JP2004348394A

JP2004348394A - Environment change device, and behavioral guideline information generation and presentation device

Info

Publication number: JP2004348394A
Application number: JP2003144192A
Authority: JP
Inventors: Tetsuo Kurahashi; 哲郎倉橋; Kazunori Higuchi; 和則樋口; Yoshiyuki Umemura; 祥之梅村; Youji Yamada; 陽滋山田
Original assignee: Toyota Motor Corp; Toyota Central R&D Labs Inc; Toyota Gauken
Current assignee: Toyota Motor Corp; Toyota Central R&D Labs Inc; Toyota Gauken
Priority date: 2003-05-21
Filing date: 2003-05-21
Publication date: 2004-12-09

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently produce the next controlled variable or behavioral guideline information. <P>SOLUTION: A state evaluation value Vπ(s<SB>t</SB>) is computed from a reward for the result of the preceding behavior output (542), a TD error δt is computed (544), and an evaluation value T(s<SB>t</SB>) is computed (546). A trial value Δa(s<SB>t</SB>) is computed (548) and output (554) with the effect of the difference between a value depending on teaching behavior and the preceding behavior output on the value of variation set larger than the effect of a behavior output depending on a given probability on the value of variation. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、環境変化装置及び行動指針情報生成提示装置に係り、より詳細には、制御量に基づいて変化手段を制御して環境を変化する環境変化装置、及び、行動の指針となる行動指針情報を算出し、算出された行動指針情報を提示する行動指針情報生成提示装置に関する。
【０００２】
【従来の技術】
従来、環境変化装置の環境変化制御としての例えばロボット制御において行われる強化学習方式によれば，ある状態においてロボットにある行動を選択して行わせたときに，状態価値関数あるいは行動価値関数とよばれる，ロボットが将来にわたって与えられる報酬，すなわち収益の期待値としてどれだけを獲得できるかを逐次計算することによって，これらの価値関数のいずれかがもっとも大きくなる行動を選択することにより，所望の行動パターン（次回のロボットの制御量）を得ることができるとされている（非特許文献１参照。）。
【０００３】
【非特許文献１】
「強化学習によるロボット・ハンドの把握制御」電学論Ｃ，１２１巻，４号，第７１０頁〜第７１７頁（２００１．４）
【０００４】
【発明が解決しようとする課題】
しかしながら，とくに環境に対する情報をもっていない学習初期の段階では，ランダムな試行によって所望の行動パターン（所望のロボットの制御量）の探索を行わなければならないため，学習効率が悪い、即ち、前回の行動パターン（制御量）から次回の行動パターン（次回のロボットの制御量）を求める効率が悪い。とくに状態の次元数が大きくなると，次の行動パターンを得るための短時間に最適解が得られなく，実用性にかけていた。
【０００５】
本発明は、上記事実に鑑み成されたもので、効率良く次回の制御量又は行動指針情報を得ることの可能な環境変化装置及び行動指針情報生成提示装置を提供することを目的とする。
【０００６】
【課題を解決するための手段】
上記目的達成のため請求項１記載の発明にかかる環境変化装置は、環境を変化させる変化手段と、制御量に基づいて、前記環境が変化するように前記変化手段を制御する制御手段と、前記制御量を算出する算出手段と、を備えた環境変化装置であって、前記算出手段は、以前の制御量に基づいて変化した環境に対する評価が予め定められた良好評価範囲の場合には、予め定められた基準制御量に基づいて、以前の制御量からの変化量を定めることにより、次回の制御量を算出することを特徴とする。
【０００７】
本発明の算出手段は、制御手段の制御量を算出し、制御手段は、制御量に基づいて、環境が変化するように変化手段を制御する。
【０００８】
ここで、環境を所望の状態に変化させるための制御量を算出する際に、以前の制御量をランダムに変化させることも考えられる。しかしながら、以前の制御量をランダムに変化させても、最適解を効率良くもとめることはできない。
【０００９】
そこで、本発明に係る算出手段は、以前の制御量に基づいて変化した環境に対する評価が予め定められた良好評価範囲の場合には、予め定められた基準制御量に基づいて、以前の制御量からの変化量を定めることにより、次回の制御量を算出するようにしている。なお、基準制御量は、次回の制御量を算出するための基準値である。
【００１０】
なお、以前の制御量は、請求項５のように、基準制御量に基づいて定められた制御量としてもよい。
【００１１】
また、算出手段は、請求項２のように、基準制御量から所定の確率に基づいて定められる値と前回の制御量との差に基づいて変化量を定めるようにしてもよい。この場合、請求項３のように、算出手段は、基準制御量から所定の確率に基づいて定められる値と前回の制御量との差の変化量の値への影響を、所定の確率に基づいて定められる制御量の変化量の値への影響より大きくして、変化量を定めるようにしてもよく、一方、請求項４のように、基準制御量から所定の確率に基づいて定められる値と前回の制御量との差のみに基づいて変化量を定めるようにしてもよい。
【００１２】
このように、以前の制御量に基づいて変化した環境に対する評価が予め定められた良好評価範囲の場合には、予め定められた基準制御量に基づいて、以前の制御量からの変化量を定めることにより、次回の制御量を算出するので、以前の制御量をランダムに変化させて、次回の制御量を算出するよりは、効率良く次回の制御量を得ることができる。
【００１３】
ところで、予め定められた基準制御量が必ずしも次回の制御量を算出するための適正な基準値でない場合がある。
【００１４】
そこで、請求項６のように、各々異なる予め定められた複数の基準制御量を有し、算出手段は、複数の基準制御量の内の１つの基準制御量に基づいて定められた以前の制御量に基づいて変化した環境に対する評価が予め定められた良好評価範囲外の場合には、複数の基準制御量の内の該１つの基準制御量以外の他の基準制御量に基づいて、前回の制御量からの変化量を定めることにより、次回の制御量を算出するようにしてもよい。
【００１５】
上記基準制御量は次のようにして求めるようにしてもよい。
【００１６】
即ち、請求項７のように、操作するための操作手段と、前記操作手段が操作されて得られた操作情報から前記制御量を計算する計算手段と、を備え、前記制御手段は、前記計算手段により計算された制御量に基づいて、前記環境が変化するように前記変化手段を制御し、請求項８のように、基準制御量を、前記計算手段が操作情報から計算する。
【００１７】
なお、環境は物体の運動状態、例えば、物体の静止状態を示す。
【００１８】
また、請求項１乃至請求項１０の何れか１項に記載の環境変化装置は、請求項１１のように、変化手段としてのロボット・ハードウエアーを備えたロボットとしてもよい。
【００１９】
請求項１２記載の発明に係る行動指針情報生成提示装置は、行動の指針となる行動指針情報を算出する算出手段と、前記算出手段により算出された行動指針情報を提示する提示手段と、を備えた行動指針情報生成提示装置であって、前記算出手段は、以前の行動指針情報に基づいて変化した行動に対する評価が予め定められた良好評価範囲の場合には、予め定められた基準行動指針情報に基づいて、以前の行動指針情報からの変化量を定めることにより、次回の行動指針情報を算出することを特徴とする。
【００２０】
算出手段は行動の指針となる行動指針情報を算出し、提示手段は、算出手段により算出された行動指針情報を提示する。
【００２１】
ここで、行動を所望の行動になるための行動指針情報を算出する際に、以前の行動指針情報をランダムに変化させることも考えられる。しかしながら、以前の行動指針情報をランダムに変化させても、最適解を効率良くもとめることはできない。
【００２２】
そこで、本発明に係る算出手段は、以前の行動指針情報に基づいて変化した行動に対する評価が予め定められた良好評価範囲の場合には、予め定められた基準行動指針情報に基づいて、以前の行動指針情報からの変化量を定めることにより、次回の行動指針情報を算出する。なお、基準行動指針情報は、次回の基準行動指針情報を算出するための基準値である。
【００２３】
なお、以前の行動指針情報は、請求項１６のように、基準行動指針情報に基づいて定められた行動指針情報としてもよい。
【００２４】
また、算出手段は、請求項１３にように、基準行動指針情報から所定の確率に基づいて定められる値と前回の行動指針情報との差に基づいて変化量を定めることにより、次回の行動指針情報を算出ようにしてもよい。この場合、請求項１４のように、算出手段は、基準行動指針情報から所定の確率に基づいて定められる値と前回の行動指針情報との差の前記変化量の値への影響を、所定の確率に基づいて定められる行動指針情報の前記変化量の値への影響より大きくして、変化量を定めるようにしてもよく、一方、請求項１５のように、基準行動指針情報から所定の確率に基づいて定められる値と前回の行動指針情報との差のみに基づいて変化量を定めるようにしてもよい。
【００２５】
このように、以前の行動指針情報に基づいて変化した環境に対する評価が予め定められた良好評価範囲の場合には、予め定められた基準行動指針情報に基づいて、以前の行動指針情報からの変化量を定めることにより、次回の行動指針情報を算出するので、以前の行動指針情報をランダムに変化させて、次回の行動指針情報を算出するよりは、効率良く次回の行動指針情報を得ることができる。
【００２６】
ところで、予め定められた基準行動指針情報が必ずしも次回の制御量を算出するための適正な基準値でない場合がある。
【００２７】
そこで、請求項１７のように、各々異なる予め定められた複数の基準行動指針情報を有し、算出手段は、複数の基準行動指針情報の内の１つの基準行動指針情報に基づいて定められた以前の行動指針情報に基づいて変化した環境に対する評価が予め定められた良好評価範囲外の場合には、複数の基準行動指針情報の内の該１つの基準行動指針情報以外の他の基準行動指針情報に基づいて、以前の制御量からの変化量を定めることにより、次回の行動指針情報を算出するようにしてもよい。
【００２８】
なお、上記行動は、車両が所定運転状態のときの該車両を操縦する乗員の行動としてもよい。乗員の行動には、例えば、視線方向、注視時間、注視配分などのような注視行動や、ハンドル操作等である。
【００２９】
【発明の実施の形態】
以下、図面を参照して、本発明の第１の実施の形態を詳細に説明する。
【００３０】
図１に示すように、本実施の形態にかかるロボット１００は、「人間−ロボット−作業環境（操作対象等）」構造および情報の流れを前提としている。
【００３１】
図２に示すように、本実施の形態にかかる環境変化装置としてのロボット１００は、人が操作する部分である図示しない被操作部（操作手段）への人間からの操作力と、環境、本実施の形態では、例えば、机上の物体をつかんだり移動させたりする部分への当該物体からの反力と、を検出する力検出センサ５１２を備えている。即ち、変化手段としての周知のロボット・ハドウエアー（ロボットマニュピレータ）の先端に検出センサ１２を搭載し、その先端にフィンガを取り付けた構造になっている。フィンガの先端は、操作対象の物体との剛体接触を避けるため、ウレタンの被膜で覆われている。
【００３２】
また、ロボット１００は、力検出センサ５１２により検出された操作力の情報に基づいてロボットへの教示情報としての幾何学情報を生成する、計算手段としての教示幾何学情報生成装置５１４、情報を入力する入力装置５２０、入力装置５２０により入力されたモードの指定情報により設定される人間／ロボット協調作業モードにおいて，上記の教示幾何学情報生成装置５１４から得られた幾何学情報に基づきロボットの協調運動を生成する協調運動計画装置５１６、また同様に、入力装置５２０により入力されたモードの指定情報により設定されるロボット単体のみによる自律作業モードにおいて，上記の教示幾何学情報生成装置５１４から得られた幾何学情報に基づき，ロボットにとって作業環境との間に望ましい相互作用力パターンを自律的に生成する、算出手段としてのロボット行動計画装置５２２、及び、設定された作業モードにより定まる協調運動計画装置５１６又はロボット行動計画装置５２２からの指令（後述する行動出力）によってロボットの動作をロボット・ハドウェアに出力する、制御手段としてのロボット動作制御装置５１８を備えている。
【００３３】
なお、前述したように協調運動計画装置５１６及びロボット行動計画装置５２２には入力装置５２０が接続され、ロボット行動計画装置５２２にはさらに、ロボット動作制御装置５１８の動作状態を検出することにより、ロボット動作制御装置５１８により移動するロボット・ハドウェアの動作状態を検出する動作状態検出センサ５２４と、前述した力検出センサ５１２と、が接続されている。
【００３４】
次に、本実施の形態の作用を説明する。
【００３５】
本実施の形態では、例として、人間により接触されるロボットフィンガーで、人間の操作に従って、図示しない机上の始点位置上に位置する物体を、水平方向（Ｘ方向）に移動し、指定された位置（終点位置）に停止する場合を説明する。
【００３６】
最初に、準備として始点位置及び物体の移動先である指定された位置（終点位置）を次のように入力する。即ち、本実施の形態では、ロボットの動作のため、後述するＴＤ（ＴｅｍｐｏｒａｌＤｉｆｆｅｒｅｎｃｅ）誤差に基づく強化学習手法の一つであるＡｃｔｏｒ−Ｃｒｉｔｉｃ法を用いている。Ａｃｔｏｒ−Ｃｒｉｔｉｃ法では、ロボットの動作を制御するための制御情報を出力した後、その制御情報により動作したロボットの動作結果に対する報酬を受け取り、次の制御情報を計算するものである。このため、報酬ならびに罰は次のように入力装置５２０を介して入力する。即ち、上記の例を用いると、机上の物体を目標位置（終点位置）に向けて静止状態（始点位置）から動かした場合と目標位置を中心とした一定の範囲以内に静止させることができた場合に報酬として１を与え、逆に目標位置を含む所定範囲外での静止や指が物体から滑ってしまった場合、力検出センサにより検出された力情報に基づいて、鉛直方向（Ｚ方向）に所定値（許容値）より大きい力が入力された場合等、ロボットの限界可動範囲に達してしまった場合に−１を設定する。このように、机上の物体の位置である始点位置と目標位置（終点位置）とを上記のように報酬及び罰を設定することにより入力する。
【００３７】
次に、人間／ロボット協調作業モードを説明する。
【００３８】
入力装置５２０により人間／ロボット協調作業モードがセットされ、ロボット・ハドウエアーにより机上の物体を始点位置から目標位置（終点位置）に移動するように、人間が被操作部を操作すると、力検出センサ５１２は、被操作部への操作力及びこれに伴い物体からの反力の合力を検出する。教示幾何学情報生成装置５１４は、力検出センサ５１２により検出された力情報（教示情報）に基づいてロボットへの幾何学情報、上記例ではＸ方向の移動速度（後述する教示行動Ａ（ｓ））を生成する。協調運動計画装置５１６は、教示幾何学情報生成装置５１４から得られた幾何学情報に基づき、ロボット・ハドウエアーが協調運動するための、ロボット動作制御装置５１８の制御量を計算し、当該制御量をロボット動作制御装置５１８に出力し、ロボット・ハドウエアーが協調運動するようにロボット動作制御装置５１８を制御する。
【００３９】
動作状態検出センサ５２４は、ロボット・ハドウエアーが協調運動する際のロボット動作制御装置５１８の動作状態を検出することにより、ロボット動作制御装置５１８により移動するロボット・ハドウェアの動作状態を検出する。上記例では、動作状態検出センサ５２４は、ロボット・ハドウエアーにより机上の物体が始点位置から目標位置（終点位置）に位置するかの位置情報を検出する。
【００４０】
教示幾何学情報生成装置５１４は、生成した移動速度を教示行動として記憶するが、本実施の形態は、人間が上記処理を複数回行うことにより、各々異なる複数の教示行動（上記例では移動速度）を生成し、記憶する。
【００４１】
次に、自律作業モードについて説明する。自律モードは、ロボット自身が自身を制御して物体を移動するモードである。人間／ロボット協調作業モードでは、被操作部への操作力及びこれに伴い物体からの反力の合力を検出し、物体を目標位置に位置させることができる。しかしながら、自律モードでは、ロボットが物体を移動する際に、力検出センサ５１２は、物体からの反力のみを検出する。従って、上記の教示行動のみでは、物体を目標位置に位置させることは難しい。そこで、自律モードでは、教示行動に基づいて、物体を目標位置に位置させるためのロボット動作制御装置５１８の制御量（行動出力ａ（ｓ））を次にように計算する。
【００４２】
即ち、入力装置５２０により自律モードがセットされると、ロボット行動計画装置５２２は、図３に示した行動パターン（相互作用力パターン）生成処理ルーチンを実行する。
【００４３】
最初のステップ５３２で、上記複数の教示行動（上記例では移動速度）の各々を識別する変数ｉを０に初期化し、ステップ５３４で、変数ｉを１インクリメントし、変数ｉが教示行動の総数Ｉ以内か否かを判断する。
【００４４】
変数ｉが教示行動の総数Ｉより大きい場合には、全ての教示行動に対して最適解が得られないとして、本ルーチンを終了し、又は、まったくランダムな値を探索するモードに移行する。一方、変数ｉが教示行動の総数Ｉ以内の場合には、本ルーチンがスタートしてから、後述する行動出力の計算が２回目以降か否かを判断し、２回目以降でない、即ち、最初に計算する場合と判断された場合には、ロボット・ハドウエアーが行動するための、ロボット動作制御装置５１８の制御量である行動出力ａ（ｓ_ｔ）を、所定の確率により定まる値として計算して、ステップ５５４に進む。
【００４５】
ステップ５５４で、上記計算された行動出力ａ（ｓ_ｔ）をロボット動作制御装置５１８に出力する。この行動出力ａ（ｓ_ｔ）に従ってロボット動作制御装置５１８は、ロボット・ハドウエアーの動作を制御する。
【００４６】
ここで、上記のように、上記式より計算された行動出力ａ（ｓ_ｔ）は、ロボットフィンガーが物体を移動する移動速度を示す。即ち、ロボットフィンガーは、上記式より計算された行動出力ａ（ｓ_ｔ）で定まる移動速度で物体を始点位置から、Ｘ方向に、上記移動速度で移動し、指定された位置（終点位置）に停止させようとする。
【００４７】
ステップ５５６は、報酬ｒ_ｔ＋１を計算する。即ち、動作状態検出センサ５２４は、ロボット・ハドウエアーが行動する際のロボット動作制御装置５１８の動作状態を検出しており、具体的には、机上の物体が目標位置を中心とした一定の範囲以内に静止させることができたか否かを検出する。よって、動作状態検出センサ５２４の検出結果に基づいて報酬ｒ_ｔ＋１を計算する。なお、机上の物体を目標位置に移動する途中で、ロボットフィンガから物体が滑ってしまった場合には、人間が入力装置５２０を介して報酬を与える。なお、ロボットフィンガに接触センサを設け、物体が滑ってしまったか否かを判断し、接触センサの出力に基づいて報酬を計算するようにしてもよい。また、力検出センサにより検出された力情報に基づいて、鉛直方向（Ｚ方向）に所定値（許容値）より大きい力が入力された場合、力検出センサにより検出された力情報に基づいて、報酬を計算するようにしてもよい。
【００４８】
ステップ５５８で、ステップ５５６で計算された報酬ｒ_ｔ＋１に基づいて、机上の物体が目標位置（終点位置）に静止させることができたか否かを判断する。第１回目の行動出力の場合には、通常は、本ステップ５５８は否定判定され、この場合には、ステップ５６０で、ステップ５５６で計算された報酬ｒ_ｔ＋１に基づいて、目標位置を含む範囲以外での静止や指が物体から滑ってしまったり、力検出センサにより検出された力情報に基づいて、鉛直方向（Ｚ方向）に所定値（許容値）より大きい力が入力されたり、ロボットの限界可動範囲に達してしまったか否かを判断する。
【００４９】
ロボットの限界可動範囲に達してしまってない場合には、ステップ５４２に進む。
【００５０】
ステップ５４２で、ある方策πすなわち状態から行動を選択する確率のもとでの状態Ｓ_ｔにおける状態評価値Ｖ^π（ｓ_ｔ）を以下の式から計算する。
【００５１】
【数１】

【００５２】
ステップ５４４で、ＴＤ誤差δｔを以下の式から計算する。
【００５３】
【数２】

【００５４】
なお、上記式において、状態価値Ｖ（ｓ_ｔ）は、状態価値Ｖπ（Ｓ_ｔ）の推定値，γは、は割引率である。また、Ｖ（ｓ_ｔ＋１）は、現在の状態価値Ｖ（ｓ_ｔ）から推定される次の状態の状態価値の推定値である。従って、ＴＤ誤差δｔは、時間的に進んだ次の時点の値である。
【００５５】
ステップ５４６で、教示情報に対する評価値Ｔ（ｓ_ｔ）を以下の式より計算する。
【００５６】
【数３】

【００５７】
ステップ５４８で、試行値Δａ（ｓ_ｔ）以下の式より計算する。
【００５８】
【数４】

【００５９】
但し、ｎはガウス分布を仮定した試行を示す。また、Ａ（ｓ_ｔ）は、前述した教示行動（基準制御量）から予め定められた量の平均（オフセット）および分散（ばらつき）を伴って，予め定められた確率分布に基づいて定められる量（以下、教示行動から定まる値という）である。
【００６０】
上記式の中で，
【００６１】
【数５】

【００６２】
は，教示行動が環境に対して有効であれば，ロボットの行動出力値を教示行動から定まる値に接近させるように学習をさせ，教示行動があまり有効でないようであれば，ランダムな試行錯誤でロボットの目標出力を探索させるようにすること意味している。即ち、報酬ｒ_ｔ＋１が正で、これによりＴＤ誤差δｔが正で、評価値Ｔ（ｓ_ｔ）も正の場合には、ρ（Ｔ（ｓ_ｔ））は１に近い値をとり、ρ（−Ｔ（ｓ_ｔ））は０に近い値をとる。即ち、前回の行動出力に基づいて物体が移動した結果に対する評価が予め定められた良好評価範囲の場合には、予め定められた教示行動に基づいて、前回の行動出力からの変化量を定める。より詳細には、教示行動から定まる値と前回の行動出力との差に基づいて、更に詳細には、教示行動から定まる値と前回の行動出力との差の変化量の値への影響を、所定の確率に基づいて定められる行動出力の変化量の値への影響より大きくして、変化量を定めている。
【００６３】
ステップ５５０で、状態価値Ｖ（ｓ_ｔ）を以下の式より更新する。
【００６４】
【数６】

【００６５】
ステップ５５２で、ロボット・ハドウエアーが行動するための、ロボット動作制御装置５１８の制御量である行動出力ａ（ｓ_ｔ）を以下の式より計算する。
【００６６】
【数７】

【００６７】
なお、βはテップサイズパラメータ、δ_ｔはＴＤ（ＴｅｍｐｏｒａｌＤｉｆｆｅｒｅｎｃｅ）誤差（詳細は後述する）である。上記式右辺の第１番目の項であるａ（ｓ_ｔ）は前回の行動出力ａ（ｓ_ｔ）である。従って、上記式右辺の第２番目の項であるβδ_ｔΔａ（ｓ_ｔ）は、前回の行動出力ａ（ｓ_ｔ）から今回の行動出力ａ（ｓ_ｔ）への変化量である。
【００６８】
そして、上記のように、ステップ５５４で、上記計算された行動出力ａ（ｓ_ｔ）をロボット動作制御装置５１８に出力する。この行動出力ａ（ｓ_ｔ）に従ってロボット動作制御装置５１８は、ロボット・ハドウエアーの動作を制御する。即ち、上記のように、ロボットフィンガーは、上記式より計算された行動出力ａ（ｓ_ｔ）で定まる移動速度で物体を始点位置から、Ｘ方向に、上記移動速度で移動し、指定された位置（終点位置）に停止させようとする。
【００６９】
ステップ５５６では、上記のように報酬ｒ_ｔ＋１を計算し、ステップ５５８で、ステップ５５６で計算された報酬ｒ_ｔ＋１に基づいて、机上の物体が目標位置（終点位置）に静止させることができたか否かを判断し、机上の物体が目標位置（終点位置）に静止させることができた場合には、今回得られた行動出力である移動速度を最適値として、本ルーチンを終了する。
【００７０】
一方、机上の物体が目標位置（終点位置）に静止させることができず、ステップ５６０において、ロボットの限界可動範囲に達し、許容範囲外と判断された場合には、変数ｉにより識別される教示行動は、変化量を求めるための教示行動としては不適当と考えられるため、ステップ５３４に戻って、上記したように変数ｉを１インクリメントして、他の教示行動について最適解を求める。
【００７１】
次に、以上の実施の形態のシミュレーションによる実機実験について説明する。教示情報を与えた場合と与えない場合それぞれにおいて、１００回のシミュレーションにおける、タスク達成（目標位置を中心とした一定の範囲以内に静止させることができた）に至るまでにかかったエピソード数の平均と標準偏差を表１に示す。
【００７２】
【表１】

【００７３】
上記表１に示すように、環境を扱う上で妥当な教示行動を与えた場合の方が、与えない場合に比べて早く、安定した解に至る。図４には、タスク達成の観点から学習を完了した行動出力によるロボットフィンガの振るまいの結果を示す。図４には、対象物のＸ方向の速度、指定位置からの距離および、対象物とロボットフィンガ先端との間の接触力の変化を示している。また、対象物の質量を１５０［ｇ］、対象物−フィンガ間の静止摩擦係数を１．００、対象物−床間の静止摩擦係数を０．２０、動摩擦係数を０．１８としている。
【００７４】
以上説明したように、本実施の形態では、前回の行動出力ａ（ｓ_ｔ）から今回の行動出力ａ（ｓ_ｔ）への変化量を、教示行動の環境に対する有効度合いに基づいて決定させている、即ち、学習初期の段階でも，ランダムな試行による所望の行動パターンの探索をしていないので，学習効率がよく，とくに状態の次元数が大きくなっても，最適解を得やすく，実用性に高い。
【００７５】
また、準備段階において教示情報を複数入力し、各々異なる教示行動を求めてお、どれかの教示行動に従うという行動出力の決め方をするので，１つ１つの教示行動付近の探索コストを下げることができる。
【００７６】
以上説明した実施の形態では、教示行動から定まる値と前回の行動出力との差の変化量の値への影響を、所定の確率に基づいて定められる行動出力の変化量の値への影響より大きくして、変化量を定めて、次回の行動出力を計算している（数４参照）。しかし、本発明はこれに限定されるのではなく、次にように、教示行動と前回の行動出力との差のみに基づいて変化量を定めて、次回の行動出力を計算するようにしてもよい。
【００７７】
即ち、例えば、上記数４の式に代えて、変数ｉにより識別される教示行動に対して，
【００７８】
【数８】

【００７９】
なる値によってをさだめるようにしてもよい。
【００８０】
【数９】

【００８１】
である。なお、ｕは評価値Ｔ（ｓ）である。
【００８２】
以上説明した実施の形態では、環境として物体の１次元状の水平移動を例にとり説明したが、本発明はこれに限定されるものではなく、物体の２次元状や３次元状の移動等も同様に適用してもよい。
【００８３】
なお、複数の作業部分で構成される作業の各作業部分において、人間／ロボット協調作業モードで行う部分なのか、ロボットによる自律モードなのかを、検討し、自律モードで行う作業部分は、上記のように、一旦、人間／ロボット協調作業モードで、教示行動を得ておき、最適解が得られた場合には、自律モードで行うようにして、作業の効率化を図ることができる。
【００８４】
次に、本発明の第２の実施の形態の一例を詳細に説明する。本実施の形態は、走行時における車両の挙動に対する乗員（ドライバ）の確認行動の良否の判定結果から乗員への行動の指針となる行動指針情報としてのアドバイス情報を生成しかつ提示するアドバイス情報生成提示装置に本実施の形態を適用したものである。
【００８５】
図６に示すように、本実施の形態にかかるアドバイス情報生成提示装置１０は、ＣＰＵ１２、ＲＯＭ１４、ＲＡＭ１６、及び入出力インタフェース１８をデータやコマンドを授受可能にバスにより接続したコンピュータを含んで構成されると共に、車両に搭載可能に構成されている。入出力インタフェース１８には、各種データを記憶したメモリ２０が接続されている。
【００８６】
また、入出力インタフェース１８には、各種センサ２２、メディアリードライト装置２４、出力装置としてのプリンタ２６、入力装置としてのキーボード２８、及び表示装置としてのモニタ３０が接続されている。なお、プリンタ２６は必須の構成ではなく、本実施の形態のアドバイス情報生成提示装置１０の評価結果を印刷する場合に接続する。また、メディアリードライト装置２４も必須の構成ではなく、後述する処理ルーチンの読み書きやアドバイス情報生成提示装置１０の評価結果のデータを読み書きしたり、各種データを読み書きしたりするときに接続する。
【００８７】
各種センサ２２には、運転状態計測装置として、車両の挙動を検出するための挙動検出センサ及び車両の乗員の行動を検出する行動検出センサが含まれている。挙動検出センサの一例としては、操舵角センサ、車速センサ、ヨーレートセンサ、ブレーキペダル踏込み量センサ、アクセルペダル踏込み量センサ、及びウィンカセンサなどがある。また、車両の挙動として道路などの走行路における車線に対する挙動を把握するために、車線を検出する車線検出装置などの検出センサを含むことができ、また車両の位置を把握するためにＧＰＳなどの位置検出センサを含むことができる。車線検出装置の一例には特開平０８−２６１７５６号公報に開示された技術がある。さらに、ナビゲーションシステムのように、地図データを有して車両の経路を案内したり経路履歴を記憶したりする経路案内装置を含むことができる。
【００８８】
行動検出センサの一例としては、乗員の視線方向や顔向きを検出する検出装置、乗員を撮影した画像により乗員を識別する識別装置がある。これらの装置の一例には、特開平１１−２７６４３８号公報や特開平２０００−１６８５０２号公報に開示された技術や市販製品（ｓｅｅｉｎｇｍａｃｈｉｎｅｓ社製ｆａｃｅＬａｂ）がある。
【００８９】
図５に示すように、アドバイス情報生成提示装置１０は、上記ハードウェア資源及び後述するソフトウェア資源を利用しており、運転状態を計測する運転状態計測装置３２、運転状態計測装置３２により計測された運転状態の計測値を用いて乗員の確認行動の良否からアドバイス情報を生成する、算出手段としてのアドバイス情報生成装置３４、及び、アドバイス情報生成装置３４生成されたアドバイス情報を提示する、提示手段としてのアドバイス情報提示装置３６に機能的に分類して構成される。
【００９０】
なお、アドバイス情報提示装置３６は、乗員の確認行動の判定結果を、印刷したり、表示したりする提示装置であり、プリンタなどの印刷装置、ディスプレイなどの表示装置、スピーカなどの拡声装置が一例として挙げられる。
【００９１】
アドバイス情報生成提示装置１０を構成する運転状態計測装置３２は、車両の挙動及び乗員の行動を含む運転状態を計測する装置であり、各種センサの検出値から、車両の挙動や車両の乗員の行動を含む運転状態を計測する。
【００９２】
すなわち、操舵角センサ、車速センサ、ヨーレートセンサ、ブレーキペダル踏込み量センサ、アクセルペダル踏込み量センサ、及びウィンカセンサなどにより車両の挙動を把握して運転状態を求める。また、車線検出装置などの検出値により道路などの走行路における車線を把握したり、ＧＰＳ（衛生利用測位システム）やＧＩＳ（地理情報システム）などの位置検出値により車両の位置や方向そして周囲の状況などを把握することもできる。また、運転状態計測装置３２は、乗員の視線方向や顔向きを検出する検出装置による検出値から乗員の行動を計測したり、乗員を撮影した画像から乗員を識別する計測をしたりすることもできる。
【００９３】
アドバイス情報生成装置３４は、運転状態計測装置３２による計測結果から乗員の確認行動を判定し、アドバイス情報を生成するもので、基本行動分類部４０、基本行動内状態演算部４２、視線配分演算部４４、報酬演算部４６、及びアドバイス情報生成部４８から構成される。これらの基本行動分類部４０、基本行動内状態演算部４２、視線配分演算部４４、比較演算部４６及び表示用処理部４８には、運転状態計測装置３２の計測信号が入力されるように、運転状態計測装置３２に接続されている。
【００９４】
基本行動分類部４０は運転状態計測装置３２から出力された計測信号を用いて車両の挙動、及びこの挙動がなされた際の道路形態を求め、乗員による運転状態を、挙動と道路形態の組合わせからなる基本行動に分類するものである。基本行動分類部４０は、運転状態計測装置３２の計測信号が入力されるように運転状態計測装置３２に接続され、その出力が基本行動内状態演算部４２及び比較演算部４６に出力されるように接続されている。
【００９５】
この基本行動分類部４０は、まず運転状態計測装置３２から出力された計測信号を用いて車両の挙動（行動）を求め、次に、この車両の挙動（行動）が行われた際の道路形態を、運転状態計測装置３２から出力された計測信号を用いて求める。これらの車両の挙動（行動）と道路形態を組合わせたものを基本行動とする。そして、基本行動分類部４０は、運転状態計測装置３２から出力された計測信号に対応する乗員による運転状態を、挙動と道路形態の組合わせからなる基本行動に分類する。
【００９６】
図７には、上述のような車両の挙動と道路形態の組合わせからなる基本行動の分類テーブル４１の一例を示した。本実施の形態では、上記車両の挙動の一例として、右折、左折、右側車線変更、左側車線変更、交差点直進、右転回の６つの車両の挙動（行動）を採用している。また、道路形態の一例として、信号のある４叉路以上の交差点、信号のある４叉路未満の交差点、信号のない交差点で優先側より進入、信号のない交差点で非優先側より進入、非交差点の５つの道路形態を採用している。
【００９７】
また、本実施の形態では、基本行動として、図７に「○」印で示すように、２１個の基本行動を採用する。従って、運転状態計測装置３２による計測結果から、乗員による運転状態として、２１個のうちの何れかの基本行動に分類される。
【００９８】
また、本実施の形態では、車両の挙動（行動）と道路形態を組合わせからなる基本行動の各々を、シート５０に対応させている。シート５０は、基本行動を車両の挙動及び道路形態の遷移を含んで表現するための情報集合体である。図８乃至図１２には本実施の形態において基本行動分類部４０で分類される基本行動に対して用いられるシート５０の一例を示した。これらのシート５０は、アドバイス情報生成提示装置１０のメモリ２０に記憶されている。
【００９９】
図８は、右折行動で分類された基本行動用のシート例を示した。すなわち、信号有りで４叉路以上の交差点での右折行動、信号有りで４叉路未満の交差点での右折行動、信号無の交差点で優先側より進入したときの右折行動、信号無の交差点で非優先側より進入したときの右折行動、非交差点での右折行動の何れかの右折行動として基本行動が分類されたときに用いられるシート５０Ａの一例である。
【０１００】
このシート５０Ａは、横軸に所定間隔の車速（速度軸）、縦軸に所定間隔の車両回頭（ステアリング操舵軸）で範囲を規定した複数のセルに分割されている。各セルにはセルの位置を特定するためのセル番号が付されている。図８の例では各セルの左下付近に２桁の数で表記している。ここでは、第１の桁を、車両回頭（ステアリング操舵軸）の範囲として右折準備、右折開始から右折中、右折中から右折終了の挙動を識別する数の「１」〜「３」で表している。また、第２の桁を、車速（速度軸）として０〜１ｋｍ／ｈでブレーキオン、０〜１ｋｍ／ｈでブレーキオフまたは１〜１０ｋｍ／ｈ、１０〜２５ｋｍ／ｈ、２５〜４０ｋｍ／ｈ、４０ｋｍ／ｈ以上の範囲を識別する数の「０」〜「４」で表している。従って、例えば、車速が１０ｋｍ／ｈ〜２５ｋｍ／ｈで右折開始から右折中の状態は、セル番号「２２」が対応される。
【０１０１】
図９は、左折行動で分類された基本行動用のシート例を示した。すなわち、信号有りで４叉路以上の交差点での左折行動、信号有りで４叉路未満の交差点での左折行動、信号無の交差点で優先側より進入したときの左折行動、信号無の交差点で非優先側より進入したときの左折行動、非交差点での左折行動の何れかの左折行動として基本行動が分類されたときに用いられるシート５０Ｂの一例である。
【０１０２】
このシート５０Ｂも、シート５０Ａと同様に、横軸に所定間隔の車速（速度軸）、縦軸に所定間隔の車両回頭（ステアリング操舵軸）で範囲を規定した複数のセルに分割されている。また、シート５０Ｂのセル番号も同様に、第１の桁を、車両回頭（ステアリング操舵軸）の範囲を識別する数の「１」〜「３」で表し、第２の桁を、車速（速度軸）の範囲を識別する数の「０」〜「４」で表している。
【０１０３】
図１０は、右転回行動で分類された基本行動用のシート例である。すなわち信号有りで４叉路以上の交差点での右転回行動、信号有りで４叉路未満の交差点での右転回行動、信号無の交差点で優先側より進入したときの右転回行動、信号無の交差点で非優先側より進入したときの右転回行動、非交差点での右転回行動の何れかの右転回行動として基本行動が分類されたときに用いられるシート５０Ｃである。
【０１０４】
このシート５０Ｃは、横軸に所定間隔の車速（速度軸）、縦軸に所定間隔の車両回頭（ステアリング操舵軸）で範囲を規定した複数のセルに分割されている。シート５０Ｃの基本行動は、転回行動であり、車両の挙動として切り返しなど様々な挙動を含むことが考えられるため、セルは、次のように分割されている。車両回頭（ステアリング操舵軸）の範囲として行動開始から右転回準備までの準備段階を準備Ａと準備Ｂとの２つを設けている。準備Ｂは、左方向ステアリング動作からステアリング中立位置までの挙動であり、準備Ａは、右方向ステアリング動作からステアリング中立位置までの挙動である。そして、右転回開始から、右転回中までの挙動、右転回中から右転回終了までの挙動に分割している。これらを識別するために、セル番号の第１の桁を、数「０」〜「３」で表している。車速（速度軸）としては、後退を含むため、後退の範囲を設けて、前進と差異を付するためにセル番号にはマイナス符号「−」を付している。すなわち、第２の桁を、車速（速度軸）として、後退、０〜１ｋｍ／ｈでブレーキオン、０〜１ｋｍ／ｈでブレーキオフまたは１〜１０ｋｍ／ｈ、１０〜２５ｋｍ／ｈ、２５〜４０ｋｍ／ｈ、４０ｋｍ／ｈ以上の範囲を識別する数の「−１」、「０」〜「４」で表している。
【０１０５】
図１１は、非交差点での右側車線変更行動として基本行動が分類されたときに用いられるシート５０Ｄを示した。このシート５０Ｄは、横軸に所定間隔の車速（速度軸）、縦軸に車線の跨ぎ状態（跨ぎ始めを基準として）で範囲を規定した複数のセルに分割されている。なお、左側車線変更行動に対応するシート５０はシート５０Ｄと同様であり、跨ぐ車線を左側とすれば良い。
【０１０６】
このシート５０Ｄの基本行動は、車線変更行動であり、車両の高速走行などの挙動を含むことが考えられるため、車速（速度軸）についてセルを増加している。すなわち、上記のセルにおいて４０ｋｍ／ｈ以上であった範囲を４０ｋｍ／ｈ〜７０ｋｍ／ｈに変更し、７０ｋｍ／ｈ以上の範囲を追加している。従って、セル番号として、第１の桁を、車線の跨ぎの範囲として車線変更準備と車線変更中の挙動を識別する数の「１」、「２」で表している。また、第２の桁を、車速（速度軸）として０〜１ｋｍ／ｈでブレーキオン、０〜１ｋｍ／ｈでブレーキオフまたは１〜１０ｋｍ／ｈ、１０〜２５ｋｍ／ｈ、２５〜４０ｋｍ／ｈ、４０ｋｍ／ｈ〜７０ｋｍ／ｈ、７０ｋｍ／ｈ以上の範囲を識別する数の「０」〜「５」で表している。
【０１０７】
図１２は、交差点直進行動で分類されたシート例を示した。すなわち、信号有りで４叉路以上の交差点での交差点直進行動、信号有りで４叉路未満の交差点での交差点直進行動、信号無の交差点で優先側より進入したときの交差点直進行動、信号無の交差点で非優先側より進入したときの交差点直進行動の何れかの交差点直進行動として基本行動が分類されたときに用いられるシート５０Ｅである。
【０１０８】
シート５０Ｅは、横軸に所定間隔の車速（速度軸）、縦軸に交差点中心に対する車両位置を所定間隔で範囲を規定した複数のセルに分割されている。また、シート５０Ｅのセル番号は、第１の桁を、交差点の通過前、通過直前、通過中の何れかの車両位置の範囲を識別する数の「１」〜「３」で表し、第２の桁を、車速（速度軸）の範囲を識別する数の「０」〜「４」で表している。
【０１０９】
次に、基本行動分類部４０における基本行動を分類するアルゴリズムの一例を図１３乃至図１５を参照して説明する。
【０１１０】
図１３に示すように、基本行動分類部４０では基本行動を分類する処理ルーチンが実行され、ステップ１０２へ進み、車両の走行データを読み取る。次のステップ１０４では、運転状態計測装置３２からの走行データから車両の走行軌跡を求めると共に、その走行軌跡と予め記憶された地図データとを照合する。なお、走行軌跡は、ナビゲーションシステムから抽出することができ、また地図データは、ナビゲーションシステムで用いる地図データやＧＩＳから入手することができる。これらの走行軌跡と地図データとの照合は、これら走行軌跡と地図データの幾何学的なデータを用いて行われる。
【０１１１】
次のステップ１０６では、走行軌跡上に交差点が存在するか否かを判断し、否定されると、次のステップ１０８において道路形態を非交差点と設定し、次のステップ１１０において道路境界線と走行軌跡が交差したか否かを判断する。ステップ１１０で肯定されると、ステップ１１２において後述する右左折行動判定ルーチンにより左折行動か右折行動と設定した後に本ルーチンを終了する。
【０１１２】
なお、ステップ１０８を経由したステップ１１２では、基本行動として、非交差点で左折行動または非交差点で右折行動の何れかに設定される（後述）。
【０１１３】
ステップ１１０で否定されると、ステップ１１４へ進み、同一道路の境界線内で道路中心線を跨いで走行方向が反対になったか否かを判断する。ステップ１１４で肯定されると、ステップ１１６において後述する右転回行動判定ルーチンにより右転回行動に関する行動を判定した後に本ルーチンを終了する。
【０１１４】
なお、ステップ１０８を経由したステップ１１６では、基本行動として、非交差点で右転回行動または交差点で右転回行動の何れかに設定される（後述）。
【０１１５】
ステップ１１４で否定されると、ステップ１１８へ進み、車両が車両の右側車線を跨いだか否かを判断し、肯定された場合には、ステップ１２０において、右側車線を跨ぐ前の４秒間及び跨いだ後の２秒間を右側車線変更と設定して本ルーチンを終了する。なお、ステップ１２０では、基本行動として、非交差点で右側車線変更行動が設定される。
【０１１６】
ステップ１１８で否定されると、ステップ１２２へ進み、車両が車両の左側車線を跨いだか否かを判断し、肯定された場合には、ステップ１２４において、左側車線を跨ぐ前の４秒間及び跨いだ後の２秒間を左側車線変更と設定して本ルーチンを終了する。なお、ステップ１２４では、基本行動として、非交差点で左側車線変更行動が設定される。
【０１１７】
ステップ１２２で否定されると、ステップ１２６へ進み、道なりの走行と設定され、本ルーチンを終了する。このステップ１２６では、基本行動の設定を行う必要はないが、道成の走行について確認行動を評価する場合には、その設定を行うようにしてもよい。
【０１１８】
走行軌跡上に交差点が存在して上記ステップ１０６において肯定された場合には、ステップ１２８において道路形態として交差点の形態を抽出する。このステップ１２８では、まず地図データを参照して、その交差点の信号有無を判別する。次に、信号有りのとき、その交差点が４叉路以上か否かを判別し、信号無のとき、進入形態が優先側からか非優先側からかを判別する。これによって、ステップ１２８では、道路形態として、信号有りで４叉路以上の交差点、信号有りで４叉路未満の交差点、信号無の交差点で優先側から進入、信号無の交差点で非優先側から進入、の何れかを道路形態と設定する。
【０１１９】
次のステップ１３０では、車両が交差点を通過する前後（例えば各々２０ｍ走行時）において車両進行方向の変動が３０度以上で１３５度未満であるか否かを判断する。ステップ１３０で肯定されると、ステップ１１２へ進み、右左折行動判定ルーチンが実行される。
【０１２０】
なお、ここでは、ステップ１２８を経由したステップ１１２において設定される基本行動として、信号有りで４叉路以上の交差点、信号有りで４叉路未満の交差点、信号無の交差点で優先側から進入、信号無の交差点で非優先側から進入、の何れかを道路形態と、左折行動または右折行動の何れかの組み合わせが設定される。
【０１２１】
また、上記ステップ１３０の判断で車両進行方向の変動量として、３０度以上で１３５度未満を採用したが、この数値に限定されるものではなく、車両の右折行動または左折行動を実行するときにおける車両進行方向の角度の上限及び下限を設定すればよい。この数値は、実験的に求めたり、統計的に求めたりしてもよい。
【０１２２】
ステップ１３０で否定されると、ステップ１３２へ進み、同一道路の境界線内で道路中心線を跨いで走行方向が反対になったか否かを判断する。ステップ１３２で肯定されると、ステップ１１６へ進み、右転回行動判定ルーチンが実行される。
【０１２３】
なお、ここでは、ステップ１２８を経由したステップ１１６において設定される基本行動として、信号有りで４叉路以上の交差点、信号有りで４叉路未満の交差点、信号無の交差点で優先側から進入、信号無の交差点で非優先側から進入、の何れかを道路形態と、右転回行動の組み合わせが設定される。
【０１２４】
上記ステップ１３２で否定されると、ステップ１３４へ進み、交差点の前６０ｍから交差点後３０以内の期間を交差点直進行動と設定した後に本ルーチンを終了する。なお、このステップ１３４では、信号有りで４叉路以上の交差点、信号有りで４叉路未満の交差点、信号無の交差点で優先側から進入、信号無の交差点で非優先側から進入、の何れかを道路形態と、直進である車両の挙動の組み合わせが基本行動として設定される。
【０１２５】
なお、上記のルーチンでは、図７に示す分類テーブル４１に基づく基本行動に分類する場合を想定し、交差点付近で車線変更を除いているが、これに限定されるものではなく、交差点付近で車線変更を行った場合を基本行動に追加してもよいことは勿論である。
【０１２６】
図１４に示すように、上記ステップ１１６では右転回行動判定ルーチンが実行される。
【０１２７】
この右転回行動判定ルーチンでは、まず、ステップ１３６において、交差点を追加したか否かを判断する。すなわち、上述のようにステップ１０６の判断結果に基づいて判別され、否定されると（非交差点）ステップ１３８へ進み、肯定されると（交差点通過）ステップ１４０へ進む。
【０１２８】
ステップ１３８では、道路中心線を横切る６０ｍ前となる時点から道路中心線を横切った地点から２０ｍ離れるまでの期間を右転回行動と設定する。ステップ１４０では、車両の進行方向を右方向に変更した交差点の６０ｍ前となる時点から３０ｍ後となる時点までの期間を右転回行動と設定する。
【０１２９】
図１５に示すように、上記ステップ１１２では右左折行動判定ルーチンが実行される。
【０１３０】
この右左折行動判定ルーチンでは、まず、ステップ１４２において、交差点通過前後で（各々２０ｍ走行時）の車両進行方向の変化が進行方向に対して右側になるかまたは進行方向の右側の道路境界線上を走行軌跡が交差するか否かを判断する。すなわち、車両の挙動が右折行動か左折行動かを判断する。
【０１３１】
右折行動であり、ステップ１４２で肯定されると、ステップ１４４へ進み、交差点を通過したか否かを判断する。ステップ１４４で肯定されると、ステップ１４６へ進み、進行方向を右に変更した交差点の６０ｍ前となる時点から３０ｍ後となる時点までの期間を右折行動と設定して、本ルーチンを終了する。
【０１３２】
一方、ステップ１４４で否定されると、ステップ１４８において道路境界内から境界外へ移動する場合であるか否かを判断する。ステップ１４８において肯定されると、ステップ１５０へ進み、道路中心線を横切る６０ｍ前となる時点から道路境界外に出て１秒後までの期間を右折行動と設定し、本ルーチンを終了する。ステップ１４８で否定すなわち道路境界外から境界内に進入した場合には、ステップ１５２へ進み、道路境界内に進入する２秒前から道路中心線を横切って２０ｍ走行する時点までの期間を右折行動と設定し、本ルーチンを終了する。
【０１３３】
左折行動であり、ステップ１４２で否定されると、ステップ１５４へ進み、交差点を通過したか否かを判断する。ステップ１５４で肯定されると、ステップ１５６へ進み、進行方向を左に変更した交差点の６０ｍ前となる時点から３０ｍ後となる時点までの期間を左折行動と設定して、本ルーチンを終了する。
【０１３４】
一方、ステップ１５４で否定されると、ステップ１５８において道路境界内から境界外へ移動する場合であるか否かを判断する。ステップ１５８において肯定されると、ステップ１６０へ進み、道路中心線を横切る６０ｍ前となる時点から道路境界外に出て１秒後までの期間を左折行動と設定し、本ルーチンを終了する。ステップ１５８で否定すなわち道路境界外から境界内に進入した場合には、ステップ１６２へ進み、道路境界内に進入する２秒前から道路中心線を横切って２０ｍ走行する時点までの期間を左折行動と設定し、本ルーチンを終了する。
【０１３５】
以上のようにして、例えば、交差点を通り、交差点前後２０ｍでの車両の方位角変化が３０度以上１３５度未満であって、その際の交差点形状を行動が行われた交差点の地図データより、例えば信号のある４叉路以上の交差点と判断された場合、進行方向を右に変えた交差点の６０ｍ前となる時点から３０ｍ後となる時点までの期間を抽出し、信号のある４叉路以上交差点での右折行動という基本行動が抽出される。
【０１３６】
従って、基本行動分類部４０では、例えば信号のある叉路以上の交差点での右折行動等の基本行動を乗員の行動後にその行動による車両の挙動や道路形態で分類し、その基本行動毎にシート５０を選択し、基本行動内状態演算部４２へ出力する。
【０１３７】
なお、上記では、走行軌跡と地図データとから基本行動を設定した場合を説明したが、本発明はこれに限定されるものではなく、運転状態計測装置３２の計測値の何れかを用いて基本行動を設定してもよい。一例として、ウィンカ（方向指示器）の信号を用いて基本行動を設定する場合を説明する。
【０１３８】
図１６に示すように、基本行動分類部４０でウィンカ信号を用いた基本行動を分類する処理ルーチンが実行され、ステップ２０２へ進み、車両の走行データを読み取る。次のステップ２０４では、運転状態計測装置３２からの走行データから車両の走行軌跡を求めると共に、その走行軌跡と予め記憶された地図データとを照合する。
【０１３９】
次のステップ２０６では、走行軌跡上に交差点が存在するか否かを判断し、否定されると、次のステップ２０８において道路形態を非交差点と設定し、次のステップ２１０においてウィンカ（方向指示器）がオン）した（指示されたか否かを判断する。ステップ２１０で否定されるとステップ２１８において道なりの走行と設定し、本ルーチンを終了する。
【０１４０】
一方、ステップ２１０で肯定されると、ステップ２１２においてウィンカがオンした１秒前とオフした１秒後の車両の進行方向変化が３０度未満か否かを判断する。ステップ２１２で肯定されるとステップ２１６へ進み、車両が車線を跨いだか否かを判断し、否定されるとステップ２１８へ進み、肯定されると、ステップ２２０へ進む。
【０１４１】
ステップ２２０では、右ウィンカがオンしたか否かを判断し、肯定されるとステップ２２２において、ウィンカがオンしていた期間を含めて前後３秒間を右車線変更行動と設定し、本ルーチンを終了する。一方、ステップ２２０で否定されると、ステップ２２４へ進み、ウィンカがオンしていた期間を含めて前後３秒間を左車線変更行動と設定する。
【０１４２】
車両の進行方向変化が３０度以上の場合には、ステップ２１２で否定され、ステップ２１４へ進み、車両の進行方向変化が３０度以上でかつ１３５度未満か否かを判断する。ステップ２１４で肯定されると、ステップ２４０へ進み、右ウィンカがオンしたか否かを判断し、肯定されるとステップ２４２においてウィンカがオンしていた期間を含めて前後３秒間を右折行動と設定し、本ルーチンを終了する。一方、ステップ２４０で否定されると、ステップ２４４へ進みウィンカがオンしていた期間を含めて前後３秒間を左折行動と設定する。
【０１４３】
一方、ステップ２１４で否定されるとステップ２３４へ進み、右ウィンカがオンしたか否かを判断し、肯定されると、ステップ２３８においてウィンカがオンしていた期間を含め前後３秒間とステアリングが中立位置を１秒間保持するまでの期間を右転回行動に設定した後に本ルーチンを終了する。ステップ２３４で否定されると、ステップ２３６においてステップ２３８と逆、すなわちウィンカがオンしていた期間を含め前後３秒間とステアリングが中立位置を１秒間保持するまでの期間を左転回行動に設定する。なお、左転回行動は、図７の分類テーブル４１に含まれていないが、右転回行動と同様に扱うことができる。
【０１４４】
走行軌跡上に交差点が存在して上記ステップ２０６において肯定された場合には、ステップ２２６において道路形態として交差点の形態を抽出する。このステップ２２６では、上記ステップ１２８と同様に、道路形態として、信号有りで４叉路以上の交差点、信号有りで４叉路未満の交差点、信号無の交差点で優先側から進入、信号無の交差点で非優先側から進入、の何れかを道路形態と設定する。
【０１４５】
次のステップ２２８では、交差点付近でウィンカがオンしたか否かを判断する。ステップ２２８で肯定されると、ステップ２３０へ進み、交差点位置前後（例えば各々２０ｍ）の車両の進行方向の変動が３０度以上でかつ１３５度未満であるか否かを判断する。ステップ２３０で否定されると、ステップ２３２において車両進行方向変化が１３５度以上か否かを判断し、肯定されるとステップ２３４へ進み、否定されるとステップ２１６へ進む。
【０１４６】
一方、ステップ２３０で肯定されると、ステップ２４０へ進み、右ウィンカがオンしたか否かを判断し、肯定されるとステップ２４２において、ウィンカがオンしていた期間を含めて前後３秒間を右折行動と設定し、本ルーチンを終了する。一方、ステップ２４０で否定されると、ステップ２４４へ進み、ウィンカがオンしていた期間を含めて前後３秒間を左折行動と設定する。
【０１４７】
上記ステップ２２８で否定されると、ステップ２４６へ進む。ステップ２４６では、交差点位置前後における車両の進行方向の変化が３０度未満であったか否かを判断する。ステップ２４６で肯定されると、交差点真４０ｍから交差点後２０ｍ以内であった期間を交差点直進行動に設定して本ルーチンを終了する。一方、ステップ２４６で否定されると、基本行動を設定することなく、ステップ２５０においてウィンカ消し忘れ状態であることを設定し、本ルーチンを終了する。
【０１４８】
図１に示す基本行動内状態演算部４２は、基本行動分類部４０で分類した基本行動において、どのような状態を遷移したかを表す遷移の過程、すなわち基本行動内における状態を求めるものである。
【０１４９】
すなわち、基本行動内状態演算部４２では、運転状態計測装置３２による計測結果から基本行動である、乗員によってなされた車両の操縦に対する車両の挙動についてその遷移を求めるもので、基本行動内状態演算部４２は、運転状態計測装置３２の計測信号が入力されるように運転状態計測装置３２に接続され、その出力が視線配分演算部４４及び比較演算部４６に出力されるように接続されている。
【０１５０】
この基本行動内状態演算部４２は、基本行動分類部４０で分類された基本行動のシート５０に、どのような状態からどのように遷移したかを記録した状態マップ５２を生成し、出力する。この状態マップ５２を生成するために、基本行動内状態演算部４２は、基本行動を遷移を把握する。この遷移の把握は、シート５０のセル単位で行われる。言い換えれば、基本行動内状態演算部４２では、基本行動を、シート５０のセル単位で区分けし、その区分けしたセル間の関係を遷移として扱う。
【０１５１】
図１７乃至図２０には、基本行動内における状態の遷移を、セル単位で区分けするための基準の一例を示した。
【０１５２】
図１７は、右折行動または左折行動の基準行動について、すなわちシート５０Ａ，５０Ｂを、操舵角θ及び車両回頭角φを用いてセル単位で区分けするための車両の挙動に関する基準例を示した。まず、右折行動について説明する。シート５０Ａの横軸の車速に関する分類は上記図８と同様である。縦軸の車両回頭角（操舵角）は、右折準備、右折開始から右折中、右折中から右折終了の挙動で分類する。この分類基準を、本実施の形態では、ステアリングの中立位置を０度の原点としたときに、３０度、及び車両回頭角変化量の６０％の時点を採用している。
【０１５３】
すなわち、右折行動が抽出された時間区間において、ステアリングの中立位置を０度として、操舵角θが右方向に最大となった時点から遡って、右側に３０度未満となる時点以前を右折準備状態と分類する。また、右折準備状態の終了時の車両回頭角φを０とし右折行動終了時までの車両回頭角変化量をＡＣと表す。そして、右折準備状態以降で車両回頭角φが車両回頭角変化量ＡＣの６５％となった時点までを右折開始〜右折中の状態、それ以降を右折中〜右折終了の状態として分類する。
【０１５４】
なお、左折行動は、右折行動と対称的でほぼ同様であり、上述の右折を左折におきかえれば良いため、説明を省略する。
【０１５５】
また、上記基準として、ステアリングの中立位置を０度の原点としたときに、３０度、及び車両回頭角変化量の６５％の時点を採用しているが、本発明はこれに限定されるものではなく、実験的や統計的に求めた任意の角度や任意の比率を基準として採用してもよい。
【０１５６】
図１８は、右回転行動の基準行動について、すなわちシート５０Ｃを、操舵角θ及び車両回頭角φを用いてセル単位で区分けするための車両の挙動に関する基準例を示した。シート５０Ｃの横軸の車速に関する分類は上記図１０にも示したが、前進か後進かをギアレンジまたは車速で判別する。前進時の車速の詳細分類は上述と同様である。縦軸の車両回頭角（操舵角）は、右転回準備、右転回開始から右転回中、右転回中から右転回終了の挙動で分類する。この分類基準を、本実施の形態では、ステアリングの中立位置を０度の原点としたときに、１０度、３０度、及び車両回頭角変化量の９０％の時点を採用している。
【０１５７】
すなわち、右転回行動が抽出された時間区間において、交差点前１０ｍ時の車両回頭角φを０とし、車両回頭角φが右側９０度未満で、操舵角θが右方向に最大となった時点から遡って、右側に３０度未満となる時点以前を右転回準備状態と分類する。そして、分類した右転回準備状態内をさらに、ステアリングが中立位置より左側１０度未満の右側である時（準備Ａ）と、それ以外の時（準備Ｂ）の２つに分類する。次に、右転回準備状態から、右転回行動の開始時点から終了時までの車両回頭角変化量ＡＣの９０％となった時点までを右転回開始〜右転回中状態、それ以降を右転回中〜右転回終了状態として分類する。
【０１５８】
なお、本実施の形態では左転回行動について説明を省略するが、右転回行動とほぼ同様に扱うことができ、上述の右転回を左転回におきかえて分類することができる。
【０１５９】
また、上記基準として、ステアリングの中立位置を０度の原点としたときに、１０度、３０度、及び車両回頭角変化量の９０％の時点を採用しているが、本発明はこれに限定されるものではなく、実験的や統計的に求めた任意の角度や任意の比率を基準として採用してもよい。
【０１６０】
図１９は、右車線変更行動の基準行動について、すなわちシート５０Ｄを、セル単位で区分けするための車両の挙動に関する基準例を示した。まず、右側車線行動について説明する。シート５０Ｄの横軸の車速に関する分類は上記図１１と同様である。縦軸の車線の跨ぎは、右側車線変更準備、右側車線変更中の挙動で分類する。この分類基準を、本実施の形態では、車線と車両の位置関係のうち車両右端が右側車線上に至る時点を採用している。
【０１６１】
すなわち、右側車線変更行動が抽出された時間区間において、行動開始から車両右端が右側車線上に至る時点以前を右側車線変更準備状態とし、右側車線準備状態以降から行動終了までを右側車線変更中〜行動終了として分類する。この場合の行動終了は、右側車線変更が完了してから所定時間を経過（２秒後）したまでの期間を含めることが好ましい。
【０１６２】
なお、左側車線変更行動は、右側車線変更行動とほぼ同様であり、上述の右側車線変更を左側車線変更におきかえれば良いため、説明を省略する。
【０１６３】
また、上記基準として、車線と車両の位置関係のうち車両右端が右側車線上に至る時点を採用しているが、本発明はこれに限定されるものではなく、車両に予め定めた位置や車線を基準として所定距離を隔てた位置を採用してもよい。
【０１６４】
図２０は、交差点直進行動の基準行動について、すなわちシート５０Ｅを、車両と交差点の距離関係を用いてセル単位で区分けするための車両の挙動に関する基準例を示した。シート５０Ｅの横軸の車速に関する分類は上記図１２と同様である。縦軸の車両と交差点の距離関係は、通過前、通過直前、通過中と通過後の挙動で分類する。この分類基準を、本実施の形態では、車両の交差点までの距離が１０ｍ、３０ｍである時点を採用している。
【０１６５】
すなわち、交差点直進行動が抽出された期間において、交差点の基準位置を０として、行動開始から交差点の基準位置から車両の位置が１０ｍ前となる時点までを通過前状態と分類する。また、通過前状態の終了時（基準位置から１０ｍ前）から交差点の基準位置までを通過直前状態と設定する。そして、通過直前状態以降で行動終了までを通過中〜通過後の状態として分類する。
【０１６６】
なお、上記基準として、車両の交差点までの距離が１０ｍ、３０ｍである時点を採用しているが、本発明はこれに限定されるものではなく、車速に応じて距離を変更したり、交差点の基準位置を所定許容量を有するようにしてもよい。。
【０１６７】
図２１には、基本行動内状態演算部４２で記録される状態マップ５２の一例を、イメージで示した。図２１は、信号有りで４叉路以上の交差点において右折行動と分類された基本行動の状態マップ５２の例である。図２１の図中の●印と矢印はある右折行動がどのような状態を遷移したかを示している。すなわち、ステアリングが中立状態かつ時速４０ｋｍ以上から減速をし、一旦停止をした後に発進し、ステアリングを右に切り、加速しながら交差点を右折している様子を示している。
【０１６８】
各セル間の関係は、始点側のセルから終点側のセルの向きを矢印で示しており、その向きにより行動を表している。図２１では、「Ａ」は加速、「Ｄ」は減速、「Ｈ」はステアリング操舵状態変化を表す。状態マップ５２は、セル番号は基本状態における遷移についての該当状態に相当し、そのセルからの矢印の向きで行動を示すことが可能となる。従って、セル番号と行動を表すアルファベットの組合せで、個別の遷移を表すことができる。この始点側のセル番号と行動を表すアルファベットの組合せを状態マップ番号と定める。例えば、４０ｋｍ／ｈ以上の車速で右折準備状態であったセル番号１４の状態から減速行動を行った遷移を表す場合、状態マップ番号として、「１４−Ｄ」と表記する。
【０１６９】
基本行動内状態演算部４２は、この状態マップ番号を出力する。この出力の一例を次の表２に示す。
【０１７０】
【表２】

【０１７１】
図５に示す視線配分演算部４４は、基本行動分類部４０で分類した基本行動について、乗員の確認行動として乗員の視線方向を求めるものである。
【０１７２】
すなわち、視線配分演算部４４では、運転状態計測装置３２による計測結果から、基本行動内状態演算部４２による状態の遷移を含む基本行動に対応する、乗員によってなされた確認行動である視線方向を求めるもので、基本行動内状態演算部４２及び運転状態計測装置３２に接続され、その出力が比較演算部４６に出力されるように接続されている。
【０１７３】
図２２に示すように、視線配分演算部４４では、乗員の視線方向として、１０種類ＬＬ、ＦＬ、ＦＦ、ＦＲ、ＲＲ、ＢＲ、ＢＢ、ＢＬ、ＥＩ、ＥＴ、を予め定めている。まず、乗員が着座したときの正面方向を０度とし、右周りの旋回角を正符号、左周りの旋回角を負符号として、９度〜−９度までを正面方向ＦＦと定め、１７１度〜１８０と−１７１度〜−１８０とを後方方向ＢＢと定めている。
【０１７４】
また、９度〜２５度までを右前方方向ＦＲ、２５度〜１３５度までを右方向ＲＲ、１３５度〜１７１度までを右後方方向ＢＲと定め、−９度〜−２５度までを左前方方向ＦＬ、−２５度〜―１３５度までを左方向ＬＬ、―１３５度〜―１７１度までを左後方方向ＢＬと定めている。また、車両内の装備例えばインパネや計器類を向く方向ＥＩ、及びその他の方向ＥＴも定めている。その他の方向の一例には、天井、シート上に置かれたものなどに視線が向いた場合の方向がある。
【０１７５】
図２３には、視線配分演算部４４における視線方向の分類について示した。図２３（Ａ）に示すように、車両内外の装備品についての位置関係を予め記憶されたメモリから読み取り、乗員との相対関係を求めておく。図２３（Ｂ）に示すように、視線方向の分類は車両上に仮想原点Ｘを置き、乗員の視線方向を表す３次元空間での直線を前述した行動検出センサの出力より求める。求めた直線が、各ミラー領域やインパネ領域を通過する場合には方向ＢＲ，ＢＢ，ＢＬ，ＥＩの何れかを設定し、それ以外でかつガラス窓領域も通過しない場合には方向ＥＴと設定する。
【０１７６】
また、図２３（Ｃ）に示すように、方向ＬＬ，ＦＬ，ＦＦ，ＦＲ，ＲＲの各方向の分類に際しては、方向ＢＲ，ＢＢ，ＢＬ，ＥＩ、ＥＴと判定されていない場合かつ窓ガラス領域通過した場合を対象として、ドライバの頭部位置、顔向き、視線方向から、車両上の仮想原点からの角度に再計算した上で視線方向を求める。すなわち、仮想原点を中心とした半径１０ｍの円柱面と、乗員の頭部を原点とした視線方向の直線と、の交点に仮想原点を通過する直線を仮想原点での視線方向として求め、その角度を視線方向とする。
【０１７７】
視線配分演算部４４では、乗員の確認行動として乗員の視線方向について、上記１０種類に分類された各方向への１回につき所定時間以上（本実施の形態では１００ｍｓｅｃ以上）の保持時間（注視時間）を求めてテーブルを作成する。視線配分演算部４４は、この表３に示したテーブルを出力する。このテーブルの一例を次の表３に示した。表３の例は、上述の表１で出力されたものに方向と保持時間を追加した形態である。なお、視線方向と保持時間（注視時間）とにより視線配分が求められる。
【０１７８】
【表３】

【０１７９】
図５に示すように、報酬演算部４６は、基本行動分類部４０で分類した基本行動について、規範の視線配分と乗員の視線配分とを比較して、前回のアドバイス情報に従って行動した乗員の確認行動に対する報酬を求めるものである。
【０１８０】
すなわち、報酬演算部４６では、運転状態計測装置３２による計測結果、基本行動分類部４０により分類された基準行動、基本行動内状態演算部４２で求めた状態マップ５２、視線配分演算部４４で求めた視線杯配分を用いて、規範の視線配分と乗員の視線配分とを比較する。このため、報酬演算部４６は、運転状態計測装置３２の計測信号が入力されるように運転状態計測装置３２に接続されると共に、基本行動分類部４０、基本行動内状態演算部４２、視線配分演算部４４に接続され、その出力がアドバイス情報生成部４８に出力されるように接続されている。
【０１８１】
報酬演算部４６では、規範視線配分データベース５４から基準行動に対する規範視線配分マップ５６を抽出し、基準行動に対する乗員の視線方向についての視線配分マップ５８を求、これらを比較して、報酬６０を算出して、アドバイス情報生成４８へ出力する。
【０１８２】
図２４に示すように、報酬演算部４６は、視線方向結果入力部７０、算出回路７２、状態マップ番号入力部７４、呼出回路７６、及び視線配分比較採点回路７８から構成されている。
【０１８３】
視線方向結果入力部７０は、上記表３に示す状態マップ５２を入力するための入力部であり、状態マップ番号入力部７４は、上記表２に示す基本行動内状態演算部４２から出力された状態マップ番号を入力するための入力部である。視線方向結果入力部７０による状態マップ５２は算出回路７２へ出力される。算出回路７２は、基本行動と状態マップ毎に状態が遷移する規定時間（本実施の形態では２秒）前毎の視線配分、各方向毎の保持時間を算出するための回路である。
【０１８４】
この算出回路７２において、視線方向結果入力部７０の出力から視線配分及び各方向の平均保持時間を算出した算出結果の一例を、次の表４に示した。
【０１８５】
【表４】

【０１８６】
また、状態マップ番号入力部７４による状態マップ番号は呼出回路７６へ出力される。呼出回路７６は、基本行動と状態マップ毎の規範となる視線配分の対応などを記憶した規範視線配分データベース５４を備えており、その規範視線配分データベース５４から状態マップ番号の入力に応じたデータを参照するための回路である。この規範視線配分データベース５４には、例えば、視線配分得点表（図２５）や、各視線方向における平均保持時間の得点表を記憶している。
【０１８７】
図２５には、規範視線配分データベース５４に記憶した視線配分得点表の一例を示した。図２５は、信号有りで４叉路以上の交差点での右折行動かつ状態マップ番号１４―Ｄの規範視線配分マップを示すもので、この視線配分得点表は、視線方向と、規定時間（本実施の形態では２秒）内における視線配分の割合と、得点との対応関係を表している。次の表５には、図２５のマップのうち、右折行動かつ状態マップ番号１４―Ｄの規範視線配分マップの数値一覧を示した。
【０１８８】
【表５】

【０１８９】
呼出回路７６は状態マップ番号入力部７４からの出力から、視線配分得点表（例えば、図２５）、各方向の平均保持時間の得点表を規範視線配分データベース５４より取り出して、出力する。
【０１９０】
なお、この視線配分得点表や、各視線方向における平均保持時間の得点表は、運転指導員などの専門家による確認行動の判定基準を教師あり学習手法により得られたものを使用することが好ましい。また、上記得点表は、運転指導員などの専門家の運転時のデータからその頻度分布を用いて算出したものを使用しても良い。
【０１９１】
上記算出回路７２及び呼出回路７６の出力側は、報酬計算回路７８の比較側に接続される。報酬計算回路７８は、算出回路７２と呼出回路７６の出力を比較し、採点結果を出力するための回路であり、その出力すなわち比較結果はアドバイス情報生成部４８へ入力されるように接続される。
【０１９２】
報酬計算回路７８は、次の数１０に示すように、算出回路７２より出力される基本行動（Ｌ）と状態マップ毎の視線配分（ｇＬ，ｉ，ｋ）、平均保持時間（ｈＬ，ｉ，ｋ）と、呼出回路７６より出力される最高得点が「１」とした参照マップＧＬ，ｊ、ＨＬ，ｊから、方向ｉ毎に得点（ＱＬ，ｊ，ｋ）を内挿し算出する。ここで添字ｉは方向を添字ｊは基本行動（Ｌ）中の状態マップ番号、添字ｋは基本行動において、任意の１つの基本行動中に状態マップ番号が遷移した順番を示し、ｋ番目の状態マップ番号はｊである。またＮをｋの最大値とし、Ｎは基本行動中に状態マップ番号が遷移した回数となる。
【０１９３】
【数１０】

【０１９４】
そして、基本行動（Ｌ）での判定値を、次の数１１とする。
【０１９５】
【数１１】

【０１９６】
報酬計算回路７８は、この計算結果の最低値を報酬としてアドバイス情報生成装置４８に出力する。なお、本実施の形態では、報酬計算回路７８は、そのときの基本行動（Ｌ）、状態マップ番号（ｊ）と、参照マップＧ_Ｌ，ｊ（ｇ_{Ｌ，ｉ，ｋ}）と、参照マップＨ_Ｌ，ｊ（ｈ_{Ｌ，ｉ，ｋ}）と、の各々の値もアドバイス情報生成装置４８に出力する。
【０１９７】
図５に示すように、アドバイス情報生成部４８は、個人特性データベース６４、及び事故事例データベース６２を備えており、報酬演算部４６から出力された報酬から、乗員（ドライバ）の確認行動に対するアドバイス情報を生成する。
【０１９８】
すなわち、アドバイス情報生成部４８では、運転状態計測装置３２による計測結果に基づいて演算された報酬演算部４６による報酬を用いて、乗員の視線配分に対するアドバイス情報を生成する。このため、報酬演算部４６の出力がアドバイス情報提示装置３６に出力されるように接続されている。なお、アドバイス情報生成部４８は、運転状態計測装置３２の計測信号が入力されるように運転状態計測装置３２に接続されている。
【０１９９】
図２６に示すように、アドバイス情報生成部４８は、乗員（ドライバ）の氏名を入力するための入力回路８０を備えており、入力回路８０は個人特性データベース６４を備えた個人ＤＢ処理回路８２に接続されている。個人ＤＢ処理回路８２は、事故事例データベース６２を備えた事例ＤＢ処理回路８４に接続されている。
【０２００】
また、アドバイス情報生成部４８は、報酬演算部４６から出力された報酬を入力するための報酬入力部８６を備えており、報酬入力部８６の出力側は、報酬に基づいてアドバイス情報を生成するアドバイス情報生成回路９０に接続されている。
【０２０１】
なお、アドバイス情報生成回路９０の出力側は、個人ＤＢ処理回路８２及びアドバイス情報提示用出力信号生成回路９２に接続されている。また、アドバイス情報提示用出力信号生成回路９２の出力側は、個人ＤＢ処理回路８２及びアドバイス情報提示装置３６に接続されている。
【０２０２】
報酬入力部８６から報酬が入力されると、アドバイス情報生成回路９０は、図３１に示すアドバイス情報生成処理ルーチンを実行する。本アドバイス情報生成ルーチンは、前述した第１の実施の形態にかかる行動パターン生成処理ルーチンと同様な部分を有するので、同一部分には同一の符号を付してその詳細な説明を省略する。
【０２０３】
即ち、報酬入力部８６から報酬が入力されると、図３１に示すアドバイス情報生成処理ルーチンがスタートし、ステップ３０２で、入力した報酬に基づいて、前回生成出力したアドバイス情報が最適か否かを判断する。前回のアドバイス情報が最適な場合には、ステップ３０４で、前回のアドバイス情報が最適である旨を示す情報をアドバイス情報提示装置３６に出力する。一方、前回のアドバイス情報が最適でない場合には、ステップ３０６で、入力した報酬に基づいて、乗員の確認行動が許容範囲内か否かを判断する。乗員の確認行動が許容範囲内である場合には、前述した第１の実施の形態にかかる行動パターン生成処理ルーチンにおけるステップ５４２〜ステップ５５４と同様の処理を実行する。
【０２０４】
即ち、ステップ５４２で、報酬演算部４６により演算された報酬を用いて状態評価値Ｖπ（ｓ_ｔ）を上記式（数１）から計算し、ステップ５４４で、ＴＤ誤差δｔを上記式（数２）から計算し、ステップ５４６で、評価値Ｔ（ｓ_ｔ）を上記式（３）より計算する。
【０２０５】
ステップ５４８で、試行値Δａ（ｓ_ｔ）を上記式（数４）より計算する。なお、本実施の形態におけるＡ（ｓ）は、視線配分の基準値（基準行動指針情報）から所定の確率により定まる値（以下、基準値から定まる値という）であり、視線配分は各個人により差があるので、本実施の形態では、各々異なる代表的な複数の視線配分を基準値として記憶する。なお、各々異なる代表的な複数の視線配分の基準値各々は変数ｉにより識別される。
【０２０６】
また、前述したように、数５により示される式は，基準値が乗員に対して有効であれば，アドバイス情報を基準値に接近させるように学習をさせ，基準値が乗員に対してあまり有効でないようであれば，ランダムな試行錯誤でアドバイス情報を探索させるようにすること意味している。即ち、報酬が正で、これによりＴＤ誤差δｔが正で、評価値Ｔ（ｓ_ｔ）も正の場合には、ρ（Ｔ（ｓ_ｔ））は１に近い値をとり、ρ（−Ｔ（ｓ_ｔ））は０に近い値をとる。即ち、前回のアドバイス情報に基づいて変化した乗員の視線配分に対する評価が予め定められた良好評価範囲の場合には、予め定められた上記基準値に基づいて、前回のアドバイス情報からの変化量を定める。より詳細には、基準値から定まる値と前回のアドバイス情報との差に基づいて、更に詳細には、基準値から定まる値と前回のアドバイス情報との差の変化量の値への影響を、所定の確率に基づいて定められるアドバイス情報の変化量の値への影響より大きくして、変化量を定めている。
【０２０７】
なお、基準値が乗員に対してあまり有効でなく，ランダムな試行錯誤でアドバイス情報を探索させるとしても、無制限な値を取り得るわけには行かず、基本行動に対する許容範囲を予め定めておき、許容範囲内においてランダムな値をとる。
【０２０８】
ステップ５５０で、状態価値Ｖ（ｓ_ｔ）を上記式（数６）より更新し、ステップ５５２で、乗員へのアドバイス情報ａ（ｓ_ｔ）を上記式（数７）より計算し、ステップ５５４で、アドバイス情報ａ（ｓ_ｔ）を出力する。
【０２０９】
上記ステップ３０６で、乗員の確認行動が許容範囲内でない場合には、前回のアドバイス情報を求めるために用いた視線配分（基準値）が適正でないと判断できるので、ステップ３０８で、視線配分の基準値を識別する変数ｉを１インクリメントし、変数ｉが基準値の総数より大きいか否かを判断する。変数ｉが基準値の総数より大きくない場合には、ステップ３０８で１インクリメントされた変数ｉで、ステップ５４２〜ステップ５５４を実行する。
【０２１０】
なお、本実施の形態では、アドバイス情報生成回路９０は、報酬入力部８６を介して入力した、基本行動（Ｌ）、状態マップ番号（ｊ）、参照マップＧＬ，ｊ（ｇＬ，ｉ，ｋ）、参照マップＨＬ，ｊ（ｈＬ，ｉ，ｋ）、の値もアドバイス情報提示用出力信号生成回路９２に出力する。
【０２１１】
アドバイス情報生成回路９０は、参照マップＧ_Ｌ，ｊ（ｇ_{Ｌ，ｉ，ｋ}）と、参照マップＨ_Ｌ，ｊ（ｈ_{Ｌ，ｉ，ｋ}）と、の値より、値が低い視線方向ｉを抽出した後、アドバイス情報提示用出力信号生成回路９２、及び個人ＤＢ処理回路８２に、基本行動（Ｌ）及び状態マップ番号（ｊ）と、抽出結果（視線方向ｉ）を出力する。
【０２１２】
個人ＤＢ処理回路８２は、入力回路８０からの乗員の氏名に対応して乗員毎に、アドバイス情報生成回路９０から出力されたアドバイス情報を記録する。すなわち、入力回路８０により入力された乗員を識別するための識別データから、個人ＤＢ処理回路８２におけるその乗員の個人特性データベース６４に対してアクセス可能とし、アドバイス情報生成回路９０からのアドバイス情報を乗員毎に記録する。これによって、個人ＤＢ処理回路８２の個人特性データベース６４には、報酬の良否に拘わらず全てのアドバイス情報を記憶することができ、報酬の良否に対応して該当するアドバイス情報を抽出（検索）することができる。
【０２１３】
例えば、個人ＤＢ処理回路８２は、アドバイス情報提示装置３６に対して、過去の良好と認められない確認行動の検索結果を出力する機能を有することができる。この場合、過去の良好と認められない確認行動の累積データが個人特性データベース６４に存在しているので、検索命令の入力によって個人特性データベース６４から該当するデータを抽出し、これを検索結果として出力すればよい。
【０２１４】
アドバイス情報提示用出力信号生成回路９２は、アドバイス情報生成回路９０の出力であるアドバイス情報を、アドバイス情報提示装置３６に表示させるための表示信号を生成する回路である。なお、本実施の形態では、アドバイス情報提示用出力信号生成回路９２は、アドバイス情報以外の他の上記情報を、判定結果提示装置３６へ表示させるための表示信号も生成可能である。
【０２１５】
図２７に示すように、アドバイス情報提示装置３６が車両内のインパネ上に装備する。この場合、アドバイス情報提示装置３６は安全確認適正度メータとして機能し、アドバイス情報提示用出力信号生成回路９２は、このアドバイス情報提示装置３６に対して提示用信号出力を行う。図２７の左方には、この安全確認適正度メータとして機能する判定結果提示装置３６の一例の表示画面３００を示した。表示画面３００では、現在までの確認行動についての判定結果を棒グラフ形式で時系列的に表示した場合を示している。従って、判定結果提示装置３６の表示画面３００には、現在までの確認行動について、逐次最新の判定結果が表示される。
【０２１６】
表示画面３００には、詳細ボタン３０２が含まれている。この詳細ボタン３０２は、判定結果の詳細を提示するための指示ボタンである。なお、この詳細ボタン３０２は、乗員による要求指示があった場合でかつ車両の停車中のみ、その要求を受け付けるものとし、その結果図２８に一例を示す画像による提示を行う。また、この場合、判定結果提示装置３６はタッチパネル形式とすることで、乗員の入力指示が容易となる。
【０２１７】
図２８には、詳細ボタン３０２が指示されたときに提示する一例の詳細画面３０４を示した。図２８では、確認行動を、車両の挙動に応じた時系列的な遷移をイメージとして提示した場合を示している。また、詳細画面３０４は、その遷移の過程において判定結果のうち特に注意しなければならない確認行動や、統計的に注意を促すべき確認行動のアドバイス情報３０６を共に提示した場合を示している。なお、上記のように、アドバイス情報が最適である場合には、乗員の確認行動も最適であり、その旨、表示する。
【０２１８】
また、詳細画面３０４には、乗員の確認行動の癖を提示する要求を指示する癖ボタン３０８を含んでいる。この癖ボタン３０８は、乗員による要求があった場合でかつ車両が停車中にのみ要求を受け付けて情報を提示する。
【０２１９】
図２９には、癖ボタン３０８が指示されたときに提示する一例の癖表示画面３１０を示した。図２９では、個人特性データベース６４を参照して、乗員の確認行動について、良否の少なくとも一方のデータを提示した場合を示している。すなわち、個人特性データベースから検索された最も回数の多い良好でない確認行動の基本行動（Ｌ）、状態マップ番号（ｊ）、視線方向（ｉ）から順に、所定数（図の例では２つ）を提示し、また個人特性データベースから検索された最も回数の多い良好な確認行動の基本行動（Ｌ）、状態マップ番号（ｊ）、視線方向（ｉ）を所定数（図の例では１つ）を提示することができる。
【０２２０】
また、癖表示画面３１０には、事故占いボタン３１２が含まれている。この事故占いボタン３１２は、乗員による要求があった場合でかつ車両が停車中にのみ要求を受け付けて事故占いの結果を提示する。事故占いとは、乗員の過去の運転行動から乗員が陥りやすい緊急状態を、事故事例を基にして提示するものである。例えば、事例ＤＢ処理回路８４に含まれる事故事例データベース６２として、公知である危険予知トレーニング等のように事故事例の行動と道路形態と危険な方向のフラグのついた事故事例データベースを具備する。そして、個人特性データベースから検索された最も回数の多い悪い確認行動の基本行動（Ｌ）、状態マップ番号（ｊ）、視線方向（ｉ）を使用して事故事例データベース６２を検索し、ドライバが悪い確認行動を起こしやすい事例に適した事故事例、危険予知トレーニング例を出力する。
【０２２１】
図３０には、事故占いボタン３１２が指示されたときに提示する一例の事故占い結果表示画面３１４を示した
なお、判定結果提示用出力信号生成回路９２は、判定結果提示装置３６として拡声器が設置された場合に、それに対応する、すなわち音声情報を生成することができる。例えば、判定結果提示用出力信号生成回路９２は、走行中に確認行動の判定結果を通知する際、乗員が音声出力を希望する設定にしておけば、判定結果提示装置３６を利用し、音声にて“先ほどの右折での確認行動は○点です。”等のアナウンスを行うこともできる。
【０２２２】
また、ナビゲーションシステムを装備し、そのルートガイドを使用する時には、各基本行動においてドライバが犯しやすいミスを個人特性データベースから検索し、ルート案内時に音声により気をつける点を提示することも可能である。
【０２２３】
このように本実施の形態では、乗員の運転状態に対応して基本行動を分類し、その基本行動に対する確認行動を評価（判定）し、アドバイス情報を提示している。この提示は、運転に影響の少ない範囲で提示しているので、確認行動の不足点や誤った点を運転中に指摘して、乗員の運転を強制することがない。また、常時乗員の確認行動を監視し、乗員の癖を把握した上で、運転に支障をきたすことの無い時点で、乗員に不適格な確認行動があった場面や良い確認行動をしていることを教示することができるので、安全行動が向上する。また、常に確認行動が監視することで、効果的に確認行動を良好な方向へ促すことができる。これによって事故などの緊急状態へ至ることの低減を図ることができる。従って、安全性を確保しつつ乗員の確認行動を評価することができる。
【０２２４】
また、基本行動のどの状態にいる場合にどの方向をどの程度見るべきかの一般的知識、例えば、右にステアリングを切る前に前もって右を充分見ておく、高速道路で頭部を大きく左右にふる確認を長くすべきでない、などを定量的に評価、判定することが可能である。
【０２２５】
なお、上記実施の形態では、基本行動としてシートに分類して用いて説明したが、本発明は、シートに限定されるものではなく、基本行動を把握可能なベクトル表現形式の数式を用いてもよい。
【０２２６】
以上説明したように、本実施の形態では、前回のアドバイス情報ａ（ｓ_ｔ）から今回のアドバイス情報ａ（ｓ_ｔ）への変化量を、視線配分の基準値に対する有効度合いに基づいて決定させている、即ち、学習初期の段階でも，ランダムな試行による所望の行動パターンの探索をしていないので，学習効率がよく，とくに状態の次元数が大きくなっても，最適解を得やすく，実用性に高い。
【０２２７】
また、視線配分の基準値を複数定めておき、どれかの基準値に従うという行動出力の決め方をするので，１つ１つの基準値付近の探索コストを下げることができる。
【０２２８】
以上説明した実施の形態では、基準値と前回のアドバイス情報との差の変化量の値への影響を、所定の確率に基づいて定められるアドバイス情報の変化量の値への影響より大きくして、変化量を定めて、次回のアドバイス情報を計算している（数４参照）。しかし、本発明はこれに限定されるのではなく、前述した式（数８）のように、基準値と前回のアドバイス情報との差のみに基づいて変化量を定めて、次回の行動出力を計算するようにしてもよい。
【０２２９】
前述した実施の形態では、アドバイス情報提示装置は、最初に安全確認適正度を表示すると共に詳細ボタンを表示し、詳細ボタンが指示されたときにアドバイス情報を提示しているが、本発明はこれに限定されるものではなく、最初にアドバイス情報を提示するようにしてもよい。
【０２３０】
なお、前述した実施の形態では、上記行動は、車両が所定運転状態のときの該車両を操縦する乗員の視線配分としているが、本発明はこれに限定されるものではなく、視線方向、注視時間等の、注視配分以外の注視行動や、ハンドル操作等にも同様に適用可能である。
【０２３１】
【発明の効果】
以上説明したように請求項１記載の発明は、以前の制御量に基づいて変化した環境に対する評価が予め定められた良好評価範囲の場合には、予め定められた基準制御量に基づいて、以前の制御量からの変化量を定めることにより、次回の制御量を算出するので、効率良く次回の制御量を得ることができる、という効果を有する。
【０２３２】
また、請求項１２記載の発明は、以前の行動指針情報に基づいて変化した環境に対する評価が予め定められた良好評価範囲の場合には、予め定められた基準行動指針情報に基づいて、以前の行動指針情報からの変化量を定めることにより、次回の行動指針情報を算出するので、以前の行動指針情報をランダムに変化させて、次回の行動指針情報を算出するよりは、効率良く次回の行動指針情報を得ることができる、という効果を有する。
【図面の簡単な説明】
【図１】本実施の形態に係る「人間−ロボット−作業環境」構造を示す概念図である。
【図２】ロボットの装置構成全体図である。
【図３】ロボット行動計画装置が実行する行動パターン生成処理ルーチンを示すフローチャートである。
【図４】実験結果を示す図である。
【図５】本発明の実施の形態に係るアドバイス情報生成提示装置の概略構成を示すブロック図である。
【図６】本発明の実施の形態に係るアドバイス情報生成提示装置の基本構成を示すブロック図である。
【図７】アドバイス情報生成提示装置において基本行動を分類する分類テーブルを示すイメージ図である。
【図８】右折行動のシートを示すイメージ図である。
【図９】左折行動のシートを示すイメージ図である。
【図１０】右転回行動のシートを示すイメージ図である。
【図１１】右側車線変更行動のシートを示すイメージ図である。
【図１２】交差点直進行動のシートを示すイメージ図である。
【図１３】確認行動判定装置の基本行動分類部で基本行動を分類する処理の過程を示すフローチャートである。
【図１４】図１３のステップ１１６の詳細を示すフローチャートである。
【図１５】図１３のステップ１１２の詳細を示すフローチャートである。
【図１６】確認行動判定装置の基本行動分類部で、方向指示器の信号を用いて基本行動を分類する処理の過程を示すフローチャートである。
【図１７】右折行動または左折行動について、操舵角θ及び車両回頭角φを用いてセル単位で区分けするための車両の挙動に関する基準例の説明図である。
【図１８】右転回行動について、操舵角θ及び車両回頭角φを用いてセル単位で区分けするための車両の挙動に関する基準例の説明図である。
【図１９】車線変更行動について、セル単位で区分けすることに関する基準例の説明図である。
【図２０】交差点直進行動について、セル単位で区分けすることに関する基準例の説明図である。
【図２１】状態マップを示すイメージ図である。
【図２２】乗員の視線方向の範囲を示すイメージ図である。
【図２３】視線配分演算部における視線方向の分類について示し、（Ａ）、（Ｂ）は車両内外の装備品についての位置関係をを示し、（Ｃ）は窓ガラス領域通過した場合の説明図である。
【図２４】確認行動判定装置に含まれる比較演算部の詳細を示すブロック図である。
【図２５】規範視線配分データベースに記憶した視線配分得点表の一例を示すイメージ図である。
【図２６】確認行動判定装置に含まれる表示用処理部の詳細を示すブロック図である。
【図２７】車両内に装備された判定結果提示装置一例を示すイメージ図である。
【図２８】判定結果提示装置に提示される確認行動についての判定結果の詳細を示すイメージ図である。
【図２９】乗員の癖に関するデータの癖表示画面の一例を示すイメージ図である。
【図３０】事故占い結果表示画面の一例を示すイメージ図である。
【図３１】アドバイス情報生成処理ルーチンを示すフローチャートである。
【符号の説明】
１００ロボット１００
５１２力検出センサ５１２
５１４教示幾何学情報生成装置
５１６協調運動計画装置
５１８ロボット動作制御装置
５２０入力装置
５２２ロボット行動計画装置
５２４位置検出センサー
１０アドバイス情報生成提示装置
３２運転状態計測装置
３４アドバイス情報生成装置
３６アドバイス情報提示装置
４０基本行動分類部
４２基本行動内状態演算部
４４視線配分演算部
４６報酬演算部
４８アドバイス情報演算部
５０シート
５２状態マップ
５４規範視線配分データベース
５６規範視線配分マップ
５８視線配分マップ
６０得点結果
６２事故事例データベース
６４個人特性データベース[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an environment change device and an action guide information generation / presentation device, and more particularly, to an environment change device that changes an environment by controlling change means based on a control amount, and an action guideline as an action guide. The present invention relates to an action guideline information generation and presentation device that calculates information and presents the calculated action guideline information.
[0002]
[Prior art]
Conventionally, according to the reinforcement learning method performed in, for example, robot control as environment change control of an environment change device, when a robot selects and performs a certain action in a certain state, it is called a state value function or an action value function. By selecting the action that maximizes one of these value functions by sequentially calculating how much the robot can be rewarded in the future, that is, the expected value of the profit, the desired action can be obtained. It is described that a pattern (a control amount of the next robot) can be obtained (see Non-Patent Document 1).
[0003]
[Non-patent document 1]
"Control of Grasp of Robot Hand by Reinforcement Learning", Denki Kagaku C, Vol. 121, No. 4, pp. 710-717 (2001.4)
[0004]
[Problems to be solved by the invention]
However, particularly in the initial stage of learning without information on the environment, a search for a desired behavior pattern (a desired control amount of the robot) must be performed by random trials. The efficiency of obtaining the next action pattern (the next robot control amount) from the (control amount) is low. In particular, when the number of dimensions of the state became large, the optimal solution could not be obtained in a short time to obtain the next action pattern, and the practicality was put on.
[0005]
The present invention has been made in view of the above facts, and has as its object to provide an environment change device and an action guide information generation / presentation device capable of efficiently obtaining the next control amount or action guide information.
[0006]
[Means for Solving the Problems]
In order to achieve the above object, an environment changing device according to the first aspect of the present invention includes a changing unit that changes an environment, a control unit that controls the changing unit such that the environment changes based on a control amount, Calculating means for calculating a control amount, wherein the calculating means is configured to determine whether an environment changed based on a previous control amount is within a predetermined good evaluation range. The next control amount is calculated by determining the amount of change from the previous control amount based on the determined reference control amount.
[0007]
The calculating means of the present invention calculates the control amount of the control means, and the control means controls the changing means based on the control amount so that the environment changes.
[0008]
Here, when calculating the control amount for changing the environment to a desired state, the previous control amount may be changed at random. However, even if the previous control amount is changed at random, the optimum solution cannot be obtained efficiently.
[0009]
Therefore, when the evaluation of the environment changed based on the previous control amount is within a predetermined good evaluation range, the calculating means according to the present invention determines the previous control amount based on the predetermined reference control amount. The next control amount is calculated by determining the amount of change from. The reference control amount is a reference value for calculating the next control amount.
[0010]
Note that the previous control amount may be a control amount determined based on the reference control amount.
[0011]
Further, the calculation means may determine the amount of change based on a difference between a value determined based on a predetermined probability from the reference control amount and a previous control amount. In this case, as in claim 3, the calculating means determines the influence of the difference between the value determined based on the predetermined control amount from the reference control amount and the previous control amount on the value of the change amount based on the predetermined probability. The amount of change may be determined by making the control amount larger than the influence of the amount of change of the control amount determined as described above. On the other hand, a value determined based on a predetermined probability from the reference control amount as in claim 4 The change amount may be determined based only on the difference between the control amount and the previous control amount.
[0012]
As described above, when the evaluation of the environment changed based on the previous control amount is within the predetermined good evaluation range, the amount of change from the previous control amount is determined based on the predetermined reference control amount. As a result, the next control amount is calculated, so that the next control amount can be obtained more efficiently than when the previous control amount is randomly changed and the next control amount is calculated.
[0013]
Incidentally, the predetermined reference control amount may not always be an appropriate reference value for calculating the next control amount.
[0014]
Therefore, as in claim 6, each of the control means has a plurality of different predetermined reference control amounts, and the calculating means controls the previous control amount determined based on one of the plurality of reference control amounts. If the evaluation of the environment changed based on the amount is out of the predetermined good evaluation range, the previous reference control amount other than the one reference control amount out of the plurality of reference control amounts is used. The next control amount may be calculated by determining the amount of change from the control amount.
[0015]
The reference control amount may be obtained as follows.
[0016]
That is, as in claim 7, further comprising: operating means for operating, and calculating means for calculating the control amount from operation information obtained by operating the operating means, wherein the control means Based on the control amount calculated by the means, the changing means is controlled so that the environment changes, and the calculation means calculates the reference control amount from the operation information as in claim 8.
[0017]
The environment indicates a motion state of the object, for example, a stationary state of the object.
[0018]
Further, the environment changing device according to any one of claims 1 to 10 may be a robot provided with robot hardware as changing means as in claim 11.
[0019]
The action guideline information generation / presentation device according to the twelfth aspect of the present invention includes a calculation unit for calculating action guideline information serving as a guideline for action, and a presentation unit for presenting the action guideline information calculated by the calculation unit. An action guideline information generating and presenting device, wherein the calculating means, when the evaluation of the action changed based on the previous action guideline information is within a predetermined good evaluation range, sets the predetermined reference action guideline information. The next action guideline information is calculated by determining the amount of change from the previous action guideline information based on the above.
[0020]
The calculating means calculates action guideline information serving as a guideline of the action, and the presentation means presents the action guideline information calculated by the calculating means.
[0021]
Here, when calculating the action guideline information for turning the action into a desired action, the previous action guideline information may be randomly changed. However, even if the previous action guideline information is randomly changed, the optimal solution cannot be efficiently obtained.
[0022]
Therefore, the calculation means according to the present invention, when the evaluation of the behavior changed based on the previous action guideline information is a predetermined good evaluation range, based on the predetermined reference action guideline information, By determining the amount of change from the action guideline information, the next action guideline information is calculated. Note that the reference action guideline information is a reference value for calculating the next reference action guideline information.
[0023]
The previous action guideline information may be action guideline information determined based on the reference action guideline information.
[0024]
Further, the calculation means determines the amount of change based on a difference between a value determined based on a predetermined probability from the reference action guideline information and the previous action guideline information, thereby providing a next action guideline. Information may be calculated. In this case, as in claim 14, the calculating means determines the influence of the difference between the value determined based on the predetermined probability from the reference action guideline information and the previous action guideline information on the value of the change amount by a predetermined value. The change amount may be determined by making the action guideline information determined based on the probability larger than the influence on the value of the change amount. On the other hand, a predetermined probability is determined from the reference action guideline information as in claim 15. The amount of change may be determined based only on the difference between the value determined based on the above and the previous action guideline information.
[0025]
As described above, when the evaluation of the environment changed based on the previous action guideline information is within the predetermined good evaluation range, the change from the previous action guideline information is performed based on the predetermined reference action guideline information. By determining the amount, the next action guideline information is calculated, so it is more efficient to obtain the next action guideline information than to randomly change the previous action guideline information and calculate the next action guideline information. it can.
[0026]
Incidentally, the predetermined reference action guideline information may not always be an appropriate reference value for calculating the next control amount.
[0027]
In view of the above, the present invention has a plurality of different reference action guide information different from each other, and the calculating means is determined based on one of the plurality of reference action guide information. If the evaluation of the environment changed based on the previous action guideline information is out of the predetermined good evaluation range, another standard action guideline other than the one of the plurality of reference action guideline information is used. The next action guideline information may be calculated by determining the amount of change from the previous control amount based on the information.
[0028]
Note that the above-mentioned action may be an action of an occupant operating the vehicle when the vehicle is in a predetermined driving state. The occupant's behavior includes, for example, gaze behavior such as gaze direction, gaze time, and gaze distribution, and steering wheel operation.
[0029]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings.
[0030]
As shown in FIG. 1, the robot 100 according to the present embodiment is premised on the structure of “human-robot-work environment (operation target and the like)” and information flow.
[0031]
As shown in FIG. 2, a robot 100 as an environment changing device according to the present embodiment includes a human-operated force on a not-shown operated portion (operating means), which is a portion operated by a human, the environment, and the environment. In the embodiment, for example, a force detection sensor 512 that detects a reaction force from a part on a desk that grabs or moves the object from the object is provided. That is, the detection sensor 12 is mounted on the tip of a known robot hardware (robot manipulator) as a changing means, and a finger is attached to the tip. The tips of the fingers are covered with a urethane coating to avoid rigid contact with the object to be operated.
[0032]
In addition, the robot 100 generates geometry information as teaching information to the robot based on information on the operating force detected by the force detection sensor 512, and inputs the information to a teaching geometry information generating device 514 as a calculation unit. In the human / robot cooperative work mode set by the input device 520 to perform the cooperative motion of the robot based on the geometry information obtained from the teaching geometry information generation device 514, Is also obtained from the teaching geometry information generation device 514 in the autonomous operation mode only by the robot alone set by the mode designation information input by the input device 520. Desired interaction force pattern between robot and working environment based on geometric information The robot action planning device 522 as a calculation means, which is autonomously generated, and the cooperative motion planning device 516 determined by the set work mode or the command (action output described later) from the robot action planning device 522 controls the operation of the robot. A robot operation control device 518 is provided as control means for outputting to the robot hardware.
[0033]
As described above, the input device 520 is connected to the cooperative motion planning device 516 and the robot behavior planning device 522, and the robot behavior planning device 522 further detects the operation state of the robot motion control device 518, and An operation state detection sensor 524 for detecting the operation state of the moving robot / hardware by the operation control device 518 and the above-described force detection sensor 512 are connected.
[0034]
Next, the operation of the present embodiment will be described.
[0035]
In the present embodiment, as an example, an object located at a starting point position on a desk (not shown) is moved in a horizontal direction (X direction) by a robot finger contacted by a human in accordance with a human operation, and a designated position is moved. The case of stopping at (end point position) will be described.
[0036]
First, as a preparation, a start point position and a designated position (end point position) to which an object is moved are input as follows. That is, in the present embodiment, for the operation of the robot, the actor-critic method, which is one of the reinforcement learning methods based on a TD (temporal difference) error described later, is used. In the actor-critic method, after outputting control information for controlling the operation of the robot, a reward for the operation result of the robot operated by the control information is received, and the next control information is calculated. Therefore, rewards and penalties are input via the input device 520 as follows. That is, using the above example, the object on the desk was moved from the stationary state (start point position) toward the target position (end point position), and the object could be stopped within a certain range centered on the target position. In this case, 1 is given as a reward, and conversely, when the stationary object is out of the predetermined range including the target position or when the finger slips from the object, the vertical direction (Z direction) based on the force information detected by the force detecting sensor. Is set to -1 when the maximum movable range of the robot is reached, such as when a force larger than a predetermined value (allowable value) is input to the. As described above, the start point position and the target position (end point position), which are the positions of the object on the desk, are input by setting the reward and the penalty as described above.
[0037]
Next, the human / robot cooperative work mode will be described.
[0038]
When the human / robot cooperative work mode is set by the input device 520, and the human operates the operated portion so that the object on the desk is moved from the start position to the target position (end position) by the robot hardware, the force detection sensor 512 Detects the operation force on the operated part and the resultant force of the reaction force from the object accordingly. The teaching geometry information generation device 514 outputs geometric information to the robot based on the force information (teaching information) detected by the force detection sensor 512, in the above example, the moving speed in the X direction (teaching action A (s) described later). ). The cooperative motion planning device 516 calculates a control amount of the robot operation control device 518 for the robot / hardware to perform a cooperative motion based on the geometric information obtained from the teaching geometry information generating device 514, and calculates the control amount. The output is output to the robot operation control device 518, and the robot operation control device 518 is controlled so that the robot hardware cooperates.
[0039]
The operation state detection sensor 524 detects the operation state of the robot operation control device 518 when the robot hardware performs a cooperative motion, thereby detecting the operation state of the robot hardware moved by the robot operation control device 518. In the above example, the operation state detection sensor 524 detects the position information of whether the object on the desk is located at the target position (end point position) from the start point position by the robot hardware.
[0040]
The teaching geometry information generating apparatus 514 stores the generated moving speed as a teaching action. In the present embodiment, a plurality of different teaching actions (moving speed in the above example) are performed by a human performing the above processing a plurality of times. ) Is generated and stored.
[0041]
Next, the autonomous work mode will be described. The autonomous mode is a mode in which the robot controls itself and moves an object. In the human / robot cooperative operation mode, the operation force on the operated portion and the resultant force of the reaction force from the object are detected, and the object can be positioned at the target position. However, in the autonomous mode, when the robot moves the object, the force detection sensor 512 detects only the reaction force from the object. Therefore, it is difficult to position the object at the target position only by the above teaching action. Therefore, in the autonomous mode, the control amount (action output a (s)) of the robot operation control device 518 for positioning the object at the target position is calculated as follows based on the teaching action.
[0042]
That is, when the autonomous mode is set by the input device 520, the robot behavior planning device 522 executes the behavior pattern (interaction force pattern) generation processing routine shown in FIG.
[0043]
In a first step 532, a variable i for identifying each of the plurality of teaching actions (moving speed in the above example) is initialized to 0, and in a step 534, the variable i is incremented by 1 and the variable i is the total number of teaching actions I It is determined whether it is within.
[0044]
If the variable i is larger than the total number I of the teaching actions, it is determined that the optimal solution cannot be obtained for all the teaching actions, and this routine ends, or the mode shifts to a mode for searching for a completely random value. On the other hand, when the variable i is within the total number I of the teaching actions, it is determined whether or not the calculation of the action output to be described later is the second or later since the start of this routine. If it is determined that the calculation is to be performed, an action output a (s), which is a control amount of the robot operation control device 518, for the robot hardware to act. _t ) Is calculated as a value determined by a predetermined probability, and the flow advances to step 554.
[0045]
At step 554, the calculated action output a (s) _t ) Is output to the robot operation control device 518. This action output a (s _t According to ()), the robot operation control device 518 controls the operation of the robot hardware.
[0046]
Here, as described above, the action output a (s) calculated from the above equation _t ) Indicates the moving speed at which the robot finger moves the object. That is, the robot finger outputs the action output a (s) calculated from the above equation. _t ), The object is moved from the start point position in the X direction at the above-mentioned movement speed, and is stopped at the designated position (end point position).
[0047]
Step 556 determines the reward r _{t + 1} Is calculated. That is, the operation state detection sensor 524 detects the operation state of the robot operation control device 518 when the robot hardware behaves. Specifically, the object on the desk is within a certain range around the target position. It is detected whether or not it was able to be stopped still. Therefore, based on the detection result of the operation state detection sensor 524, the reward r _{t + 1} Is calculated. When the object slips from the robot finger while moving the object on the desk to the target position, a human gives a reward via the input device 520. Note that a contact sensor may be provided on the robot finger to determine whether or not the object has slipped, and the reward may be calculated based on the output of the contact sensor. Further, based on the force information detected by the force detection sensor, when a force larger than a predetermined value (allowable value) is input in the vertical direction (Z direction), based on the force information detected by the force detection sensor, A reward may be calculated.
[0048]
In step 558, the reward r calculated in step 556 _{t + 1} It is determined whether or not the object on the desk has been able to stop at the target position (end point position) based on. In the case of the first action output, normally, this step 558 is negatively determined. In this case, in step 560, the reward r calculated in step 556 is obtained. _{t + 1} And the finger slips from the object in a range other than the range including the target position, or a predetermined value (allowable value) in the vertical direction (Z direction) based on the force information detected by the force detection sensor. It is determined whether a large force has been input or the robot has reached the limit movable range.
[0049]
If it has not reached the limit movable range of the robot, the process proceeds to step 542.
[0050]
In step 542, state S under a certain policy π, that is, the probability of selecting an action from the state. _t State evaluation value V ^π (S _t ) Is calculated from the following equation.
[0051]
(Equation 1)

[0052]
In step 544, the TD error δt is calculated from the following equation.
[0053]
(Equation 2)

[0054]
In the above equation, the state value V (s _t ) Is the state value Vπ (S _t ) Is a discount rate. Also, V (s _{t + 1} ) Is the current state value V (s _t ) Is the estimated value of the state value of the next state. Therefore, the TD error δt is a value at the next time point advanced in time.
[0055]
In step 546, the evaluation value T (s _t ) Is calculated from the following equation.
[0056]
[Equation 3]

[0057]
At step 548, the trial value Δa (s _t ) Calculate from the following formula.
[0058]
(Equation 4)

[0059]
Here, n indicates a trial assuming a Gaussian distribution. Also, A (s _t ) Is an amount determined based on a predetermined probability distribution (hereinafter referred to as “teaching action”), with an average (offset) and a variance (variation) of a predetermined amount from the teaching action (reference control amount) described above. It is a fixed value).
[0060]
In the above equation,
[0061]
(Equation 5)

[0062]
Trains the robot so that the action output value of the robot approaches the value determined from the teaching action if the teaching action is effective for the environment, and performs random trial and error if the teaching action is not very effective. This means searching for the target output of the robot. That is, the reward r _{t + 1} Is positive, whereby the TD error δt is positive and the evaluation value T (s _t ) Is also positive, ρ (T (s _t )) Takes a value close to 1 and ρ (−T (s _t )) Takes a value close to 0. That is, when the evaluation of the result of the movement of the object based on the previous action output is within a predetermined good evaluation range, the amount of change from the previous action output is determined based on the predetermined teaching action. More specifically, based on the difference between the value determined from the teaching action and the previous action output, more specifically, the effect of the difference between the value determined from the teaching action and the previous action output on the value of the amount of change, The amount of change is determined to be greater than the effect of the action output on the value of the amount of change determined based on the predetermined probability.
[0063]
At step 550, the state value V (s _t ) Is updated from the following equation.
[0064]
(Equation 6)

[0065]
In step 552, an action output a (s), which is a control amount of the robot operation control device 518, for the robot hardware to act. _t ) Is calculated from the following equation.
[0066]
(Equation 7)

[0067]
Here, β is a step size parameter, δ _t Is a TD (Temporal Difference) error (details will be described later). The first term on the right side of the above equation, a (s _t ) Is the previous action output a (s _t ). Therefore, βδ which is the second term on the right side of the above equation _t Δa (s _t ) Is the previous action output a (s) _t ) To the current action output a (s) _t ).
[0068]
Then, as described above, at step 554, the calculated action output a (s) _t ) Is output to the robot operation control device 518. This action output a (s _t According to ()), the robot operation control device 518 controls the operation of the robot hardware. That is, as described above, the robot finger outputs the action output a (s) calculated from the above equation. _t ), The object is moved from the start point position in the X direction at the above-mentioned movement speed, and is stopped at the designated position (end point position).
[0069]
In step 556, as described above, the reward r _{t + 1} Is calculated, and in step 558, the reward r calculated in step 556 is calculated. _{t + 1} It is determined whether or not the object on the desk could be stopped at the target position (end point position) based on the above. If the object on the desk could be stopped at the target position (end point position), The routine is terminated with the moving speed, which is the obtained action output, as the optimum value.
[0070]
On the other hand, if the object on the desk cannot be stopped at the target position (end point position) and reaches the limit movable range of the robot in step 560 and is determined to be out of the allowable range, the teaching identified by the variable i Since the action is considered to be inappropriate as a teaching action for obtaining the amount of change, the procedure returns to step 534, and the variable i is incremented by one as described above, and an optimum solution is obtained for other teaching actions.
[0071]
Next, a description will be given of an actual machine experiment by simulation of the above embodiment. The average of the number of episodes required to achieve the task (can be stopped within a certain range centered on the target position) in 100 simulations in each of the case where teaching information is given and the case where teaching information is not given And the standard deviation are shown in Table 1.
[0072]
[Table 1]

[0073]
As shown in Table 1 above, when a teaching action that is appropriate for handling the environment is given, a stable solution is achieved faster than when no teaching action is given. FIG. 4 shows the result of the behavior of the robot finger based on the action output that has completed learning from the viewpoint of task achievement. FIG. 4 shows the speed of the object in the X direction, the distance from the designated position, and the change in the contact force between the object and the tip of the robot finger. The mass of the object is 150 [g], the coefficient of static friction between the object and the finger is 1.00, the coefficient of static friction between the object and the floor is 0.20, and the coefficient of dynamic friction is 0.18.
[0074]
As described above, in the present embodiment, the previous action output a (s _t ) To the current action output a (s) _t ) Is determined based on the effectiveness of the teaching behavior with respect to the environment. In other words, even in the initial stage of learning, a search for a desired behavior pattern by random trial is not performed. In particular, even if the number of dimensions of the state becomes large, it is easy to obtain the optimal solution, and the practicality is high.
[0075]
Also, in the preparation stage, a plurality of pieces of teaching information are input, different teaching actions are requested, and an action output of following one of the teaching actions is determined. Therefore, it is possible to lower the search cost in the vicinity of each teaching action. it can.
[0076]
In the embodiment described above, the influence on the value of the change amount of the difference between the value determined from the teaching action and the previous action output is determined by the influence on the value of the change amount of the action output determined based on the predetermined probability. By increasing the value, the amount of change is determined, and the next action output is calculated (see Equation 4). However, the present invention is not limited to this. Even if the amount of change is determined based on only the difference between the teaching action and the previous action output, the next action output may be calculated as follows. Good.
[0077]
That is, for example, in place of the equation (4), for the teaching action identified by the variable i,
[0078]
(Equation 8)

[0079]
The value may be determined by a certain value.
[0080]
(Equation 9)

[0081]
It is. Note that u is the evaluation value T (s).
[0082]
In the embodiment described above, the one-dimensional horizontal movement of the object has been described as an example of the environment. However, the present invention is not limited to this, and the two-dimensional or three-dimensional movement of the object may be performed. You may apply similarly.
[0083]
In each work part of the work composed of a plurality of work parts, whether the work part is performed in the human / robot cooperative work mode or the autonomous mode by the robot is examined, and the work part performed in the autonomous mode is described above. As described above, the teaching action is once obtained in the human / robot cooperative operation mode, and when the optimal solution is obtained, the operation is performed in the autonomous mode, whereby the work efficiency can be improved.
[0084]
Next, an example of the second embodiment of the present invention will be described in detail. The present embodiment generates advice information as action guide information serving as a guideline for an action to a passenger based on a determination result of whether or not an occupant (driver) confirms the behavior of the vehicle at the time of traveling. The present embodiment is applied to a presentation device.
[0085]
As shown in FIG. 6, the advice information generation / presentation apparatus 10 according to the present embodiment includes a computer in which a CPU 12, a ROM 14, a RAM 16, and an input / output interface 18 are connected by a bus so that data and commands can be exchanged. And can be mounted on a vehicle. The input / output interface 18 is connected to a memory 20 storing various data.
[0086]
The input / output interface 18 is connected to various sensors 22, a media read / write device 24, a printer 26 as an output device, a keyboard 28 as an input device, and a monitor 30 as a display device. The printer 26 is not an essential component, and is connected when printing the evaluation result of the advice information generation / presentation apparatus 10 of the present embodiment. Further, the media read / write device 24 is not an essential component, and is connected when reading / writing of a processing routine described later, reading / writing of evaluation result data of the advice information generation / presentation device 10, and reading / writing of various data.
[0087]
The various sensors 22 include a behavior detection sensor for detecting the behavior of the vehicle and a behavior detection sensor for detecting the behavior of the occupant of the vehicle as the driving state measuring device. Examples of the behavior detection sensor include a steering angle sensor, a vehicle speed sensor, a yaw rate sensor, a brake pedal depression amount sensor, an accelerator pedal depression amount sensor, a blinker sensor, and the like. In addition, a vehicle may include a detection sensor such as a lane detecting device that detects a lane in order to grasp the behavior of the vehicle relative to a lane on a traveling road such as a road, and a GPS or the like in order to grasp the position of the vehicle. A position detection sensor can be included. As an example of the lane detecting device, there is a technology disclosed in JP-A-08-261756. Further, as in a navigation system, a route guidance device that has map data and guides a route of a vehicle or stores a route history can be included.
[0088]
As an example of the behavior detection sensor, there are a detection device that detects the sight line direction and the face direction of the occupant, and an identification device that identifies the occupant based on an image of the occupant. Examples of these devices include the technology disclosed in Japanese Patent Application Laid-Open No. 11-276438 and Japanese Patent Application Laid-Open No. 2000-168502 and commercially available products (faceLab manufactured by seeingmachines).
[0089]
As shown in FIG. 5, the advice information generation / presentation device 10 utilizes the above hardware resources and software resources described below, and is measured by the operation state measurement device 32 that measures the operation state and the operation state measurement device 32. The advice information generation device 34 as a calculation unit that generates advice information based on the quality of the occupant's confirmation action using the measured value of the driving state, and the advice information generation device 34 that presents the generated advice information. Is functionally classified into the advice information presenting device 36.
[0090]
Note that the advice information presenting device 36 is a presenting device that prints or displays the determination result of the occupant's confirmation action, and is an example of a printing device such as a printer, a display device such as a display, and a loudspeaker such as a speaker. As
[0091]
The driving state measuring device 32 included in the advice information generation and presentation device 10 is a device that measures a driving state including a vehicle behavior and an occupant's behavior, and detects the behavior of the vehicle and the behavior of the occupant of the vehicle from detection values of various sensors. The operating state including is measured.
[0092]
That is, the behavior of the vehicle is grasped by a steering angle sensor, a vehicle speed sensor, a yaw rate sensor, a brake pedal depression amount sensor, an accelerator pedal depression amount sensor, a blinker sensor, and the like, and the driving state is obtained. In addition, the lane on a traveling road such as a road is grasped by a detection value of a lane detection device or the like, and the position, direction, and surroundings of the vehicle are detected by a position detection value such as a GPS (sanitary positioning system) or a GIS (geographic information system). The situation can be grasped. In addition, the driving state measuring device 32 may measure the behavior of the occupant from the detection value of the occupant's line of sight and face direction, and may perform measurement for identifying the occupant from an image of the occupant. it can.
[0093]
The advice information generation device 34 determines an occupant's confirmation action from the measurement result of the driving state measurement device 32, and generates advice information. The basic action classification unit 40, the basic action in-state calculation unit 42, and the gaze distribution calculation unit 44, a reward calculator 46, and an advice information generator 48. The basic behavior classifying unit 40, the basic behavior inside state computing unit 42, the gaze distribution computing unit 44, the comparison computing unit 46, and the display processing unit 48 are configured to receive the measurement signal of the driving state measuring device 32, It is connected to the operating state measuring device 32.
[0094]
The basic behavior classification unit 40 obtains the behavior of the vehicle using the measurement signal output from the driving state measuring device 32 and the road form at the time of this behavior, and determines the driving state by the occupant by combining the behavior and the road form. It is classified into basic actions consisting of The basic behavior classification unit 40 is connected to the operation state measurement device 32 so that the measurement signal of the operation state measurement device 32 is input, and the output is output to the basic behavior state computation unit 42 and the comparison computation unit 46. It is connected to the.
[0095]
The basic behavior classification unit 40 first obtains the behavior (action) of the vehicle using the measurement signal output from the driving state measurement device 32, and then determines the road configuration when the behavior (action) of the vehicle is performed. Is obtained using the measurement signal output from the operation state measurement device 32. A combination of the behavior of the vehicle and the road form is defined as a basic behavior. Then, the basic behavior classification unit 40 classifies the driving state by the occupant corresponding to the measurement signal output from the driving state measurement device 32 into a basic behavior including a combination of behavior and road form.
[0096]
FIG. 7 shows an example of a basic behavior classification table 41 composed of a combination of the above-described vehicle behavior and road form. In the present embodiment, as an example of the behavior of the vehicle, the behavior (action) of six vehicles such as right turn, left turn, right lane change, left lane change, straight ahead at an intersection, and right turn is adopted. In addition, as an example of the road form, intersections with four or more intersections with a signal, intersections with less than four intersections with a signal, intersections without a signal enter from a priority side, and intersections without a signal enter an intersection from a non-priority side. Five road forms of intersection are adopted.
[0097]
Further, in the present embodiment, 21 basic actions are adopted as the basic actions as indicated by “O” marks in FIG. Therefore, based on the measurement result by the driving state measuring device 32, the driving state by the occupant is classified into any one of the 21 basic actions.
[0098]
Further, in the present embodiment, each of the basic actions including a combination of the behavior (action) of the vehicle and the road form is made to correspond to the seat 50. The seat 50 is an information aggregate for expressing the basic behavior including the behavior of the vehicle and the transition of the road form. 8 to 12 show an example of the sheet 50 used for the basic behavior classified by the basic behavior classification unit 40 in the present embodiment. These sheets 50 are stored in the memory 20 of the advice information generation / presentation device 10.
[0099]
FIG. 8 shows an example of a sheet for basic behavior classified by right-turn behavior. That is, a right turn action at an intersection with four or more intersections with a signal, a right turn action at an intersection with less than four intersections with a signal, a right turn action when entering from a priority side at an intersection without a signal, and an intersection without a signal It is an example of a sheet 50A used when the basic action is classified as any one of a right turn action when entering from a non-priority side and a right turn action at a non-intersection.
[0100]
The seat 50A is divided into a plurality of cells having a range defined by a vehicle speed (speed axis) at a predetermined interval on the horizontal axis and a vehicle turning (steering steering axis) at a predetermined interval on the vertical axis. Each cell is assigned a cell number for specifying the position of the cell. In the example of FIG. 8, each cell is represented by a two-digit number near the lower left. Here, the first digit is represented as a range of vehicle turning (steering steering axis) by numbers “1” to “3” for identifying a right turn preparation, a right turn start to right turn, and a right turn to right turn end behavior. I have. In addition, the second digit is a vehicle speed (speed axis) with brake on at 0 to 1 km / h, brake off at 0 to 1 km / h or 1 to 10 km / h, 10 to 25 km / h, 25 to 40 km / h, The number for identifying the range of 40 km / h or more is represented by “0” to “4”. Therefore, for example, when the vehicle speed is 10 km / h to 25 km / h and the vehicle is turning right from the start of the right turn, the cell number “22” is corresponded.
[0101]
FIG. 9 shows an example of a sheet for basic behavior classified by left-turn behavior. That is, a left-turn action at an intersection with four or more intersections with a signal, a left-turn action at an intersection with less than four intersections with a signal, a left-turn action when entering from a priority side at an intersection without a signal, or an intersection without a signal It is an example of the sheet 50B used when the basic action is classified as any one of the left turn action when entering from a non-priority side and the left turn action at a non-intersection.
[0102]
Similarly to the seat 50A, the seat 50B is divided into a plurality of cells having a range defined by a vehicle speed (speed axis) at a predetermined interval on the horizontal axis and a vehicle turning (steering steering axis) at a predetermined interval on the vertical axis. Similarly, the first digit of the cell number of the seat 50B is represented by “1” to “3” for identifying the range of the vehicle turning (steering steering axis), and the second digit is represented by the vehicle speed (speed). The range for identifying the range (axis) is represented by “0” to “4”.
[0103]
FIG. 10 is an example of a sheet for basic behavior classified as right turning behavior. That is, a right-turning action at an intersection with four or more intersections with a signal, a right-turning action at an intersection with less than four intersections with a signal, a right-turning action when entering from a priority side at an intersection without a signal, It is a sheet 50C used when the basic action is classified as any one of the right-turning action when entering from the non-priority side at the intersection and the right-turning action at the non-intersection.
[0104]
The seat 50C is divided into a plurality of cells having a range defined by a vehicle speed (speed axis) at predetermined intervals on the horizontal axis and a vehicle turning (steering steering axis) at predetermined intervals on the vertical axis. The basic action of the seat 50C is a turning action, and it is conceivable that the behavior of the vehicle may include various actions such as turning back, so the cells are divided as follows. As the range of the vehicle turning (steering steering axis), two preparation stages, preparation A and preparation B, are provided from the start of the action to the preparation for turning right. Preparation B is a behavior from the left steering operation to the steering neutral position, and preparation A is a behavior from the right steering operation to the steering neutral position. Then, the behavior is divided into a behavior from the start of the right turn to the right turn, and a behavior from the right turn to the end of the right turn. To identify them, the first digit of the cell number is represented by the numbers “0” to “3”. Since the vehicle speed (speed axis) includes the backward movement, a range of the backward movement is provided, and the cell number is given a minus sign "-" to give a difference from the forward movement. In other words, the second digit is the vehicle speed (speed axis), reverse, brake on at 0-1 km / h, brake off at 0-1 km / h or 1-10 km / h, 10-25 km / h, 25-40 km / H, a number for identifying a range of 40 km / h or more is represented by “−1”, “0” to “4”.
[0105]
FIG. 11 shows a seat 50D used when the basic behavior is classified as the right lane change behavior at a non-intersection. The seat 50D is divided into a plurality of cells in which the range is defined by the vehicle speed (speed axis) at a predetermined interval on the horizontal axis and the lane straddling state (based on the start of straddling) on the vertical axis. In addition, the seat 50 corresponding to the left lane changing action is the same as the seat 50D, and the lane to straddle may be the left side.
[0106]
The basic action of the seat 50D is a lane change action, which may include a behavior such as high-speed running of the vehicle. Therefore, the number of cells for the vehicle speed (speed axis) is increased. That is, the range of 40 km / h or more in the above cell is changed from 40 km / h to 70 km / h, and the range of 70 km / h or more is added. Therefore, as the cell number, the first digit is represented by the numbers “1” and “2” for identifying the lane change preparation and the behavior during the lane change as the lane crossing range. In addition, the second digit is a vehicle speed (speed axis) with brake on at 0 to 1 km / h, brake off at 0 to 1 km / h or 1 to 10 km / h, 10 to 25 km / h, 25 to 40 km / h, 40 km / h to 70 km / h, and a number for identifying a range of 70 km / h or more is represented by “0” to “5”.
[0107]
FIG. 12 shows an example of a sheet classified according to an intersection straight traveling motion. That is, an intersection straight ahead movement at an intersection with four or more intersections with a signal, an intersection straight ahead movement at an intersection with less than four intersections with a signal, an intersection straight ahead movement when entering from a priority side at an intersection without a signal, and no signal The sheet 50E is used when the basic action is classified as any one of the intersection straight traveling motions when the vehicle enters the intersection from the non-priority side.
[0108]
The seat 50E is divided into a plurality of cells in which the abscissa indicates the vehicle speed (speed axis) at predetermined intervals and the ordinate indicates the vehicle position with respect to the center of the intersection at predetermined intervals. The first digit of the cell number of the sheet 50E is represented by a number “1” to “3” for identifying the range of any vehicle position before, immediately before, or during the intersection. Are represented by numbers “0” to “4” for identifying the range of the vehicle speed (speed axis).
[0109]
Next, an example of an algorithm for classifying the basic behavior in the basic behavior classification unit 40 will be described with reference to FIGS.
[0110]
As shown in FIG. 13, in the basic action classifying unit 40, a processing routine for classifying the basic action is executed, and the process proceeds to step 102, where the traveling data of the vehicle is read. In the next step 104, the traveling locus of the vehicle is obtained from the traveling data from the driving state measuring device 32, and the traveling locus is compared with the map data stored in advance. The traveling locus can be extracted from the navigation system, and the map data can be obtained from map data or GIS used in the navigation system. The collation between the traveling locus and the map data is performed using the geometric data of the traveling locus and the map data.
[0111]
In the next step 106, it is determined whether or not there is an intersection on the running locus. If the result is negative, the road form is set to a non-intersection in the next step 108, and It is determined whether or not the trajectories intersect. If the determination in step 110 is affirmative, the routine ends in step 112 after setting a left-turn action or a right-turn action in a right-left turn action determination routine described later.
[0112]
In step 112 after step 108, the basic action is set to either a left turn action at a non-intersection or a right turn action at a non-intersection (described later).
[0113]
If a negative determination is made in step 110, the process proceeds to step 114, in which it is determined whether the traveling direction has been reversed across the road centerline within the boundary of the same road. If the determination in step 114 is affirmative, the routine ends after the action related to the right-turning action is determined in step 116 by the right-turning action determination routine described later.
[0114]
In step 116 after step 108, the basic action is set to either a right-turning action at a non-intersection or a right-turning action at an intersection (described later).
[0115]
If the determination in step 114 is negative, the process proceeds to step 118, where it is determined whether the vehicle has crossed the right lane. If the determination is affirmative, in step 120, the vehicle has crossed the right lane for 4 seconds. The next two seconds are set as a right lane change, and this routine ends. In step 120, a right lane change action at a non-intersection is set as a basic action.
[0116]
If the determination in step 118 is negative, the process proceeds to step 122, where it is determined whether the vehicle has crossed the left lane of the vehicle. If the determination is affirmative, in step 124, the vehicle has crossed the left lane for 4 seconds. The next two seconds are set as a left lane change, and the routine ends. In step 124, a left lane change action at a non-intersection is set as a basic action.
[0117]
If a negative determination is made in step 122, the process proceeds to step 126, where the vehicle is set to run on the road, and the routine ends. In this step 126, it is not necessary to set the basic action. However, when the confirmation action is evaluated for the road running, the setting may be made.
[0118]
If there is an intersection on the traveling locus and the result in step 106 is affirmative, in step 128, the form of the intersection is extracted as the road form. In step 128, first, the presence or absence of a signal at the intersection is determined with reference to the map data. Next, when there is a signal, it is determined whether or not the intersection is a four-way intersection or more, and when there is no signal, it is determined whether the approach mode is from a priority side or a non-priority side. Thereby, in step 128, as the road form, the intersection with four or more intersections with a traffic light, the intersection with less than four intersections with a traffic light, the intersection with no traffic light enters from the priority side, and the intersection with no traffic light from the non-priority side Entry is set as the road form.
[0119]
In the next step 130, it is determined whether the variation in the traveling direction of the vehicle is not less than 30 degrees and less than 135 degrees before and after the vehicle passes the intersection (for example, when traveling 20 m each). When the result in step 130 is affirmative, the routine proceeds to step 112, where a right / left turn behavior determination routine is executed.
[0120]
Here, as basic actions set in step 112 via step 128, intersections with four or more intersections with a signal, intersections with less than four intersections with a signal, and intersections without a signal enter from the priority side, A combination of a road form and one of a left-turn action and a right-turn action is set for either of the intersections with no traffic light and the approach from the non-priority side.
[0121]
In addition, in the determination of step 130, the variation amount in the vehicle traveling direction is set to 30 degrees or more and less than 135 degrees, but is not limited to this numerical value, and is not limited to this value. What is necessary is just to set the upper limit and the lower limit of the angle in the vehicle traveling direction. This numerical value may be obtained experimentally or statistically.
[0122]
If a negative determination is made in step 130, the process proceeds to step 132, where it is determined whether or not the traveling direction has been reversed across the road centerline within the boundary of the same road. If the determination in step 132 is affirmative, the routine proceeds to step 116, where a right-turning behavior determination routine is executed.
[0123]
Here, basic actions set in step 116 after step 128 include intersections at four or more intersections with a signal, intersections with less than four intersections with a signal, and intersections without a signal at the priority side. A combination of a road form and a right-turning action is set for either of the intersections with no traffic light and the approach from the non-priority side.
[0124]
If a negative decision is made in step 132, the routine proceeds to step 134, where the period within 60 m before the intersection and within 30 after the intersection is set as the straight-forward movement of the intersection, and this routine is ended. In this step 134, any of intersections with four or more intersections with a signal, intersections with less than four intersections with a signal, entry from a priority side at an intersection without a signal, entry from a non-priority side at an intersection without a signal, The combination of the road form and the behavior of the vehicle traveling straight ahead is set as the basic behavior.
[0125]
In the above routine, the lane change is excluded near the intersection, assuming the case where the basic action is classified based on the classification table 41 shown in FIG. 7, but the present invention is not limited to this. Needless to say, the case where the change is made may be added to the basic action.
[0126]
As shown in FIG. 14, in step 116, a right-turning behavior determination routine is executed.
[0127]
In the right-turning behavior determination routine, first, in step 136, it is determined whether an intersection has been added. That is, as described above, the determination is made based on the determination result of step 106, and if the determination is negative (non-intersection), the process proceeds to step 138;
[0128]
In step 138, the period from the point 60 m before crossing the road center line to the point 20 m away from the point crossing the road center line is set as the right turning action. In step 140, the period from the point 60 m before the intersection where the traveling direction of the vehicle is changed to the right to the point 30 m after the intersection is set as the right turning action.
[0129]
As shown in FIG. 15, in step 112, a right / left turn behavior determination routine is executed.
[0130]
In the right / left turn behavior determination routine, first, in step 142, the vehicle traveling direction change before and after passing through the intersection (when traveling 20 m each) is on the right side with respect to the traveling direction or on the road boundary line on the right side in the traveling direction. It is determined whether or not the traveling trajectories intersect. That is, it is determined whether the behavior of the vehicle is a right-turn action or a left-turn action.
[0131]
If the answer is affirmative in step 142, the process proceeds to step 144 to determine whether or not the vehicle has passed an intersection. If the determination in step 144 is affirmative, the routine proceeds to step 146, in which the period from the point 60 m before the intersection where the traveling direction is changed to the right to the point 30 m after the intersection is set as the right-turning action, and the routine ends.
[0132]
On the other hand, if the result in step 144 is negative, it is determined in step 148 whether or not the vehicle moves from inside the road boundary to outside the boundary. If the determination in step 148 is affirmative, the routine proceeds to step 150, in which a period from 60 m before crossing the center line of the road to one second after exiting the road boundary is set as a right-turning action, and the routine ends. If the determination in step 148 is negative, that is, if the vehicle enters the boundary from outside the road boundary, the process proceeds to step 152, and the period from 2 seconds before entering the road boundary to the point at which the vehicle travels 20 m across the road center line is defined as a right-turn action. Set and end this routine.
[0133]
If the answer is negative in step 142, the flow advances to step 154 to determine whether or not the vehicle has passed an intersection. If the determination in step 154 is affirmative, the process proceeds to step 156, in which the period from the point 60 m before the intersection where the traveling direction is changed to the left to the point 30 m after the intersection is set as the left-turning action, and the routine ends.
[0134]
On the other hand, if the result in step 154 is negative, it is determined in step 158 whether or not the vehicle moves from inside the road boundary to outside the boundary. If an affirmative determination is made in step 158, the process proceeds to step 160, in which a period from 60 m before crossing the road center line to 1 second after going out of the road boundary is set as a left-turn action, and the routine ends. If the determination in step 158 is negative, that is, if the vehicle enters the boundary from outside the road boundary, the process proceeds to step 162, and the period from 2 seconds before entering the road boundary to the point at which the vehicle travels 20 m across the road center line is defined as a left-turn action. Set and end this routine.
[0135]
As described above, for example, from the map data of the intersection where the azimuth change of the vehicle at 20 m before and after the intersection is 30 degrees or more and less than 135 degrees after passing the intersection, and the intersection shape at that time is acted on, For example, when it is determined that there is an intersection with a signal at four or more intersections, a period from a point 60 m before the intersection where the traveling direction is changed to the right to 30 m after the intersection is extracted, and a four or more intersection with a signal is extracted. The basic action of turning right at the intersection is extracted.
[0136]
Therefore, the basic action classifying unit 40 classifies basic actions such as a right turn action at an intersection beyond a certain intersection with a signal, after the occupant's action, according to the behavior of the vehicle or the road form according to the action, and sets a sheet for each of the basic actions. 50 is selected and output to the basic behavioral state calculation unit 42.
[0137]
In the above description, the case where the basic action is set from the traveling locus and the map data has been described. However, the present invention is not limited to this, and the basic action is set using any of the measurement values of the driving state measurement device 32. An action may be set. As an example, a case where a basic action is set using a signal of a turn signal (direction indicator) will be described.
[0138]
As shown in FIG. 16, the basic action classifying unit 40 executes a processing routine for classifying the basic action using the blinker signal, and proceeds to step 202 to read the traveling data of the vehicle. In the next step 204, the traveling locus of the vehicle is obtained from the traveling data from the driving state measuring device 32, and the traveling locus is compared with the map data stored in advance.
[0139]
In the next step 206, it is determined whether or not there is an intersection on the traveling locus. If the result is negative, the road configuration is set to a non-intersection in the next step 208, and in the next step 210, the turn signal (direction indicator) is set. ) Is turned on) (whether or not it has been instructed.) If a negative determination is made in step 210, road running is set in step 218, and this routine ends.
[0140]
On the other hand, if affirmative in step 210, it is determined in step 212 whether the change in the traveling direction of the vehicle one second before and one second after the blinker was turned on is less than 30 degrees. If affirmative in step 212, the process proceeds to step 216, where it is determined whether the vehicle has crossed the lane. If negative, the process proceeds to step 218, and if affirmative, the process proceeds to step 220.
[0141]
In step 220, it is determined whether or not the right blinker has been turned on. If the result is affirmative, in step 222, the right lane changing action is set for 3 seconds before and after including the period in which the blinker was turned on, and this routine is ended. I do. On the other hand, if the result in step 220 is negative, the process proceeds to step 224, and the left lane change action is set for 3 seconds before and after including the period in which the blinker was on.
[0142]
If the change in the traveling direction of the vehicle is 30 degrees or more, the determination in step 212 is negative, and the process proceeds to step 214 to determine whether the change in the traveling direction of the vehicle is 30 degrees or more and less than 135 degrees. If affirmative in step 214, the process proceeds to step 240, in which it is determined whether the right blinker has been turned on. If affirmative, in step 242, a right turn action is set for 3 seconds before and after including the period in which the blinker was turned on. Then, this routine ends. On the other hand, if a negative determination is made in step 240, the process proceeds to step 244, in which the left-turn action is set for three seconds before and after the period including the period in which the blinker was on.
[0143]
On the other hand, if the determination in step 214 is negative, the process proceeds to step 234, in which it is determined whether or not the right blinker has been turned on. If the determination is affirmative, the steering is neutralized for 3 seconds before and after the period in which the blinker was on in step 238. After the period until the position is held for one second is set to the right turning action, this routine ends. If a negative determination is made in step 234, the process proceeds to step 236 to set the left-turning action in the reverse of step 238, that is, for a period of 3 seconds before and after including the period when the blinker is on and a period until the steering wheel holds the neutral position for 1 second. Note that the left turning action is not included in the classification table 41 of FIG. 7, but can be handled in the same manner as the right turning action.
[0144]
If there is an intersection on the traveling locus and the result in step 206 is affirmative, in step 226, the form of the intersection is extracted as the road form. In this step 226, as in step 128, as the road form, an intersection with four or more intersections with a signal, an intersection with less than four intersections with a signal, an intersection without a signal, entering from the priority side, and an intersection without a signal , One of the approaches from the non-priority side is set as the road form.
[0145]
In the next step 228, it is determined whether or not the blinker has been turned on near the intersection. If the determination in step 228 is affirmative, the process proceeds to step 230, where it is determined whether the variation in the traveling direction of the vehicle before and after the intersection position (for example, each 20 m) is 30 degrees or more and less than 135 degrees. If negative in step 230, it is determined in step 232 whether the change in the vehicle traveling direction is 135 degrees or more. If affirmative, the process proceeds to step 234, and if negative, the process proceeds to step 216.
[0146]
On the other hand, if affirmative in step 230, the process proceeds to step 240, in which it is determined whether or not the right blinker has been turned on. If affirmative, in step 242, a right turn is made for three seconds before and after including the period in which the blinker was on. The action is set, and the routine ends. On the other hand, if the result in step 240 is negative, the process proceeds to step 244, in which a left-turn action is set for 3 seconds before and after, including the period when the blinker is on.
[0147]
If the result is negative in step 228, the process proceeds to step 246. In step 246, it is determined whether or not the change in the traveling direction of the vehicle before and after the intersection position is less than 30 degrees. If the determination in step 246 is affirmative, the period within the range of 40 m from the true intersection and within 20 m after the intersection is set as the intersection straight traveling motion, and the routine ends. On the other hand, if the determination in step 246 is negative, the routine sets that the blinker has been forgotten to be erased in step 250 without setting the basic action, and ends this routine.
[0148]
The state calculation unit 42 in the basic action shown in FIG. 1 obtains a transition process indicating what state has been changed in the basic action classified by the basic action classification unit 40, that is, a state in the basic action. .
[0149]
In other words, the state-in-basic-operation calculating unit 42 calculates the transition of the behavior of the vehicle with respect to the operation of the vehicle performed by the occupant, which is the basic action, from the measurement result by the driving state measuring device 32. Reference numeral 42 is connected to the driving condition measuring device 32 so that the measurement signal of the driving condition measuring device 32 is input, and is connected so that the output thereof is output to the gaze distribution calculation unit 44 and the comparison calculation unit 46.
[0150]
The in-basic-behavior state calculating unit 42 generates and outputs a state map 52 in which a state and a transition are recorded on the basic behavior sheet 50 classified by the basic behavior classifying unit 40. In order to generate the state map 52, the state-in-basic-action calculating unit 42 grasps the transition of the basic action. This transition is grasped for each cell of the sheet 50. In other words, the basic behavior in-state calculation unit 42 classifies the basic behavior in units of cells of the sheet 50 and treats the relation between the divided cells as transition.
[0151]
FIGS. 17 to 20 show an example of a criterion for classifying the state transition in the basic behavior in units of cells.
[0152]
FIG. 17 illustrates a reference example of the reference behavior of the right turn behavior or the left turn behavior, that is, a reference example of the behavior of the vehicle for dividing the

seats

50A and 50B into cells using the steering angle θ and the vehicle turning angle φ. First, the right turn behavior will be described. The classification regarding the vehicle speed on the horizontal axis of the seat 50A is the same as in FIG. The vehicle turning angle (steering angle) on the vertical axis is classified according to the behavior of right turn preparation, right turn start to right turn, right turn to right turn end. In this embodiment, when the neutral position of the steering wheel is set as the origin of 0 degree, the classification criterion employs a time point of 30 degrees and a time point of 60% of the vehicle turning angle change amount.
[0153]
That is, in the time section in which the right-turn action is extracted, the neutral position of the steering is set to 0 degree, and the time before the time when the steering angle θ becomes the maximum in the right direction and before the time when the steering angle θ becomes less than 30 degrees to the right is set to the right-turn preparation state. Classify as Further, the vehicle turning angle φ at the end of the right turn preparation state is set to 0, and the amount of change of the vehicle turning angle until the right turn action ends is represented as AC. Then, the vehicle is classified as a right turn start to right turn state from the right turn preparation state to a point when the vehicle turning angle φ becomes 65% of the vehicle turning angle change AC, and is classified as a right turn middle to right turn end state thereafter.
[0154]
Note that the left turn action is symmetrical and almost the same as the right turn action, and the right turn described above may be replaced with the left turn, and a description thereof will be omitted.
[0155]
Further, as the reference, when the neutral position of the steering wheel is set as the origin of 0 degree, a time point of 30 degrees and 65% of the amount of change in the turning angle of the vehicle is adopted, but the present invention is not limited to this. Instead, any angle or any ratio determined experimentally or statistically may be used as a reference.
[0156]
FIG. 18 illustrates a reference example of the reference action of the right turning action, that is, the reference action of the vehicle behavior for dividing the seat 50C into cells using the steering angle θ and the vehicle turning angle φ. Although the classification relating to the vehicle speed on the horizontal axis of the seat 50C is also shown in FIG. 10, it is determined whether the vehicle is moving forward or backward based on the gear range or the vehicle speed. The detailed classification of the vehicle speed when moving forward is the same as described above. The turning angle (steering angle) of the vehicle on the vertical axis is classified according to the behavior of right turning preparation, right turning starting right turning, right turning end right turning. In this embodiment, when the neutral position of the steering wheel is set as the origin of 0 °, the classification criterion adopts 10 °, 30 °, and 90% of the vehicle turning angle change amount.
[0157]
That is, in the time section in which the right-turning action is extracted, the vehicle turning angle φ at 10 m before the intersection is set to 0, the vehicle turning angle φ is smaller than 90 degrees on the right side, and the steering angle θ becomes maximum rightward. Backward, before the point where the angle is less than 30 degrees to the right, it is classified as a right turn preparation state. Then, the inside of the classified right-turn preparation state is further classified into two cases: when the steering is on the right side less than 10 degrees to the left of the neutral position (preparation A), and at other times (preparation B). Next, from the right-turning preparation state, the right-turning start to the right-turning state is performed until the time when the vehicle turning angle change amount AC becomes 90% from the start point to the end point of the right-turning action. Classified as a right-turned end state.
[0158]
In the present embodiment, the description of the left-turning behavior is omitted, but the left-turning behavior can be treated in substantially the same manner as the right-turning behavior, and the above-described right-turning can be replaced with the left-turning.
[0159]
Further, when the neutral position of the steering wheel is set as the origin of 0 degree, the reference points are 10 degrees, 30 degrees, and 90% of the change in the turning angle of the vehicle, but the present invention is not limited thereto. Instead, any angle or any ratio determined experimentally or statistically may be used as a reference.
[0160]
FIG. 19 shows a reference example of the reference behavior of the right lane change behavior, that is, a reference example of the behavior of the vehicle for dividing the seat 50D into cells. First, the right lane behavior will be described. The classification regarding the vehicle speed on the horizontal axis of the seat 50D is the same as that in FIG. The crossing of the vertical lane is classified according to the right lane change preparation and the behavior during the right lane change. In the present embodiment, this classification criterion employs the time point at which the right end of the vehicle reaches the right lane in the positional relationship between the lane and the vehicle.
[0161]
That is, in the time section in which the right lane change action is extracted, the right lane change preparation state is set before the time when the right end of the vehicle reaches the right lane from the start of the action, and the right lane change is being performed from the right lane preparation state to the end of the action. Classify as end of action. In this case, the action end preferably includes a period from when the right lane change is completed to when a predetermined time has elapsed (after 2 seconds).
[0162]
Note that the left lane changing action is almost the same as the right lane changing action, and the right lane change described above may be replaced with the left lane change, and a description thereof will be omitted.
[0163]
Further, as the reference, the time when the right end of the vehicle reaches the right lane in the positional relationship between the lane and the vehicle is adopted. However, the present invention is not limited to this, and the position or the lane predetermined for the vehicle is determined. May be adopted as positions that are separated by a predetermined distance from the reference.
[0164]
FIG. 20 shows a reference example of the reference behavior of the straight-forward movement at the intersection, that is, the behavior of the vehicle for dividing the seat 50E into cells using the distance relationship between the vehicle and the intersection. The classification regarding the vehicle speed on the horizontal axis of the seat 50E is the same as that in FIG. The distance relationship between the vehicle on the vertical axis and the intersection is classified according to behavior before passing, immediately before passing, during and after passing. In the present embodiment, this classification criterion employs a time point at which the distance to the intersection of the vehicle is 10 m or 30 m.
[0165]
That is, in the period in which the straight-forward movement of the intersection is extracted, the reference position of the intersection is set to 0, and the state from the start of the action to the point at which the position of the vehicle is 10 m before the intersection is classified as the pre-passing state. Further, the state from the end of the state before passing (10 m before the reference position) to the reference position of the intersection is set as the state immediately before passing. Then, from the state immediately before the passage to the end of the action, the state is classified as the state during the passage or after the passage.
[0166]
In addition, although the time when the distance to the intersection of the vehicle is 10 m or 30 m is adopted as the reference, the present invention is not limited to this, and the distance may be changed according to the vehicle speed, or the intersection may be changed. The reference position may have a predetermined allowance. .
[0167]
FIG. 21 shows an example of the state map 52 recorded by the state-in-basic-state operation unit 42 as an image. FIG. 21 is an example of a state map 52 of a basic action classified as a right-turn action at an intersection with four or more intersections with a signal. The mark and the arrow in the diagram of FIG. 21 indicate what state the right-turning behavior has changed. In other words, it shows a state in which the steering is in a neutral state, decelerates from 40 km / h or more, stops once, starts, turns the steering to the right, and turns right at the intersection while accelerating.
[0168]
Regarding the relationship between the cells, the direction from the cell on the starting point side to the cell on the ending point side is indicated by an arrow, and the action is represented by the direction. In FIG. 21, "A" indicates acceleration, "D" indicates deceleration, and "H" indicates a change in steering state. In the state map 52, the cell number corresponds to the corresponding state regarding the transition in the basic state, and the action can be indicated in the direction of the arrow from the cell. Therefore, individual transitions can be represented by a combination of a cell number and an alphabet representing an action. The combination of the cell number on the starting point side and the alphabet representing the action is defined as a state map number. For example, when representing a transition in which a deceleration action has been performed from the state of the cell number 14 which was in a state of preparing for a right turn at a vehicle speed of 40 km / h or more, "14-D" is described as a state map number.
[0169]
The basic behavior state calculation unit 42 outputs this state map number. An example of this output is shown in Table 2 below.
[0170]
[Table 2]

[0171]
The gaze distribution calculation unit 44 shown in FIG. 5 obtains the sight line direction of the occupant as the occupant confirmation action for the basic behavior classified by the basic behavior classification unit 40.
[0172]
That is, the gaze distribution calculation unit 44 obtains the gaze direction, which is the confirmation action performed by the occupant, corresponding to the basic behavior including the state transition by the state calculation unit 42 within the basic behavior from the measurement result by the driving state measurement device 32. It is connected to the basic-operation-in-state operation unit 42 and the operation state measurement device 32, and the output is output to the comparison operation unit 46.
[0173]
As shown in FIG. 22, the line-of-sight distribution calculation unit 44 presets 10 types of LL, FL, FF, FR, RR, BR, BB, BL, EI, and ET as the line-of-sight direction of the occupant. First, the front direction when the occupant is seated is set to 0 degree, the clockwise turning angle is set to a positive sign, the counterclockwise turning angle is set to a negative sign, and from 9 degrees to -9 degrees is set to the front direction FF, and 171 degrees. 180 and -171 degrees to -180 are defined as the backward direction BB.
[0174]
Also, 9 to 25 degrees is defined as a right front direction FR, 25 to 135 degrees is defined as a right direction RR, 135 to 171 degrees is defined as a right rear direction BR, and -9 to -25 degrees is defined as a left front direction. The direction FL, from -25 degrees to -135 degrees, is defined as the left direction LL, and the range from -135 degrees to -171 degrees is defined as the left rear direction BL. In addition, a direction EI toward equipment in the vehicle, such as an instrument panel and instruments, and other directions ET are also defined. An example of another direction is a direction in which the line of sight is directed to a ceiling, an object placed on a sheet, or the like.
[0175]
FIG. 23 shows the classification of the gaze direction in the gaze distribution calculation unit 44. As shown in FIG. 23A, the positional relationship between the equipment inside and outside the vehicle is read from a memory stored in advance, and the relative relationship with the occupant is obtained. As shown in FIG. 23B, the classification of the line of sight is performed by setting a virtual origin X on the vehicle and obtaining a straight line in a three-dimensional space representing the line of sight of the occupant from the output of the above-described action detection sensor. Any of the directions BR, BB, BL, and EI is set when the obtained straight line passes through each mirror area or instrument panel area, and is set as the direction ET when the other does not pass through the glass window area. .
[0176]
Further, as shown in FIG. 23C, when the directions LL, FL, FF, FR, and RR are classified, the directions BR, BB, BL, EI, and ET are not determined and the window glass area is not determined. For the case where the vehicle has passed, the gaze direction is obtained by recalculating the angle from the virtual origin on the vehicle from the driver's head position, face direction, and gaze direction. That is, a straight line passing through the virtual origin at the intersection of a cylindrical surface having a radius of 10 m centered at the virtual origin and a line of sight with the head of the occupant as the origin is determined as the line of sight at the virtual origin, and its angle is determined. Is the viewing direction.
[0177]
The gaze distribution calculation unit 44 determines the occupant's gaze direction as the occupant's gaze direction, and holds the gaze direction for at least a predetermined time (100 msec or more in the present embodiment) in each of the ten types of directions in each of the directions. ) And create a table. The line-of-sight distribution calculator 44 outputs the table shown in Table 3. An example of this table is shown in Table 3 below. The example of Table 3 is a form in which the direction and the holding time are added to those output in Table 1 described above. The gaze distribution is determined from the gaze direction and the holding time (gaze time).
[0178]
[Table 3]

[0179]
As illustrated in FIG. 5, the reward calculation unit 46 compares the gaze distribution of the norm and the gaze distribution of the occupant with respect to the basic behavior classified by the basic behavior classification unit 40, and checks the occupant who has acted according to the previous advice information. It seeks reward for action.
[0180]
That is, in the reward calculation unit 46, the measurement result by the driving state measurement device 32, the reference action classified by the basic action classification unit 40, the state map 52 obtained by the state calculation unit 42 in the basic action, and the gaze distribution calculation unit 44. The gaze distribution of the norm and the gaze distribution of the occupant are compared using the determined gaze distribution. For this reason, the reward calculation unit 46 is connected to the driving state measurement device 32 so that the measurement signal of the driving state measurement device 32 is input, and the basic behavior classification unit 40, the basic behavior in-state calculation unit 42, the gaze distribution It is connected to the arithmetic unit 44 and is connected so that its output is output to the advice information generating unit 48.
[0181]
The reward calculation unit 46 extracts the reference gaze distribution map 56 for the reference behavior from the reference gaze distribution database 54, obtains a gaze distribution map 58 for the occupant's gaze direction for the reference behavior, and compares these to calculate a reward 60. Then, the information is output to the advice information generator 48.
[0182]
As shown in FIG. 24, the reward calculation unit 46 includes a gaze direction result input unit 70, a calculation circuit 72, a state map number input unit 74, a calling circuit 76, and a gaze distribution comparison and scoring circuit 78.
[0183]
The gaze direction result input unit 70 is an input unit for inputting the state map 52 shown in Table 3 above, and the state map number input unit 74 is output from the basic behavioral state calculation unit 42 shown in Table 2 above. This is an input unit for inputting a state map number. The state map 52 by the gaze direction result input unit 70 is output to the calculation circuit 72. The calculation circuit 72 is a circuit for calculating the line-of-sight distribution every prescribed time (2 seconds in this embodiment) before the state transitions for each basic action and state map, and the holding time for each direction.
[0184]
Table 4 below shows an example of a calculation result obtained by calculating the gaze distribution and the average holding time in each direction from the output of the gaze direction result input unit 70 in the calculation circuit 72.
[0185]
[Table 4]

[0186]
The state map number from the state map number input section 74 is output to the calling circuit 76. The calling circuit 76 includes a reference gaze distribution database 54 that stores the correspondence between the basic behavior and the gaze distribution serving as a reference for each state map, and stores data corresponding to the input of the state map number from the reference gaze distribution database 54. This is a circuit for reference. The reference gaze distribution database 54 stores, for example, a gaze distribution score table (FIG. 25) and a score table of the average holding time in each gaze direction.
[0187]
FIG. 25 shows an example of a gaze distribution score table stored in the reference gaze distribution database 54. FIG. 25 shows a reference gaze distribution map of a right turn behavior at an intersection with four or more intersections with a traffic light and a state map number 14-D. The gaze distribution score table includes a gaze direction and a specified time (this embodiment). In the example, the correspondence between the gaze distribution ratio within 2 seconds) and the score is shown. The following Table 5 shows a list of numerical values of the reference gaze distribution map of the right turn behavior and state map number 14-D in the map of FIG.
[0188]
[Table 5]

[0189]
The calling circuit 76 extracts a gaze distribution score table (for example, FIG. 25) and a score table of the average holding time in each direction from the output from the state map number input unit 74 from the reference gaze distribution database 54 and outputs them.
[0190]
It is preferable that the gaze distribution score table and the average holding time score table in each gaze direction be obtained by using a supervised learning method based on a judgment criterion of a confirmation action by an expert such as a driving instructor. In addition, the score table may be calculated from the data at the time of driving by an expert such as a driving instructor using the frequency distribution.
[0191]
The output sides of the calculation circuit 72 and the calling circuit 76 are connected to the comparison side of the reward calculation circuit 78. The reward calculation circuit 78 is a circuit for comparing the output of the calculation circuit 72 with the output of the calling circuit 76 and outputting a scoring result. The output, that is, the comparison result is connected to the advice information generation unit 48 so as to be input. .
[0192]
The reward calculation circuit 78 calculates the basic action (L) output from the calculation circuit 72, the gaze distribution (gL, i, k) for each state map, and the average holding time (hL, i, k) and the reference map GL, j, HL, j in which the highest score output from the calling circuit 76 is “1”, and the score (QL, j, k) is calculated by interpolation for each direction i. Here, the suffix i indicates the direction, the suffix j indicates the state map number in the basic action (L), the suffix k indicates the order in which the state map number changes during any one basic action in the basic action, and the k-th state The map number is j. N is the maximum value of k, and N is the number of transitions of the state map number during the basic action.
[0193]
(Equation 10)

[0194]
Then, the determination value in the basic action (L) is set to the following Expression 11.
[0195]
(Equation 11)

[0196]
The reward calculation circuit 78 outputs the lowest value of the calculation result to the advice information generation device 48 as a reward. In this embodiment, the reward calculation circuit 78 calculates the basic action (L), the state map number (j) and the reference map G at that time. _{L, j} (G _{L, i, k} ) And reference map H _{L, j} (H _{L, i, k} ) Are also output to the advice information generating device 48.
[0197]
As shown in FIG. 5, the advice information generation unit 48 includes an individual characteristic database 64 and an accident case database 62, and based on the reward output from the reward calculation unit 46, advice information on the occupant (driver) confirmation action Generate
[0198]
That is, the advice information generation unit 48 generates the advice information on the gaze distribution of the occupant, using the reward calculated by the reward calculation unit 46 based on the measurement result by the driving state measurement device 32. For this reason, it is connected so that the output of the reward calculation unit 46 is output to the advice information presenting device 36. Note that the advice information generation unit 48 is connected to the operation state measurement device 32 so that the measurement signal of the operation state measurement device 32 is input.
[0199]
As shown in FIG. 26, the advice information generation unit 48 includes an input circuit 80 for inputting the name of an occupant (driver). The input circuit 80 is connected to a personal DB processing circuit 82 including a personal characteristic database 64. It is connected. The personal DB processing circuit 82 is connected to a case DB processing circuit 84 including the accident case database 62.
[0200]
Further, the advice information generation unit 48 includes a reward input unit 86 for inputting the reward output from the reward calculation unit 46, and the output side of the reward input unit 86 generates advice information based on the reward. It is connected to the advice information generation circuit 90.
[0201]
The output side of the advice information generation circuit 90 is connected to the personal DB processing circuit 82 and the output signal generation circuit 92 for presenting advice information. The output side of the advice information presenting output signal generation circuit 92 is connected to the personal DB processing circuit 82 and the advice information presenting device 36.
[0202]
When a reward is input from the reward input unit 86, the advice information generation circuit 90 executes an advice information generation processing routine shown in FIG. This advice information generation routine has the same parts as the action pattern generation processing routine according to the above-described first embodiment. Therefore, the same parts are denoted by the same reference numerals and detailed description thereof will be omitted.
[0203]
That is, when a reward is input from the reward input unit 86, the advice information generation processing routine shown in FIG. 31 starts, and in step 302, based on the input reward, it is determined whether the advice information generated and output last time is optimal. to decide. If the previous advice information is optimal, information indicating that the previous advice information is optimal is output to the advice information presenting device 36 in step 304. On the other hand, if the previous advice information is not optimal, it is determined in step 306 whether or not the occupant's confirmation action is within an allowable range based on the input reward. If the occupant's confirmation action is within the allowable range, the same processing as steps 542 to 554 in the action pattern generation processing routine according to the above-described first embodiment is executed.
[0204]
That is, in step 542, the state evaluation value Vπ (s) is calculated using the reward calculated by the reward calculation unit 46. _t ) Is calculated from the above equation (Equation 1), and in step 544, the TD error δt is calculated from the above equation (Equation 2). In step 546, the evaluation value T (s) is calculated. _t ) Is calculated from the above equation (3).
[0205]
At step 548, the trial value Δa (s _t ) Is calculated from the above equation (Equation 4). A (s) in the present embodiment is a value (hereinafter referred to as a value determined from the reference value) determined by a predetermined probability from a reference value of the gaze distribution (reference action guideline information). Since there is a difference, in this embodiment, a plurality of different visual line distributions, each of which is different, is stored as a reference value. It should be noted that each of the reference values of a plurality of different representative gaze distributions is identified by a variable i.
[0206]
Further, as described above, the equation represented by Equation 5 allows the advice information to be learned so as to approach the reference value if the reference value is valid for the occupant, and the reference value is not very effective for the occupant. If not, it means that advice information is searched by random trial and error. That is, the reward is positive, the TD error δt is positive, and the evaluation value T (s _t ) Is also positive, ρ (T (s _t )) Takes a value close to 1 and ρ (−T (s _t )) Takes a value close to 0. That is, when the evaluation of the occupant's gaze distribution changed based on the previous advice information is within a predetermined good evaluation range, the amount of change from the previous advice information is determined based on the predetermined reference value. Determine. More specifically, based on the difference between the value determined from the reference value and the previous advice information, more specifically, the effect of the difference between the value determined from the reference value and the previous advice information on the value of the amount of change, The amount of change is determined so as to be greater than the influence on the value of the amount of change of the advice information determined based on a predetermined probability.
[0207]
In addition, even if the reference value is not very effective for the occupant and the advice information is searched by random trial and error, an unlimited value cannot be obtained, and an allowable range for the basic action is determined in advance. Take a random value within the allowable range.
[0208]
At step 550, the state value V (s _t ) Is updated from the above equation (Equation 6), and in step 552, advice information a (s) for the occupant is updated. _t ) Is calculated from the above equation (Equation 7), and in step 554, the advice information a (s) _t ) Is output.
[0209]
If it is determined in step 306 that the occupant's confirmation action is not within the allowable range, the gaze distribution (reference value) used for obtaining the previous advice information can be determined to be inappropriate. The variable i for identifying the value is incremented by one, and it is determined whether the variable i is larger than the total number of reference values. If the variable i is not larger than the total number of reference values, steps 542 to 554 are executed with the variable i incremented by 1 in step 308.
[0210]
In the present embodiment, the advice information generation circuit 90 outputs the basic action (L), the state map number (j), and the reference map GL, j (gL, i, k) input via the reward input unit 86. , Reference map HL, j (hL, i, k) are also output to the advice information presenting output signal generation circuit 92.
[0211]
The advice information generation circuit 90 uses the reference map G _{L, j} (G _{L, i, k} ) And reference map H _{L, j} (H _{L, i, k} After extracting the line-of-sight direction i whose value is lower than the values of (i), the basic information (L) and the state map number (j) are output to the advice information presenting output signal generation circuit 92 and the personal DB processing circuit 82; The extraction result (viewing direction i) is output.
[0212]
The personal DB processing circuit 82 records the advice information output from the advice information generation circuit 90 for each occupant in accordance with the name of the occupant from the input circuit 80. That is, from the identification data for identifying the occupant input by the input circuit 80, the personal characteristic database 64 of the occupant can be accessed in the personal DB processing circuit 82, and the advice information from the advice information generation circuit 90 is transmitted to the occupant. Record each time. As a result, all pieces of advice information can be stored in the personal property database 64 of the personal DB processing circuit 82 regardless of the quality of the reward, and the corresponding advice information is extracted (searched) in accordance with the quality of the reward. be able to.
[0213]
For example, the personal DB processing circuit 82 may have a function of outputting to the advice information presenting device 36 a search result of a past confirmation action that is not recognized as good. In this case, since the accumulated data of the past confirmation actions that are not recognized as good exist in the personal property database 64, the corresponding data is extracted from the personal property database 64 by inputting the search command, and this is output as the search result. do it.
[0214]
The advice information presentation output signal generation circuit 92 is a circuit that generates a display signal for causing the advice information presentation device 36 to display the advice information output from the advice information generation circuit 90. In the present embodiment, the output signal generating circuit 92 for presenting advice information can also generate a display signal for displaying the above information other than the advice information on the determination result presenting device 36.
[0215]
As shown in FIG. 27, the advice information presentation device 36 is provided on the instrument panel in the vehicle. In this case, the advice information presentation device 36 functions as a safety confirmation appropriateness meter, and the advice information presentation output signal generation circuit 92 outputs a presentation signal to the advice information presentation device 36. On the left side of FIG. 27, a display screen 300 of an example of the determination result presentation device 36 functioning as the safety confirmation appropriateness meter is shown. The display screen 300 shows a case in which the results of the judgment on the confirmation action up to the present are displayed in a bar graph format in a time-series manner. Therefore, the display screen 300 of the determination result presenting device 36 displays the latest determination result of the confirmation action up to the present.
[0216]
The display screen 300 includes a detail button 302. The detail button 302 is an instruction button for presenting details of the determination result. Note that the detail button 302 accepts a request only when a request instruction is given by the occupant and only when the vehicle is stopped, and as a result, an image shown in FIG. 28 is presented as an example. In this case, the determination result presenting device 36 is in the form of a touch panel, so that the occupant can easily input an instruction.
[0219]
FIG. 28 shows an example detail screen 304 presented when the detail button 302 is instructed. FIG. 28 shows a case where the confirmation action is presented as a time-series transition according to the behavior of the vehicle. Further, the detail screen 304 shows a case where, in the transition process, a confirmation action that requires special attention among the determination results and advice information 306 of a confirmation action that statistically requires attention are presented together. As described above, when the advice information is optimal, the occupant's confirmation action is also optimal, and a message to that effect is displayed.
[0218]
Further, the detail screen 304 includes a habit button 308 for instructing a request to present the habit of the occupant's confirmation action. The habit button 308 accepts the request and presents the information only when the request is made by the occupant and only when the vehicle is stopped.
[0219]
FIG. 29 shows an example of the habit display screen 310 that is presented when the habit button 308 is instructed. FIG. 29 shows a case where at least one of pass / fail data is presented regarding the occupant confirmation behavior with reference to the personal characteristic database 64. That is, a predetermined number (two in the example in the figure) of the basic action (L), the state map number (j), and the line-of-sight direction (i) of the most unfavorable confirmation action searched from the personal property database in the order of the number of times. The basic action (L), the state map number (j), and the line-of-sight direction (i) of the best confirmation action which is presented and searched from the personal property database most frequently are the predetermined numbers (one in the example in the figure). Can be presented.
[0220]
The habit display screen 310 includes an accident fortune-telling button 312. The accident fortune-telling button 312 accepts the request only when there is a request from the occupant and only when the vehicle is stopped, and presents the result of the accident fortune-telling. The accident fortune-telling is to present an emergency state in which the occupant is likely to fall from the past driving behavior of the occupant based on the accident case. For example, as the accident case database 62 included in the case DB processing circuit 84, there is provided an accident case database with flags of the behavior of the accident case, the road form, and the dangerous direction, such as well-known danger prediction training. Then, the accident case database 62 is searched using the basic action (L), the state map number (j), and the line-of-sight direction (i) of the most frequent bad confirmation action searched from the personal property database, and the driver is bad. Outputs accident cases and danger prediction training examples suitable for cases where confirmation actions are likely to occur.
[0221]
FIG. 30 shows an example of an accident fortune-telling result display screen 314 that is presented when the accident fortune-telling button 312 is instructed.
When a loudspeaker is installed as the determination result presentation device 36, the determination result presentation output signal generation circuit 92 can generate sound information corresponding to the loudspeaker. For example, when notifying the determination result of the confirmation action during traveling, the determination result presentation output signal generation circuit 92 may use the determination result presentation device 36 to output a voice if the occupant sets a desired voice output. An announcement such as "The confirmation action at the previous right turn is a point."
[0222]
In addition, when a navigation system is equipped and the route guide is used, it is possible to search the personal characteristic database for mistakes that are likely to be made by the driver in each basic action, and to present points to be noted by voice during route guidance. .
[0223]
As described above, in the present embodiment, the basic behavior is classified according to the driving state of the occupant, the confirmation behavior for the basic behavior is evaluated (determined), and the advice information is presented. Since this presentation is presented in a range that has little effect on driving, it is not necessary to point out a deficiency or an erroneous point in the confirmation action during driving and to force the occupant to drive. In addition, the occupant constantly monitors the occupant's confirmation behavior, grasps the occupant's habits, and at the time when there is no hindrance to driving, performs situations in which the occupant had inappropriate qualification behavior and good confirmation behavior Can be taught to improve safety behavior. In addition, since the confirmation action is constantly monitored, the confirmation action can be effectively promoted in a favorable direction. As a result, it is possible to reduce the occurrence of an emergency such as an accident. Therefore, it is possible to evaluate the occupant's confirmation action while ensuring safety.
[0224]
Also, general knowledge of what direction and how much you should look in which state of the basic action, for example, before turning the steering to the right, look at the right enough in advance, turn the head greatly on the highway It is possible to quantitatively evaluate and judge whether or not the shake confirmation should not be lengthened.
[0225]
In the above embodiment, the basic behavior is described by classifying it into a sheet. However, the present invention is not limited to the sheet. Good.
[0226]
As described above, in the present embodiment, the previous advice information a (s _t ) To the current advice information a (s) _t ) Is determined based on the degree of effectiveness of the gaze distribution with respect to the reference value. That is, even in the initial stage of learning, the search for a desired behavior pattern by random trial is not performed. Often, even if the number of dimensions of the state is large, it is easy to obtain the optimal solution, and the practicality is high.
[0227]
In addition, since a plurality of reference values for gaze distribution are determined and the action output is determined to follow one of the reference values, the search cost in the vicinity of each reference value can be reduced.
[0228]
In the embodiment described above, the influence on the value of the change amount of the difference between the reference value and the previous advice information is made larger than the influence on the value of the change amount of the advice information determined based on the predetermined probability. , The amount of change is determined, and the next advice information is calculated (see Equation 4). However, the present invention is not limited to this, and the amount of change is determined based only on the difference between the reference value and the previous advice information, and the next action output is determined, as in the aforementioned equation (Equation 8). You may make it calculate.
[0229]
In the above-described embodiment, the advice information presenting apparatus first displays the safety confirmation appropriateness and displays the detail button, and presents the advice information when the detail button is instructed. However, the present invention is not limited to this, and advice information may be presented first.
[0230]
In the above-described embodiment, the above-mentioned behavior is defined as the line-of-sight distribution of the occupant operating the vehicle when the vehicle is in the predetermined driving state. However, the present invention is not limited to this, and the line-of-sight The present invention is similarly applicable to gaze behavior other than gaze distribution, such as time, and handle operation.
[0231]
【The invention's effect】
As described above, according to the first aspect of the present invention, when the evaluation on the environment changed based on the previous control amount is within a predetermined good evaluation range, the evaluation based on the predetermined reference control amount is performed. Since the next control amount is calculated by determining the amount of change from the control amount, there is an effect that the next control amount can be obtained efficiently.
[0232]
Further, according to the invention of claim 12, when the evaluation of the environment changed based on the previous action guideline information is within a predetermined good evaluation range, the previous evaluation is performed based on the predetermined reference action guideline information. Since the next action guideline information is calculated by determining the amount of change from the action guideline information, the next action guideline information is calculated more efficiently than changing the previous action guideline information randomly and calculating the next action guideline information. This has the effect that guide information can be obtained.
[Brief description of the drawings]
FIG. 1 is a conceptual diagram showing a “human-robot-work environment” structure according to the present embodiment.
FIG. 2 is an overall configuration diagram of a robot.
FIG. 3 is a flowchart illustrating a behavior pattern generation processing routine executed by the robot behavior planning device.
FIG. 4 is a diagram showing experimental results.
FIG. 5 is a block diagram showing a schematic configuration of an advice information generation / presentation device according to the embodiment of the present invention.
FIG. 6 is a block diagram showing a basic configuration of an advice information generation / presentation device according to an embodiment of the present invention.
FIG. 7 is an image diagram showing a classification table for classifying basic actions in the advice information generation / presentation device.
FIG. 8 is an image diagram showing a sheet for a right-turn action.
FIG. 9 is an image diagram showing a sheet of a left turn action.
FIG. 10 is an image diagram showing a sheet for turning right.
FIG. 11 is an image diagram showing a seat for a right lane change action.
FIG. 12 is an image diagram showing a sheet moving straight ahead at an intersection.
FIG. 13 is a flowchart illustrating a process of a process of classifying a basic behavior in a basic behavior classification unit of the confirmation behavior determination device.
FIG. 14 is a flowchart showing details of step 116 in FIG.
FIG. 15 is a flowchart showing details of step 112 in FIG.
FIG. 16 is a flowchart illustrating a process of a process of classifying a basic action using a signal of a direction indicator in a basic action classification unit of the confirmation action determination device.
FIG. 17 is an explanatory diagram of a reference example regarding a vehicle behavior for classifying a right-turning behavior or a left-turning behavior on a cell-by-cell basis using a steering angle θ and a vehicle turning angle φ.
FIG. 18 is an explanatory diagram of a reference example regarding a behavior of a vehicle for classifying a right-turning behavior on a cell-by-cell basis using a steering angle θ and a vehicle turning angle φ.
FIG. 19 is an explanatory diagram of a reference example regarding division of a lane change action in units of cells.
FIG. 20 is an explanatory diagram of a reference example regarding the division of an intersection straight traveling motion in units of cells.
FIG. 21 is an image diagram showing a state map.
FIG. 22 is an image diagram showing a range in the line of sight of the occupant.
FIGS. 23A and 23B show the classification of the gaze direction in the gaze distribution calculation unit, wherein FIGS. 23A and 23B show the positional relationship of the equipment inside and outside the vehicle, and FIG. It is.
FIG. 24 is a block diagram illustrating details of a comparison operation unit included in the confirmation action determination device.
FIG. 25 is an image diagram showing an example of a gaze distribution score table stored in a reference gaze distribution database.
FIG. 26 is a block diagram illustrating details of a display processing unit included in the confirmation action determination device.
FIG. 27 is an image diagram showing an example of a determination result presentation device installed in a vehicle.
FIG. 28 is an image diagram illustrating details of a determination result regarding a confirmation action presented to the determination result presentation device.
FIG. 29 is an image diagram showing an example of a data habit display screen regarding the occupant habit.
FIG. 30 is an image diagram showing an example of an accident fortune-telling result display screen.
FIG. 31 is a flowchart illustrating an advice information generation processing routine.
[Explanation of symbols]
100 Robot 100
512 Force detection sensor 512
514 Teaching geometry information generation device
516 Cooperative motion planning device
518 Robot motion controller
520 input device
522 Robot Action Planning Device
524 position detection sensor
10 Advice information generation and presentation device
32 Operating condition measuring device
34 Advice information generation device
36 Advice information presentation device
40 Basic Behavior Classification Unit
42 Basic operation state calculation unit
44 Gaze distribution calculation unit
46 Reward calculator
48 Advice information calculation unit
50 sheets
52 State Map
54 Normative gaze distribution database
56 norm gaze distribution map
58 Gaze distribution map
60 score results
62 Accident case database
64 Personal Characteristics Database

Claims

Means for changing the environment;
Control means for controlling the change means such that the environment changes based on a control amount;
Calculating means for calculating the control amount;
An environment change device comprising:
When the evaluation for the environment changed based on the previous control amount is within a predetermined good evaluation range, the calculation unit calculates the amount of change from the previous control amount based on the predetermined reference control amount. By determining, the next control amount is calculated,
An environmental change device characterized by the above-mentioned.

The calculation means calculates a next control amount by determining the change amount based on a difference between a value determined based on a predetermined probability from the reference control amount and a previous control amount. The environment change device according to claim 1.

The calculation means, the influence of the difference between a value determined based on a predetermined probability from the reference control amount and a previous control amount on the value of the change amount, the control amount determined based on a predetermined probability, 3. The environment change apparatus according to claim 2, wherein the change amount is determined so as to be larger than an influence on a value of the change amount.

3. The environment change apparatus according to claim 2, wherein the calculating unit determines the change amount based only on a difference between a value determined based on a predetermined probability from the reference control amount and a previous control amount.

The environment change device according to claim 1, wherein the previous control amount is a control amount determined based on the reference control amount.

Each having a plurality of different predetermined reference control amounts,
The calculating means may include, when an evaluation for an environment changed based on a previous control amount determined based on one reference control amount of the plurality of reference control amounts is outside a predetermined good evaluation range. Calculating a next control amount by determining a change amount from a previous control amount based on another reference control amount other than the one reference control amount out of the plurality of reference control amounts. The environment changing device according to claim 5, wherein

Operating means for operating;
Calculation means for calculating the control amount from operation information obtained by operating the operation means,
With
The control unit controls the changing unit such that the environment changes based on the control amount calculated by the calculating unit.
The environment changing device according to claim 1.

The environment change device according to claim 7, wherein the reference control amount is a control amount calculated by the calculation unit from the operation information.

The environment changing device according to claim 1, wherein the environment indicates a motion state of the object.

The environment change apparatus according to claim 9, wherein the motion state indicates a stationary state of the object.

The environment changing apparatus according to any one of claims 1 to 10, wherein the changing unit is a robot that is robot hardware.

Calculating means for calculating action guideline information serving as a guideline for action;
Presenting means for presenting the action guideline information calculated by the calculating means;
An action guideline information generation / presentation device comprising:
When the evaluation of the behavior changed based on the previous action guideline information is within a predetermined good evaluation range, the calculation unit may perform the calculation based on the predetermined reference action guideline information from the previous action guideline information. By determining the amount of change, the next action guideline information is calculated,
An action guideline information generation / presentation apparatus, characterized in that:

The calculating means calculates the next action guideline information by determining the change amount based on a difference between a value determined based on a predetermined probability from the reference action guideline information and the previous action guideline information. The action guideline information generation / presentation device according to claim 12, characterized in that:

The calculation means determines the effect of the difference between the value determined based on a predetermined probability from the reference action guideline information and the previous action guideline information on the value of the amount of change, based on a predetermined probability. 14. The action guideline information generation / presentation apparatus according to claim 13, wherein the change amount is determined by making the influence of the information on the value of the change amount larger.

14. The action guideline according to claim 13, wherein the calculation unit determines the change amount based only on a difference between a value determined based on a predetermined probability from the reference action guideline information and previous action guideline information. Information generation and presentation device.

The action guideline information generation and presentation device according to any one of claims 12 to 15, wherein the previous action guideline information is action guideline information determined based on the reference action guideline information.

Each of which has a plurality of different predetermined standard action guideline information,
The calculation means is configured such that the evaluation of the environment changed based on the previous action guideline information determined based on one of the plurality of reference action guideline information is out of a predetermined good evaluation range. In this case, the amount of change from the previous control amount is determined based on other reference action guideline information other than the one reference action guideline information among the plurality of reference action guideline information, so that the next action guideline is determined. 17. The action guideline information generation / presentation device according to claim 16, wherein information is calculated.

18. The action guideline information generating and presenting apparatus according to claim 12, wherein the action is an action of an occupant operating the vehicle when the vehicle is in a predetermined driving state.