JP2018165693A

JP2018165693A - Driving support method and driving support device using the same, automatic driving control device, vehicle, program, and presentation system

Info

Publication number: JP2018165693A
Application number: JP2017063659A
Authority: JP
Inventors: 江村　恒一; Koichi Emura; 恒一江村; 本村　秀人; Hideto Motomura; 秀人本村
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2017-03-28
Filing date: 2017-03-28
Publication date: 2018-10-25

Abstract

PROBLEM TO BE SOLVED: To provide a technology to derive a driving behavior that reflects a driver's intention.SOLUTION: A creation part 90 selects a predetermined number of driving behaviors from a plurality of types of driving behaviors in descending order of reliability, and creates first presentation information in which the selected predetermined number of driving behaviors are indicated. The creation part 90 creates second presentation information including driving behaviors with lower reliability in place of some of the predetermined number of driving behaviors to be included in the first presentation information, in place of the first presentation information at a predetermined frequency. An operation signal input part receives an input of a first operation signal indicating one driving behavior selected by a driver from the first presentation information, or a second operation signal indicating one driving behavior selected by the driver from the second presentation information. A learning part 74 executes positive weighting on the one driving behavior indicated in the first operation signal. The learning part 74 executes positive weighting on the one driving behavior indicated in the second operation signal, and executes negative weighting on the other driving behaviors.SELECTED DRAWING: Figure 3

Description

本発明は、車両、車両に設けられる運転支援方法およびそれを利用した運転支援装置、自動運転制御装置、プログラム、提示システムに関する。 The present invention relates to a vehicle, a driving support method provided in the vehicle, a driving support device using the same, an automatic driving control device, a program, and a presentation system.

自動運転車両は、車両の周囲の状況を検知し、従来乗員が意図し実行していた運転行動を自動的に実行することによって走行する。このような自動運転車両には、乗員が意図する運転行動と、自動運転車両の運転行動とが乖離しないように、乗員が運転行動を変更するための運転支援装置が搭載される。運転支援装置は、実行可能な運転行動を提示し、乗員に運転行動を選択させる（例えば、特許文献１参照）。 An autonomous driving vehicle travels by detecting a situation around the vehicle and automatically executing a driving action that has been intended and performed by a passenger. Such an automatic driving vehicle is equipped with a driving support device for the occupant to change the driving behavior so that the driving behavior intended by the occupant and the driving behavior of the automatic driving vehicle do not deviate. The driving support device presents an executable driving action and causes the occupant to select the driving action (see, for example, Patent Document 1).

国際公開第１６／１７０７６３号International Publication No. 16/170763

提示した運転行動の中に乗員が希望する運転行動が含まれない場合、乗員は、希望する運転行動を選択できない。そのため、乗員の意図を反映するような運転行動を提示することが望まれる。 If the presented driving behavior does not include the driving behavior desired by the occupant, the occupant cannot select the desired driving behavior. Therefore, it is desirable to present driving behavior that reflects the passenger's intention.

本発明はこうした状況に鑑みなされたものであり、その目的は、乗員の意図を反映するような運転行動を導出する技術を提供することにある。 This invention is made | formed in view of such a condition, The objective is to provide the technique which derives | leads-out the driving action which reflects a passenger | crew's intent.

上記課題を解決するために、本発明のある態様の運転支援装置は、運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成する生成部と、生成部において生成した第１提示情報を報知装置に出力する提示情報出力部と、報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力される操作信号入力部と、操作信号入力部に入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とを備える。生成部は、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成し、提示情報出力部は、生成部において生成した第２提示情報を報知装置に出力し、操作信号入力部には、報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力され、学習部は、操作信号入力部に入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する。 In order to solve the above problems, a driving support device according to an aspect of the present invention selects a predetermined number of driving actions in descending order of reliability of each of a plurality of types of driving actions that are estimation results using a driving action model. A generation unit that generates first presentation information indicating a predetermined number of selected driving actions, a presentation information output unit that outputs the first presentation information generated in the generation unit to a notification device, and a notification device An operation signal input unit to which a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information is input, and one indicated by the first operation signal input to the operation signal input unit A learning unit that updates the driving behavior model by performing learning while performing positive weighting on the driving behavior. The generation unit generates second presentation information including driving behavior with low reliability instead of the first presentation information at a predetermined frequency instead of a part of the predetermined number of driving behaviors to be included in the first presentation information. The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device, and the operation signal input unit outputs one of the second presentation information notified from the notification device by the occupant. The second operation signal indicating the driving action is input, and the learning unit performs positive weighting on the one driving action indicated in the second operation signal input to the operation signal input unit, and uses the second presentation information as the second presentation information. The driving behavior model is updated by performing learning while negatively weighting other driving behaviors included.

本発明の別の態様は、自動運転制御装置である。この装置は、運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成する生成部と、生成部において生成した第１提示情報を報知装置に出力する提示情報出力部と、報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力される操作信号入力部と、操作信号入力部に入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部と、１つの運転行動をもとに、車両の自動運転を制御する自動運転制御部とを備える。生成部は、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成し、提示情報出力部は、生成部において生成した第２提示情報を報知装置に出力し、操作信号入力部には、報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力され、学習部は、操作信号入力部に入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する。 Another aspect of the present invention is an automatic operation control device. This device selects a predetermined number of driving actions in descending order of the reliability of each of a plurality of types of driving actions that are estimation results using the driving action model, and the first predetermined driving action is indicated. A generation unit that generates the presentation information, a presentation information output unit that outputs the first presentation information generated in the generation unit to the notification device, and one driving selected by the occupant with respect to the first presentation information notified from the notification device An operation signal input unit to which a first operation signal indicating an action is input, and learning while executing positive weighting on one driving action indicated in the first operation signal input to the operation signal input unit And a learning unit that updates the driving behavior model and an automatic driving control unit that controls automatic driving of the vehicle based on one driving behavior. The generation unit generates second presentation information including driving behavior with low reliability instead of the first presentation information at a predetermined frequency instead of a part of the predetermined number of driving behaviors to be included in the first presentation information. The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device, and the operation signal input unit outputs one of the second presentation information notified from the notification device by the occupant. The second operation signal indicating the driving action is input, and the learning unit performs positive weighting on the one driving action indicated in the second operation signal input to the operation signal input unit, and uses the second presentation information as the second presentation information. The driving behavior model is updated by performing learning while negatively weighting other driving behaviors included.

本発明のさらに別の態様は、車両である。この車両は、運転支援装置を備える車両であって、運転支援装置は、運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成する生成部と、生成部において生成した第１提示情報を報知装置に出力する提示情報出力部と、報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力される操作信号入力部と、操作信号入力部に入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とを備える。生成部は、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成し、提示情報出力部は、生成部において生成した第２提示情報を報知装置に出力し、操作信号入力部には、報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力され、学習部は、操作信号入力部に入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する。 Yet another embodiment of the present invention is a vehicle. The vehicle includes a driving support device, and the driving support device selects a predetermined number of driving behaviors in descending order of reliability of each of a plurality of types of driving behaviors that are estimation results using the driving behavior model. A generation unit that generates first presentation information indicating a predetermined number of selected driving actions, a presentation information output unit that outputs the first presentation information generated in the generation unit to a notification device, and a notification device An operation signal input unit to which a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information is input, and one indicated by the first operation signal input to the operation signal input unit A learning unit that updates the driving behavior model by performing learning while performing positive weighting on the driving behavior. The generation unit generates second presentation information including driving behavior with low reliability instead of the first presentation information at a predetermined frequency instead of a part of the predetermined number of driving behaviors to be included in the first presentation information. The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device, and the operation signal input unit outputs one of the second presentation information notified from the notification device by the occupant. The second operation signal indicating the driving action is input, and the learning unit performs positive weighting on the one driving action indicated in the second operation signal input to the operation signal input unit, and uses the second presentation information as the second presentation information. The driving behavior model is updated by performing learning while negatively weighting other driving behaviors included.

本発明のさらに別の態様は、運転支援方法である。この方法は、運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成するステップと、生成した第１提示情報を報知装置に出力するステップと、報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力されるステップと、入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップと、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成するステップと、生成した第２提示情報を報知装置に出力するステップと、報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力されるステップと、入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップと、を備える。 Yet another embodiment of the present invention is a driving support method. This method selects a predetermined number of driving actions in descending order of the reliability of each of a plurality of types of driving actions, which is an estimation result using a driving action model, and the first predetermined driving action is indicated. A step of generating the presentation information; a step of outputting the generated first presentation information to the notification device; and a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device Is input, the step of updating the driving behavior model by executing learning while performing positive weighting on one driving behavior indicated in the input first operation signal, and the first presentation information Generating a second presentation information including a driving action with low reliability instead of a part of the predetermined number of driving actions to be included instead of the first presentation information at a predetermined frequency; A step of outputting the second presentation information to the notification device; a step of inputting a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device; A driving behavior model is performed by performing positive weighting on one driving behavior indicated in the second operation signal and performing learning while performing negative weighting on other driving behaviors included in the second presentation information. Updating.

本発明のさらに別の態様は、提示システムである。この提示システムは、運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成する生成部と、生成部において生成した第１提示情報を出力する提示情報出力部とを備える運転支援装置と、運転支援装置から出力された提示情報を報知する報知装置とを備える。運転支援装置は、報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力される操作信号入力部と、操作信号入力部に入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とをさらに備える。生成部は、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成し、提示情報出力部は、生成部において生成した第２提示情報を報知装置に出力し、操作信号入力部には、報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力され、学習部は、操作信号入力部に入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する。 Yet another embodiment of the present invention is a presentation system. This presentation system selects a predetermined number of driving behaviors in descending order of the reliability of each of a plurality of types of driving behaviors, which is an estimation result using a driving behavior model, and the selected predetermined number of driving behaviors are shown. A driving support device including a generation unit that generates 1 presentation information; a presentation information output unit that outputs first presentation information generated by the generation unit; and a notification device that notifies the presentation information output from the driving support device. Prepare. The driving support device is input to the operation signal input unit that receives the first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device, and the operation signal input unit. And a learning unit that updates the driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal. The generation unit generates second presentation information including driving behavior with low reliability instead of the first presentation information at a predetermined frequency instead of a part of the predetermined number of driving behaviors to be included in the first presentation information. The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device, and the operation signal input unit outputs one of the second presentation information notified from the notification device by the occupant. The second operation signal indicating the driving action is input, and the learning unit performs positive weighting on the one driving action indicated in the second operation signal input to the operation signal input unit, and uses the second presentation information as the second presentation information. The driving behavior model is updated by performing learning while negatively weighting other driving behaviors included.

なお、以上の構成要素の任意の組合せ、構成要素の一部、例えば学習部、を通信網を介したコンピュータで逐次あるいは一日など所定時間分をまとめて処理する構成、本発明の表現を装置、システム、方法、プログラム、プログラムを記録した記録媒体、本装置を搭載した車両などの間で変換したものもまた、本発明の態様として有効である。 Arbitrary combinations of the above constituent elements, a part of the constituent elements, for example, a learning unit, are processed sequentially or collectively for a predetermined time such as one day by a computer via a communication network, and the expression of the present invention is an apparatus. A system, a method, a program, a recording medium on which the program is recorded, a vehicle in which the apparatus is mounted, and the like are also effective as an aspect of the present invention.

本発明によれば、乗員の意図を反映するような運転行動を導出できる。 According to the present invention, it is possible to derive a driving action that reflects the intention of the occupant.

実施の形態１乃至３に係る車両の構成を示す図である。It is a figure which shows the structure of the vehicle which concerns on Embodiment 1 thru | or 3. 実施の形態１乃至３に係る車両の室内を模式的に示す図である。It is a figure which shows typically the interior of the vehicle which concerns on Embodiment 1 thru | or 3. 実施の形態１乃至３に係る制御部の構成を示す図である。3 is a diagram illustrating a configuration of a control unit according to Embodiments 1 to 3. FIG. 実施の形態１乃至３に係るヒストグラム生成部において生成されるヒストグラムを示す図である。6 is a diagram showing a histogram generated in the histogram generation unit according to Embodiments 1 to 3. FIG. 図５（ａ）乃至（ｃ）は、図３の表示制御部の処理概要を示す図である。FIGS. 5A to 5C are diagrams showing an outline of processing of the display control unit of FIG. 実施の形態１に係る制御部による処理手順を示すフローチャートである。4 is a flowchart illustrating a processing procedure by a control unit according to the first embodiment. 図７（ａ）乃至（ｅ）は、実施の形態２に係る表示制御部の処理概要を示す図である。FIGS. 7A to 7E are diagrams illustrating an outline of processing of the display control unit according to the second embodiment. 実施の形態２に係る制御部による生成手順を示すフローチャートである。10 is a flowchart illustrating a generation procedure by a control unit according to the second embodiment. 実施の形態３に係る制御部による生成手順を示すフローチャートである。10 is a flowchart illustrating a generation procedure by a control unit according to Embodiment 3.

（実施の形態１）
本発明を具体的に説明する前に、概要を述べる。本実施の形態は、自動車の自動運転に関する。特に、本実施の形態は、車両の運転行動に関する情報を車両の乗員（例えば運転者）との間でやり取りするためのＨＭＩ（ＨｕｍａｎＭａｃｈｉｎｅＩｎｔｅｒｆａｃｅ）を制御する装置（以下「運転支援装置」とも呼ぶ。）に関する。本実施の形態における各種の用語は次のように定義される。「運転行動」は、車両の走行中または停止時の操舵や制動などの作動状態、もしくは自動運転制御に係る制御内容を含んでおり、例えば、定速走行、加速、減速、一時停止、停止、車線変更、進路変更、右左折、駐車などである。また、運転行動は、巡航（車線維持で車速維持）、車線維持、先行車追従、追従時のストップアンドゴー、追越、合流車両への対応、高速道への進入と退出を含めた乗換（インターチェンジ）、合流、工事ゾーンへの対応、緊急車両への対応、割込み車両への対応、右左折専用レーンへの対応、歩行者・自転車とのインタラクション、車両以外の障害物回避、標識への対応、右左折・Ｕターン制約への対応、車線制約への対応、一方通行への対応、交通標識への対応、交差点・ラウンドアバウトへの対応などであってもよい。 (Embodiment 1)
Before describing the present invention in detail, an outline will be described. The present embodiment relates to an automatic driving of an automobile. In particular, the present embodiment is a device that controls an HMI (Human Machine Interface) for exchanging information related to driving behavior of the vehicle with a vehicle occupant (for example, a driver) (hereinafter also referred to as “driving support device”). .) Various terms in the present embodiment are defined as follows. “Driving behavior” includes the state of operation such as steering and braking during driving or stopping of the vehicle, or control content related to automatic driving control, for example, constant speed driving, acceleration, deceleration, pause, stop, Lane change, course change, left / right turn, parking, etc. In addition, driving behavior includes cruise (maintaining lane keeping, vehicle speed), lane keeping, following vehicle follow-up, stop-and-go during follow-up, overtaking, response to merging vehicles, transfer including highway entry and exit ( Interchange), confluence, response to construction zone, emergency vehicle response, interrupt vehicle response, right / left turn lane response, pedestrian / bicycle interaction, obstacle avoidance other than vehicles, signs response , Right / left turn / U-turn restrictions, lane restrictions, one-way traffic, traffic signs, intersections / roundabouts, etc.

「運転行動推定エンジン」として、ＤＬ（ＤｅｅｐＬｅａｒｎｉｎｇ：深層学習）、ＭＬ（ＭａｃｈｉｎｅＬｅａｒｎｉｎｇ：機械学習）、フィルタ等のいずれか、あるいはそれらの組合せが使用される。ＤｅｅｐＬｅａｒｎｉｎｇは、例えば、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ：畳み込みニューラルネットワーク)、ＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ：リカレント・ニューラル・ネットワーク)である。また、ＭａｃｈｉｎｅＬｅａｒｎｉｎｇは、例えば、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）である。さらに、フィルタは、例えば、協調フィルタリングである。 As the “driving behavior estimation engine”, any of DL (Deep Learning), ML (Machine Learning), a filter, or a combination thereof is used. Deep Learning is, for example, CNN (Convolutional Neural Network) or RNN (Recurrent Neural Network). Also, the Machine Learning is, for example, SVM (Support Vector Machine). Furthermore, the filter is, for example, collaborative filtering.

「運転行動モデル」は、運転行動推定エンジンに応じて一意に定められる。ＤＬの場合の運転行動モデルは学習されたニューラルネットワーク（ＮｅｕｒａｌＮｅｔｗｏｒｋ）であり、ＳＶＭの場合の運転行動モデルは学習された予測モデルであり、協調フィルタリングの場合の運転行動モデルは走行環境データと運転行動データとを紐付けたデータである。ルールの場合の運転行動モデルは入力と出力とを紐付けたデータである。 The “driving behavior model” is uniquely determined according to the driving behavior estimation engine. The driving behavior model in the case of DL is a learned neural network (Neural Network), the driving behavior model in the case of SVM is a learned prediction model, and the driving behavior model in the case of collaborative filtering includes driving environment data and driving It is data that links behavior data. The driving behavior model in the case of a rule is data in which an input and an output are linked.

このような定義のもと、運転支援装置は、機械学習等により生成した運転行動モデルを用いて複数の運転行動を推定する。さらに、運転支援装置は、信頼度が最も高い運転行動を選択して、選択した運転行動に応じた自動運転を実行させる。信頼度とは、推定された運転行動の確からしさを示しており、ＤＬの場合に推定結果の累積値に相当し、ＳＶＭの場合に信頼値（ｃｏｎｆｉｄｅｎｃｅｖａｌｕｅ）に相当し、協調フィルタリングの場合に相関度に相当する。ルールの場合にルールの信頼度に相当する。そのため、信頼度が最も高い運転行動は、安全性の高い運転行動である。しかしながら、当該運転行動が、乗員の意図を反映した運転行動でない場合があり、乗員の意図を反映するような運転行動を導出することが望まれる。 Based on such a definition, the driving support apparatus estimates a plurality of driving behaviors using a driving behavior model generated by machine learning or the like. Further, the driving support device selects the driving action having the highest reliability and causes the driving operation to be performed according to the selected driving action. The reliability indicates the certainty of the estimated driving behavior, corresponds to the cumulative value of the estimation result in the case of DL, corresponds to the confidence value in the case of SVM, and corresponds to the case of collaborative filtering. Corresponds to the degree of correlation. In the case of rules, this corresponds to the reliability of the rules. Therefore, the driving action with the highest reliability is a driving action with high safety. However, the driving behavior may not be a driving behavior that reflects the occupant's intention, and it is desirable to derive a driving behavior that reflects the occupant's intention.

そのため、本実施の形態では、選択された運転行動をもとに強化学習を実行することによって、運転行動モデルを更新する。その際、強化学習における報酬として、運転支援装置によって運転行動が選択された場合に対する報酬よりも、乗員によって運転行動が選択された場合に対する報酬を大きくする。その結果、乗員によって選択された運転行動の信頼度が以後高くなりやすくなり、乗員の意図が反映されやすくなる。以下、本発明の実施の形態について、図面を参照して詳細に説明する。なお、以下に説明する各実施の形態は一例であり、本発明はこれらの実施の形態により限定されるものではない。 Therefore, in the present embodiment, the driving behavior model is updated by executing reinforcement learning based on the selected driving behavior. At this time, the reward for the case where the driving action is selected by the occupant is set larger than the reward for the case where the driving action is selected by the driving support device. As a result, the reliability of the driving action selected by the occupant is likely to increase thereafter, and the intention of the occupant is easily reflected. Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Each embodiment described below is an example, and the present invention is not limited to these embodiments.

図１は、実施の形態１に係る車両１００の構成を示し、特に自動運転車両に関する構成を示す。車両１００は、自動運転モードで走行可能であり、報知装置２、入力装置４、無線装置８、運転操作部１０、検出部２０、自動運転制御装置３０、運転支援装置４０を含む。図１に示す各装置の間は、専用線あるいはＣＡＮ（ＣｏｎｔｒｏｌｌｅｒＡｒｅａＮｅｔｗｏｒｋ）等の有線通信で接続されてもよい。また、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）、Ｅｔｈｅｒｎｅｔ（登録商標）、Ｗｉ−Ｆｉ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）等の有線通信または無線通信で接続されてもよい。 FIG. 1 shows a configuration of vehicle 100 according to Embodiment 1, and particularly shows a configuration related to an autonomous driving vehicle. The vehicle 100 can travel in the automatic driving mode, and includes a notification device 2, an input device 4, a wireless device 8, a driving operation unit 10, a detection unit 20, an automatic driving control device 30, and a driving support device 40. The devices shown in FIG. 1 may be connected by wired communication such as a dedicated line or a CAN (Controller Area Network). Moreover, you may connect by wired communication or wireless communications, such as USB (Universal Serial Bus), Ethernet (trademark), Wi-Fi (trademark), Bluetooth (trademark).

報知装置２は、車両１００の走行に関する情報を乗員に報知する。報知装置２は、例えば、車内に設置されているカーナビゲーションシステム、ヘッドアップディスプレイ、センタディスプレイである。報知装置２は、ステアリングホイール、ピラー、ダッシュボード、メータパネル周りなどに設置されているＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）などの発光体などのような情報を表示する表示部でもよい。また、報知装置２は、情報を音声に変換して乗員に報知するスピーカであってもよいし、あるいは、乗員が感知できる位置（例えば、乗員の座席、ステアリングホイールなど）に設けられる振動体であってもよい。さらに、報知装置２は、これらの組合せであってもよい。 The notification device 2 notifies the occupant of information related to traveling of the vehicle 100. The notification device 2 is, for example, a car navigation system, a head-up display, or a center display installed in the vehicle. The notification device 2 may be a display unit that displays information such as a light emitter such as an LED (Light Emitting Diode) installed around the steering wheel, pillar, dashboard, and meter panel. In addition, the notification device 2 may be a speaker that converts information into sound and notifies the occupant, or a vibration body provided at a position that can be sensed by the occupant (for example, the occupant's seat, steering wheel). There may be. Further, the notification device 2 may be a combination thereof.

入力装置４は、乗員による操作入力を受けつけるユーザインタフェース装置である。例えば入力装置４は、タッチパネル、レバー、ボタン、スイッチ、ジョイスティックやボリューム等のコントローラ、非接触でジェスチャーを認識するカメラ等のセンサ、音声を認識するマイク等のセンサや、それらの組合せであり、乗員が入力した自車の自動運転に関する情報を受けつける。また、自動運転と手動運転を切りかえるための操作信号を受けつけてもよい。入力装置４は、受けつけた情報を操作信号として運転支援装置４０に出力する。 The input device 4 is a user interface device that receives an operation input by an occupant. For example, the input device 4 is a touch panel, a lever, a button, a switch, a controller such as a joystick or a volume, a sensor such as a camera that recognizes a gesture without contact, a sensor such as a microphone that recognizes voice, or a combination thereof. Receives information on the automatic driving of the vehicle entered by. An operation signal for switching between automatic operation and manual operation may be received. The input device 4 outputs the received information to the driving support device 40 as an operation signal.

図２は、車両１００の室内を模式的に示す。報知装置２は、ヘッドアップディスプレイ（ＨＵＤ、Ｈｅａｄ−ＵｐＤｉｓｐｌａｙ）２ａであってもよく、センタディスプレイ２ｂであってもよい。入力装置４は、ステアリング１１に設けられた第１操作部４ａであってもよく、運転席と助手席との間に設けられた第２操作部４ｂであってもよく、ジェスチャーを認識するカメラ等のセンサである第３操作部４ｃであってもよい。なお、報知装置２と入力装置４は一体化されてもよく、例えばタッチパネルディスプレイとして実装されてもよい。車両１００には、自動運転に関する情報を音声にて乗員へ提示するスピーカ６がさらに設けられてもよい。この場合、運転支援装置４０は、自動運転に関する情報を示す画像を報知装置２に表示させ、それとともに、またはそれに代えて、自動運転に関する情報を示す音声をスピーカ６から出力させてもよい。図１に戻る。 FIG. 2 schematically shows the interior of the vehicle 100. The notification device 2 may be a head-up display (HUD, Head-Up Display) 2a or a center display 2b. The input device 4 may be the first operation unit 4a provided in the steering 11 or the second operation unit 4b provided between the driver seat and the passenger seat, and is a camera that recognizes a gesture. The third operation unit 4c, which is a sensor such as In addition, the alerting | reporting apparatus 2 and the input device 4 may be integrated, for example, may be mounted as a touch panel display. The vehicle 100 may further be provided with a speaker 6 that presents information related to automatic driving to the occupant by voice. In this case, the driving support device 40 may cause the notification device 2 to display an image indicating information related to automatic driving, and output a sound indicating information related to automatic driving from the speaker 6 together with or instead of the information. Returning to FIG.

無線装置８は、携帯電話通信システム、ＷＭＡＮ（ＷｉｒｅｌｅｓｓＭｅｔｒｏｐｏｌｉｔａｎＡｒｅａＮｅｔｗｏｒｋ）等に対応しており、無線通信を実行する。具体的に説明すると、無線装置８は、ネットワーク３０２を介してサーバ３００と通信する。サーバ３００は車両１００外部の装置であり、運転行動学習部３１０を含む。運転行動学習部３１０については後述する。なお、サーバ３００と運転支援装置４０は、運転支援システム５００に含められる。 The wireless device 8 corresponds to a mobile phone communication system, WMAN (Wireless Metropolitan Area Network), and the like, and performs wireless communication. Specifically, the wireless device 8 communicates with the server 300 via the network 302. Server 300 is a device external to vehicle 100 and includes a driving behavior learning unit 310. The driving behavior learning unit 310 will be described later. The server 300 and the driving support device 40 are included in the driving support system 500.

運転操作部１０は、ステアリング１１、ブレーキペダル１２、アクセルペダル１３、ウィンカスイッチ１４を備える。ステアリング１１、ブレーキペダル１２、アクセルペダル１３、ウィンカスイッチ１４は、ステアリングＥＣＵ、ブレーキＥＣＵ、エンジンＥＣＵとモータＥＣＵおよびウィンカコントローラにより電子制御が可能である。自動運転モードにおいて、ステアリングＥＣＵ、ブレーキＥＣＵ、エンジンＥＣＵ、モータＥＣＵは、自動運転制御装置３０から供給される制御信号に応じて、アクチュエータを駆動する。またウィンカコントローラは、自動運転制御装置３０から供給される制御信号に応じてウィンカランプを点灯あるいは消灯する。 The driving operation unit 10 includes a steering 11, a brake pedal 12, an accelerator pedal 13, and a winker switch 14. The steering 11, the brake pedal 12, the accelerator pedal 13, and the winker switch 14 can be electronically controlled by a steering ECU, a brake ECU, an engine ECU, a motor ECU, and a winker controller. In the automatic operation mode, the steering ECU, the brake ECU, the engine ECU, and the motor ECU drive the actuator in accordance with a control signal supplied from the automatic operation control device 30. The blinker controller turns on or off the blinker lamp according to a control signal supplied from the automatic operation control device 30.

検出部２０は、車両１００の周囲状況および走行状態を検出する。検出部２０は、例えば、車両１００の速度、車両１００に対する先行車両の相対速度、車両１００と先行車両との距離、車両１００に対する側方車線の車両の相対速度、車両１００と側方車線の車両との距離、車両１００の位置情報を検出する。検出部２０は、検出した各種情報（以下、「検出情報」という）を自動運転制御装置３０に出力する。また、検出部２０は、自動運転制御装置３０を介して運転支援装置４０に検出情報を出力してもよいし、運転支援装置４０に直接出力してもよい。検出部２０は、位置情報取得部２１、センサ２２、速度情報取得部２３、地図情報取得部２４を含む。 The detection unit 20 detects a surrounding situation and a running state of the vehicle 100. The detection unit 20 includes, for example, the speed of the vehicle 100, the relative speed of the preceding vehicle with respect to the vehicle 100, the distance between the vehicle 100 and the preceding vehicle, the relative speed of the vehicle in the side lane with respect to the vehicle 100, and the vehicle in the vehicle 100 and the side lane. And the position information of the vehicle 100 are detected. The detection unit 20 outputs various detected information (hereinafter referred to as “detection information”) to the automatic operation control device 30. Further, the detection unit 20 may output the detection information to the driving support device 40 via the automatic driving control device 30 or may directly output the detection information to the driving support device 40. The detection unit 20 includes a position information acquisition unit 21, a sensor 22, a speed information acquisition unit 23, and a map information acquisition unit 24.

位置情報取得部２１は、ＧＮＳＳ（ＧｌｏｂａｌＮａｖｉｇａｔｉｏｎＳａｔｅｌｌｉｔｅＳｙｓｔｅｍ（ｓ））受信機から車両１００の現在位置を取得する。センサ２２は、車外の状況および車両１００の状態を検出するための各種センサの総称である。車外の状況を検出するためのセンサとして例えばカメラ、ミリ波レーダ、ＬＩＤＡＲ（ＬｉｇｈｔＤｅｔｅｃｔｉｏｎａｎｄＲａｎｇｉｎｇ、ＬａｓｅｒＩｍａｇｉｎｇＤｅｔｅｃｔｉｏｎａｎｄＲａｎｇｉｎｇ）、ソナー、気温センサ、気圧センサ、湿度センサ、照度センサ等が搭載される。車外の状況は、車線情報を含む自車の走行する道路状況、天候を含む環境、自車周辺状況、近傍位置にある他車両（隣接車線を走行する他車両等）を含む。なお、センサ２２が検出できる車外の情報であれば何でもよい。また車両１００の状態を検出するためのセンサ２２として例えば、加速度センサ、ジャイロセンサ、地磁気センサ、傾斜センサ等が搭載される。 The position information acquisition unit 21 acquires the current position of the vehicle 100 from a GNSS (Global Navigation Satellite System (s)) receiver. The sensor 22 is a generic name for various sensors for detecting the situation outside the vehicle and the state of the vehicle 100. For example, a camera, a millimeter wave radar, a LIDAR (Light Detection and Ranging), a sonar, a temperature sensor, an atmospheric pressure sensor, a humidity sensor, and an illuminance sensor are mounted as sensors for detecting a situation outside the vehicle. The situation outside the vehicle includes a road condition in which the host vehicle includes lane information, an environment including weather, a situation around the host vehicle, and other vehicles in the vicinity (such as other vehicles traveling in the adjacent lane). Any information outside the vehicle that can be detected by the sensor 22 may be used. Further, as the sensor 22 for detecting the state of the vehicle 100, for example, an acceleration sensor, a gyro sensor, a geomagnetic sensor, an inclination sensor, and the like are mounted.

速度情報取得部２３は、車速センサから車両１００の現在速度を取得する。地図情報取得部２４は、地図データベースから車両１００の現在位置周辺の地図情報を取得する。地図データベースは、車両１００内の記録媒体に記録されていてもよいし、使用時にネットワークを介して地図サーバからダウンロードしてもよい。なお、地図情報には、道路、交差点に関する情報が含まれている。 The speed information acquisition unit 23 acquires the current speed of the vehicle 100 from the vehicle speed sensor. The map information acquisition unit 24 acquires map information around the current position of the vehicle 100 from the map database. The map database may be recorded on a recording medium in the vehicle 100, or may be downloaded from a map server via a network when used. The map information includes information on roads and intersections.

自動運転制御装置３０は、自動運転制御機能を実装した自動運転コントローラであり、自動運転における車両１００の行動を決定する。自動運転制御装置３０は、制御部３１、記憶部３２、Ｉ／Ｏ（Ｉｎｐｕｔ／Ｏｕｔｐｕｔ、入出力）部３３を備える。制御部３１の構成はハードウェア資源とソフトウェア資源の協働、またはハードウェア資源のみにより実現できる。ハードウェア資源としてプロセッサ、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、その他のＬＳＩを利用でき、ソフトウェア資源としてオペレーティングシステム、アプリケーション、ファームウェア等のプログラムを利用できる。記憶部３２は、フラッシュメモリ等の不揮発性記録媒体を備える。Ｉ／Ｏ部３３は、各種の通信フォーマットに応じた通信制御を実行する。例えば、Ｉ／Ｏ部３３は、自動運転に関する情報を運転支援装置４０に出力するとともに、制御コマンドを運転支援装置４０から入力する。また、Ｉ／Ｏ部３３は、検出情報を検出部２０から入力する。 The automatic driving control device 30 is an automatic driving controller that implements an automatic driving control function, and determines the behavior of the vehicle 100 in automatic driving. The automatic operation control device 30 includes a control unit 31, a storage unit 32, and an I / O (Input / Output, input / output) unit 33. The configuration of the control unit 31 can be realized by cooperation of hardware resources and software resources, or only by hardware resources. Processors, ROM (Read Only Memory), RAM (Random Access Memory), and other LSIs can be used as hardware resources, and programs such as operating systems, applications, and firmware can be used as software resources. The storage unit 32 includes a nonvolatile recording medium such as a flash memory. The I / O unit 33 executes communication control according to various communication formats. For example, the I / O unit 33 outputs information related to automatic driving to the driving support device 40 and inputs a control command from the driving support device 40. Further, the I / O unit 33 inputs detection information from the detection unit 20.

制御部３１は、運転支援装置４０から入力した制御コマンド、検出部２０あるいは各種ＥＣＵから収集した各種情報を自動運転アルゴリズムに適用して、車両１００のアクセルスロットル開度、ステアリング舵角等の自動制御対象を制御するための制御値を算出する。制御部３１は算出した制御値を、各制御対象のＥＣＵまたはコントローラに伝達する。本実施の形態ではステアリングＥＣＵ、ブレーキＥＣＵ、エンジンＥＣＵ、ウィンカコントローラに伝達する。なお電気自動車あるいはハイブリッドカーの場合、エンジンＥＣＵに代えてまたは加えてモータＥＣＵに制御値を伝達する。 The control unit 31 applies the control command input from the driving support device 40 and various information collected from the detection unit 20 or various ECUs to the automatic driving algorithm, and automatically controls the accelerator throttle opening, steering angle, etc. of the vehicle 100. A control value for controlling the object is calculated. The control unit 31 transmits the calculated control value to each control target ECU or controller. In this embodiment, it is transmitted to the steering ECU, the brake ECU, the engine ECU, and the winker controller. In the case of an electric vehicle or a hybrid car, the control value is transmitted to the motor ECU instead of or in addition to the engine ECU.

運転支援装置４０は、車両１００と乗員との間のインタフェース機能を実行するＨＭＩコントローラであり、制御部４１、記憶部４２、Ｉ／Ｏ部４３を備える。制御部４１は、ＨＭＩ制御等の各種データ処理を実行する。制御部４１は、ハードウェア資源とソフトウェア資源の協働、またはハードウェア資源のみにより実現できる。ハードウェア資源としてプロセッサ、ＲＯＭ、ＲＡＭ、その他のＬＳＩを利用でき、ソフトウェア資源としてオペレーティングシステム、アプリケーション、ファームウェア等のプログラムを利用できる。 The driving support device 40 is an HMI controller that executes an interface function between the vehicle 100 and an occupant, and includes a control unit 41, a storage unit 42, and an I / O unit 43. The control unit 41 executes various data processing such as HMI control. The control unit 41 can be realized by cooperation of hardware resources and software resources, or only by hardware resources. Processors, ROM, RAM, and other LSIs can be used as hardware resources, and programs such as an operating system, application, and firmware can be used as software resources.

記憶部４２は、制御部４１により参照され、または更新されるデータを記憶する記憶領域である。例えばフラッシュメモリ等の不揮発の記録媒体により実現される。Ｉ／Ｏ部４３は、各種の通信フォーマットに応じた各種の通信制御を実行する。Ｉ／Ｏ部４３は、操作信号入力部５０、画像・音声出力部５１、検出情報入力部５２、コマンドＩＦ（Ｉｎｔｅｒｆａｃｅ、インタフェース）５３、通信ＩＦ５６を備える。 The storage unit 42 is a storage area that stores data that is referred to or updated by the control unit 41. For example, it is realized by a non-volatile recording medium such as a flash memory. The I / O unit 43 executes various communication controls according to various communication formats. The I / O unit 43 includes an operation signal input unit 50, an image / audio output unit 51, a detection information input unit 52, a command IF (Interface) 53, and a communication IF 56.

操作信号入力部５０は、入力装置４に対してなされた乗員もしくは車外にいるユーザの操作による操作信号を入力装置４から受信し、制御部４１へ出力する。画像・音声出力部５１は、制御部４１が生成した画像データあるいは音声メッセージを報知装置２へ出力して表示させる。検出情報入力部５２は、検出部２０による検出処理の結果であり、車両１００の現在の周囲状況および走行状態を示す検出情報を検出部２０から受信し、制御部４１へ出力する。 The operation signal input unit 50 receives an operation signal from the input device 4 by an operation of an occupant or a user outside the vehicle made to the input device 4 and outputs the operation signal to the control unit 41. The image / sound output unit 51 outputs the image data or the voice message generated by the control unit 41 to the notification device 2 for display. The detection information input unit 52 is a result of the detection process by the detection unit 20, receives detection information indicating the current surrounding state and running state of the vehicle 100 from the detection unit 20, and outputs the detection information to the control unit 41.

コマンドＩＦ５３は、自動運転制御装置３０とのインタフェース処理を実行し、行動情報入力部５４とコマンド出力部５５を含む。行動情報入力部５４は、自動運転制御装置３０から送信された車両１００の自動運転に関する情報を受信し、制御部４１へ出力する。コマンド出力部５５は、自動運転制御装置３０に対して自動運転の態様を指示する制御コマンドを、制御部４１から受けつけて自動運転制御装置３０へ送信する。 The command IF 53 executes interface processing with the automatic driving control device 30 and includes an action information input unit 54 and a command output unit 55. The behavior information input unit 54 receives information regarding the automatic driving of the vehicle 100 transmitted from the automatic driving control device 30 and outputs the information to the control unit 41. The command output unit 55 receives a control command for instructing the automatic driving control device 30 from the automatic driving control device 30 and transmits the control command to the automatic driving control device 30.

通信ＩＦ５６は、無線装置８とのインタフェース処理を実行する。通信ＩＦ５６は、制御部４１から出力されたデータを無線装置８へ送信し、無線装置８から車外の装置へ送信させる。また、通信ＩＦ５６は、無線装置８により転送された、車外の装置からのデータを受信し、制御部４１へ出力する。 The communication IF 56 executes interface processing with the wireless device 8. The communication IF 56 transmits the data output from the control unit 41 to the wireless device 8 and causes the wireless device 8 to transmit to the device outside the vehicle. Further, the communication IF 56 receives data from a device outside the vehicle transferred by the wireless device 8 and outputs the data to the control unit 41.

なお、ここでは、自動運転制御装置３０と運転支援装置４０は別個の装置として構成される。変形例として、図１の破線で示すように、自動運転制御装置３０と運転支援装置４０を１つのコントローラに統合してもよい。言い換えれば、１つの自動運転制御装置が、図１の自動運転制御装置３０と運転支援装置４０の両方の機能を備える構成であってもよい。さらに、報知装置２、運転支援装置４０が組み合わされた提示システムとして構成されてもよい。 Here, the automatic driving control device 30 and the driving support device 40 are configured as separate devices. As a modified example, as shown by a broken line in FIG. 1, the automatic driving control device 30 and the driving support device 40 may be integrated into one controller. In other words, one automatic driving control device may be configured to have the functions of both the automatic driving control device 30 and the driving support device 40 of FIG. Furthermore, you may be comprised as a presentation system with which the alerting | reporting apparatus 2 and the driving assistance device 40 were combined.

図３は、制御部４１の構成を示す。制御部４１は、運転行動推定部７０、表示制御部７２、学習部７４を含む。運転行動推定部７０は、運転行動モデル８０、推定部８２、ヒストグラム生成部８４を含み、表示制御部７２は、生成部９０、処理部９２を含み、処理部９２は選択部９４を含む。 FIG. 3 shows the configuration of the control unit 41. The control unit 41 includes a driving action estimation unit 70, a display control unit 72, and a learning unit 74. The driving behavior estimation unit 70 includes a driving behavior model 80, an estimation unit 82, and a histogram generation unit 84, the display control unit 72 includes a generation unit 90 and a processing unit 92, and the processing unit 92 includes a selection unit 94.

運転行動推定部７０は、車両１００が実行しうる複数の運転行動の候補のうち、現在の状況において実現可能な運転行動を判定するために、予め学習により構築されたニューラルネットワーク（ＮＮ）を使用する。ここで、実現可能な運転行動は複数であってもよく、運転行動を判定することは運転行動を推定することともいえる。 The driving behavior estimation unit 70 uses a neural network (NN) that has been constructed in advance in order to determine driving behavior that can be realized in the current situation among a plurality of driving behavior candidates that the vehicle 100 can execute. To do. Here, there may be a plurality of driving behaviors that can be realized, and determining the driving behavior can be said to estimate the driving behavior.

運転行動推定部７０での処理には、図１のサーバ３００における運転行動学習部３１０も関連するので、ここでは、運転行動学習部３１０の処理をまず説明する。運転行動学習部３１０は、複数の運転者の運転履歴と走行履歴の少なくとも１つをパラメータとしてニューラルネットワークに入力する。また、運転行動学習部３１０は、ニューラルネットワークからの出力が、入力したパラメータに対応した教師付けデータに一致するように、ニューラルネットワークの重みを最適化する。運転行動学習部３１０は、このような処理を繰り返し実行することによって、運転行動モデル８０を生成する。つまり、運転行動モデル８０は、重みが最適化されたニューラルネットワークである。サーバ３００は、運転行動学習部３１０において生成した運転行動モデル８０をネットワーク３０２、無線装置８を介して運転支援装置４０に出力する。なお、運転行動学習部３１０は、新たなパラメータをもとに運転行動モデル８０を更新してもよい。その際、更新された運転行動モデル８０は、リアルタイムに運転支援装置４０へ出力されてもよいし、遅延をもって運転支援装置４０へ出力されてもよい。 Since the processing in the driving behavior estimation unit 70 is also related to the driving behavior learning unit 310 in the server 300 in FIG. 1, here, the processing in the driving behavior learning unit 310 will be described first. The driving behavior learning unit 310 inputs at least one of driving histories and traveling histories of a plurality of drivers as a parameter to the neural network. The driving behavior learning unit 310 optimizes the weight of the neural network so that the output from the neural network matches the supervised data corresponding to the input parameter. The driving behavior learning unit 310 generates the driving behavior model 80 by repeatedly executing such processing. That is, the driving behavior model 80 is a neural network with optimized weights. The server 300 outputs the driving behavior model 80 generated by the driving behavior learning unit 310 to the driving support device 40 via the network 302 and the wireless device 8. Note that the driving behavior learning unit 310 may update the driving behavior model 80 based on a new parameter. At that time, the updated driving behavior model 80 may be output to the driving support device 40 in real time, or may be output to the driving support device 40 with a delay.

運転行動学習部３１０によって生成され、かつ運転行動推定部７０に入力された運転行動モデル８０は、複数の運転者の運転履歴と走行履歴の少なくとも１つから構築したニューラルネットワークである。また、運転行動モデル８０は、複数の運転者の走行履歴と走行履歴から構築したニューラルネットワークを、特定の運転者の走行履歴と走行履歴を用いた転移学習により、構築し直したニューラルネットワークであってもよい。ニューラルネットワークの構築には公知の技術が使用されればよいので、ここでは説明を省略する。なお、図３の運転行動推定部７０には１つの運転行動モデル８０が含まれているが、運転者、乗員、走行シーン、天候、国ごとに複数の運転行動モデル８０が運転行動推定部７０に含まれていて、状況を自動的に判定するか、手動で変更することにより切りかえてもよい。 The driving behavior model 80 generated by the driving behavior learning unit 310 and input to the driving behavior estimation unit 70 is a neural network constructed from at least one of a driving history and a driving history of a plurality of drivers. The driving behavior model 80 is a neural network in which a neural network constructed from the driving histories and traveling histories of a plurality of drivers is reconstructed by transfer learning using the traveling histories and traveling histories of specific drivers. May be. Since a known technique may be used for the construction of the neural network, the description is omitted here. 3 includes one driving behavior model 80. However, a plurality of driving behavior models 80 are provided for each driver, passenger, traveling scene, weather, and country. The status may be automatically determined or switched by manual change.

推定部８２は、運転行動モデル８０を用いて、運転行動を推定する。ここで、運転履歴は、車両１００によって過去になされた複数の運転行動のそれぞれに対応した複数の特徴量（以下、「特徴量セット」という）を示す。運転行動に対応した複数の特徴量は、例えば、車両１００によって当該運転行動がなされた時点から所定時間前の時点における車両１００の走行状態を示す量である。特徴量は、例えば、同乗者数、車両１００の速さやその時系列、ハンドルの操舵量やその時系列、ブレーキの度合いやその時系列、アクセルの度合いやその時系列などである。運転履歴は、運転特性モデルといわれてもよい。そのため、特徴量は、例えば、速度に関する特徴量、ステアリングに関する特徴量、操作タイミングに関する特徴量、車外センシングに関する特徴量、または車内センシングに関する特徴量等である。これらの特徴量は、図１の検出部２０によって検出されて、Ｉ／Ｏ部４３経由で推定部８２に入力される。また、これらの特徴量は、複数の運転者の走行履歴と走行履歴に加えられ、新たにニューラルネットワークの再構築に用いてもよい。さらに、これらの特徴量は、特定の運転者の走行履歴と走行履歴に加えられ、新たにニューラルネットワークの再構築に用いてもよい。 The estimation unit 82 estimates driving behavior using the driving behavior model 80. Here, the driving history indicates a plurality of feature amounts (hereinafter referred to as “feature amount set”) corresponding to each of a plurality of driving actions performed by the vehicle 100 in the past. The plurality of feature amounts corresponding to the driving action are, for example, quantities indicating the driving state of the vehicle 100 at a time point a predetermined time before the driving action is performed by the vehicle 100. The feature amount is, for example, the number of passengers, the speed of the vehicle 100 and its time series, the steering amount and its time series of the steering wheel, the degree of brake and its time series, the degree of accelerator and its time series, and the like. The driving history may be referred to as a driving characteristic model. Therefore, the feature amount is, for example, a feature amount related to speed, a feature amount related to steering, a feature amount related to operation timing, a feature amount related to outside vehicle sensing, or a feature amount related to in-vehicle sensing. These feature amounts are detected by the detection unit 20 in FIG. 1 and input to the estimation unit 82 via the I / O unit 43. These feature amounts may be added to the driving history and the driving history of a plurality of drivers, and may be used for newly reconstructing a neural network. Furthermore, these feature amounts may be added to the travel history and travel history of a specific driver, and may be used for newly reconstructing a neural network.

走行履歴は、車両１００によって過去になされた複数の運転行動のそれぞれに対応した複数の環境パラメータ（以下、「環境パラメータセット」という）を示す。運転行動に対応した複数の環境パラメータは、例えば、車両１００によって当該運転行動がなされた時点から所定時間前の時点やその時点以前の所定範囲の時系列における車両１００の環境（周囲の状況）を示すパラメータである。環境パラメータは、例えば、自車両の速度、自車両に対する先行車両の相対速度、先行車をセンサがとらえる大きさ、および先行車両と自車両との車間距離などである。また、これらの環境パラメータは、図１の検出部２０によって検出されて、Ｉ／Ｏ部４３経由で推定部８２に入力される。また、これらの環境パラメータは、複数の運転者の走行履歴と走行履歴に加えられ、新たにニューラルネットワークの再構築に用いてもよい。さらに、これらの環境パラメータは、特定の運転者の走行履歴と走行履歴に加えられ、新たにニューラルネットワークの再構築に用いてもよい。 The travel history indicates a plurality of environmental parameters (hereinafter referred to as “environment parameter set”) corresponding to each of a plurality of driving actions performed by the vehicle 100 in the past. The plurality of environmental parameters corresponding to the driving behavior include, for example, the environment (surrounding conditions) of the vehicle 100 in a predetermined time range before a predetermined time from the time when the driving behavior is performed by the vehicle 100 or in a predetermined time range before that time. It is a parameter to show. The environmental parameters are, for example, the speed of the host vehicle, the relative speed of the preceding vehicle with respect to the host vehicle, the size at which the sensor detects the preceding vehicle, and the inter-vehicle distance between the preceding vehicle and the host vehicle. Further, these environmental parameters are detected by the detection unit 20 of FIG. 1 and input to the estimation unit 82 via the I / O unit 43. Further, these environmental parameters may be added to the traveling history and traveling history of a plurality of drivers, and may be used for newly reconstructing a neural network. Furthermore, these environmental parameters may be added to the travel history and travel history of a specific driver and used for reconstructing a neural network.

推定部８２は、運転履歴あるいは走行履歴に含まれる特徴量セットあるいは／および環境パラメータを取得する。推定部８２は、運転行動モデル８０のニューラルネットワークに特徴量セットあるいは／および環境パラメータを入力し、ニューラルネットワークからの出力を推定結果としてヒストグラム生成部８４に出力する。 The estimation unit 82 acquires a feature set or / and environmental parameters included in the driving history or the traveling history. The estimation unit 82 inputs a feature set or / and environmental parameters to the neural network of the driving behavior model 80, and outputs the output from the neural network to the histogram generation unit 84 as an estimation result.

ヒストグラム生成部８４は、推定部８２から、複数種類の運転行動と、各運転行動に対応する推定結果とを取得し、その運転行動に対する推定結果の累積値を示すヒストグラムを生成する。そのため、ヒストグラムには、複数種類の運転行動と、各運転行動に対応した累積値とが含まれる。ここで、累積値とは、運転行動に対する推定結果が導出された回数を累積した値である。 The histogram generation unit 84 acquires a plurality of types of driving behaviors and estimation results corresponding to the driving behaviors from the estimation unit 82, and generates a histogram indicating the cumulative value of the estimation results for the driving behaviors. Therefore, the histogram includes a plurality of types of driving behaviors and cumulative values corresponding to the driving behaviors. Here, the cumulative value is a value obtained by accumulating the number of times the estimation result for the driving action is derived.

図４は、ヒストグラム生成部８４において生成されるヒストグラムを示す。ヒストグラムには、一例として５種類の運転行動である運転行動Ａ〜Ｅが含まれる。また、運転行動Ａ〜Ｅのそれぞれに対する累積値が含まれる。ここでは、累積値が大きい順に、運転行動Ｃ、運転行動Ｅ、運転行動Ｂ、運転行動Ｄ、運転行動Ａであるとする。図３に戻る。ヒストグラム生成部８４は、生成したヒストグラムを生成部９０に出力する。 FIG. 4 shows a histogram generated by the histogram generator 84. The histogram includes driving actions A to E which are five kinds of driving actions as an example. In addition, a cumulative value for each of the driving actions A to E is included. Here, it is assumed that the driving action C, the driving action E, the driving action B, the driving action D, and the driving action A are in descending order of the cumulative value. Returning to FIG. The histogram generation unit 84 outputs the generated histogram to the generation unit 90.

生成部９０は、ヒストグラム生成部８４からヒストグラム、つまり複数種類の運転行動と、各運転行動に対応した累積値とを入力する。生成部９０は、複数種類の運転行動のそれぞれの累積値が大きい順に所定数の運転行動を選択する。例えば、生成部９０は、「５」の運転行動のうちから「３」の運転行動を選択する。なお、選択される運転行動の数は「３」に限定されない。選択された運転行動は、ヒストグラム生成部８４からの複数種類の運転行動のうち、一部の運転行動ともいえる。なお、先行車が減速した場合に、加速して車間距離を詰めるといった危険を及ぼす運転行動を除外するなど、交通安全に沿ったルールベースに基づいて推定された安全な運転行動に限ってもよい。生成部９０は、選択した運転行動が示された提示情報を生成する。図４の場合、提示情報には、運転行動Ｃ、運転行動Ｅ、運転行動Ｂが含まれており、提示情報では、累積値が大きい順にこれらの運転行動が並べられているものとする。生成部９０は、生成した提示情報を処理部９２に出力する。 The generation unit 90 inputs a histogram from the histogram generation unit 84, that is, a plurality of types of driving behaviors, and a cumulative value corresponding to each driving behavior. The generation unit 90 selects a predetermined number of driving actions in descending order of the cumulative value of each of the plurality of types of driving actions. For example, the generation unit 90 selects the driving action “3” from the driving actions “5”. Note that the number of selected driving actions is not limited to “3”. It can be said that the selected driving behavior is part of the plurality of types of driving behavior from the histogram generation unit 84. In addition, when the preceding vehicle decelerates, it may be limited to safe driving behavior estimated based on a rule base in line with traffic safety, such as excluding driving behavior that creates a danger of accelerating and closing the distance between vehicles . The generation unit 90 generates presentation information indicating the selected driving action. In the case of FIG. 4, the presentation information includes driving behavior C, driving behavior E, and driving behavior B, and in the presentation information, these driving behaviors are arranged in descending order of the cumulative value. The generation unit 90 outputs the generated presentation information to the processing unit 92.

処理部９２は、生成部９０からの提示情報を受けつける。処理部９２は、図１の画像・音声出力部５１を介して、図２のヘッドアップディスプレイ２ａあるいはセンタディスプレイ２ｂに提示情報を出力する。ヘッドアップディスプレイ２ａあるいはセンタディスプレイ２ｂは、提示情報の画像を表示する。なお、処理部９２は、図１の画像・音声出力部５１を介して、図２のスピーカ６に提示情報を出力してもよい。その際、スピーカ６は、提示情報の音声メッセージを出力する。 The processing unit 92 receives presentation information from the generation unit 90. The processing unit 92 outputs the presentation information to the head-up display 2a or the center display 2b in FIG. 2 via the image / sound output unit 51 in FIG. The head-up display 2a or the center display 2b displays an image of presentation information. The processing unit 92 may output the presentation information to the speaker 6 in FIG. 2 via the image / sound output unit 51 in FIG. At that time, the speaker 6 outputs a voice message of the presentation information.

図５（ａ）−（ｃ）は、表示制御部７２の処理概要を示す。図５（ａ）は、センタディスプレイ２ｂにおいて表示される提示情報の画像を示す。この提示情報の画像は図４をもとに生成されており、累積値が大きい順番に、運転行動Ｃ、運転行動Ｅ、運転行動Ｂが上から下に並んで配置される。また、累積値が大きいほど、文字のサイズが大きくされる。つまり、センタディスプレイ２ｂに表示される画像では、提示情報において前方に配置された運転行動ほど、画面の上方に配置されるとともに、文字のサイズが大きくされる。これらは、累積値である信頼度が高い運転行動ほど、乗員に選択されやすくするためである。 5A to 5C show an outline of processing of the display control unit 72. FIG. FIG. 5A shows an image of presentation information displayed on the center display 2b. The image of the presentation information is generated based on FIG. 4, and the driving action C, the driving action E, and the driving action B are arranged in order from the top to the bottom in the descending order of the cumulative value. Also, the larger the accumulated value, the larger the character size. That is, in the image displayed on the center display 2b, the driving action arranged forward in the presentation information is arranged at the top of the screen and the character size is increased. These are for the driving behavior with higher reliability, which is a cumulative value, to be more easily selected by the occupant.

なお、ヘッドアップディスプレイ２ａに提示情報の画像が表示される場合、当該画像は、図５（ａ）と同様である。図５（ｂ）は、センタディスプレイ２ｂにおいて表示される提示情報の画像であって、かつ運転行動Ｃ、運転行動Ｅ、運転行動Ｂを実際の運転行動に対応付けた場合の提示情報の画像を示す。ここでは、運転行動Ｃは「直進」に対応し、運転行動Ｅは「右折」に対応し、運転行動Ｂは「左側に車線変更」に対応するとする。以下では、説明を明瞭にするために、図５（ａ）を説明の対象とする。図５（ｃ）の説明は後述し、図３に戻る。 In addition, when the image of presentation information is displayed on the head up display 2a, the said image is the same as that of Fig.5 (a). FIG. 5B is an image of presentation information displayed on the center display 2b, and an image of presentation information when driving behavior C, driving behavior E, and driving behavior B are associated with actual driving behavior. Show. Here, it is assumed that the driving action C corresponds to “straight ahead”, the driving action E corresponds to “turn right”, and the driving action B corresponds to “change lane to left”. In the following, for clarity of explanation, FIG. The description of FIG. 5C will be described later, and the description returns to FIG.

乗員、例えば運転手は、センタディスプレイ２ｂに表示された提示情報の画像において示された複数種類の運転行動から１つの運転行動を選択する場合、入力装置４に対して選択結果を入力する。例えば、乗員は、第１操作部４ａあるいは第２操作部４ｂを操作して１つの運転行動を選択する。また、センタディスプレイ２ｂがタッチパネルである場合、乗員は、センタディスプレイ２ｂに表示された提示情報の画像の中から、１つの運転行動の表示部分をタッチすることによって、１つの運転行動を選択する。さらに、ヘッドアップディスプレイ２ａに提示情報の画像が表示されている場合、乗員は、画像の中から、１つの運転行動の表示部分を選択するようなジェスチャーを実行すると、第３操作部４ｃはジェスチャーに応じた１つの運転行動を選択する。操作信号入力部５０には、入力装置４からの操作信号であって、かつ乗員によって選択された１つの運転行動を示す操作信号が入力される。このように操作信号入力部５０に操作信号が入力される場合は、「手動選択状態」と呼ばれる。 An occupant, for example, a driver, inputs a selection result to the input device 4 when selecting one driving action from a plurality of types of driving actions shown in the image of the presentation information displayed on the center display 2b. For example, the occupant selects one driving action by operating the first operation unit 4a or the second operation unit 4b. When the center display 2b is a touch panel, the occupant selects one driving action by touching a display portion of one driving action from the image of the presentation information displayed on the center display 2b. Furthermore, when the image of the presentation information is displayed on the head-up display 2a, when the occupant executes a gesture for selecting one display portion of the driving action from the image, the third operation unit 4c One driving action corresponding to the is selected. An operation signal that is an operation signal from the input device 4 and that indicates one driving action selected by the occupant is input to the operation signal input unit 50. When an operation signal is input to the operation signal input unit 50 in this way, it is called a “manual selection state”.

乗員は、センタディスプレイ２ｂに表示された提示情報の画像において示された複数種類の運転行動から１つの運転行動を選択しない場合、入力装置４に対して選択結果を入力しない。そのため、操作信号入力部５０には、提示行動を出力してから所定期間において、入力装置４からの操作信号が入力されない。操作信号入力部５０に操作信号が入力されない場合、処理部９２は、選択部９４に対して１つの運転行動の選択の実行を指示する。選択部９４は、処理部９２からの指示を受けつけた場合、提示情報に含まれた複数種類の運転行動のうちの１つの運転行動を選択する。ここでは、例えば、最も累積値の大きい運動行動が選択される。これは、提示情報に配置された複数種類の運転行動のうち、先頭の運転行動が選択されることに相当する。このように操作信号入力部５０に操作信号が入力されない場合は、「自動選択状態」と呼ばれる。 When the occupant does not select one driving action from a plurality of types of driving actions shown in the image of the presentation information displayed on the center display 2b, the occupant does not input the selection result to the input device 4. Therefore, the operation signal input unit 50 does not receive an operation signal from the input device 4 for a predetermined period after the presentation action is output. When the operation signal is not input to the operation signal input unit 50, the processing unit 92 instructs the selection unit 94 to execute selection of one driving action. When the selection unit 94 receives an instruction from the processing unit 92, the selection unit 94 selects one driving behavior among a plurality of types of driving behaviors included in the presentation information. Here, for example, the exercise action having the largest cumulative value is selected. This is equivalent to selecting the first driving action among a plurality of types of driving actions arranged in the presentation information. When no operation signal is input to the operation signal input unit 50 as described above, this is called an “automatic selection state”.

処理部９２は、自動選択状態の場合、選択した運転行動に対応した制御コマンドをコマンド出力部５５経由で自動運転制御装置３０に出力する。一方、処理部９２は、手動選択状態の場合、操作信号で示された運転行動に対応した制御コマンドをコマンド出力部５５経由で自動運転制御装置３０に出力する。図１の自動運転制御装置３０は、制御コマンドに対応した運転行動をもとに、車両１００の自動運転を制御する。 In the automatic selection state, the processing unit 92 outputs a control command corresponding to the selected driving action to the automatic driving control device 30 via the command output unit 55. On the other hand, in the manual selection state, the processing unit 92 outputs a control command corresponding to the driving behavior indicated by the operation signal to the automatic driving control device 30 via the command output unit 55. The automatic driving control device 30 in FIG. 1 controls the automatic driving of the vehicle 100 based on the driving behavior corresponding to the control command.

学習部７４は、自動選択状態の場合、選択部９４において選択した１つの運転行動に正の報酬「α」を付与する。一方、学習部７４は、手動選択状態の場合、操作信号において示された１つの運転行動に正の報酬「β」を付与する。ここで、自動選択状態の場合において付与する正の報酬「α」の値よりも、手動選択状態の場合において付与される正の報酬「β」の値を大きくする。例えば、前者が「＋０．５」とされ、後者が「＋０．７」とされる。学習部７４は、１つの運転行動に報酬を付与しながら強化学習を実行することによって運転行動モデル８０を更新する。強化学習については公知の技術が使用されればよいので、ここでは説明を省略するが、報酬が反映されるように、運転行動モデル８０におけるニューラルネットワークの重みが調節されることに相当する。 In the automatic selection state, the learning unit 74 gives a positive reward “α” to one driving action selected by the selection unit 94. On the other hand, in the manual selection state, the learning unit 74 gives a positive reward “β” to one driving action indicated in the operation signal. Here, the value of the positive reward “β” given in the case of the manual selection state is made larger than the value of the positive reward “α” given in the case of the automatic selection state. For example, the former is “+0.5” and the latter is “+0.7”. The learning unit 74 updates the driving behavior model 80 by executing reinforcement learning while giving a reward to one driving behavior. Since a known technique may be used for reinforcement learning, a description thereof is omitted here, but this corresponds to adjustment of the weight of the neural network in the driving behavior model 80 so that the reward is reflected.

ここで、学習部７４は、手動選択状態の場合、提示情報に含まれた複数種類の運転行動のうち、操作信号において示された１つの運転行動以外の運転行動、つまり乗員によって選択されなかった運転行動に負の報酬を付与してもよい。その際、提示情報に含まれなかった運転行動に負の報酬が付与されなくてもよく、付与されてもよい。ここで、負の報酬は「−β」と示されるが、他の値であってもよい。学習部７４は、負の報酬も付与しながら強化学習を実行する。一方、自動選択状態の場合、学習部７４は、選択部９４において選択した１つの運転行動以外の運転行動に負の報酬を付与しない。 Here, in the case of the manual selection state, the learning unit 74 is not selected by one of the driving behaviors other than the one driving behavior indicated in the operation signal among the plurality of types of driving behaviors included in the presentation information, that is, by the occupant. A negative reward may be given to driving behavior. In that case, a negative reward may not be given to the driving action which was not included in presentation information, and may be given. Here, the negative reward is indicated as “−β”, but may be other values. The learning unit 74 performs reinforcement learning while also giving a negative reward. On the other hand, in the automatic selection state, the learning unit 74 does not give a negative reward to driving behaviors other than the one driving behavior selected by the selection unit 94.

ここでは、学習部７４において付与される報酬について、図５（ｃ）を使用しながらさらに詳細に説明する。ここでは、前提として、図５（ａ）に示すような提示情報の画像が表示されているとする。図５（ｃ）のパターン「１」は、自動選択状態に相当する。乗員が運転行動を選択しなければ、選択部９４は運転行動Ｃを選択する。その結果、学習部７４は、運転行動Ｃに正の報酬「＋α」を付与する。一方、図５（ｃ）のパターン「２」から「４」は、手動選択状態に相当する。パターン「２」において、乗員が運転行動Ｃを選択した場合、学習部７４は、運転行動Ｃに正の報酬「＋β」を付与する。パターン「３」において、乗員が運転行動Ｅを選択した場合、学習部７４は、運転行動Ｅに正の報酬「＋β」を付与し、運転行動Ｃに負の報酬「−β」を付与する。パターン「４」において、乗員が運転行動Ｂを選択した場合、学習部７４は、運転行動Ｂに正の報酬「＋β」を付与し、運転行動Ｃ、Ｅに負の報酬「−β」を付与する。 Here, the reward given in the learning part 74 is demonstrated still in detail, using FIG.5 (c). Here, it is assumed that an image of presentation information as shown in FIG. 5A is displayed as a premise. The pattern “1” in FIG. 5C corresponds to the automatic selection state. If the occupant does not select the driving action, the selection unit 94 selects the driving action C. As a result, the learning unit 74 gives a positive reward “+ α” to the driving action C. On the other hand, patterns “2” to “4” in FIG. 5C correspond to the manual selection state. When the occupant selects the driving action C in the pattern “2”, the learning unit 74 gives a positive reward “+ β” to the driving action C. In the pattern “3”, when the occupant selects the driving action E, the learning unit 74 gives a positive reward “+ β” to the driving action E and gives a negative reward “−β” to the driving action C. In the pattern “4”, when the occupant selects the driving action B, the learning unit 74 gives a positive reward “+ β” to the driving action B, and gives a negative reward “−β” to the driving actions C and E. To do.

以上の構成による運転支援装置４０の動作を説明する。図６は、制御部４１による処理手順を示すフローチャートである。推定部８２は、複数の運転行動を推定する（Ｓ１０）。生成部９０は、累積値が大きい順に所定の運転行動を選択する（Ｓ１２）。報知装置２は、提示情報を表示する（Ｓ１４）。操作信号入力部５０に操作信号が入力された場合（Ｓ１６のＹ）、学習部７４は、操作信号に示された運転行動に正の報酬「＋β」を付与し（Ｓ１８）、提示情報中の他の運転行動で操作信号に示された運転行動より累積地が大きい運転行動に負の報酬「−β」を付与する（Ｓ２０）。操作信号入力部５０に操作信号が入力されない場合（Ｓ１６のＮ）、選択部９４は、累積値が最大の運転行動を選択する（Ｓ２２）。学習部７４は、選択した運転行動に正の報酬「＋α」を付与する（Ｓ２４）。 The operation of the driving support device 40 having the above configuration will be described. FIG. 6 is a flowchart illustrating a processing procedure performed by the control unit 41. The estimation unit 82 estimates a plurality of driving behaviors (S10). The generation unit 90 selects a predetermined driving action in descending order of the cumulative value (S12). The notification device 2 displays the presentation information (S14). When an operation signal is input to the operation signal input unit 50 (Y in S16), the learning unit 74 gives a positive reward “+ β” to the driving action indicated in the operation signal (S18), A negative reward “−β” is given to a driving action having a cumulative location larger than that of the driving action indicated by the operation signal in another driving action (S20). When the operation signal is not input to the operation signal input unit 50 (N in S16), the selection unit 94 selects the driving action having the maximum accumulated value (S22). The learning unit 74 gives a positive reward “+ α” to the selected driving action (S24).

本実施の形態によれば、操作信号が入力されない場合の報酬の値よりも、操作信号が入力された場合の報酬の値を大きくするので、乗員が積極的に選択したときの運転行動の信頼性を高くできる。また、乗員が積極的に選択したときの運転行動の信頼性が高くなるので、乗員の意図を反映するような運転行動を導出できる。また、乗員に選択された運転行動以外の運転行動に負の報酬を付与するので、乗員が選択しなかった運転行動の信頼度を低くできる。また、乗員に選択された１つの運転行動より信頼度が高いとシステムが推定していた運転行動に負の報酬を付与するので、システムが信頼度が高いと推定したが乗員が選択した運転行動の信頼性と、乗員が選択しなかった運転行動の信頼性との差を大きくできる。また、乗員が選択した運転行動の信頼性と、システムが信頼度が高いと推定したが乗員が選択しなかった運転行動の信頼性との差が大きくなるので、乗員の意志をさらに反映できる。 According to the present embodiment, since the reward value when the operation signal is input is larger than the reward value when the operation signal is not input, the reliability of the driving behavior when the occupant actively selects Can increase the sex. In addition, since the reliability of the driving action when the occupant actively selects, the driving action that reflects the occupant's intention can be derived. In addition, since a negative reward is given to the driving behavior other than the driving behavior selected by the occupant, the reliability of the driving behavior not selected by the occupant can be lowered. In addition, since the system gives a negative reward to the driving behavior that the system has estimated to be more reliable than the driving behavior selected by the occupant, the driving behavior selected by the occupant is estimated to be highly reliable. The difference between the reliability of the vehicle and the reliability of the driving behavior not selected by the occupant can be increased. In addition, since the difference between the reliability of the driving behavior selected by the occupant and the reliability of the driving behavior that the occupant did not select is estimated that the system is highly reliable, the will of the occupant can be further reflected.

また、提示情報に含まれた一部の運転行動のうち、選択されなかった運転行動に負の報酬を付与するので、提示情報に含まれなかった運転行動に報酬を付与しなくできる。また、提示情報に含まれなかった運転行動に報酬が付与されないので、乗員の意図と関係なく信頼性が変化することを抑制できる。また、操作信号が入力されない場合の正の報酬の値よりも、操作信号入力された場合の正の報酬の値を大きくするので、乗員の意図を反映するような運転行動を実行できる。 Moreover, since a negative reward is given to the driving action which was not selected among some driving actions contained in presentation information, it is not possible to give a reward to the driving action which was not included in presentation information. Moreover, since a reward is not given to the driving | running action which was not contained in presentation information, it can suppress that reliability changes irrespective of a passenger | crew's intent. Further, since the value of the positive reward when the operation signal is input is made larger than the value of the positive reward when the operation signal is not input, it is possible to execute a driving action that reflects the occupant's intention.

（実施の形態２）
次に、実施の形態２を説明する。実施の形態２は、実施の形態１と同様に、提示情報の画像を表示するとともに、選択された運転行動に報酬を与えながら強化学習を実行する運転支援装置に関する。実施の形態１においては、自動選択状態であるか、手動選択状態であるかに応じて報酬の値を変えて、強化学習が実行される。一方、実施の形態２は、手動運転状態である場合を説明の対象とする。前述のごとく、提示情報には、推定された複数種類の運転行動のうち、所定数の運転行動が含まれる。乗員は、提示情報に含まれた運転行動を選択可能であるが、提示情報に含まれていない運転行動を選択できない。このような運転行動の信頼度は向上しないので、当該運転行動は推定されにくくなる。つまり、乗員は、そのような運転行動を意図的に除外していなくても除外されてしまうので、乗員の意図が反映されない。実施の形態２は、乗員の意図を反映させやすくなる提示情報の生成について説明する。実施の形態２に係る車両１００は図１、図２と同様のタイプであり、実施の形態２に係る制御部４１は図３と同様のタイプである。 (Embodiment 2)
Next, a second embodiment will be described. As in the first embodiment, the second embodiment relates to a driving support device that displays an image of presentation information and performs reinforcement learning while rewarding a selected driving action. In the first embodiment, the reinforcement learning is executed by changing the value of the reward depending on whether the state is the automatic selection state or the manual selection state. On the other hand, in the second embodiment, the case where the vehicle is in the manual operation state will be described. As described above, the presentation information includes a predetermined number of driving actions among the estimated plural kinds of driving actions. The occupant can select a driving action included in the presentation information, but cannot select a driving action not included in the presentation information. Since the reliability of such driving behavior is not improved, the driving behavior is difficult to be estimated. In other words, since the occupant is excluded even if such driving behavior is not intentionally excluded, the intention of the occupant is not reflected. Embodiment 2 demonstrates the production | generation of the presentation information which becomes easy to reflect a passenger | crew's intent. The vehicle 100 according to the second embodiment is the same type as that shown in FIGS. 1 and 2, and the control unit 41 according to the second embodiment is the same type as that shown in FIG.

図３の運転行動推定部７０は、実施の形態１と同様に、複数種類の運転行動と、各運転行動に対応した累積値とが含まれたヒストグラムを出力する。生成部９０は、ヒストグラムに含まれた複数種類の運転行動のそれぞれの累積値が大きい順に所定数の運転行動を選択する。生成部９０は、選択した運転行動が示された提示情報を生成する。この提示情報は実施の形態１と同様であるが、ここでは「第１提示情報」という。第１提示情報では、例えば、「５」の運転行動のうちから、累積値が大きい方から「３」の運転行動を含む。 The driving action estimation unit 70 in FIG. 3 outputs a histogram including a plurality of types of driving actions and cumulative values corresponding to the driving actions, as in the first embodiment. The generation unit 90 selects a predetermined number of driving behaviors in descending order of the cumulative value of each of a plurality of types of driving behaviors included in the histogram. The generation unit 90 generates presentation information indicating the selected driving action. This presentation information is the same as that in the first embodiment, but is referred to as “first presentation information” here. The first presentation information includes, for example, the driving action “3” from the larger cumulative value among the driving actions “5”.

一方、生成部９０は、所定の頻度、例えば１０回に１回の割合で、第１提示情報の代わりに、第２提示情報を生成する。第２提示情報では、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、累積値の小さい運転行動が含められる。具体的に説明すると、所定数が「３」である場合、第１提示情報に含めるべき３つの運転行動は、累積値の大きさが１番目の運転行動、累積値の大きさが２番目の運動行動、累積値の大きさが３番目の運転行動である。生成部９０は、これらのうちの１つの運転行動、例えば、累積値の大きさが３番目の運転行動の代わりに、累積値の大きさが４番目の運転行動、あるいは累積値の大きさが５番目の運転行動を含めるように第２提示情報を生成する。例えば、累積値の大きさが４番目の運転行動と、累積値の大きさが５番目の運転行動は、第２提示情報の生成ごとに所定の割合で交互に含められればよい。生成部９０は、生成した第１提示情報、あるいは第２提示情報を処理部９２に出力する。 On the other hand, the generation unit 90 generates the second presentation information instead of the first presentation information at a predetermined frequency, for example, once every ten times. In the second presentation information, a driving action with a small cumulative value is included instead of a part of the predetermined number of driving actions to be included in the first presentation information. More specifically, when the predetermined number is “3”, the three driving actions to be included in the first presentation information are the first driving action having the cumulative value and the second driving action having the second cumulative value. The movement behavior and the cumulative value are the third driving behavior. The generation unit 90 is one of these driving actions, for example, the driving action having the fourth cumulative value, or the driving value having the fourth cumulative value, instead of the third driving action having the cumulative value. Second presentation information is generated so as to include the fifth driving action. For example, the driving behavior with the fourth cumulative value and the driving behavior with the fifth cumulative value may be included alternately at a predetermined ratio every time the second presentation information is generated. The generation unit 90 outputs the generated first presentation information or second presentation information to the processing unit 92.

処理部９２は、図１の画像・音声出力部５１に第１提示情報あるいは第２提示情報を出力し、画像・音声出力部５１は、図２のヘッドアップディスプレイ２ａあるいはセンタディスプレイ２ｂに第１提示情報あるいは第２提示情報を出力する。ヘッドアップディスプレイ２ａあるいはセンタディスプレイ２ｂは、第１提示情報あるいは第２提示情報の画像を表示する。 The processing unit 92 outputs the first presentation information or the second presentation information to the image / sound output unit 51 in FIG. 1, and the image / sound output unit 51 outputs the first presentation information to the head-up display 2a or the center display 2b in FIG. Presentation information or second presentation information is output. The head-up display 2a or the center display 2b displays an image of the first presentation information or the second presentation information.

図７（ａ）−（ｅ）は、実施の形態２に係る表示制御部７２の処理概要を示す。図７（ａ）は、センタディスプレイ２ｂにおいて表示される第１提示情報の画像を示す。これは、図５（ａ）と同一である。図７（ｂ）は、センタディスプレイ２ｂにおいて表示される第２提示情報の画像を示す。第１提示情報との比較を容易にするために、この第２提示情報の画像は図４をもとに生成されている。累積値が大きい順番に、運転行動Ｃ、運転行動Ｅが上から下に並んで配置される。また、累積値が最も小さい運転行動Ａが、最も下に配置されるとともに、最も小さい文字のサイズにされる。図７（ｃ）−（ｅ）の説明は後述し、図３に戻る。 FIGS. 7A to 7E show an outline of processing of the display control unit 72 according to the second embodiment. FIG. 7A shows an image of the first presentation information displayed on the center display 2b. This is the same as FIG. FIG. 7B shows an image of the second presentation information displayed on the center display 2b. In order to facilitate comparison with the first presentation information, the image of the second presentation information is generated based on FIG. The driving action C and the driving action E are arranged side by side from top to bottom in order of increasing cumulative value. In addition, the driving action A having the smallest cumulative value is arranged at the bottom and has the smallest character size. 7 (c)-(e) will be described later, and the description returns to FIG.

乗員は、センタディスプレイ２ｂに表示された第１提示情報あるいは第２提示情報の画像において示された複数種類の運転行動から１つの運転行動を選択しない場合、入力装置４に対して選択結果を入力しない。これが前述の自動選択状態であり、操作信号入力部５０には入力装置４からの操作信号が入力されない。その場合、選択部９４は、第１提示情報あるいは第２提示情報に含まれた複数種類の運転行動のうちの１つの運転行動を選択する。具体的に説明すると、選択部９４は、第１提示情報あるいは第２提示情報に含まれた所定数の運転行動のうち、予め定められた順番、例えば、画像において最も上に配置された１つの運転行動を選択する。このような選択部９４の処理は実施の形態１と同様である。このような選択部９４の動作を考慮すると、生成部９０は、第２提示情報を生成する際、選択部９４において選択される順番以外の順番に、累積値の小さい運転行動を配置させるといえる。図７（ｂ）においては、累積値の最も小さい運転行動が最も下に配置される。 When the occupant does not select one driving action from a plurality of types of driving actions shown in the image of the first presentation information or the second presentation information displayed on the center display 2b, the occupant inputs a selection result to the input device 4. do not do. This is the aforementioned automatic selection state, and the operation signal from the input device 4 is not input to the operation signal input unit 50. In that case, the selection unit 94 selects one driving action among a plurality of types of driving actions included in the first presentation information or the second presentation information. More specifically, the selection unit 94 has a predetermined order among the predetermined number of driving actions included in the first presentation information or the second presentation information, for example, the one arranged at the top in the image. Select driving behavior. Such processing of the selection unit 94 is the same as that of the first embodiment. Considering such an operation of the selection unit 94, when generating the second presentation information, the generation unit 90 can be said to arrange driving actions having a small cumulative value in an order other than the order selected by the selection unit 94. . In FIG. 7B, the driving action having the smallest cumulative value is arranged at the bottom.

一方、乗員は、センタディスプレイ２ｂに表示された第１提示情報あるいは第２提示情報の画像において示された複数種類の運転行動から１つの運転行動を選択する場合、入力装置４に対して選択結果を入力する。これが前述の手動選択状態であり、操作信号入力部５０には入力装置４からの操作信号が入力される。なお、第１提示情報が表示された場合の操作信号を「第１操作信号」といい、第２提示情報が表示された場合の操作信号を「第２操作信号」ということもある。これに続いて、処理部９２は、制御コマンドをコマンド出力部５５を経由で自動運転制御装置３０に出力するが、実施の形態１と同様であるので、ここでは説明を省略する。 On the other hand, when the occupant selects one driving action from a plurality of types of driving actions shown in the image of the first presentation information or the second presentation information displayed on the center display 2b, a selection result is input to the input device 4. Enter. This is the above-described manual selection state, and the operation signal input unit 50 receives an operation signal from the input device 4. The operation signal when the first presentation information is displayed is referred to as a “first operation signal”, and the operation signal when the second presentation information is displayed may be referred to as a “second operation signal”. Subsequently, the processing unit 92 outputs a control command to the automatic operation control device 30 via the command output unit 55, but since it is the same as in the first embodiment, the description thereof is omitted here.

学習部７４は、自動選択状態において、第１提示情報が出力された場合であるか、あるいは第２提示情報が出力された場合であるかにかかわらず、実施の形態１と同様の処理を実行する。また、学習部７４は、手動選択状態において、第１操作信号が入力された場合、乗員に選択された１つの運転行動に報酬を付与する。この報酬の付与も実施の形態１と同様であり、第１操作信号において示された１つの運転行動に正の報酬が付与される。しかしながら、学習部７４は、手動選択状態において、第１提示情報に含まれた複数種類の運転行動のうち、選択されなかった運転行動に負の報酬を付与しない。 The learning unit 74 performs the same processing as in the first embodiment regardless of whether the first presentation information is output or the second presentation information is output in the automatic selection state. To do. Further, the learning unit 74 gives a reward to one driving action selected by the occupant when the first operation signal is input in the manual selection state. This reward is also given in the same manner as in the first embodiment, and a positive reward is given to one driving action indicated in the first operation signal. However, the learning unit 74 does not give a negative reward to a driving action that has not been selected among a plurality of types of driving actions included in the first presentation information in the manual selection state.

学習部７４は、手動選択状態において、第２操作信号が入力された場合、乗員に選択された１つの運転行動に報酬を付与する。この報酬の付与も実施の形態１と同様であり、第２操作信号において示された１つの運転行動に正の報酬が付与される。一方、学習部７４は、手動選択状態において、第２提示情報に含まれた複数種類の運転行動のうち、選択されなかった運転行動に負の報酬を付与する。ここで、負の報酬は「−β」と示されるが、他の値であってもよい。 When the second operation signal is input in the manual selection state, the learning unit 74 gives a reward to one driving action selected by the occupant. This reward is also given in the same manner as in the first embodiment, and a positive reward is given to one driving action indicated in the second operation signal. On the other hand, in the manual selection state, the learning unit 74 gives a negative reward to a driving action that is not selected among a plurality of types of driving actions included in the second presentation information. Here, the negative reward is indicated as “−β”, but may be other values.

ここでは、学習部７４において付与される報酬について、図７（ｃ）−（ｄ）を使用しながらさらに詳細に説明する。ここでは、手動選択状態のみを説明の対象にする。図７（ｃ）の前提として、図７（ａ）に示すような第１提示情報の画像が表示されているとする。パターン「１」において、乗員が運転行動Ｃを選択した場合、学習部７４は運転行動Ｃに正の報酬「＋β」を付与する。パターン「２」において、乗員が運転行動Ｅを選択した場合、学習部７４は運転行動Ｅに正の報酬「＋β」を付与する。パターン「３」において、乗員が運転行動Ｂを選択した場合、学習部７４は運転行動Ｂに正の報酬「＋β」を付与する。 Here, the reward given in the learning part 74 is demonstrated still in detail, using FIG.7 (c)-(d). Here, only the manual selection state will be described. Assume that an image of the first presentation information as shown in FIG. 7A is displayed as a premise of FIG. In the pattern “1”, when the occupant selects the driving action C, the learning unit 74 gives a positive reward “+ β” to the driving action C. In the pattern “2”, when the occupant selects the driving action E, the learning unit 74 gives a positive reward “+ β” to the driving action E. In the pattern “3”, when the occupant selects the driving action B, the learning unit 74 gives the driving action B a positive reward “+ β”.

また、図７（ｄ）の前提として、図７（ｂ）に示すような第２提示情報の画像が表示されているとする。パターン「４」において、乗員が運転行動Ｃを選択した場合、学習部７４は、運転行動Ｃに正の報酬「＋β」を付与する。パターン「５」において、乗員が運転行動Ｅを選択した場合、学習部７４は、運転行動Ｅに正の報酬「＋β」を付与し、運転行動Ｃ、Ａに負の報酬「−β」を付与する。パターン「６」において、乗員が運転行動Ａを選択した場合、学習部７４は、運転行動Ａに正の報酬「＋β」を付与し、運転行動Ｃ、Ｅに負の報酬「−β」を付与する。図３に戻る。 Further, as an assumption of FIG. 7D, it is assumed that an image of the second presentation information as shown in FIG. 7B is displayed. In the pattern “4”, when the occupant selects the driving action C, the learning unit 74 gives a positive reward “+ β” to the driving action C. In the pattern “5”, when the occupant selects the driving action E, the learning unit 74 gives a positive reward “+ β” to the driving action E and gives a negative reward “−β” to the driving actions C and A. To do. In the pattern “6”, when the occupant selects the driving action A, the learning unit 74 gives a positive reward “+ β” to the driving action A, and gives a negative reward “−β” to the driving actions C and E. To do. Returning to FIG.

これまでの生成部９０は、第２提示情報を生成する際、選択部９４において選択される順番以外の順番に、累積値の小さい運転行動を配置させている。なお、選択部９４において選択される順番の一例は、画像において最も上である。しかしながら、生成部９０は、第２提示情報を生成する際、選択部９４において選択される順番に、累積値の小さい運転行動を配置させてもよい。図７（ｅ）は、このような場合にセンタディスプレイ２ｂにおいて表示される第２提示情報の画像を示す。これまでとの比較を容易にするために、この第２提示情報の画像は図４をもとに生成されている。累積値が最も小さい運転行動Ａが最も上に配置されるとともに、最も大きい文字のサイズにされる。また、これの下に、累積値が大きい順番に、運転行動Ｃ、運転行動Ｅが上から下に並んで配置される。 When generating the second presentation information, the generation unit 90 so far arranges driving behaviors with small cumulative values in an order other than the order selected by the selection unit 94. Note that an example of the order of selection in the selection unit 94 is the highest in the image. However, when generating the second presentation information, the generation unit 90 may arrange driving actions having a small cumulative value in the order selected by the selection unit 94. FIG. 7E shows an image of the second presentation information displayed on the center display 2b in such a case. In order to facilitate comparison with the past, the image of the second presentation information is generated based on FIG. The driving action A having the smallest cumulative value is arranged at the top, and the size of the largest character is set. Also, below this, the driving action C and the driving action E are arranged in order from the top to the bottom in the order of the cumulative value.

以上の構成による運転支援装置４０の動作を説明する。図８は、実施の形態２に係る制御部４１による生成手順を示すフローチャートである。生成部９０はｉ＝１と設定する（Ｓ１００）。ｉ＝１０でなければ（Ｓ１０２のＮ）、生成部９０は第１提示情報を生成する（Ｓ１０４）。生成部９０はｉをインクリメントする（Ｓ１０６）。ｉ＝１０であれば（Ｓ１０２のＹ）、生成部９０は第２提示情報を生成する（Ｓ１０８）。生成部９０はｉ＝１と設定する（Ｓ１１０）。終了でなければ（Ｓ１１２のＮ）、ステップ１０２に戻る。終了でなければ（Ｓ１１２のＹ）、終了する。 The operation of the driving support device 40 having the above configuration will be described. FIG. 8 is a flowchart showing a generation procedure by the control unit 41 according to the second embodiment. The generation unit 90 sets i = 1 (S100). If i is not 10 (N in S102), the generation unit 90 generates first presentation information (S104). The generation unit 90 increments i (S106). If i = 10 (Y in S102), the generation unit 90 generates second presentation information (S108). The generation unit 90 sets i = 1 (S110). If not completed (N in S112), the process returns to Step 102. If not finished (Y in S112), the process is finished.

図９は、実施の形態３に係る制御部４１による生成手順を示すフローチャートである。操作信号入力部５０に第１操作信号あるいは第２操作信号が入力される（Ｓ１５０）。第１操作信号が入力された場合（Ｓ１５２のＹ）、学習部７４は、選択された運転行動に正の報酬「＋β」を付与する（Ｓ１５４）。第１提示情報が入力されていない場合（Ｓ１５２のＮ）、学習部７４は、選択された運転行動に正の報酬「＋β」を付与し（Ｓ１５６）、他の運転行動のうち選択された運転行動より高い信頼度でシステムが推定した運転行動に負の報酬「−β」を付与する（Ｓ１５８）。 FIG. 9 is a flowchart illustrating a generation procedure by the control unit 41 according to the third embodiment. The first operation signal or the second operation signal is input to the operation signal input unit 50 (S150). When the first operation signal is input (Y in S152), the learning unit 74 gives a positive reward “+ β” to the selected driving action (S154). When the first presentation information is not input (N in S152), the learning unit 74 gives a positive reward “+ β” to the selected driving action (S156), and the selected driving action among the other driving actions is selected. A negative reward “−β” is given to the driving behavior estimated by the system with higher reliability than the behavior (S158).

本実施の形態によれば、信頼度の低い運転行動を含めた第２提示情報を所定の頻度で出力するので、信頼度の低い運転行動の選択機会を増加できる。また、信頼度の低い運転行動の選択機会が増加するので、当該運転行動が乗員の意図で選択されなかったか否かを区別できる。また、当該運転行動が乗員の意図で選択されなかったか否かが区別されるので、乗員の意図を認識できる。また、乗員の意図が認識されるので、乗員の意図を反映するような運転行動を導出できる。また、選択部において選択される運転行動として、信頼度の低い運転行動を配置させるので、当該運転行動を乗員が選択するか否かによって、乗員の積極的な意志を認識できる。また、選択部において選択される運転行動以外として、信頼度の低い運転行動を配置させるので、信頼度の高い運転行動を選択部に選択させることができる。また、信頼度の低い運転行動を含めた第２提示情報を所定の頻度で出力するので、乗員の意図を反映するような運転行動を実行できる。 According to the present embodiment, since the second presentation information including the driving behavior with low reliability is output at a predetermined frequency, it is possible to increase the selection opportunities for the driving behavior with low reliability. Moreover, since the selection opportunity of driving behavior with low reliability increases, it can be distinguished whether or not the driving behavior was not selected by the occupant's intention. Further, since it is distinguished whether or not the driving behavior is not selected by the occupant's intention, the occupant's intention can be recognized. In addition, since the occupant's intention is recognized, driving behavior that reflects the occupant's intention can be derived. In addition, since the driving behavior with low reliability is arranged as the driving behavior selected by the selection unit, it is possible to recognize the positive will of the occupant depending on whether the occupant selects the driving behavior. In addition, since the driving behavior with low reliability is arranged other than the driving behavior selected in the selection unit, the driving behavior with high reliability can be selected by the selection unit. In addition, since the second presentation information including the driving behavior with low reliability is output at a predetermined frequency, the driving behavior reflecting the passenger's intention can be executed.

以上、本発明に係る実施の形態について図面を参照して詳述してきたが、上述した装置や各処理部の機能は、コンピュータプログラムにより実現されうる。上述した機能をプログラムにより実現するコンピュータは、キーボードやマウス、タッチパッドなどの入力装置、ディスプレイやスピーカなどの出力装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＯＭ、ＲＡＭ、ハードディスク装置やＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）などの記憶装置、ＤＶＤ−ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＵＳＢメモリなどの記録媒体から情報を読み取る読取装置、ネットワークを介して通信を行うネットワークカードなどを備え、各部はバスにより接続される。 As mentioned above, although embodiment concerning this invention has been explained in full detail with reference to drawings, the function of the apparatus mentioned above and each processing part may be realized by a computer program. A computer that realizes the above-described functions by a program includes an input device such as a keyboard, mouse, and touch pad, an output device such as a display and a speaker, a CPU (Central Processing Unit), a ROM, a RAM, a hard disk device, and an SSD (Solid State Drive). Storage device such as a DVD-ROM (Digital Versatile Disk Read Only Memory), a reading device that reads information from a recording medium such as a USB memory, a network card that communicates via a network, etc., and each part is connected by a bus .

また、読取装置は、上記プログラムを記録した記録媒体からそのプログラムを読み取り、記憶装置に記憶させる。あるいは、ネットワークカードが、ネットワークに接続されたサーバ装置と通信を行い、サーバ装置からダウンロードした上記各装置の機能を実現するためのプログラムを記憶装置に記憶させる。また、ＣＰＵが、記憶装置に記憶されたプログラムをＲＡＭにコピーし、そのプログラムに含まれる命令をＲＡＭから順次読み出して実行することにより、上記各装置の機能が実現される。 The reading device reads the program from the recording medium on which the program is recorded, and stores the program in the storage device. Or a network card communicates with the server apparatus connected to the network, and memorize | stores the program for implement | achieving the function of said each apparatus downloaded from the server apparatus in a memory | storage device. Further, the function of each device is realized by the CPU copying the program stored in the storage device to the RAM and sequentially reading out and executing the instructions included in the program from the RAM.

本発明の一態様の概要は、次の通りである。
（項目１−１）
運転行動モデルを用いた推定結果である複数種類の運転行動が示された提示情報を生成する生成部と、
前記生成部において生成した提示情報を報知装置に出力する提示情報出力部と、
前記報知装置から報知された提示情報に対して乗員が選択した１つの運転行動を示す操作信号が入力される操作信号入力部と、
前記操作信号入力部に操作信号が未入力である場合、複数種類の運転行動のうちの１つの運転行動を選択する選択部と、
前記操作信号入力部に操作信号が入力された場合、当該操作信号において示された１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新し、前記操作信号入力部に操作信号が未入力である場合、前記選択部において選択した１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とを備え、
前記学習部は、前記操作信号入力部に操作信号が未入力である場合の重み付けの値よりも、前記操作信号入力部に操作信号が入力された場合の重み付けの値を大きくすることを特徴とする運転支援装置。 The outline of one embodiment of the present invention is as follows.
(Item 1-1)
A generation unit that generates presentation information indicating a plurality of types of driving behavior, which is an estimation result using the driving behavior model;
A presentation information output unit that outputs the presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives an operation signal indicating one driving action selected by the occupant with respect to the presentation information notified from the notification device;
When an operation signal is not input to the operation signal input unit, a selection unit that selects one driving action among a plurality of types of driving actions;
When an operation signal is input to the operation signal input unit, the driving behavior model is updated by performing learning while weighting one driving behavior indicated in the operation signal, and the operation signal input unit A learning unit that updates a driving behavior model by performing learning while weighting one driving behavior selected in the selection unit when an operation signal is not input;
The learning unit increases a weighting value when an operation signal is input to the operation signal input unit, than a weighting value when an operation signal is not input to the operation signal input unit. Driving assistance device.

この態様によると、操作信号が入力されない場合の重み付けの値よりも、操作信号が入力された場合の重み付けの値を大きくするので、乗員の意図を反映するような運転行動を導出できる。 According to this aspect, since the weighting value when the operation signal is input is made larger than the weighting value when the operation signal is not input, it is possible to derive driving behavior that reflects the occupant's intention.

（項目１−２）
前記学習部は、前記操作信号入力部に操作信号が入力された場合、当該操作信号において示された１つの運転行動以外の運転行動に負の重み付けを実行しながら学習を実行することを特徴とする項目１−１に記載の運転支援装置。
この場合、乗員に選択された１つの運転行動以外の運転行動に負の重み付けを実行するので、乗員が選択しなかった運転行動の信頼度を低くできる。 (Item 1-2)
When the operation signal is input to the operation signal input unit, the learning unit performs learning while performing negative weighting on a driving action other than one driving action indicated in the operation signal. The driving support device according to item 1-1.
In this case, since negative weighting is performed on driving actions other than one driving action selected by the occupant, the reliability of driving actions not selected by the occupant can be reduced.

（項目１−３）
前記生成部は、複数種類の運転行動のうち、一部の運転行動が示された提示情報を生成し、
前記学習部は、前記操作信号入力部に操作信号が入力された場合、前記生成部において生成した提示情報に含まれた一部の運転行動のうち、当該操作信号において示された１つの運転行動以外の運転行動に負の重み付けを実行することを特徴とする項目１−２に記載の運転支援装置。
この場合、提示情報に含まれた一部の運転行動のうち、選択されなかった運転行動に負の重み付けを実行するので、乗員が選択しなかった運転行動の信頼度を低くできる。 (Item 1-3)
The generation unit generates presentation information indicating some driving behaviors among a plurality of types of driving behaviors,
The learning unit, when an operation signal is input to the operation signal input unit, out of a part of driving behavior included in the presentation information generated in the generation unit, one driving behavior indicated in the operation signal The driving support device according to item 1-2, wherein negative weighting is performed on driving behavior other than the above.
In this case, since the negative weighting is performed on the driving behavior that has not been selected among some of the driving behaviors included in the presentation information, the reliability of the driving behavior not selected by the occupant can be reduced.

（項目１−４）
運転行動モデルを用いた推定結果である複数種類の運転行動が示された提示情報を生成する生成部と、
前記生成部において生成した提示情報を報知装置に出力する提示情報出力部と、
前記報知装置から報知された提示情報に対して乗員が選択した１つの運転行動を示す操作信号が入力される操作信号入力部と、
前記操作信号入力部に操作信号が未入力である場合、複数種類の運転行動のうちの１つの運転行動を選択する選択部と、
前記操作信号入力部に操作信号が入力された場合、当該操作信号において示された１つの運転行動をもとに車両の自動運転を制御し、前記操作信号入力部に操作信号が未入力である場合、前記選択部において選択した１つの運転行動をもとに車両の自動運転を制御する自動運転制御部と、
前記操作信号入力部に操作信号が入力された場合、当該操作信号において示された１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新し、前記操作信号入力部に操作信号が未入力である場合、前記選択部において選択した１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とを備え、
前記学習部は、前記操作信号入力部に操作信号が未入力である場合の重み付けの値よりも、前記操作信号入力部に操作信号が入力された場合の重み付けの値を大きくすることを特徴とする自動運転制御装置。 (Item 1-4)
A generation unit that generates presentation information indicating a plurality of types of driving behavior, which is an estimation result using the driving behavior model;
A presentation information output unit that outputs the presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives an operation signal indicating one driving action selected by the occupant with respect to the presentation information notified from the notification device;
When an operation signal is not input to the operation signal input unit, a selection unit that selects one driving action among a plurality of types of driving actions;
When an operation signal is input to the operation signal input unit, automatic driving of the vehicle is controlled based on one driving action indicated in the operation signal, and no operation signal is input to the operation signal input unit An automatic driving control unit that controls automatic driving of the vehicle based on one driving action selected by the selection unit;
When an operation signal is input to the operation signal input unit, the driving behavior model is updated by performing learning while weighting one driving behavior indicated in the operation signal, and the operation signal input unit A learning unit that updates a driving behavior model by performing learning while weighting one driving behavior selected in the selection unit when an operation signal is not input;
The learning unit increases a weighting value when an operation signal is input to the operation signal input unit, than a weighting value when an operation signal is not input to the operation signal input unit. Automatic operation control device.

この態様によると、操作信号が入力されない場合の重み付けの値よりも、操作信号が入力された場合の重み付けの値を大きくするので、乗員の意図を反映するような運転行動を実行できる。 According to this aspect, since the weighting value when the operation signal is input is made larger than the weighting value when the operation signal is not input, it is possible to execute the driving action that reflects the occupant's intention.

（項目１−５）
運転支援装置を備える車両であって、
前記運転支援装置は、
運転行動モデルを用いた推定結果である複数種類の運転行動が示された提示情報を生成する生成部と、
前記生成部において生成した提示情報を報知装置に出力する提示情報出力部と、
前記報知装置から報知された提示情報に対して乗員が選択した１つの運転行動を示す操作信号が入力される操作信号入力部と、
前記操作信号入力部に操作信号が未入力である場合、複数種類の運転行動のうちの１つの運転行動を選択する選択部と、
前記操作信号入力部に操作信号が入力された場合、当該操作信号において示された１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新し、前記操作信号入力部に操作信号が未入力である場合、前記選択部において選択した１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とを備え、
前記学習部は、前記操作信号入力部に操作信号が未入力である場合の重み付けの値よりも、前記操作信号入力部に操作信号が入力された場合の重み付けの値を大きくすることを特徴とする車両。 (Item 1-5)
A vehicle equipped with a driving support device,
The driving support device includes:
A generation unit that generates presentation information indicating a plurality of types of driving behavior, which is an estimation result using the driving behavior model;
A presentation information output unit that outputs the presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives an operation signal indicating one driving action selected by the occupant with respect to the presentation information notified from the notification device;
When an operation signal is not input to the operation signal input unit, a selection unit that selects one driving action among a plurality of types of driving actions;
When an operation signal is input to the operation signal input unit, the driving behavior model is updated by performing learning while weighting one driving behavior indicated in the operation signal, and the operation signal input unit A learning unit that updates a driving behavior model by performing learning while weighting one driving behavior selected in the selection unit when an operation signal is not input;
The learning unit increases a weighting value when an operation signal is input to the operation signal input unit, than a weighting value when an operation signal is not input to the operation signal input unit. Vehicle.

（項目１−６）
運転行動モデルを用いた推定結果である複数種類の運転行動が示された提示情報を生成するステップと、
生成した提示情報を報知装置に出力するステップと、
前記報知装置から報知された提示情報に対して乗員が選択した１つの運転行動を示す操作信号が入力されるステップと、
操作信号が未入力である場合、複数種類の運転行動のうちの１つの運転行動を選択するステップと、
操作信号が入力された場合、当該操作信号において示された１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップと、
操作信号が未入力である場合、前記選択するステップにおいて選択した１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップとを備え、
操作信号が未入力である場合の重み付けの値よりも、操作信号が入力された場合の重み付けの値を大きくすることを特徴とする運転支援方法。 (Item 1-6)
Generating presentation information indicating a plurality of types of driving behavior that is an estimation result using the driving behavior model;
Outputting the generated presentation information to a notification device;
An operation signal indicating one driving action selected by the occupant with respect to the presentation information notified from the notification device; and
When an operation signal is not input, a step of selecting one driving action among a plurality of types of driving actions;
When an operation signal is input, updating the driving behavior model by performing learning while weighting one driving behavior indicated in the operation signal; and
A step of updating the driving behavior model by performing learning while weighting the one driving behavior selected in the selecting step when the operation signal is not inputted,
A driving support method, wherein a weighting value when an operation signal is input is made larger than a weighting value when an operation signal is not input.

（項目１−７）
運転行動モデルを用いた推定結果である複数種類の運転行動が示された提示情報を生成するステップと、
生成した提示情報を報知装置に出力するステップと、
前記報知装置から報知された提示情報に対して乗員が選択した１つの運転行動を示す操作信号が入力されるステップと、
操作信号が未入力である場合、複数種類の運転行動のうちの１つの運転行動を選択するステップと、
操作信号が入力された場合、当該操作信号において示された１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップと、
操作信号が未入力である場合、前記選択するステップにおいて選択した１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップとを備え、
操作信号が未入力である場合の重み付けの値よりも、操作信号が入力された場合の重み付けの値を大きくすることをコンピュータに実行させるためのプログラム。 (Item 1-7)
Generating presentation information indicating a plurality of types of driving behavior that is an estimation result using the driving behavior model;
Outputting the generated presentation information to a notification device;
An operation signal indicating one driving action selected by the occupant with respect to the presentation information notified from the notification device; and
When an operation signal is not input, a step of selecting one driving action among a plurality of types of driving actions;
When an operation signal is input, updating the driving behavior model by performing learning while weighting one driving behavior indicated in the operation signal; and
A step of updating the driving behavior model by performing learning while weighting the one driving behavior selected in the selecting step when the operation signal is not inputted,
A program for causing a computer to execute a weighting value when an operation signal is input larger than a weighting value when an operation signal is not input.

（項目１−８）
運転行動モデルを用いた推定結果である複数種類の運転行動が示された提示情報を生成する生成部と、前記生成部において生成した提示情報を出力する提示情報出力部とを備える運転支援装置と、
前記運転支援装置から出力された提示情報を報知する報知装置とを備え、
前記運転支援装置は、
前記報知装置から報知された提示情報に対して乗員が選択した１つの運転行動を示す操作信号が入力される操作信号入力部と、
前記操作信号入力部に操作信号が未入力である場合、複数種類の運転行動のうちの１つの運転行動を選択する選択部と、
前記操作信号入力部に操作信号が入力された場合、当該操作信号において示された１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新し、前記操作信号入力部に操作信号が未入力である場合、前記選択部において選択した１つの運転行動に重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とをさらに備え、
前記学習部は、前記操作信号入力部に操作信号が未入力である場合の重み付けの値よりも、前記操作信号入力部に操作信号が入力された場合の重み付けの値を大きくすることを特徴とする提示システム。 (Item 1-8)
A driving support apparatus comprising: a generation unit that generates presentation information indicating a plurality of types of driving behaviors that are estimation results using a driving behavior model; and a presentation information output unit that outputs the presentation information generated in the generation unit; ,
A notification device that notifies the presentation information output from the driving support device;
The driving support device includes:
An operation signal input unit that receives an operation signal indicating one driving action selected by the occupant with respect to the presentation information notified from the notification device;
When an operation signal is not input to the operation signal input unit, a selection unit that selects one driving action among a plurality of types of driving actions;
When an operation signal is input to the operation signal input unit, the driving behavior model is updated by performing learning while weighting one driving behavior indicated in the operation signal, and the operation signal input unit A learning unit that updates the driving behavior model by performing learning while weighting one driving behavior selected by the selection unit when the operation signal is not input;
The learning unit increases a weighting value when an operation signal is input to the operation signal input unit, than a weighting value when an operation signal is not input to the operation signal input unit. To present system.

（項目２−１）
運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成する生成部と、
前記生成部において生成した第１提示情報を報知装置に出力する提示情報出力部と、
前記報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力される操作信号入力部と、
前記操作信号入力部に入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とを備え、
前記生成部は、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成し、
前記提示情報出力部は、前記生成部において生成した第２提示情報を前記報知装置に出力し、
前記操作信号入力部には、前記報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力され、
前記学習部は、前記操作信号入力部に入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新することを特徴とする運転支援装置。 (Item 2-1)
A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated A generator to
A presentation information output unit that outputs the first presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
A learning unit that updates a driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal input to the operation signal input unit;
The generating unit, instead of a part of the predetermined number of driving actions to be included in the first presentation information, substitutes the second presentation information including driving actions with low reliability in place of the first presentation information at a predetermined frequency. Generate and
The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device,
The operation signal input unit receives a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device,
The learning unit performs positive weighting on one driving action indicated in the second operation signal input to the operation signal input unit, and is negative with respect to other driving actions included in the second presentation information. A driving support apparatus that updates a driving behavior model by performing learning while performing weighting.

この態様によると、信頼度の低い運転行動を含めた第２提示情報を所定の頻度で出力するので、乗員の意図を反映するような運転行動を導出できる。 According to this aspect, since the second presentation information including the driving behavior with low reliability is output at a predetermined frequency, it is possible to derive the driving behavior reflecting the passenger's intention.

（項目２−２）
前記操作信号入力部に第１操作信号が未入力である場合、第１提示情報に含まれた１つの運転行動を選択し、前記操作信号入力部に第２操作信号が未入力である場合、第２提示情報に含まれた１つの運転行動を選択する選択部をさらに備え、
前記選択部は、第１提示情報あるいは第２提示情報に含まれた所定数の運転行動のうち、予め定められた順番に配置された１つの運転行動を選択し、
前記生成部は、第２提示情報を生成する際、前記予め定められた順番に、信頼度の低い運転行動を配置させることを特徴とする項目２−１に記載の運転支援装置。
この場合、選択部において選択される運転行動として、信頼度の低い運転行動を配置させるので、乗員の積極的な意志を認識できる。 (Item 2-2)
When the first operation signal is not input to the operation signal input unit, one driving action included in the first presentation information is selected, and when the second operation signal is not input to the operation signal input unit, A selection unit for selecting one driving action included in the second presentation information;
The selection unit selects one driving action arranged in a predetermined order from among a predetermined number of driving actions included in the first presentation information or the second presentation information,
The driving support device according to Item 2-1, wherein the generation unit arranges driving actions with low reliability in the predetermined order when generating the second presentation information.
In this case, since the driving behavior with low reliability is arranged as the driving behavior selected by the selection unit, it is possible to recognize the positive will of the occupant.

（項目２−３）
前記操作信号入力部に第１操作信号が未入力である場合、第１提示情報に含まれた１つの運転行動を選択し、前記操作信号入力部に第２操作信号が未入力である場合、第２提示情報に含まれた１つの運転行動を選択する選択部をさらに備え、
前記選択部は、第１提示情報あるいは第２提示情報に含まれた所定数の運転行動のうち、予め定められた順番に配置された１つの運転行動を選択し、
前記生成部は、第２提示情報を生成する際、前記予め定められた順番以外の順番に、信頼度の低い運転行動を配置させることを特徴とする項目２−１に記載の運転支援装置。
この場合、選択部において選択される運転行動以外として、信頼度の低い運転行動を配置させるので、信頼度の高い運転行動を選択部に選択させることができる。 (Item 2-3)
When the first operation signal is not input to the operation signal input unit, one driving action included in the first presentation information is selected, and when the second operation signal is not input to the operation signal input unit, A selection unit for selecting one driving action included in the second presentation information;
The selection unit selects one driving action arranged in a predetermined order from among a predetermined number of driving actions included in the first presentation information or the second presentation information,
The driving support device according to Item 2-1, wherein the generating unit arranges driving behaviors with low reliability in an order other than the predetermined order when generating the second presentation information.
In this case, since the driving behavior with low reliability is arranged in addition to the driving behavior selected by the selection unit, the driving behavior with high reliability can be selected by the selection unit.

（項目２−４）
運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成する生成部と、
前記生成部において生成した第１提示情報を報知装置に出力する提示情報出力部と、
前記報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力される操作信号入力部と、
前記操作信号入力部に入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部と、
１つの運転行動をもとに、車両の自動運転を制御する自動運転制御部とを備え、
前記生成部は、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成し、
前記提示情報出力部は、前記生成部において生成した第２提示情報を前記報知装置に出力し、
前記操作信号入力部には、前記報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力され、
前記学習部は、前記操作信号入力部に入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新することを特徴とする自動運転制御装置。 (Item 2-4)
A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated A generator to
A presentation information output unit that outputs the first presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
A learning unit that updates a driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal input to the operation signal input unit;
With an automatic driving control unit that controls the automatic driving of the vehicle based on one driving action,
The generating unit, instead of a part of the predetermined number of driving actions to be included in the first presentation information, substitutes the second presentation information including driving actions with low reliability in place of the first presentation information at a predetermined frequency. Generate and
The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device,
The operation signal input unit receives a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device,
The learning unit performs positive weighting on one driving action indicated in the second operation signal input to the operation signal input unit, and is negative with respect to other driving actions included in the second presentation information. An automatic driving control device that updates a driving behavior model by performing learning while executing weighting.

この態様によると、信頼度の低い運転行動を含めた第２提示情報を所定の頻度で出力するので、乗員の意図を反映するような運転行動を実行できる。 According to this aspect, since the second presentation information including the driving action with low reliability is output at a predetermined frequency, the driving action reflecting the passenger's intention can be executed.

（項目２−５）
運転支援装置を備える車両であって、
前記運転支援装置は、
運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成する生成部と、
前記生成部において生成した第１提示情報を報知装置に出力する提示情報出力部と、
前記報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力される操作信号入力部と、
前記操作信号入力部に入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とを備え、
前記生成部は、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成し、
前記提示情報出力部は、前記生成部において生成した第２提示情報を前記報知装置に出力し、
前記操作信号入力部には、前記報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力され、
前記学習部は、前記操作信号入力部に入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新することを特徴とする車両。 (Item 2-5)
A vehicle equipped with a driving support device,
The driving support device includes:
A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated A generator to
A presentation information output unit that outputs the first presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
A learning unit that updates a driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal input to the operation signal input unit;
The generating unit, instead of a part of the predetermined number of driving actions to be included in the first presentation information, substitutes the second presentation information including driving actions with low reliability in place of the first presentation information at a predetermined frequency. Generate and
The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device,
The operation signal input unit receives a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device,
The learning unit performs positive weighting on one driving action indicated in the second operation signal input to the operation signal input unit, and is negative with respect to other driving actions included in the second presentation information. A vehicle that updates a driving behavior model by performing learning while performing weighting.

（項目２−６）
運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成するステップと、
生成した第１提示情報を報知装置に出力するステップと、
前記報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力されるステップと、
入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップと、
第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成するステップと、
生成した第２提示情報を前記報知装置に出力するステップと、
前記報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力されるステップと、
入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップと、
を備えることを特徴とする運転支援方法。 (Item 2-6)
A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated And steps to
Outputting the generated first presentation information to a notification device;
A step of inputting a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
Updating the driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the input first operation signal;
Generating a second presentation information including a driving action with low reliability instead of a part of the predetermined number of driving actions to be included in the first presentation information at a predetermined frequency instead of the first presentation information;
Outputting the generated second presentation information to the notification device;
A step of inputting a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device;
By performing positive weighting on one driving action indicated in the input second operation signal and performing learning while performing negative weighting on other driving actions included in the second presentation information Updating the driving behavior model;
A driving support method comprising:

（項目２−７）
運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成するステップと、
生成した第１提示情報を報知装置に出力するステップと、
前記報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力されるステップと、
入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップと、
第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成するステップと、
生成した第２提示情報を前記報知装置に出力するステップと、
前記報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力されるステップと、
入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新するステップとをコンピュータに実行させるためのプログラム。 (Item 2-7)
A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated And steps to
Outputting the generated first presentation information to a notification device;
A step of inputting a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
Updating the driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the input first operation signal;
Generating a second presentation information including a driving action with low reliability instead of a part of the predetermined number of driving actions to be included in the first presentation information at a predetermined frequency instead of the first presentation information;
Outputting the generated second presentation information to the notification device;
A step of inputting a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device;
By performing positive weighting on one driving action indicated in the input second operation signal and performing learning while performing negative weighting on other driving actions included in the second presentation information A program for causing a computer to execute the step of updating the driving behavior model.

（項目２−８）
運転行動モデルを用いた推定結果である複数種類の運転行動のそれぞれの信頼度が高い順に所定数の運転行動を選択するとともに、選択した所定数の運転行動が示された第１提示情報を生成する生成部と、前記生成部において生成した第１提示情報を出力する提示情報出力部とを備える運転支援装置と、
前記運転支援装置から出力された提示情報を報知する報知装置とを備え、
前記運転支援装置は、
前記報知装置から報知された第１提示情報に対して乗員が選択した１つの運転行動を示す第１操作信号が入力される操作信号入力部と、
前記操作信号入力部に入力された第１操作信号において示された１つの運転行動に正の重み付けを実行しながら学習を実行することによって運転行動モデルを更新する学習部とをさらに備え、
前記生成部は、第１提示情報に含めるべき所定数の運転行動の一部の代わりに、信頼度の低い運転行動を含めた第２提示情報を、所定の頻度で第１提示情報の代わりに生成し、
前記提示情報出力部は、前記生成部において生成した第２提示情報を前記報知装置に出力し、
前記操作信号入力部には、前記報知装置から報知された第２提示情報に対して乗員が選択した１つの運転行動を示す第２操作信号が入力され、
前記学習部は、前記操作信号入力部に入力された第２操作信号において示された１つの運転行動に正の重み付けを実行するとともに、第２提示情報に含まれた他の運転行動に負の重み付けを実行しながら学習を実行することによって運転行動モデルを更新することを特徴とする提示システム。 (Item 2-8)
A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated A driving support device comprising: a generating unit that performs the presenting information output unit that outputs the first presentation information generated in the generating unit;
A notification device that notifies the presentation information output from the driving support device;
The driving support device includes:
An operation signal input unit that receives a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
A learning unit that updates the driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal input to the operation signal input unit;
The generating unit, instead of a part of the predetermined number of driving actions to be included in the first presentation information, substitutes the second presentation information including driving actions with low reliability in place of the first presentation information at a predetermined frequency. Generate and
The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device,
The operation signal input unit receives a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device,
The learning unit performs positive weighting on one driving action indicated in the second operation signal input to the operation signal input unit, and is negative with respect to other driving actions included in the second presentation information. A presentation system that updates a driving behavior model by performing learning while performing weighting.

以上、本発明を実施の形態をもとに説明した。これらの実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. It is understood by those skilled in the art that these embodiments are exemplifications, and that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. By the way.

実施の形態１、２において、運転行動推定部７０は運転支援装置４０の制御部４１に含まれる。しかしながらこれに限らず例えば、運転行動推定部７０は、自動運転制御装置３０の制御部３１に含まれてもよい。本変形例によれば、構成の自由度を向上できる。 In the first and second embodiments, the driving behavior estimation unit 70 is included in the control unit 41 of the driving support device 40. However, the present invention is not limited thereto, and for example, the driving behavior estimation unit 70 may be included in the control unit 31 of the automatic driving control device 30. According to this modification, the degree of freedom of configuration can be improved.

実施の形態１、２において、運転行動モデル８０は、運転行動学習部３１０において生成され、運転行動推定部７０に送信されている。しかしながらこれに限らず例えば、運転行動モデル８０は運転行動推定部７０にプリインストールされていてもよい。本変形例によれば、構成を簡易にできる。 In the first and second embodiments, the driving behavior model 80 is generated by the driving behavior learning unit 310 and transmitted to the driving behavior estimation unit 70. However, the present invention is not limited to this. For example, the driving behavior model 80 may be preinstalled in the driving behavior estimation unit 70. According to this modification, the configuration can be simplified.

実施の形態１、２において、運転行動学習部３１０は、運転支援装置４０に含まれてもよい。 In the first and second embodiments, the driving behavior learning unit 310 may be included in the driving support device 40.

実施の形態１、２において、運転行動推定部７０は、推定として、ニューラルネットワークを使用する深層学習により生成した運転行動モデルを用いている。しかしながらこれに限らず例えば、運転行動推定部７０は、深層学習以外の機械学習を用いた運転行動モデルを用いてもよい。深層学習以外の機械学習の一例は、ＳＶＭである。さらに、運転行動推定部７０は、統計処理により生成したフィルタを用いてもよい。フィルタの一例は、協調フィルタリングである。協調フィルタリングでは、各運転行動に対応した運転履歴あるいは走行履歴と、テストデータとの相関値を算出することによって、相関値の高い運転行動が選択される。相関値によって確からしさが示されているので、相関値は尤度ともいえ、信頼度に相当する。学習部７４は、信頼度として相関値に対する報酬を付与する。本変形例によれば、構成の自由度を向上できる。 In the first and second embodiments, the driving behavior estimation unit 70 uses a driving behavior model generated by deep learning using a neural network as an estimation. However, the present invention is not limited thereto, and for example, the driving behavior estimation unit 70 may use a driving behavior model using machine learning other than deep learning. An example of machine learning other than deep learning is SVM. Further, the driving behavior estimation unit 70 may use a filter generated by statistical processing. An example of a filter is collaborative filtering. In collaborative filtering, a driving action having a high correlation value is selected by calculating a correlation value between a driving history or a driving history corresponding to each driving action and test data. Since the certainty is indicated by the correlation value, the correlation value is also a likelihood and corresponds to the reliability. The learning unit 74 gives a reward for the correlation value as the reliability. According to this modification, the degree of freedom of configuration can be improved.

実施の形態１において、学習部７４は、手動選択状態である場合、選択されなかった運転行動に対して負の報酬を付与している。しかしながらこれに限らず例えば、学習部７４は、手動選択状態である場合、選択されなかった運転行動に対して負の報酬を付与しなくてもよい。本変形例によれば、選択された運転行動の信頼度と、選択されなかった運転行動の信頼度との差の増大を抑制できる。 In the first embodiment, the learning unit 74 gives a negative reward to the driving action that is not selected in the manual selection state. However, the present invention is not limited to this. For example, when the learning unit 74 is in the manual selection state, the negative reward may not be given to the driving action that is not selected. According to this modification, an increase in the difference between the reliability of the selected driving action and the reliability of the driving action that has not been selected can be suppressed.

実施の形態１、２の組合せも有効である。本変形例によれば、実施の形態１、２の組合せによる効果を得ることができる。 The combination of the first and second embodiments is also effective. According to this modification, the effect of the combination of the first and second embodiments can be obtained.

２報知装置、２ａヘッドアップディスプレイ、２ｂセンタディスプレイ、４入力装置、４ａ第１操作部、４ｂ第２操作部、４ｃ第３操作部、６スピーカ、８無線装置、１０運転操作部、１１ステアリング、１２ブレーキペダル、１３アクセルペダル、１４ウィンカスイッチ、２０検出部、２１位置情報取得部、２２センサ、２３速度情報取得部、２４地図情報取得部、３０自動運転制御装置、３１制御部、３２記憶部、３３Ｉ／Ｏ部、４０運転支援装置、４１制御部、４２記憶部、４３Ｉ／Ｏ部、５０操作信号入力部、５１画像・音声出力部、５２検出情報入力部、５３コマンドＩＦ、５４行動情報入力部、５５コマンド出力部、５６通信ＩＦ、７０運転行動推定部、７２表示制御部、７４学習部、８０運転行動モデル、８２推定部、８４ヒストグラム生成部、９０生成部、９２処理部、９４選択部、１００車両、３００サーバ、３０２ネットワーク、３１０運転行動学習部、５００運転支援システム。 2 Informing device, 2a Head-up display, 2b Center display, 4 Input device, 4a 1st operation part, 4b 2nd operation part, 4c 3rd operation part, 6 Speaker, 8 Radio | wireless apparatus, 10 Driving operation part, 11 Steering, 12 brake pedal, 13 accelerator pedal, 14 blinker switch, 20 detection unit, 21 position information acquisition unit, 22 sensor, 23 speed information acquisition unit, 24 map information acquisition unit, 30 automatic operation control device, 31 control unit, 32 storage unit , 33 I / O unit, 40 driving support device, 41 control unit, 42 storage unit, 43 I / O unit, 50 operation signal input unit, 51 image / sound output unit, 52 detection information input unit, 53 command IF, 54 Action information input part, 55 Command output part, 56 Communication I 70 driving behavior estimation unit, 72 display control unit, 74 learning unit, 80 driving behavior model, 82 estimation unit, 84 histogram generation unit, 90 generation unit, 92 processing unit, 94 selection unit, 100 vehicle, 300 server, 302 network 310 Driving behavior learning unit, 500 Driving support system.

Claims

A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated A generator to
A presentation information output unit that outputs the first presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
A learning unit that updates a driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal input to the operation signal input unit;
The generating unit, instead of a part of the predetermined number of driving actions to be included in the first presentation information, substitutes the second presentation information including driving actions with low reliability in place of the first presentation information at a predetermined frequency. Generate and
The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device,
The operation signal input unit receives a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device,
The learning unit performs positive weighting on one driving action indicated in the second operation signal input to the operation signal input unit, and is negative with respect to other driving actions included in the second presentation information. A driving support apparatus that updates a driving behavior model by performing learning while performing weighting.

When the first operation signal is not input to the operation signal input unit, one driving action included in the first presentation information is selected, and when the second operation signal is not input to the operation signal input unit, A selection unit for selecting one driving action included in the second presentation information;
The selection unit selects one driving action arranged in a predetermined order from among a predetermined number of driving actions included in the first presentation information or the second presentation information,
The driving support device according to claim 1, wherein when generating the second presentation information, the generating unit arranges driving behaviors with low reliability in the predetermined order.

When the first operation signal is not input to the operation signal input unit, one driving action included in the first presentation information is selected, and when the second operation signal is not input to the operation signal input unit, A selection unit for selecting one driving action included in the second presentation information;
The selection unit selects one driving action arranged in a predetermined order from among a predetermined number of driving actions included in the first presentation information or the second presentation information,
The driving support device according to claim 1, wherein when generating the second presentation information, the generation unit arranges driving behaviors with low reliability in an order other than the predetermined order.

A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated A generator to
A presentation information output unit that outputs the first presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
A learning unit that updates a driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal input to the operation signal input unit;
With an automatic driving control unit that controls the automatic driving of the vehicle based on one driving action,
The generating unit, instead of a part of the predetermined number of driving actions to be included in the first presentation information, substitutes the second presentation information including driving actions with low reliability in place of the first presentation information at a predetermined frequency. Generate and
The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device,
The operation signal input unit receives a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device,
The learning unit performs positive weighting on one driving action indicated in the second operation signal input to the operation signal input unit, and is negative with respect to other driving actions included in the second presentation information. An automatic driving control device that updates a driving behavior model by performing learning while executing weighting.

A vehicle equipped with a driving support device,
The driving support device includes:
A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated A generator to
A presentation information output unit that outputs the first presentation information generated in the generation unit to a notification device;
An operation signal input unit that receives a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
A learning unit that updates a driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal input to the operation signal input unit;
The generating unit, instead of a part of the predetermined number of driving actions to be included in the first presentation information, substitutes the second presentation information including driving actions with low reliability in place of the first presentation information at a predetermined frequency. Generate and
The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device,
The operation signal input unit receives a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device,
The learning unit performs positive weighting on one driving action indicated in the second operation signal input to the operation signal input unit, and is negative with respect to other driving actions included in the second presentation information. A vehicle that updates a driving behavior model by performing learning while performing weighting.

A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated And steps to
Outputting the generated first presentation information to a notification device;
A step of inputting a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
Updating the driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the input first operation signal;
Generating a second presentation information including a driving action with low reliability instead of a part of the predetermined number of driving actions to be included in the first presentation information at a predetermined frequency instead of the first presentation information;
Outputting the generated second presentation information to the notification device;
A step of inputting a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device;
By performing positive weighting on one driving action indicated in the input second operation signal and performing learning while performing negative weighting on other driving actions included in the second presentation information Updating the driving behavior model;
A driving support method comprising:

A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated And steps to
Outputting the generated first presentation information to a notification device;
A step of inputting a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
Updating the driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the input first operation signal;
Generating a second presentation information including a driving action with low reliability instead of a part of the predetermined number of driving actions to be included in the first presentation information at a predetermined frequency instead of the first presentation information;
Outputting the generated second presentation information to the notification device;
A step of inputting a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device;
By performing positive weighting on one driving action indicated in the input second operation signal and performing learning while performing negative weighting on other driving actions included in the second presentation information A program for causing a computer to execute the step of updating the driving behavior model.

A predetermined number of driving actions are selected in descending order of the reliability of each of a plurality of types of driving actions, which are estimation results using the driving action model, and first presentation information indicating the selected predetermined number of driving actions is generated A driving support device comprising: a generating unit that performs the presenting information output unit that outputs the first presentation information generated in the generating unit;
A notification device that notifies the presentation information output from the driving support device;
The driving support device includes:
An operation signal input unit that receives a first operation signal indicating one driving action selected by the occupant with respect to the first presentation information notified from the notification device;
A learning unit that updates the driving behavior model by performing learning while performing positive weighting on one driving behavior indicated in the first operation signal input to the operation signal input unit;
The generating unit, instead of a part of the predetermined number of driving actions to be included in the first presentation information, substitutes the second presentation information including driving actions with low reliability in place of the first presentation information at a predetermined frequency. Generate and
The presentation information output unit outputs the second presentation information generated by the generation unit to the notification device,
The operation signal input unit receives a second operation signal indicating one driving action selected by the occupant with respect to the second presentation information notified from the notification device,
The learning unit performs positive weighting on one driving action indicated in the second operation signal input to the operation signal input unit, and is negative with respect to other driving actions included in the second presentation information. A presentation system that updates a driving behavior model by performing learning while performing weighting.