JP2019106049A

JP2019106049A - Vehicle control device, risk map generation device, and program

Info

Publication number: JP2019106049A
Application number: JP2017238692A
Authority: JP
Inventors: 春昭郭; chun zhao Guo; 清澄城殿; Kiyosumi Shirodono
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2019-06-27
Anticipated expiration: 2037-12-13
Also published as: JP7020097B2

Abstract

To provide a vehicle control device capable of controlling a vehicle by human-like behavior.SOLUTION: A vehicle support control device 10 classifies traffic participants around a host vehicle according to an attribute and a state by a traffic participant classification part 30. A traffic environment risk map construction part 44 applies an actualized risk potential and a risk prevention potential according to the classification for each traffic participant around the host vehicle, and generates an integrated risk map integrating an actualized risk map and a latent risk map. A human-like route determination part 46 determines as the best action transition to or stop at a state where a large amount of reward is obtained by a reward function using the integrated risk map among the states corresponding to positions of the host vehicle on a plurality of route candidates.SELECTED DRAWING: Figure 2

Description

本発明は、車両制御装置、リスクマップ生成装置、及びプログラムに関する。 The present invention relates to a vehicle control device, a risk map generation device, and a program.

従来より、物標を右側の物標と左側の物標とに分類する物標分類手段と、リスクポテンシャルマップＭＲを、正の数値で表され、リスクポテンシャルが高いほど絶対値が大きくなるように作成する右側物標リスクポテンシャルマップ作成手段と、リスクポテンシャルマップＭＬを、リスクポテンシャル負の数値で表され、リスクポテンシャルが高いほど数値の絶対値が大きくなるように作成する左側物標リスクポテンシャルマップ作成手段と、総合リスクポテンシャルマップを作成する総合リスクポテンシャルマップ作成手段と、前記総合ポテンシャルマップにおいてリスクポテンシャルが予め設定された値となる経路を生成する経路生成手段とを備える経路生成装置が知られている（特許文献１）。 Conventionally, the target classification means for classifying targets into the target on the right side and the target on the left, and the risk potential map MR are represented by positive numbers so that the higher the risk potential, the larger the absolute value. The right target risk potential map creation means to be created and the risk potential map ML are represented by a risk potential negative number, and the left target risk potential map is created so that the absolute value of the number is larger as the risk potential is higher. A route generation apparatus is known which comprises: means; integrated risk potential map generation means for generating an integrated risk potential map; and path generation means for generating a path whose risk potential is a preset value in the integrated potential map. (Patent Document 1).

また、障害物に関する情報を検出する障害物検出手段と、自車両が障害物の現在位置に最接近するまでの余裕時間を算出する余裕時間算出手段と、余裕時間が経過したときの障害物の将来位置を推定する将来位置推定手段と、自車両が回避すべき回避対象領域を設定する回避対象領域設定手段と、回避対象領域を回避する走行経路を設定する走行経路設定手段と、を備え、回避対象領域設定手段は、将来位置及び将来位置の周辺位置における障害物との衝突リスクポテンシャルを評価し、衝突リスクポテンシャルに基づいて、回避対象領域を設定する。不必要な回避支援を削減することができる運転支援装置が知られている（特許文献２）。 In addition, obstacle detection means for detecting information related to an obstacle, margin time calculation means for calculating a margin time until the own vehicle approaches the current position of the obstacle most, obstacles when the margin time has elapsed Future position estimating means for estimating future position, avoidance target area setting means for setting an avoidance target area to be avoided by the vehicle, and travel route setting means for setting a traveling route for avoiding the avoidance target area The avoidance target area setting means evaluates the collision risk potential with the obstacle at the future position and the peripheral position of the future position, and sets the avoidance target area based on the collision risk potential. A driving support device capable of reducing unnecessary avoidance support is known (Patent Document 2).

また、自車両前方の障害物ＸＭに対する自車両ＭＭのリスクポテンシャルが、予め設定した第１の閾値Ｔｈ１より高く且つアクセルペダルが操作されていないと判定すると、自車両ＭＭに制動力を付与し、さらに、自車両前方の障害物ＸＭに対する自車両ＭＭのリスクポテンシャルが、第１の閾値Ｔｈ１よりもリスクポテンシャルが高い第２の閾値Ｔｈ２よりリスクポテンシャルが高いと判定すると、アクセルペダルの操作状態に関わらず自車両ＭＭに制動力を付与し、自車両前方の障害物ＸＭに対する運転者の支援を、運転者の意図に応じてより適切に実施する車両用制動支援装置が知られている（特許文献３）。 If it is determined that the risk potential of the host vehicle MM with respect to the obstacle XM ahead of the host vehicle is higher than a first threshold Th1 set in advance and the accelerator pedal is not operated, the host vehicle MM is given braking force. Furthermore, when it is determined that the risk potential of the host vehicle MM with respect to the obstacle XM ahead of the host vehicle is higher than the second threshold value Th2 where the risk potential is higher than the first threshold value Th1, There is known a vehicle braking support device that applies a braking force to the host vehicle MM and more appropriately performs the driver's assistance for the obstacle XM in front of the host vehicle according to the driver's intention (Patent Document 1 3).

特開2015-232866号公報JP, 2015-232866, A 特開2012-173786号公報JP 2012-173786 A 特開2015-71425号公報JP, 2015-71425, A

上記の特許文献１に記載の技術では、左右の隣接レーンのみにあるオブジェクトにフォーカスして分類しているため、異なる交通参加者が異なる運転状態にある、本当に複雑な状況には対処できない。また、人間／データ駆動のデータは、ヒューマンライク挙動の生成には使用されない。 The technology described in the above-mentioned Patent Document 1 focuses and classifies objects in only the left and right adjacent lanes, so it can not cope with a really complicated situation in which different traffic participants are in different driving states. Also, human / data-driven data is not used to generate human-like behavior.

また、上記の特許文献２に記載の技術では、衝突までの時間を、明らかなリスクの評価にだけ使用しているため、異なる交通参加者が異なる運転状態にある、本当に複雑な状況には対処できない。また、人間／データ駆動のデータは、ヒューマンライク挙動の生成には使用されない。 Also, the technology described in the above-mentioned Patent Document 2 uses the time until collision only for the evaluation of the obvious risk, thus dealing with a really complicated situation in which different traffic participants are in different driving states. Can not. Also, human / data-driven data is not used to generate human-like behavior.

また、上記の特許文献３に記載の技術では、自車両前方のオブジェクトにのみ注力しているため、異なる交通参加者が異なる運転状態にある、本当に複雑な状況には対処できない。また、人間／データ駆動のデータは、ヒューマンライク挙動の生成には使用されない。 In addition, the technology described in Patent Document 3 mentioned above focuses on only the object in front of the host vehicle, so it can not cope with a really complicated situation where different traffic participants are in different driving states. Also, human / data-driven data is not used to generate human-like behavior.

本発明は、上記の事情を鑑みてなされたもので、ヒューマンライク挙動により車両を制御することができる車両制御装置、リスクマップ生成装置、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above-described circumstances, and an object thereof is to provide a vehicle control device capable of controlling a vehicle by human-like behavior, a risk map generation device, and a program.

上記目的を達成するために、第１の発明の車両制御装置は、自車両および交通参加者の軌跡、位置、及び車線情報に基づいて、自車両の周囲の交通参加者を、属性及び状態に応じて分類する交通参加者分類部と、自車両の周囲の交通参加者の分類結果と、分類毎に予め学習された顕在リスクとに基づいて、自車両の周囲の交通参加者毎に、分類に応じた顕在リスクを当てはめて、顕在リスクマップを生成する顕在リスクマップ生成部と、自車両の複数の経路候補上の位置に対応する状態のうち、前記顕在リスクマップを用いた報酬関数により求められる報酬が多く得られる状態に遷移又は停止することを、最適な行動として決定する行動決定部と、前記決定された行動に従って自車両を制御する車両制御部と、を含んで構成されている。 In order to achieve the above object, the vehicle control device according to the first aspect attributes and states traffic participants around the vehicle based on the trajectory, position, and lane information of the vehicle and the traffic participant. According to the traffic participant classification unit according to the classification, the classification result of the traffic participants around the own vehicle, and the actual risks learned in advance for each classification, classification for each traffic participant around the own vehicle Of the actualized risk map generation unit that generates the actualized risk map by applying the actualized risk according to the condition, and the state corresponding to the position on the plurality of route candidates of the own vehicle, determined by the reward function using the actualized risk map The system comprises: an action determining unit that determines, as an optimal action, transition or stop to a state in which a large amount of reward is obtained, and a vehicle control unit that controls the own vehicle according to the determined action.

第２の発明に係るプログラムは、コンピュータを、自車両および交通参加者の軌跡、位置、及び車線情報に基づいて、自車両の周囲の交通参加者を、属性及び状態に応じて分類する交通参加者分類部、自車両の周囲の交通参加者の分類結果と、分類毎に予め学習された顕在リスクとに基づいて、自車両の周囲の交通参加者毎に、分類に応じた顕在リスクを当てはめて、顕在リスクマップを生成する顕在リスクマップ生成部、自車両の複数の経路候補上の位置に対応する状態のうち、前記顕在リスクマップを用いた報酬関数により求められる報酬が多く得られる状態に遷移又は停止することを、最適な行動として決定する行動決定部、及び前記決定された行動に従って自車両を制御する車両制御部として機能させるためのプログラムである。 A program according to a second aspect of the present invention classifies a computer according to attributes and states of traffic participants around the host vehicle based on the locus, position, and lane information of the host vehicle and the traffic participant. Based on classification of traffic participants around the host vehicle, classification results of traffic participants around the host vehicle, and actualized risks learned in advance for each category, apply the actualized risk according to the classification for each traffic participant around the host vehicle Of the actualized risk map generating unit for generating the actualized risk map, and among the states corresponding to the positions of the host vehicle on the plurality of route candidates, a state in which much reward obtained by the reward function using the actualized risk map is obtained It is a program for functioning as a behavior control part which determines transition or stop as optimal behavior, and a vehicle control part which controls a self-vehicle according to the determined behavior.

第１の発明及び第２の発明によれば、交通参加者分類部によって、自車両および交通参加者の軌跡、位置、及び車線情報に基づいて、自車両の周囲の交通参加者を、属性及び状態に応じて分類する。顕在リスクマップ生成部によって、と、自車両の周囲の交通参加者の分類結果と、分類毎に予め学習された顕在リスクとに基づいて、自車両の周囲の交通参加者毎に、分類に応じた顕在リスクを当てはめて、顕在リスクマップを生成する。 According to the first invention and the second invention, the traffic participant classification unit attributes the traffic participants around the vehicle based on the trajectory, the position, and the lane information of the vehicle and the traffic participant. Classification according to the state. According to the classification for each traffic participant around the own vehicle based on the classification result of the traffic participants around the own vehicle and the actualized risk learned in advance for each classification by the actualized risk map generation unit The actual risk map is generated by applying the actual risk.

そして、行動決定部によって、自車両の複数の経路候補上の位置に対応する状態のうち、前記顕在リスクマップを用いた報酬関数により求められる報酬が多く得られる状態に遷移又は停止することを、最適な行動として決定する。車両制御部によって、前記決定された行動に従って自車両を制御する。 And transitioning or stopping by the action determination unit to a state in which a large amount of reward obtained by the reward function using the actualized risk map can be obtained among the states corresponding to the positions on the plurality of route candidates of the host vehicle Decide as the best action. The vehicle control unit controls the own vehicle according to the determined action.

このように、自車両の周囲の交通参加者毎に、分類に応じた顕在リスクを当てはめて、顕在リスクマップを生成し、自車両の複数の経路候補上の位置に対応する状態のうち、顕在リスクマップを用いた報酬関数により求められる報酬が多く得られる状態に遷移又は停止することを、最適な行動として決定することにより、ヒューマンライク挙動により車両を制御することができる。 Thus, the actualized risk map is generated by applying the actualized risk according to the classification for each traffic participant around the host vehicle, and among the states corresponding to the positions of the host vehicle on the plurality of route candidates, the actual presence risk map is generated It is possible to control the vehicle with human-like behavior by determining, as the optimal behavior, transition or stop to a state where a large amount of reward obtained by the reward function using the risk map can be obtained.

第３の発明に係るリスクマップ生成装置は、自車両および交通参加者の軌跡、位置、及び車線情報に基づいて、自車両の周囲の交通参加者を、属性及び状態に応じて分類する交通参加者分類部と、自車両の周囲の交通参加者の分類結果に基づいて、分類毎に、顕在リスクを生成して、データベースに格納する顕在リスク学習部と、を含んで構成されている。 The risk map generation device according to the third invention classifies traffic participants around the host vehicle according to the attribute and the state based on the locus, position, and lane information of the host vehicle and the traffic participant. It comprises the person classification part, and the occurrence risk learning part which produces | generates an occurrence risk for every classification based on the classification result of the traffic participant around the own vehicle, and stores it in a database.

第４の発明に係るプログラムは、コンピュータを、自車両および交通参加者の軌跡、位置、及び車線情報に基づいて、自車両の周囲の交通参加者を、属性及び状態に応じて分類する交通参加者分類部、及び自車両の周囲の交通参加者の分類結果に基づいて、分類毎に、顕在リスクを生成して、データベースに格納する顕在リスク学習部として機能させるためのプログラムである。 A program according to a fourth aspect of the present invention classifies a computer according to attributes and states of traffic participants around the host vehicle based on the locus, position, and lane information of the host vehicle and the traffic participant. It is a program for causing the actualized risk to be generated for each classification based on the classification of the person classification unit and the traffic participants around the own vehicle and storing the actualized risk in the database.

第３の発明及び第４の発明によれば、交通参加者分類部によって、自車両および交通参加者の軌跡、位置、及び車線情報に基づいて、自車両の周囲の交通参加者を、属性及び状態に応じて分類する。そして、顕在リスク学習部によって、自車両の周囲の交通参加者の分類結果に基づいて、分類毎に、顕在リスクを生成して、データベースに格納する。 According to the third and fourth inventions, the traffic participant classification unit attributes the traffic participants around the host vehicle based on the trajectory, the position, and the lane information of the host vehicle and the traffic participant. Classification according to the state. Then, the actualized risk learning unit generates an actualized risk for each classification based on the classification result of traffic participants around the host vehicle, and stores the generated risk in the database.

このように、自車両の周囲の交通参加者を、属性及び状態に応じて分類し、分類毎に、顕在リスクを生成して、データベースに格納することにより、ヒューマンライク挙動により車両を制御するための顕在リスクを学習することができる。 In this way, traffic participants around the host vehicle are classified according to the attribute and the state, and the occurrence risk is generated for each classification and stored in the database to control the vehicle with human-like behavior. You can learn the actual risks of

なお、本発明のプログラムを記憶する記憶媒体は、特に限定されず、ハードディスクであってもよいし、ＲＯＭであってもよい。また、ＣＤ−ＲＯＭやＤＶＤディスク、光磁気ディスクやＩＣカードであってもよい。更にまた、該プログラムを、ネットワークに接続されたサーバ等からダウンロードするようにしてもよい。 The storage medium for storing the program of the present invention is not particularly limited, and may be a hard disk or a ROM. Also, it may be a CD-ROM, a DVD disk, a magneto-optical disk or an IC card. Furthermore, the program may be downloaded from a server or the like connected to a network.

以上説明したように、本発明の車両制御装置及びプログラムによれば、自車両の周囲の交通参加者毎に、分類に応じた顕在リスクを当てはめて、顕在リスクマップを生成し、自車両の複数の経路候補上の位置に対応する状態のうち、顕在リスクマップを用いた報酬関数により求められる報酬が多く得られる状態に遷移又は停止することを、最適な行動として決定することにより、ヒューマンライク挙動により車両を制御することができる、という効果が得られる。 As described above, according to the vehicle control apparatus and program of the present invention, the actualized risk map is generated by applying the actualized risk according to the classification to each traffic participant around the own vehicle, and a plurality of the own vehicles are generated. Human-like behavior by determining, as an optimal action, transitioning or stopping to a state where a lot of rewards obtained by a reward function using an actualized risk map can be obtained among states corresponding to positions on path candidates of Has the effect of being able to control the vehicle.

本発明のリスクマップ生成装置及びプログラムによれば、自車両の周囲の交通参加者を、属性及び状態に応じて分類し、分類毎に、顕在リスクを生成して、データベースに格納することにより、ヒューマンライク挙動により車両を制御するための顕在リスクを生成することができる、という効果が得られる。 According to the risk map generation device and program of the present invention, traffic participants around the host vehicle are classified according to the attribute and the state, and the occurrence risk is generated for each classification and stored in the database. The effect is that human-like behavior can generate an actualized risk for controlling a vehicle.

本発明の第１の実施の形態に係る運転支援制御システムを示すブロック図である。FIG. 1 is a block diagram showing a driving support control system according to a first embodiment of the present invention. 本発明の第１の実施の形態に係る運転支援制御装置を示すブロック図である。FIG. 1 is a block diagram showing a driving assistance control device according to a first embodiment of the present invention. 歩行者の分類の一例を示す図である。It is a figure showing an example of classification of a pedestrian. 自転車の分類の一例を示す図である。It is a figure showing an example of classification of a bicycle. ベイジアンネットワークモデルの一例を示す図である。It is a figure which shows an example of a Bayesian network model. 軌跡誘導ポテンシャルの一例を示す図である。It is a figure which shows an example of locus | trajectory induction | guidance | derivation potential. 従来手法のポテンシャルの例を示す図である。It is a figure which shows the example of the potential of the conventional method. 分類毎の軌跡誘導ポテンシャルの一例を示す図である。It is a figure which shows an example of the locus | trajectory induction | guidance | derivation potential for every classification. 分類毎のリスク予防ポテンシャルの一例を示す図である。It is a figure which shows an example of the risk preventive potential for every classification. ＭＤＰモデルの一例を示す図である。It is a figure which shows an example of a MDP model. 決定される最適な一連の行動の例を示す図である。It is a figure which shows the example of the optimal series of action determined. 学習された軌跡誘導ポテンシャルを統合して構築された顕在リスクマップの一例を示す図である。It is a figure which shows an example of the actualized risk map built by integrating the learned locus | trajectory induction | guidance | derivation potential. 学習されたリスク予防ポテンシャルを統合して構築された潜在リスクマップの一例を示す図である。It is a figure which shows an example of the latent risk map constructed by integrating the learned risk prevention potential. 本発明の第１の実施の形態に係る運転支援制御装置のコンピュータにおけるデータ収集処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the data collection process routine in the computer of the driving assistance control apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る運転支援制御装置のコンピュータにおけるオンライン処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the on-line processing routine in the computer of the driving assistance control apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る運転支援制御装置を示すブロック図である。It is a block diagram showing a driving support control device concerning a 2nd embodiment of the present invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本実施の形態の概要＞
都市交通の無限の状況を扱うためには、それらのすべてを学習することは不可能であるので、本発明の実施の形態では、道路の複雑さを、すべての動的な交通参加者に関する、分類の異なる個別事象の組み合わせに分解する。具体的には、交通参加者を、自動車、歩行者、バイク、及びその他の移動可能オブジェクトの何れかに分類する。また、自動車は、先行車両、駐車車両、最後尾の車両、流出車両、合流車両、障害になる車両、及びその他の車両に分類される。また、歩行者は、年齢に応じた分類(子供、老人、その他)、状態に応じた分類(停止中、歩行中、ランニング中)に分類される。また、自転車は、年齢に応じた分類(子供、老人、その他)、状態に応じた分類(停止、低速、高速)に分類される。バイクは、先行バイク、駐車中のバイク、最後尾バイク、流出バイク、合流バイク、障害になるバイク、及びその他のバイクに分類される。 <Overview of this embodiment>
Because it is impossible to learn all of them in order to deal with the endless situations of urban traffic, the embodiment of the present invention relates to the complexity of the road, to all the dynamic traffic participants, Break down into a combination of individual events with different classifications. Specifically, traffic participants are classified into any of a car, a pedestrian, a motorcycle, and other movable objects. In addition, automobiles are classified into leading vehicles, parked vehicles, last vehicles, outflow vehicles, merging vehicles, vehicles which become obstacles, and other vehicles. In addition, pedestrians are classified into classifications according to age (children, old people, etc.) and classifications according to conditions (stopping, walking, running). In addition, bicycles are classified according to age (children, old people, etc.) and according to conditions (stopped, low speed, high speed). The bikes are classified into leading bikes, parked bikes, trailing bikes, spill bikes, combined bikes, disabled bikes, and other bikes.

また、すべての分類を含むデータベースを作成し、各分類に対する軌跡誘導ポテンシャル及びリスク予防ポテンシャルを学習する。これが、クラウドベースのITSマップ／データベースにおけるナレッジ／セマンテックモデル／情報として役に立つ。 Also, create a database that includes all classifications, and learn the trajectory induction potential and risk prevention potential for each classification. This serves as knowledge / semantec model / information in cloud-based ITS maps / databases.

データ／データソースは、路側センサ、車載センサ、通信ユニット、軌跡、位置、画像、ポイントクラウドである。 Data / data sources are roadside sensors, onboard sensors, communication units, trajectories, positions, images, point clouds.

軌跡誘導ポテンシャルは、所望のローカル軌跡及び目標速度プロファイルを計算するためのものであり、リスク予防ポテンシャルは、安全許容幅（又は衝突までの時間）を増大させるための速度調節のためのものである。複数の事象（連鎖効果を含む）の上記ポテンシャルを統合する際に、ポテンシャルの影響を計算するために、交通参加者間の相互作用を考慮する。 The trajectory induction potential is for calculating the desired local trajectory and target velocity profile, and the risk prevention potential is for velocity adjustment to increase the safety margin (or time to collision) . In integrating the above potentials of multiple events (including chain effects), consider interactions between traffic participants in order to calculate the effects of the potential.

また、報酬関数の学習ポテンシャルを利用して定義したマルコフ決定過程モデルでヒューマンライク挙動を生成する。マルコフ決定過程モデルでは、状態、行動、報酬を定義する。状態は、ルートを追従する経路候補、レーン内で迂回する経路候補、及び衝突を回避する経路候補における各位置を表す。行動は、前進、経路変更、停止である。報酬は、時間と統合リスクマップに応じて定められる。 In addition, human-like behavior is generated with a Markov decision process model defined using the learning potential of the reward function. The Markov decision process model defines states, actions, and rewards. The state represents each position in a path candidate following a route, a path candidate bypassing in a lane, and a path candidate avoiding collision. Actions are forward, route change, stop. Compensation is determined according to time and integrated risk map.

[第１の実施の形態]
＜システム構成＞
図１に示すように、本発明の第１の実施の形態に係る運転支援制御システム１００は、各車両に搭載された運転支援制御装置１０と、サーバ５０とを備えている。運転支援制御装置１０と、サーバ５０とは、インターネットなどのネットワーク６０を介して接続されている。 First Embodiment
<System configuration>
As shown in FIG. 1, a driving support control system 100 according to the first embodiment of the present invention includes a driving support control device 10 mounted on each vehicle and a server 50. The driving support control device 10 and the server 50 are connected via the network 60 such as the Internet.

図２に示すように、運転支援制御装置１０は、交通参加者軌跡取得部１２と、自車位置情報取得部１４と、車線情報取得部１６と、画像取得部１８と、コンピュータ２０と、車両制御部２２とを備えている。 As shown in FIG. 2, the driving support control device 10 includes a traffic participant trajectory acquisition unit 12, a vehicle position information acquisition unit 14, a lane information acquisition unit 16, an image acquisition unit 18, a computer 20, and a vehicle. And a control unit 22.

車両制御部２２は、コンピュータ２０により計算された、ヨー角と、スロットル及びブレーキの何れか一方とを有する命令に基づいて、操舵制御、ブレーキ制御、又はアクセル制御を行う。 The vehicle control unit 22 performs steering control, brake control, or accelerator control based on a command having the yaw angle and either the throttle or the brake calculated by the computer 20.

交通参加者軌跡取得部１２は、自車両周辺の交通参加者の各々の軌跡を取得する。具体的には、車、歩行者、自転車、バイク及びその他の移動物体を含む、道路環境における交通参加者の軌跡を、次の2つの方法で取得する。１つ目の方法は、データ収集処理中に、自車両がプローブカーとなって、車載センサを用いて、交通参加者を検出して追跡し、得られた軌跡を蓄積する。もう1つの方法は、路側カメラ、ライダ（LIDAR）／レーダ（RADAR）などの路側センサと、自車両との間の通信ユニットを利用して、交通参加者の軌跡を取得するものである。軌跡情報には、位置、速度及び方向が時間スタンプとともに含まれる。 The traffic participant trajectory acquisition unit 12 acquires trajectories of traffic participants around the host vehicle. Specifically, trajectories of traffic participants in the road environment, including cars, pedestrians, bicycles, bikes and other moving objects, are obtained in two ways: In the first method, during the data collection process, the vehicle serves as a probe car and detects and tracks traffic participants using an on-vehicle sensor, and accumulates obtained trajectories. Another method is to acquire a trajectory of a traffic participant using a communication unit between a roadside camera, a roadside sensor such as a lidar (LIDAR) / radar (RADAR), and the host vehicle. Trajectory information includes position, velocity and direction along with time stamps.

自車位置情報取得部１４は、データ収集処理中に、自車両がプローブカーとなって、自車両の位置と軌跡を取得する。また、自車位置情報取得部１４は、オンライン処理中に、自車両の位置と軌跡を取得する。それらは自車両のセンサ又は路側センサによって取得可能であり、通信を介して自車両に送信される。 The host vehicle position information acquisition unit 14 acquires the position and the trajectory of the host vehicle as the host vehicle becomes a probe car during the data collection process. In addition, the vehicle position information acquisition unit 14 acquires the position and trajectory of the vehicle during online processing. They can be acquired by the sensor of the host vehicle or the roadside sensor and transmitted to the host vehicle via communication.

より具体的には、本発明の実施の形態で使用される自車両の車両位置は、通常の安価なセンサで非常に高精度の位置が取得可能な、次の非特許文献１に記載の方法で取得される。 More specifically, the vehicle position of the host vehicle used in the embodiment of the present invention is the method described in the following non-patent document 1 in which a highly accurate position can be obtained with a normal inexpensive sensor. Acquired by

[非特許文献１] Kojima. Yoshikoら、「GPSドップラ法による高精度軌跡推定に基づく密結合統合を利用した新測位法の提案」、Vehicle System Dynamics 50, no. 6, pp.: 987-1000, 2012. [Non-patent document 1] Kojima. Yoshiko et al., "Proposal of a new positioning method using tight coupling integration based on high precision trajectory estimation by GPS Doppler method", Vehicle System Dynamics 50, no. 6, pp .: 987-1000 , 2012.

車線情報取得部１６は、デジタルマップから走行レーン情報を取得する。より具体的には、本発明の実施の形態で使用されるレーン情報は、通常の安価なセンサで非常に高精度のレーンレベルのデジタルマップが取得可能な、次の非特許文献２に記載の方法で取得される。 The lane information acquisition unit 16 acquires traveling lane information from the digital map. More specifically, the lane information used in the embodiment of the present invention is described in the following non-patent document 2 in which a very inexpensive lane level digital map can be obtained with a normal inexpensive sensor. Acquired in the way.

[非特許文献２]Guo, Chunzhaoら、「通常の車内センサを用いた低コストの自動レーンレベルマップ作成方法」、IEEE Transactions on Intelligent Transportation Systems 17.8 (2016): 2355-2366 [Non-patent document 2] Guo, Chunzhao et al., "A low cost automatic lane level map creation method using a normal in-vehicle sensor", IEEE Transactions on Intelligent Transportation Systems 17.8 (2016): 2355-2366

画像取得部１８は、車載カメラを用いて画像を取得する。カメラはステレオカメラでも、単眼カメラであってもよい。画像と自車両位置が同時に取得される。 The image acquisition unit 18 acquires an image using an on-vehicle camera. The camera may be a stereo camera or a monocular camera. The image and the vehicle position are acquired simultaneously.

また、コンピュータ２０を機能ブロックで表すと、上記図２に示すように、交通参加者分類部３０と、交通環境顕在リスクマップ学習部３２と、交通環境潜在リスクマップ学習部３４と、交通参加者相互作用学習部３６と、ヒューマンライク意思決定学習部３８と、通信部４０とを備えている。なお、交通環境顕在リスクマップ学習部３２は、顕在リスク学習部の一例であり、交通環境潜在リスクマップ学習部３４は、潜在リスク学習部の一例であり、交通参加者相互作用学習部３６は、相互作用推定部の一例である。 Further, when the computer 20 is represented by functional blocks, as shown in FIG. 2, the traffic participant classification unit 30, the traffic environment actualized risk map learning unit 32, the traffic environment potential risk map learning unit 34, and the traffic participants The interaction learning unit 36, the human-like decision making learning unit 38, and the communication unit 40 are provided. The traffic environment actualized risk map learning unit 32 is an example of an actualized risk learning unit, the traffic environment potential risk map learning unit 34 is an example of a latent risk learning unit, and the traffic participant interaction learning unit 36 is It is an example of an interaction estimation part.

交通参加者分類部３０は、自車両および交通参加者の軌跡、位置、及び車線情報に基づいて、自車両の周囲の交通参加者を、属性及び状態に応じて分類する。 The traffic participant classification unit 30 classifies the traffic participants around the host vehicle according to the attribute and the state based on the trajectory, the position, and the lane information of the host vehicle and the traffic participant.

人間の運転者は、他の交通参加者がどこにいるかばかりでなく、それらがどのような運転／条件／分類であるかも見るものである。分類が異なれば、人間の運転者は異なる扱いをする。本実施の形態では、人間の運転者の運転能力及び方法の模倣を試みる。これが本発明の実施の形態のキーコンポーネントであり、本発明の実施の形態では、道路の複雑さを、すべての動的な交通参加者に対する分類の異なる個別事象の組合せに分解する。具体的には、周辺の交通参加者を以下のように車両、歩行者、バイク、及びその他の移動可能オブジェクトの何れかに分類する。 Human drivers will see not only where other traffic participants are, but also what kind of driving / condition / classification they are. If the classification is different, human drivers treat differently. In the present embodiment, an attempt is made to imitate the driving ability and method of a human driver. This is a key component of the embodiment of the present invention, which breaks down the complexity of the road into a combination of different individual events of classification for all dynamic traffic participants. Specifically, surrounding traffic participants are classified into any of vehicles, pedestrians, motorcycles and other movable objects as follows.

また、歩行者は、年齢に応じた分類(子供、老人、その他)、状態に応じた分類(停止中、歩行中、ランニング中)に分類される。また、自転車は、年齢に応じた分類(子供、老人、その他)、状態に応じた分類(停止、低速、高速)に分類される。バイクは、先行バイク、駐車中のバイク、最後尾バイク、流出バイク、合流バイク、障害になるバイク、及びその他のバイクに分類される。 In addition, pedestrians are classified into classifications according to age (children, old people, etc.) and classifications according to conditions (stopping, walking, running). In addition, bicycles are classified according to age (children, old people, etc.) and according to conditions (stopped, low speed, high speed). The bikes are classified into leading bikes, parked bikes, trailing bikes, spill bikes, combined bikes, disabled bikes, and other bikes.

車両は、先行車両、駐車車両、最後尾の車両、流出車両、合流車両、障害になる車両、及びその他の車両に分類される。 Vehicles are classified into leading vehicles, parked vehicles, trailing vehicles, outflow vehicles, merging vehicles, vehicles that become obstacles, and other vehicles.

ここで、先行車両は、自レーンを先行して移動中の車両であり、自車両と類似の軌跡及び進行方向を有している。安全条件下では自車両はそれを追随又は模倣すべきである。 Here, the preceding vehicle is a vehicle moving ahead of the own lane, and has a similar trajectory and traveling direction as the own vehicle. Under safe conditions the vehicle should follow or mimic it.

また、駐車車両は、自レーンの路側に駐車している先行の停止車両である。自車両は安全条件下でそれを滑らかに迂回しなければならない。 A parked vehicle is a preceding stop vehicle parked on the roadside of the own lane. Your vehicle must bypass it smoothly under safe conditions.

末尾車両は、自レーンの交通の末尾又は交通信号位置にある、先行する停止／徐行車両である。自車両はその後ろに停止し、それが移動し始めた場合には安全条件下で追跡しなければならない。 The tail vehicle is a leading stop / go vehicle that is at the tail of traffic on the own lane or at a traffic light position. The vehicle must stop behind it and track it under safe conditions if it starts to move.

流出車両は、ターンして自レーンから出ようとする、自レーン内の先行の停止／徐行車両である。自車両は、追随するか、又は安全条件下で自レーンの近傍の実行可能経路でそれを迂回する十分な余地ができるまで後ろで待たなければならない。 The outflow vehicle is a preceding stop / slow vehicle in the own lane that turns to exit the own lane. The vehicle must either wait or wait behind or under safe conditions until there is enough room around it on the feasible route near the lane.

合流車両は、自レーンの自車両の前にまさに入ろうとする、隣接レーンで停止／移動中の車両である。自車両は、減速度の大きさが閾値以下の場合、またその時に限って、徐行してその車の合流動作に十分な余地を与えなければならない。そうでなければ、自車両は正常に走行し、安全条件下で合流車両の次の動作に反応しなければならない。 A merging vehicle is a vehicle that is stopping / moving on an adjacent lane, which is about to enter in front of the host vehicle on the own lane. The host vehicle must slow down to give sufficient room for the merging operation of the vehicle if and only if the magnitude of the deceleration is below the threshold. Otherwise, the vehicle must travel normally and react to the next movement of the combined vehicle under safe conditions.

対向車両は、自レーンの隣のレーンを反対方向から来る移動車両である。自車両は、その車両が自車両に接触しそうな場合、又は自車両に非常に近接して衝突の可能性が高くなりそうな場合、そのような車両を回避する経路を計画しなければならない。 The oncoming vehicle is a moving vehicle that comes from the opposite direction to the lane next to the own lane. The host vehicle must plan a route to avoid such a vehicle if it is likely to touch the host vehicle or if the possibility of a collision is likely to be very close to the host vehicle.

他の車両は、上記の分類のいずれにも入らない車両である。自車両は、それとの物理的衝突を回避しつつ、正常に走行すべきである。 Other vehicles are vehicles that do not fall into any of the above categories. The host vehicle should travel normally while avoiding a physical collision with it.

歩行者は、図３に示すように２次元的に分類される。縦軸は歩行者の年齢を、横軸は歩行者の状態を表す。縦軸のリスクポテンシャルの大小関係は、子供＞老人＞その他、である。横軸のリスクポテンシャルの大小関係は、立ち止まり中＜歩行中＜ランニング中、である。さらに、これらの人が自レーンを横断又は跳び込んでくる確率は、その姿勢、顔／目の方向及びその動きの軌跡を検出することで評価される。 Pedestrians are classified in two dimensions as shown in FIG. The vertical axis represents the age of the pedestrian, and the horizontal axis represents the state of the pedestrian. The magnitude relationship of the risk potential on the vertical axis is child> old man> other. The magnitude relationship of the risk potential on the horizontal axis is: stopping <walking <running. Furthermore, the probability that these persons cross or jump in their own lane is evaluated by detecting the posture, the face / eye direction, and the movement trajectory.

自転車は、図４に示すように２次元的に分類される。縦軸は自転車に乗っている人の年齢を、横軸は自転車の状態を表す。縦軸のリスクポテンシャルの大小関係は、子供＞老人＞その他、である。横軸のリスクポテンシャルの大小関係は、停止中＜低速走行＜高速走行、である。さらに、これらの人が自レーンを横断又は跳び込んでくる確率は、その姿勢、顔／目の方向及びその動きの軌跡を検出することで評価される。 Bicycles are classified in two dimensions as shown in FIG. The vertical axis represents the age of the person on the bicycle, and the horizontal axis represents the state of the bicycle. The magnitude relationship of the risk potential on the vertical axis is child> old man> other. The magnitude relationship of the risk potential on the horizontal axis is: stop <low speed <high speed. Furthermore, the probability that these persons cross or jump in their own lane is evaluated by detecting the posture, the face / eye direction, and the movement trajectory.

バイクは、先行バイク、駐車中のバイク、末尾バイク、流出するバイク、合流するバイク、対向してくるバイク、その他のバイクに分類される。 The bikes are classified into leading bikes, parked bikes, trailing bikes, runaway bikes, merging bikes, oncoming bikes, and other bikes.

先行バイクは、自車両の前方を移動中のバイクである。ただし、自車両は先行バイクの模倣はしないで、そこから安全距離を維持するだけである。 The preceding bike is a bike moving in front of the host vehicle. However, the host vehicle does not imitate the leading bike but only maintains a safe distance from it.

駐車中のバイクは、自レーンの路側に駐車している先行のバイクである。駐車車両と同様に、自車両は安全条件下でそれを滑らかに迂回しなければならない。 The parked bike is a leading bike parked on the roadside of the own lane. Like a parked vehicle, the vehicle must bypass it smoothly under safe conditions.

末尾バイクは、自レーンの交通の末尾又は交通信号位置にある、先行する停止／徐行バイクである。末尾車両と同様に、自車両はその後ろに停止し、それが移動し始めた場合には安全条件下で追跡しなければならない。 The last bike is a leading stop / slow bike that is at the end of traffic in the traffic lane or at a traffic light position. As with the end vehicle, the own vehicle must stop behind it and track it under safe conditions if it begins to move.

流出するバイクは、ターンして自レーンから出ようとする、自レーン内の先行の停止／徐行バイクである。流出車両と同様に、自車両は、追随するか、又は安全条件下で自レーンの近傍の実行可能経路でそれを迂回する十分な余地ができるまで後ろで待たなければならない。 The outflowing bike is the preceding stop / slow bike in the own lane that turns to leave the own lane. As with the spilled vehicle, the host vehicle must either follow or wait behind in safe conditions until there is sufficient room around it on a viable route near the host lane.

合流するバイクは、自レーンの自車両の前にまさに入ろうとする、隣接レーンで停止／移動中のバイクである。合流車両と同様に、自車両は、減速度の大きさが閾値以下の場合、またその時に限って、徐行してその車の合流動作に十分な余地を与えなければならない。そうでなければ、自車両は正常に走行し、安全条件下で合流バイクの次の動作に反応しなければならない。 The motorbikes that join are the ones that are stopping / moving on the adjacent lane, just trying to get in front of the host vehicle on the host lane. Similar to the merging vehicle, the own vehicle must slowly travel to give sufficient room for the merging operation of the vehicle if and only if the magnitude of the deceleration is below the threshold. Otherwise, the vehicle must travel normally and respond to the next movement of the joining bike under safe conditions.

対向してくるバイクは、自レーンの隣のレーンを反対方向から来る移動バイクである。対向車両と同様に、自車両は、その車両が自車両に接触しそうな場合、又は自車両に非常に近接して衝突の可能性が高くなりそうな場合、そのようなバイクを回避する経路を計画しなければならない。 The opposing bike is a moving bike coming from the opposite direction of the lane next to its own lane. As with oncoming vehicles, if the vehicle is likely to touch the vehicle or if the possibility of a collision is likely to be very close to the vehicle, a path that avoids such a bike is I have to plan.

その他のバイクは、上記の分類のいずれにも入らないバイクである。自車両は、それとの物理的衝突を回避しつつ、正常に走行すべきである。 Other bikes are bikes that do not fall into any of the above categories. The host vehicle should travel normally while avoiding a physical collision with it.

また、その他の移動可能オブジェクトは、物理的な衝突やその他の危険を回避するように対処しなければならない、交通環境内のその他のオブジェクトである。 Also, other movable objects are other objects in the traffic environment that must be addressed to avoid physical collisions and other hazards.

分類器は機械学習に基づいて取得される。本発明の実施の形態において、複数の特徴を有するベイジアンネットワーク（BN）モデルを構築する。例えば図５に示すような車の分類を例にとると、BNモデルにおける特徴変数は、運転速度D、交差点への至近性I、レーン内のオフセットO、軌跡T、信号S、前方自由空間F、レーンL、次のレーンＮ、及び一致度Aを含む。 A classifier is obtained based on machine learning. In the embodiments of the present invention, a Bayesian network (BN) model having a plurality of features is constructed. For example, taking the classification of a car as shown in FIG. 5 as an example, the feature variables in the BN model are: driving speed D, proximity to intersection I, offset O in lane, locus T, signal S, forward free space F , Lane L, next lane N, and degree of match A.

ここで、運転速度Dは、ガウス分布でモデル化される。交差点への至近性Iの値は｛真、偽｝のいずれかであり、これは車両と最近接交差点との間の距離で判定される。レーン内のオフセットOは、車両と自レーンの中心線との間の横方向距離で判定される。軌跡Tの値は｛長、短｝のいずれかであり、車両の軌跡の長さで判定される。信号Sの値は｛ハザード、ターン、なし｝のうちの１つであり、検出車両においてどのシグナルが点灯しているかで判定される。前方自由空間Fの値は｛真、偽、不明｝のうちの１つであり、検出車両の位置と自レーンに関する自由空間境界を比較して判定される。レーンLの値は｛自レーン、隣接レーン、その他｝のうちの１つであり、検出車両が走行しているレーンで判定される。次のレーンＮの値は｛同じ、異なる｝のいずれかであり、ある時間後に、検出車両が自車両と同じレーンを走行しているか否かで判定される。一致度Aの値は｛高、低｝のいずれかであり、検出車両と自車両の軌跡の相互の一致度で判定される。 Here, the driving speed D is modeled by a Gaussian distribution. The value of proximity I to the intersection is either {true, false}, which is determined by the distance between the vehicle and the closest intersection. The offset O in the lane is determined by the lateral distance between the vehicle and the centerline of the own lane. The value of the trajectory T is either {long, short}, and is determined by the length of the trajectory of the vehicle. The value of the signal S is one of {hazard, turn, none} and is determined by which signal is on in the detection vehicle. The value of the front free space F is one of {true, false, unknown}, and is determined by comparing the position of the detected vehicle with the free space boundary regarding the own lane. The value of the lane L is one of {own lane, adjacent lane, etc.} and is determined on the lane on which the detected vehicle is traveling. The value of the next lane N is either {same, different}, and after a certain time, it is determined whether the detected vehicle is traveling in the same lane as the host vehicle. The value of the coincidence A is either {high or low}, and is determined by the mutual coincidence of the trajectories of the detected vehicle and the host vehicle.

したがって、車両分類の、本実施の形態のBNモデルの合同確率関数は次の因子形式で定義可能である。 Therefore, the joint probability function of the BN model of the present embodiment of vehicle classification can be defined by the following factor form.

上記の式における、各種類の交通参加者の各分類に対する各特徴の確率分布は、分類が既知の、同一状況における交通参加者の軌跡データから学習される。 The probability distribution of each feature for each classification of each type of traffic participant in the above equation is learned from the trajectory data of the traffic participant in the same situation where the classification is known.

交通参加者分類部３０は、検出された交通参加者のオンライン分類を行う際に、上記の特徴を画像から検出／計算し、それを用いて各分類の学習されたBNモデルよりスコアを計算し、最高スコアを有する分類を、検出された交通参加者の分類として決定する。 When performing online classification of the detected traffic participants, the traffic participant classification unit 30 detects / calculates the above features from the image, and uses it to calculate the score from the learned BN model of each classification. , Determine the classification with the highest score as the classification of detected traffic participants.

交通環境顕在リスクマップ学習部３２は、以下に説明するように、顕在リスクマップを生成するための軌跡誘導ポテンシャルを生成する。 The traffic environment actualized risk map learning unit 32 generates a trajectory induction potential for generating the actualized risk map as described below.

人間の運転者が、交通環境内に駐車中の車などのオブジェクトを見た場合には、それからどれくらい距離を取るべきかということではなく、それを処理するのに何をすべきか又はどの経路を通るべきかということを考える。そのような機構を模倣するために、運転データから、顕在リスクマップを表示するための軌跡誘導ポテンシャルを生成する。検出された（目視できる）各オブジェクト（例えば車両、歩行者、自転車、、、）のそれぞれの分類に対して、軌跡誘導ポテンシャルが生成される。これには、図６に示すように、衝突防止のための反発空間ポテンシャル、所望の軌跡を誘導するための吸引空間ポテンシャル、及び適切な目標速度を誘導するための速度ポテンシャルが含まれる。さらに、プローブカーが各分類の各種の交通参加者に対処するときの、図６の実線で示すような、自然な運転データでのその軌跡を学習する。そのようなポテンシャルは、周辺車両を安全かつ合理的に扱うことができるように、所望の決定、経路、及び速度を符号化する。 When a human driver sees an object, such as a car, parked in a traffic environment, it does not mean how far away from it, but what should be done to handle it or which route Think about what to pass. In order to mimic such a mechanism, from the driving data, a trajectory induction potential is generated to display the overt risk map. A trajectory induction potential is generated for each classification of each detected (visible) object (e.g. vehicle, pedestrian, bicycle, ...). This includes, as shown in FIG. 6, a repulsive space potential for collision prevention, a suction space potential for guiding a desired trajectory, and a velocity potential for guiding an appropriate target velocity. Furthermore, when the probe car copes with various traffic participants of each classification, it learns its locus in natural driving data as shown by the solid line in FIG. Such potentials encode the desired decisions, paths, and speeds so that surrounding vehicles can be handled safely and rationally.

従来方法では、反発力Ｕ_v ^repをオブジェクトへ、吸引力Ｕ_r ^attをルート又はレーンへ、そして、フィードバック力Ｕ^sを所望の速度に達するために割り当てる（図７参照）。それに対し本実施の形態では、交通環境顕在リスクマップ学習部３２は、検出車両の全てからその分類に従って合力を生成する。図８に示すように、反発力は衝突しないためであり、吸引力は所望の決定、経路、及び速度を符号化して所望の挙動を誘導するためのものである。これは日々の都市交通において人間の運転者により収集された、同一状況の実際の軌跡から学習される。 In the conventional method, the repulsive force U _v ^rep is assigned to the object, the suction force U _r ^att to the route or lane, and the feedback force U ^s to reach the desired velocity (see FIG. 7). On the other hand, in the present embodiment, the traffic environment actualized risk map learning unit 32 generates the resultant from all of the detected vehicles according to the classification. As shown in FIG. 8, the repulsion is to avoid collisions, and the attraction is to encode the desired decisions, paths, and velocities to derive the desired behavior. This is learned from the actual trajectories of the same situation, collected by human drivers in daily city traffic.

オンライン処理においては、軌跡誘導ポテンシャルを使用して、所望のローカル軌跡及び目標速度プロファイルが計算される。 In on-line processing, the trajectory induction potential is used to calculate the desired local trajectory and target velocity profile.

交通環境潜在リスクマップ学習部３４は、以下に説明するように、潜在リスクマップを生成するためのリスク予防ポテンシャルを生成する。 The traffic environment potential risk map learning unit 34 generates a risk preventive potential for generating a potential risk map, as described below.

世の中には極めて多数の人間の運転者がおり、これを単純に２つの分類、すなわち良い運転者と悪い運転者に区分することができる。悪い運転者も大部分の場合には安全運転を守ることができるが、良い運転者は衝突を起こす確率がはるかに低い。両者の間のキーとなる違いの１つは、交通環境に潜むリスクを予測して、事前に回避不能の衝突を防止するために早期のアクションを取る能力である。本発明の実施の形態では、どんなリスクが予期されるか、どこで用心すべきか、そしてどうやって衝突を防止するか、ということに関して、各オブジェクト分類に対する主要な潜在リスクのすべてを、複数の熟練した人間の運転者に要請して洗い出した。その例を次の図９に示す。 There are a great many human drivers in the world, which can simply be divided into two categories: good drivers and bad drivers. Although bad drivers can in most cases protect safe driving, good drivers are much less likely to make a collision. One of the key differences between the two is the ability to take early action in anticipation of the underlying risks in the traffic environment and to prevent an inevitable collision in advance. Embodiments of the present invention include all of the major potential risks for each object classification in terms of what risks are expected, where to be aware of, and how to prevent conflicts. I asked the driver of and washed it out. The example is shown in the following FIG.

リスク予防ポテンシャルは、洗い出された潜在（不可視）リスクに対処するために生成される。さらに、生成されたリスク予防ポテンシャルは、オンライン処理において自車両の速度制御の調整にのみ適用される。一方で、オンライン処理において、軌跡は軌跡誘導ポテンシャルのみで決定される。その理由は次の２つである。第１に、交通環境内には非常に多くの潜在リスク又は不確定因子があり、それらをいちいち空間的に回避することはできない。そうでなければ、軌跡は不自然になりすぎて、多くの新たなリスクを生じるであろう。第２に、そしてより重要には、軌跡誘導ポテンシャルは、同一状況下での過去の人間による自然な運転データから学習され、これらは同一状況下での顕在リスクと潜在リスクの両方に直面したときに人間の運転者が取る軌跡である。より具体的には、リスク予防ポテンシャルは、各潜在リスクに対応する複数の速度ポテンシャルからなる。リスク予防ポテンシャルの実効的範囲、強度及び分布（例えば、ガウス分布）は、自然な運転データから学習される。 Risk prevention potentials are generated to address identified latent (invisible) risks. Furthermore, the risk prevention potential generated is only applied to the adjustment of the speed control of the vehicle in the on-line process. On the other hand, in the on-line processing, the trajectory is determined only by the trajectory induction potential. The reason is the following two. First, there are numerous potential risks or uncertainties in the traffic environment that can not be spatially avoided one by one. Otherwise, the trajectory will be too unnatural and will create many new risks. Second, and more importantly, trajectory-inducing potentials are learned from past human natural driving data under the same situation, when they face both the actual and the potential risk under the same situation It is a trace taken by a human driver. More specifically, the risk prevention potential consists of a plurality of velocity potentials corresponding to each potential risk. The effective range, intensity and distribution (eg Gaussian distribution) of the risk prevention potential is learned from natural driving data.

交通参加者相互作用学習部３６は、以下に説明するように、交通参加者間の相互作用を表すモデルのパラメータを推定する。 The traffic participant interaction learning unit 36 estimates parameters of a model representing the interaction between traffic participants, as described below.

自車両が周囲の交通参加者と相互作用するばかりでなく、交通参加者同士も相互に作用しあっている。本実施の形態では、例えば車−車、車−歩行者、車−自転車などの各交通参加者ペアの相互作用が、交通規則、熟練者定義、及び自然交通データ（交通監視カメラデータなど）に基づいてモデル化される。そうしてこのモデルのパラメータが、自然交通データから学習される。これは次の2通りで使用される。第１の使用方法では、各交通参加者のポテンシャルの統合は、単純に合計するのではなく、このモジュールで学習されたポテンシャルの相互作用／影響に基づいて結合される。第２の使用方法では、学習された相互作用は、ヒューマンライクマルコフ決定過程での交通参加者予測に使用される。 Not only do our vehicles interact with the traffic participants in the area, but traffic participants also interact with each other. In the present embodiment, the interaction between each traffic participant pair, for example, car-car, car-pedestrian, car-bicycle, is a traffic rule, expert definition, and natural traffic data (traffic surveillance camera data etc.) Modeled based on. The parameters of this model are then learned from natural traffic data. This is used in two ways: In the first method of use, the integration of the potentials of each traffic participant is not simply summed up, but combined based on the interaction / impact of the potential learned in this module. In the second usage, the learned interaction is used for traffic participant prediction in the human-like Markov decision process.

ヒューマンライク意思決定学習部３８は、以下に説明するように、行動を決定するためのモデルと報酬関数を学習する。 The human-like decision-making learning unit 38 learns a model for determining behavior and a reward function, as described below.

周辺オブジェクトとのヒューマンライク相互作用は、マルコフ決定過程（MDP）のフレームワークで実行される。ただし、行動を決定するためのモデルと報酬関数の学習は新規の方法で行われる。モデル化されたMDPは、対向車両のある場合に駐車車両を通過するタスクを一例として、 {S, A, P, R, r}で表される。 Human-like interactions with peripheral objects are implemented in the framework of Markov Decision Process (MDP). However, learning models and reward functions to determine behavior is done in a new way. The modeled MDP is represented by {S, A, P, R, r} as an example of the task of passing a parked vehicle in the presence of an oncoming vehicle.

Ｓは経路候補上の位置に対応する状態を表す。ここで、自車両は、過去の車両軌跡及びオンライン経路計画から複数の経路候補を生成する。複数の経路候補は、ルート追従（RF）、レーン内迂回（IC）、及び障害回避（OA）の3つのグループに区分される。 S represents a state corresponding to the position on the path candidate. Here, the host vehicle generates a plurality of route candidates from the past vehicle trajectory and the on-line route plan. The plurality of route candidates are divided into three groups: route tracking (RF), in-lane bypass (IC), and fault avoidance (OA).

ルート追従（RF）に区分される経路候補は、自車両を所定のレーンレベルのルートを追従するように導くためのものである。 The route candidates classified as route tracking (RF) are for guiding the host vehicle to follow a route of a predetermined lane level.

レーン内迂回（IC）に区分される経路候補は、対向車両を無視して、駐車車両を円滑かつ安全に迂回するように自車両を導くためのものである。 The route candidate classified as in lane detour (IC) is for guiding the subject vehicle to bypass the parked vehicle smoothly and safely, ignoring the oncoming vehicle.

障害回避（OA）に区分される経路候補は、駐車車両と対向車両を含む近隣オブジェクトとの衝突を回避するように自車両を導くためのものである。 Path candidates classified as obstacle avoidance (OA) are for guiding the host vehicle to avoid a collision between a parked vehicle and a nearby object including an oncoming vehicle.

MDPの状態（図１０の丸印参照）は、対向車両の位置／速度に応じた、標準的速度での経路候補上の各位置に対応している。 The state of the MDP (see the circle in FIG. 10) corresponds to each position on the route candidate at the standard speed according to the position / speed of the oncoming vehicle.

Ａは、決定される行動を表し、現在の経路候補を前進、経路切り替え、停止の何れかである。 A represents an action to be determined, which is one of advancing the current path candidate, path switching, or stopping.

Ｓは、状態遷移確率P(s_t+1|s_t, a)を表す。自車両の状態遷移は、現在の決定、経路、及び速度に従う車両運動により計算される。具体的には、対向車両の状態遷移は、過去の自然運転データから学習された車両軌跡と対向車両のモデルによって経路候補として生成される。これは自車両の運動が対向車両に与える影響を考慮するために行われる。それは対向車両の運動が自車両の決定／経路に影響するばかりでなく、自車両の運動も対向車両の運動に影響するからである。 S represents the state transition probability P (s _{t + 1} | s _t , a). State transitions of the host vehicle are calculated by vehicle motion according to the current decisions, routes, and speeds. Specifically, the state transition of the oncoming vehicle is generated as a route candidate by the vehicle trajectory learned from the past natural driving data and the oncoming vehicle model. This is done to take into consideration the influence of the movement of the host vehicle on the oncoming vehicle. This is because not only the movement of the oncoming vehicle influences the determination / path of the vehicle but also the movement of the own vehicle affects the movement of the oncoming vehicle.

Ｒは、報酬を表し、時間＋満足＋安全の組合せである。すなわち、報酬＝w₁*時間スコア + w₂* 満足スコア + w₃*安全スコアである。 R represents a reward, which is a combination of time + satisfaction + safety. That is, reward = w ₁ * time score + w ₂ * satisfaction score + w ₃ * safety score.

時間スコアは、各ステップに対して−１、停止に対して−３となる。したがって最短／最速の決定が選択される。 The time score is -1 for each step and -3 for the stop. Therefore, the shortest / fastest decision is chosen.

満足スコアは、現在の状態遷移にどのスコアを使用するかの判定閾値が与えられているとして、経路切り替えアクション時の進行方向角度の小変化に対して−２、経路切り替えアクション時の進行方向角度の大変化に対して−５となる。 The satisfaction score is given as a determination threshold of which score to use for the current state transition, -2 for a small change in the advancing direction angle at the time of the route switching action, the advancing direction angle at the time of the route switching action For a major change of -5.

安全スコアは、統合された軌跡誘導ポテンシャルとリスク予防ポテンシャルによって安全スコアが与えられる。 The safety score is given by integrated trajectory induction potential and risk prevention potential.

重みｗ_１、ｗ_２、ｗ_３は、Markov Decision Processによる強化学習に従って自然運転データを用いて学習するか、事前にエンジニアが設定しておけばよい。 The weights w ₁ , w ₂ and w ₃ may be learned using natural driving data according to reinforcement learning by Markov Decision Process, or may be set in advance by an engineer.

ｒは、割引因子を表し、更なるステップ（未来）からの報酬を考慮するために、０〜１の間の実数として定義される。 r represents the discount factor and is defined as a real number between 0 and 1 to take into account the rewards from the further steps (future).

オンライン処理においてモデルに従って決定される行動の解は次の２つの方法で得ることができる。１つはQマトリックスが事前に学習されており、Qマトリックスにより行動を決定する。Qマトリックスは自然運転データでの強化学習に基づくすべての状態→アクションマッピングを含む。 The solution of the behavior determined according to the model in on-line processing can be obtained in the following two ways. One is that the Q matrix is learned in advance, and the action is determined by the Q matrix. The Q matrix contains all state → action mappings based on reinforcement learning with natural driving data.

もう一つは、ダイナミックプログラミングアルゴリズムを用いて現状況での状態に対してオンラインで行動を決定するものである。本実施の形態では、ダイナミックプログラミングアルゴリズムを用いてオンラインで行動を決定する場合を例に説明する。 The other is to use a dynamic programming algorithm to determine the behavior online for the current situation. In the present embodiment, the case of determining an action online using a dynamic programming algorithm will be described as an example.

こうして図１１に示すように、ヒューマンライク挙動パターンが学習MDPによって生成可能となる。 Thus, as shown in FIG. 11, human-like behavior patterns can be generated by learning MDP.

通信部４０は、交通参加者分類部３０、交通環境顕在リスクマップ学習部３２、交通環境潜在リスクマップ学習部３４、交通参加者相互作用学習部３６、及びヒューマンライク意思決定学習部３８による処理結果を、サーバ５０へ送信する。 The communication unit 40 processes the traffic participant classification unit 30, traffic environment actualized risk map learning unit 32, traffic environment potential risk map learning unit 34, traffic participant interaction learning unit 36, and human-like decision making learning unit 38. Is sent to the server 50.

サーバ５０は、各運転支援制御装置１０から送信された、すべての学習モデルとポテンシャルを記憶する。すべての学習モデルとポテンシャルはナリッジデータベース／レイヤー生成に使用される。これは、自律運転又はADAS／運転支援操作のときにオンライン処理のためのクラウドベース／サーバベースサービスとして利用可能である。 The server 50 stores all learning models and potentials transmitted from each driving support control device 10. All learning models and potentials are used for knowledge database / layer generation. This is available as a cloud-based / server-based service for on-line processing during autonomous driving or ADAS / driver assistance operations.

コンピュータ２０は、更に、交通環境ダイナミック情報生成部４２と、交通環境リスクマップ構築部４４と、ヒューマンライク進路決定部４６とを備えている。なお、交通環境リスクマップ構築部４４は、顕在リスクマップ生成部及び潜在リスクマップ生成部の一例であり、ヒューマンライク進路決定部４６は、行動決定部の一例である。 The computer 20 further includes a traffic environment dynamic information generation unit 42, a traffic environment risk map construction unit 44, and a human-like course determination unit 46. The traffic environment risk map constructing unit 44 is an example of an actualized risk map generating unit and a latent risk map generating unit, and the human-like course determining unit 46 is an example of an action determining unit.

交通環境ダイナミック情報生成部４２は、オンライン処理時に、サーバ５０から得られる、検出されたオブジェクトとその分類を含む交通シーンの全情報を統合して現状交通シーンの現時点動的マップを生成する。なお、検出されたオブジェクトとその分類を含む交通シーンの情報は、自車両を含む各車両の運転支援制御装置１０によって検出されたものだけでなく、車車間通信又は路側との通信により検出されたものを用いても良い。 The traffic environment dynamic information generation unit 42 integrates all information of the traffic scene obtained from the server 50 and the traffic scene including the classification thereof during online processing to generate a current dynamic map of the current traffic scene. In addition, the information of the traffic scene including the detected object and the classification thereof is detected not only by the driving assistance control device 10 of each vehicle including the own vehicle but also by the inter-vehicle communication or the communication with the roadside. You may use a thing.

交通環境リスクマップ構築部４４は、サーバ５０から得られる、自車両の周囲の交通参加者の分類結果と、分類毎に予め学習された軌跡誘導ポテンシャルとに基づいて、自車両の周囲の交通参加者毎に、分類に応じた軌跡誘導ポテンシャルを当てはめて、顕在リスクマップを生成し、サーバ５０から得られる、自車両の周囲の交通参加者の分類結果と、分類毎に予め学習されたリスク予防ポテンシャルとに基づいて、自車両の周囲の交通参加者毎に、分類に応じたリスク予防ポテンシャルを当てはめて、潜在リスクマップを生成し、顕在リスクマップ及び潜在リスクマップを統合した統合リスクマップを生成する。 The traffic environment risk map constructing unit 44 participates in traffic around the vehicle based on the classification result of traffic participants around the vehicle obtained from the server 50 and the trajectory guidance potential learned in advance for each classification. It applies the locus induction potential according to the classification for each person, generates the actualized risk map, and the classification result of traffic participants around the own vehicle obtained from the server 50 and the risk prevention learned in advance for each classification Based on the potential and the traffic prevention around the host vehicle, the risk prevention potential according to the classification is applied to generate the potential risk map, and the integrated risk map that integrates the actual risk map and the potential risk map is generated Do.

具体的には、動的交通シーンのオブジェクトの各々に対して、分類毎に予め学習された軌跡誘導ポテンシャル及びリスク予防ポテンシャルに従って、１つずつ対応する軌跡誘導ポテンシャル及びリスク予防ポテンシャルを当てはめて、意思決定と経路計画のための統合リスクマップに統合する。駐車車両が存在する場合の統合リスクマップの例を図１２〜図１３に示す。 Specifically, for each of the objects of the dynamic traffic scene, the corresponding trajectory induction potential and the risk prevention potential are applied one by one according to the trajectory induction potential and the risk prevention potential previously learned for each classification, Integrate into integrated risk maps for decision and path planning. An example of the integrated risk map when there is a parked vehicle is shown in FIG. 12 to FIG.

また、統合リスクマップに、交通参加者のペア間の相互作用が反映される。 In addition, integrated risk maps reflect interactions between pairs of traffic participants.

ヒューマンライク進路決定部４６は、現在の状態と、状態遷移確率と、構築された統合リスクマップを用いた報酬関数とに基づいて、自車両の複数の経路候補上の位置に対応する状態のうち、統合リスクマップを用いた報酬関数により求められる報酬が多く得られるように、現在の経路候補に沿って進行するように状態を遷移するか、異なる経路候補上の状態に遷移するか、停止するか、のいずれかを、最適な行動として決定する。このとき、顕在リスクポテンシャル及びリスク予防ポテンシャルに基づいて調整される速度に応じて遷移先となる状態までの距離が決まる。 The human-like course determination unit 46 selects one of the states corresponding to the positions of the host vehicle on the plurality of path candidates, based on the current state, the state transition probability, and the reward function using the integrated risk map constructed. Transition the state to progress along the current path candidate, transition to a state on a different path candidate, or stop so as to obtain much reward determined by the reward function using the integrated risk map Or either is determined as the best action. At this time, the distance to the transition destination state is determined according to the speed adjusted based on the actual risk potential and the risk prevention potential.

具体的には、統合リスクマップ上の顕在リスクポテンシャルに基づいて、ルート追従に区分される経路候補、レーン内迂回に区分される経路候補、及び障害回避に区分される経路候補を生成し、統合リスクマップ上の顕在リスクポテンシャル及びリスク予防ポテンシャルに基づいて、車両速度が調整される。そして、調整された速度に応じて、ルート追従に区分される経路候補、レーン内迂回に区分される経路候補、及び障害回避に区分される経路候補の各経路候補上に、各位置に対応する状態を生成し、自車両と対向車両の入力状態と学習MDPモデルによるダイナミックプログラミングを用いて、将来的な報酬を考慮した報酬が得られる最適な行動を決定する。さらに、決定された行動から、ヨー角と、スロットル及びブレーキの何れか一方とを有する命令が計算されて、車両制御部２２へ出力される。なお、学習されたQマトリックスから最適な行動を決定しても良い。 Specifically, based on the actual risk potential on the integrated risk map, a route candidate classified as route following, a route candidate classified as a detour in lane, and a route candidate classified as failure avoidance are generated and integrated. Vehicle speed is adjusted based on the actual risk potential and risk prevention potential on the risk map. And, according to the adjusted speed, corresponding to each position on each route candidate of the route candidate classified as route following, the route candidate classified as detour in lane, and the route candidate classified as failure avoidance The state is generated, and dynamic programming based on the input state of the own vehicle and the oncoming vehicle and the learning MDP model is used to determine the optimum action for obtaining the reward considering the future reward. Furthermore, from the determined action, a command having the yaw angle and either the throttle or the brake is calculated and output to the vehicle control unit 22. An optimal action may be determined from the learned Q matrix.

＜運転支援制御装置１０の作用＞
次に、本実施の形態の作用について説明する。 <Operation of Driving Support Control Device 10>
Next, the operation of the present embodiment will be described.

運転支援制御装置１０は、図１４に示すデータ収集処理ルーチンを繰り返し実行する。 The driving support control device 10 repeatedly executes the data collection processing routine shown in FIG.

まず、ステップＳ１００において、交通参加者軌跡取得部１２は、自車両周辺の交通参加者の各々の軌跡を取得する。 First, in step S100, the traffic participant trajectory acquisition unit 12 acquires trajectories of traffic participants around the host vehicle.

ステップＳ１０２では、自車位置情報取得部１４は、データ収集プロセス中に、自車両がプローブカーとなって、自車両の位置と軌跡を取得する。また、車線情報取得部１６は、デジタルマップから走行レーン情報を取得する。画像取得部１８は、車載カメラを用いて画像を取得する。 In step S102, the vehicle position information acquisition unit 14 acquires the position and the trajectory of the vehicle as the vehicle becomes a probe car during the data collection process. In addition, the lane information acquisition unit 16 acquires traveling lane information from the digital map. The image acquisition unit 18 acquires an image using an on-vehicle camera.

ステップＳ１０４では、交通参加者分類部３０は、自車両および交通参加者の軌跡、位置、及び車線情報に基づいて、自車両の周囲の交通参加者を、属性及び状態に応じて分類する。 In step S104, the traffic participant classification unit 30 classifies the traffic participants around the host vehicle according to the attribute and the state based on the trajectory, the position, and the lane information of the host vehicle and the traffic participant.

そして、ステップＳ１０６において、交通環境顕在リスクマップ学習部３２は、検出された交通参加者のそれぞれの分類に対して、顕在リスクマップを生成するための軌跡誘導ポテンシャルを生成する。 Then, in step S106, the traffic environment actualized risk map learning unit 32 generates a trajectory induction potential for generating an actualized risk map for each classification of the detected traffic participants.

ステップＳ１０８では、交通環境潜在リスクマップ学習部３４は、検出された交通参加者のそれぞれの分類に対して、潜在リスクマップを生成するためのリスク予防ポテンシャルを生成する。 In step S108, the traffic environment potential risk map learning unit 34 generates a risk prevention potential for generating a potential risk map for each classification of the detected traffic participants.

そして、ステップＳ１１０では、交通参加者相互作用学習部３６は、交通参加者間の相互作用を表すモデルのパラメータを推定する。 Then, in step S110, the traffic participant interaction learning unit 36 estimates parameters of a model representing the interaction between traffic participants.

ステップＳ１１２では、通信部４０は、交通参加者分類部３０、交通環境顕在リスクマップ学習部３２、交通環境潜在リスクマップ学習部３４、および交通参加者相互作用学習部３６による処理結果を、サーバ５０へ送信して、データ収集処理ルーチンを終了する。 In step S112, the communication unit 40 processes the processing result by the traffic participant classification unit 30, traffic environment actualized risk map learning unit 32, traffic environment potential risk map learning unit 34, and traffic participant interaction learning unit 36 as the server 50. And the data collection processing routine ends.

次に、運転支援制御装置１０は、図１５に示すオンライン処理ルーチンを実行する。 Next, the driving support control device 10 executes an online processing routine shown in FIG.

まず、ステップＳ１２０において、車線情報取得部１６は、デジタルマップから走行レーン情報を取得する。画像取得部１８は、車載カメラを用いて画像を取得する。 First, in step S120, the lane information acquisition unit 16 acquires traveling lane information from the digital map. The image acquisition unit 18 acquires an image using an on-vehicle camera.

ステップＳ１２２において、交通環境ダイナミック情報生成部４２は、サーバ５０から得られる、検出されたオブジェクトとその分類を含む交通シーンの全情報を統合して現状交通シーンの現時点動的マップを生成する。 In step S122, the traffic environment dynamic information generation unit 42 integrates all the information of the traffic scene including the detected object and its classification obtained from the server 50 to generate a current dynamic map of the current traffic scene.

そして、ステップＳ１２４において、交通環境リスクマップ構築部４４は、サーバ５０から得られる、自車両の周囲の交通参加者の分類結果と、分類毎に予め学習された軌跡誘導ポテンシャルとに基づいて、自車両の周囲の交通参加者毎に、分類に応じた軌跡誘導ポテンシャルを当てはめて、顕在リスクマップを生成し、自車両の周囲の交通参加者の分類結果と、分類毎に予め学習されたリスク予防ポテンシャルとに基づいて、自車両の周囲の交通参加者毎に、分類に応じたリスク予防ポテンシャルを当てはめて、潜在リスクマップを生成し、顕在リスクマップ及び前記潜在リスクマップを統合した統合リスクマップを生成する。 Then, in step S124, the traffic environment risk map constructing unit 44 uses the classification result of the traffic participants around the host vehicle obtained from the server 50 and the trajectory induction potential learned in advance for each classification. For each traffic participant around the vehicle, apply a trajectory guidance potential according to the classification to generate an actualized risk map, and classify the traffic participants around the own vehicle with classification results and risk prevention learned in advance for each classification Based on the potential and the risk prevention potential according to the classification for each traffic participant around the own vehicle, a potential risk map is generated, and the integrated risk map integrating the actual risk map and the potential risk map Generate

ステップＳ１２６では、ヒューマンライク進路決定部４６は、自車両の複数の経路候補上の位置に対応する状態のうち、統合リスクマップを用いた報酬関数により求められる報酬が多く得られる状態に遷移又は停止することを、最適な行動として決定する In step S126, the human-like course determination unit 46 transitions or stops to a state in which a lot of rewards obtained by the reward function using the integrated risk map can be obtained among the states corresponding to the positions of the host vehicle on the plurality of route candidates. Decide what to do as the best action

そして、ステップＳ１２８において、決定された行動から、ヨー角と、スロットル／ブレーキの１つを有する命令を計算し、車両制御部２２へ出力し、ステップＳ１２０へ戻る。 Then, in step S128, from the determined action, a command having one of the yaw angle and the throttle / brake is calculated and output to the vehicle control unit 22, and the process returns to step S120.

上記のオンライン処理ルーチンが繰り返し実行されることにより、逐次計算されたヨー角と、スロットル／ブレーキの１つを有する命令が、車両制御部２２へ出力され、車両制御部２２により、車両制御が行われる。 By repeatedly executing the above-described on-line processing routine, a command having one of the yaw angle and the throttle / brake calculated sequentially is output to the vehicle control unit 22, and the vehicle control unit 22 performs the vehicle control. It will be.

以上説明したように、本発明の第１の実施の形態に係る運転支援制御システムによれば、自車両の周囲の交通参加者毎に、分類に応じた顕在リスクポテンシャルを当てはめて、顕在リスクマップを生成し、分類に応じたリスク予防ポテンシャルを当てはめて、潜在リスクマップを生成し、自車両の複数の経路候補上の位置に対応する状態のうち、顕在リスクマップ及び潜在リスクマップを統合した統合リスクマップを用いた報酬関数により求められる報酬が多く得られる状態に遷移又は停止することを、最適な行動として決定することにより、人間の運転に近い挙動により車両を制御することができる。 As described above, according to the driving assistance control system according to the first embodiment of the present invention, the actualized risk map is applied by applying the actualized risk potential according to the classification for each traffic participant around the host vehicle. To generate a potential risk map according to the classification, generate a potential risk map, and integrate the actualized risk map and the potential risk map among the states corresponding to the positions on the multiple route candidates of the vehicle It is possible to control the vehicle with behavior similar to human driving by determining as the optimal behavior that transition or stop to a state where a large amount of reward obtained by the reward function using the risk map can be obtained.

また、自車両の周囲の交通参加者を、属性及び状態に応じて分類し、分類毎に、顕在リスクポテンシャル及びリスク予防ポテンシャルを学習して、サーバに送信することにより、ヒューマンライク挙動により車両を制御するための顕在リスク及び潜在リスクを学習することができる。 In addition, traffic participants around the host vehicle are classified according to their attributes and conditions, and for each classification, the actual risk potential and risk prevention potential are learned, and transmitted to the server to transmit the vehicle by human-like behavior. It is possible to learn the actual and potential risks to control.

また、道路の複雑さを、すべての動的な交通参加者に対する異なる分類の個別事象の組合せに分解する。また、交通参加者を分類するモデルを学習し、分類のそれぞれに対する顕在リスクポテンシャル及びリスク予防ポテンシャルを学習する。学習された顕在リスクポテンシャル及びリスク予防ポテンシャルを用いた報酬関数を利用して、ヒューマンライク挙動を説明するマルコフ決定プロセスモデルが学習される。これにより、複雑な交通環境を個々の事象に分解してモデル化することで多様な走行データから人間らしい運転行動を導出することができる It also breaks down the complexity of the road into a combination of distinct events of different classifications for all dynamic traffic participants. It also learns models that classify traffic participants and learns the actual risk potential and risk prevention potential for each of the classifications. A Markov decision process model that explains human-like behavior is learned using the learned actual risk potential and the reward function using the risk prevention potential. In this way, it is possible to derive human-like driving behavior from various traveling data by decomposing and modeling a complex traffic environment into individual events

[第２の実施の形態]
次に、第２の実施の形態に係る運転支援制御装置について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 Second Embodiment
Next, a driving support control device according to the second embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、サーバを用いずに、自装置内で得られた学習モデルとポテンシャルを用いてオンライン処理を行っている点が、第１の実施の形態と異なっている。 The second embodiment is different from the first embodiment in that online processing is performed using a learning model and potential obtained in the own device without using a server.

＜システム構成＞ <System configuration>

図１６に示すように、第２の実施の形態に係る運転支援制御装置２１０のコンピュータ２２０は、交通参加者分類部３０と、交通環境顕在リスクマップ学習部３２と、交通環境潜在リスクマップ学習部３４と、交通参加者相互作用学習部３６と、ヒューマンライク意思決定学習部３８と、交通環境ダイナミック情報生成部４２と、交通環境リスクマップ構築部４４と、ヒューマンライク進路決定部４６とを備えている。 As shown in FIG. 16, the computer 220 of the driving support control device 210 according to the second embodiment includes the traffic participant classification unit 30, the traffic environment actualized risk map learning unit 32, and the traffic environment potential risk map learning unit. 34, a traffic participant interaction learning unit 36, a human-like decision making learning unit 38, a traffic environment dynamic information generating unit 42, a traffic environment risk map constructing unit 44, and a human-like course determination unit 46 There is.

交通参加者分類部３０による処理結果が、交通環境ダイナミック情報生成部４２に出力される。 The processing result by the traffic participant classification unit 30 is output to the traffic environment dynamic information generation unit 42.

交通環境顕在リスクマップ学習部３２、交通環境潜在リスクマップ学習部３４、及び交通参加者相互作用学習部３６による処理結果が、交通環境リスクマップ構築部４４に出力される。 The processing result by the traffic environment actualized risk map learning unit 32, the traffic environment potential risk map learning unit 34, and the traffic participant interaction learning unit 36 is output to the traffic environment risk map constructing unit 44.

ヒューマンライク意思決定学習部３８による処理結果が、ヒューマンライク進路決定部４６に出力される。 The processing result of the human-like decision making learning unit 38 is output to the human-like course determination unit 46.

なお、第２の実施の形態に係る運転支援制御装置２１０の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 The other configuration and operation of the driving support control device 210 according to the second embodiment are the same as those of the first embodiment, and thus the description thereof is omitted.

以上説明したように、第２の実施の形態に係る運転支援制御装置によれば、自車両の周囲の交通参加者毎に、分類に応じた顕在リスクポテンシャルを当てはめて、顕在リスクマップを生成し、分類に応じたリスク予防ポテンシャルを当てはめて、潜在リスクマップを生成し、自車両の複数の経路候補上の位置に対応する状態のうち、顕在リスクマップ及び潜在リスクマップを統合した統合リスクマップを用いた報酬関数により求められる報酬が多く得られる状態に遷移又は停止することを、最適な行動として決定することにより、人間の運転に近い挙動により車両を制御することができる。 As described above, according to the driving support control device according to the second embodiment, the actualized risk potential is applied according to the classification for each traffic participant around the host vehicle to generate the actualized risk map. Create a potential risk map by applying the risk prevention potential according to the classification, and integrate the actualized risk map and the potential risk map among the states corresponding to the positions of the host vehicle on multiple route candidates. It is possible to control the vehicle with behavior close to human driving by determining as the optimal behavior that transition or stop to a state in which a large amount of reward obtained by the used reward function is obtained.

なお、上記の第１、第２の実施の形態では、オンライン処理中に、車載カメラによって撮像された画像から交通参加者を検出する場合を例に説明したが、これに限定されるものではなく、レーザレーダによって観測された自車両の周辺に存在する物体情報に基づいて、周辺の交通参加者を検出してもよい。 In the first and second embodiments described above, the traffic participant is detected from the image captured by the on-vehicle camera during the online processing, but the present invention is not limited to this. The traffic participants in the vicinity may be detected based on object information present in the vicinity of the host vehicle observed by the laser radar.

１０、２１０運転支援制御装置
１２交通参加者軌跡取得部
１４自車位置情報取得部
１６車線情報取得部
１８画像取得部
２０、２２０コンピュータ
２２車両制御部
３０交通参加者分類部
３２交通環境顕在リスクマップ学習部
３４交通環境潜在リスクマップ学習部
３６交通参加者相互作用学習部
３８ヒューマンライク意思決定学習部
４０通信部
４２交通環境ダイナミック情報生成部
４４交通環境リスクマップ構築部
４６ヒューマンライク進路決定部
５０サーバ
１００運転支援制御システム 10, 210 Driving support control device 12 traffic participant locus acquisition unit 14 own vehicle position information acquisition unit 16 lane information acquisition unit 18 image acquisition unit 20, 220 computer 22 vehicle control unit 30 traffic participant classification unit 32 traffic environment actualization risk map Learning part 34 Traffic environment potential risk map learning part 36 Traffic participant interaction learning part 38 Human like decision making learning part 40 Communication part 42 Traffic environment dynamic information generation part 44 Traffic environment risk map construction part 46 Human like course determination part 50 Server 100 driving support control system

Claims

A traffic participant classification unit that classifies traffic participants around the host vehicle according to attributes and states based on the trajectory, position, and lane information of the host vehicle and the traffic participant;
Based on the classification results of traffic participants around the host vehicle and the actual risks learned in advance for each category, apply the actual risks according to the classification for each traffic participant around the host vehicle, and the actual risks An actual risk map generation unit that generates a map;
An action to determine as an optimal action transition or stop to a state in which a large amount of reward obtained by a reward function using the actual risk map can be obtained among states corresponding to positions on a plurality of route candidates of the own vehicle The decision unit,
A vehicle control unit that controls the vehicle according to the determined action;
Vehicle control device including.

Based on the classification result of the traffic participant around the host vehicle and the potential risk learned in advance for each category, the potential risk according to the category is applied to each traffic participant around the host vehicle, and the potential risk Further including a potential risk map generator for generating a map;
Among the states corresponding to the positions of the host vehicle on a plurality of route candidates, the behavior determination unit obtains many rewards determined by a reward function using the integrated risk map and the actual risk map and the potential risk map. The vehicle control device according to claim 1, wherein it is determined that the transition or stop to the ready state is an optimal action.

The method further includes an interaction estimation unit that estimates the influence between pairs of traffic participants of different attributes and states based on the classification result of the traffic participants,
The vehicle control device according to claim 2, wherein the influence between the estimated traffic participants is reflected in the integrated risk map.

The reward function is pre-learned using reinforcement learning by Markov Decision Process,
The vehicle control device according to any one of claims 1 to 3, wherein the plurality of route candidates include a route candidate for following a route, a route candidate for avoidance in a lane, and a route candidate for avoiding a collision.

The vehicle control device according to any one of claims 1 to 4, wherein the reward function is expressed using a score related to time, a score related to switching of a route candidate, and a score related to safety.

The vehicle control device according to any one of claims 1 to 5, wherein the classification of the traffic participant includes a vehicle, a pedestrian, a bicycle, and a motorcycle.

A traffic participant classification unit that classifies traffic participants around the host vehicle according to attributes and states based on the trajectory, position, and lane information of the host vehicle and the traffic participant;
An actualized risk learning unit that generates an actualized risk for each classification based on the classification result of traffic participants around the host vehicle and stores the actualized risk in a database;
Risk map generator including:

8. The latent risk learning unit according to claim 7, further comprising: a latent risk learning unit which is stored in the database to generate a latent risk and integrate it with the actual risk for each classification based on the classification result of traffic participants around the vehicle. Risk map generator.

Computer,
A traffic participant classification unit that classifies traffic participants around the host vehicle according to attributes and states based on the trajectory, position, and lane information of the host vehicle and the traffic participant,
Based on the classification results of traffic participants around the host vehicle and the actual risks learned in advance for each category, apply the actual risks according to the classification for each traffic participant around the host vehicle, and the actual risks An actual risk map generation unit that generates a map,
An action to determine as an optimal action transition or stop to a state in which a large amount of reward obtained by a reward function using the actual risk map can be obtained among states corresponding to positions on a plurality of route candidates of the own vehicle The program for functioning as a determination part and a vehicle control part which controls the own vehicle according to the determined action.

Computer,
A traffic participant classification unit that classifies traffic participants around the vehicle according to attributes and conditions based on the trajectory, position, and lane information of the vehicle and traffic participants, and traffic participation around the vehicle A program to generate actual risks for each classification based on the classification results of persons and store them in a database to function as an actual risk learning unit.