JP2022096232A

JP2022096232A - Vehicle control device, vehicle control method and program

Info

Publication number: JP2022096232A
Application number: JP2020209225A
Authority: JP
Inventors: 建後藤; Ken Goto; 政宣武田; Masanori Takeda
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2022-06-29
Anticipated expiration: 2040-12-17
Also published as: JP7433205B2

Abstract

To properly determine whether to let a vehicle overtake a preceding vehicle while suppressing a processing load from increasing.SOLUTION: A vehicle control device comprises: a recognition part which recognizes a circumferential situation of a vehicle; and a drive control part which controls acceleration and deceleration, and steering of the vehicle not through operation by a passenger of the vehicle. The drive control part comprises an overtake control part which determines whether or not to overtake a preceding vehicle traveling ahead of the vehicle in the same lane with the vehicle by comparing a value obtained by integrating a plurality of evaluation values that a plurality of evaluation functions output with a threshold value. Each of the plurality of evaluation functions outputs the evaluation values once a plurality of index values indicative of the circumferential situation of the vehicle recognized by the recognition part are input, and output tendencies are adjusted in a previous environment through reinforced learning.SELECTED DRAWING: Figure 5

Description

本発明は、車両制御装置、車両制御方法、およびプログラムに関する。 The present invention relates to a vehicle control device, a vehicle control method, and a program.

車両を自動的に（Automatedly）走行させる技術について研究および実用化が進められている。特許文献１には、車両を自動的に車線変更させる際の車線変更先の位置を決定することについて記載されている。また、運転者の運転行動を学習することにより、運転者の感覚に近い車線変更先を選択することについても記載されている。 Research and practical application of technologies for automatically driving vehicles are underway. Patent Document 1 describes determining the position of the lane change destination when the vehicle is automatically changed to the lane. It also describes how to select a lane change destination that is close to the driver's feeling by learning the driver's driving behavior.

特開２０１９－２１７８２８号公報Japanese Unexamined Patent Publication No. 2019-217828

上記特許文献１に記載の発明は、目標速度を維持しながら走行する中で、前走車両を追い越す好適なタイミングについて言及されていない。仮に、前走車両の追い越しを行うか否かの判定を、単純なＩｆ－Ｔｈｅｎルールで行った場合、周辺車両との関係が好適に維持されるとは限らず、典型的な例としては、一度車線変更をして元の車線変更に戻る場合の車線変更先において、後続車両との間隔が短くなり過ぎる可能性がある。 The invention described in Patent Document 1 does not mention a suitable timing for overtaking a vehicle in front while traveling while maintaining a target speed. If the judgment as to whether or not to overtake the preceding vehicle is performed by a simple If-Then rule, the relationship with the surrounding vehicles is not always maintained favorably, and as a typical example, When changing lanes once and returning to the original lane change, the distance from the following vehicle may become too short at the lane change destination.

一方で、強化学習を用いてより高度な判断をすることについて研究が進められている。強化学習は、モデル構築の自由度が高く、複雑な判断を可能にするものであるが、車載コンピュータに搭載するには処理負荷が高く、処理遅延や消費電力増大等の問題が生じ得る。 On the other hand, research is underway on making more advanced decisions using reinforcement learning. Reinforcement learning has a high degree of freedom in model construction and enables complicated judgments, but it has a high processing load to be mounted on an in-vehicle computer, and problems such as processing delay and increased power consumption may occur.

本発明は、このような事情を考慮してなされたものであり、処理負荷の増大を抑制しつつ、車両に前走車両を追い越させるか否かを適切に判断することができる車両制御装置、車両制御方法、およびプログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and is a vehicle control device capable of appropriately determining whether or not to allow a vehicle to overtake a vehicle in front while suppressing an increase in processing load. One of the purposes is to provide a vehicle control method and a program.

この発明に係る車両制御装置、車両制御方法、およびプログラムは、以下の構成を採用した。
（１）：この発明の一態様に係る車両制御装置は、車両の周辺状況を認識する認識部と、車両の乗員の操作に依らずに前記車両の加減速および操舵を制御する運転制御部と、を備え、前記運転制御部は、複数の評価関数が出力する複数の評価値を統合した値を閾値と比較することで、前記車両と同一の車線上における前記車両の前方を走行する前走車両を追い越すか否かを決定する追い越し制御部を備え、前記複数の評価関数のそれぞれは、前記認識部により認識された前記車両の周辺状況を示す複数の指標値のうち一つが入力されると評価値を出力するものであり、事前環境において強化学習によって出力傾向が調整されたものである。 The vehicle control device, the vehicle control method, and the program according to the present invention have adopted the following configurations.
(1): The vehicle control device according to one aspect of the present invention includes a recognition unit that recognizes the surrounding conditions of the vehicle, and an operation control unit that controls acceleration / deceleration and steering of the vehicle without depending on the operation of the occupants of the vehicle. , And the operation control unit compares a value obtained by integrating a plurality of evaluation values output by a plurality of evaluation functions with a threshold value to travel in front of the vehicle in the same lane as the vehicle. It is provided with an overtaking control unit that determines whether or not to overtake the vehicle, and each of the plurality of evaluation functions is input with one of a plurality of index values indicating the surrounding conditions of the vehicle recognized by the recognition unit. The evaluation value is output, and the output tendency is adjusted by reinforcement learning in the pre-environment.

（２）：上記（１）の態様において、前記追い越し制御部は、前記複数の評価値の積が前記閾値以上である場合に、前記前走車両を追い越すことを決定し、前記車両に前記前走車両を追い越させるものである。 (2): In the embodiment of (1) above, the overtaking control unit determines to overtake the preceding vehicle when the product of the plurality of evaluation values is equal to or greater than the threshold value, and the vehicle is in front of the vehicle. It overtakes running vehicles.

（３）：上記（１）または（２）の態様において、前記複数の指標値は、前記車両の目標速度と前記前走車両の速度との差を含むものである。 (3): In the embodiment (1) or (2), the plurality of index values include the difference between the target speed of the vehicle and the speed of the vehicle in front.

（４）：上記（１）から（３）のいずれかの態様において、前記複数の指標値は、前記車両と前記前走車両との距離を含むものである。 (4): In any of the above embodiments (1) to (3), the plurality of index values include the distance between the vehicle and the vehicle in front.

（５）：上記（１）から（４）のいずれかの態様において、前記複数の指標値は、前記車両と、前記車両が追い越しの際に車線変更する先の隣接車線において前記車両よりも後方に居る隣接後方車両との距離を含むものである。 (5): In any of the above embodiments (1) to (4), the plurality of index values are rearward of the vehicle and the adjacent lane to which the vehicle changes lanes when overtaking. It includes the distance to the adjacent rear vehicle in.

（６）：上記（１）から（５）のいずれかの態様において、前記複数の評価関数のそれぞれは、Ａｃｔｏｒ－Ｃｒｉｔｉｃ法によって前記出力傾向が調整されたものである。 (6): In any one of the above (1) to (5), each of the plurality of evaluation functions has its output tendency adjusted by the Actor-Critic method.

（７）：本発明の他の態様に係る車両制御方法は、車両制御装置が、車両の周辺状況を認識し、車両の乗員の操作に依らずに前記車両の加減速および操舵を制御し、前記制御することは、複数の評価関数が出力する複数の評価値を統合した値を閾値と比較することで、前記車両と同一の車線上における前記車両の前方を走行する前走車両を追い越すか否かを決定することを含み、前記複数の評価関数のそれぞれは、前記認識された前記車両の周辺状況を示す複数の指標値のうち一つが入力されると評価値を出力するものであり、事前環境において強化学習によって出力傾向が調整されたものである。 (7): In the vehicle control method according to another aspect of the present invention, the vehicle control device recognizes the surrounding situation of the vehicle and controls the acceleration / deceleration and steering of the vehicle without depending on the operation of the occupants of the vehicle. The control is to overtake the preceding vehicle traveling in front of the vehicle on the same lane as the vehicle by comparing the value obtained by integrating the plurality of evaluation values output by the plurality of evaluation functions with the threshold value. Each of the plurality of evaluation functions, including determining whether or not, outputs an evaluation value when one of a plurality of index values indicating the recognized surrounding conditions of the vehicle is input. The output tendency is adjusted by reinforcement learning in the pre-environment.

（８）：本発明の他の態様に係るプログラムは、車両に搭載されたコンピュータに、車両の周辺状況を認識させ、車両の乗員の操作に依らずに前記車両の加減速および操舵を制御させ、前記制御させることは、複数の評価関数が出力する複数の評価値を統合した値を閾値と比較することで、前記車両と同一の車線上における前記車両の前方を走行する前走車両を追い越すか否かを決定させることを含み、前記複数の評価関数のそれぞれは、前記認識された前記車両の周辺状況を示す複数の指標値のうち一つが入力されると評価値を出力するものであり、事前環境において強化学習によって出力傾向が調整されたものである。 (8): The program according to another aspect of the present invention causes a computer mounted on the vehicle to recognize the surrounding conditions of the vehicle and control the acceleration / deceleration and steering of the vehicle without depending on the operation of the occupants of the vehicle. The control is to overtake the preceding vehicle traveling in front of the vehicle on the same lane as the vehicle by comparing the value obtained by integrating the plurality of evaluation values output by the plurality of evaluation functions with the threshold value. Each of the plurality of evaluation functions, including determining whether or not, outputs an evaluation value when one of a plurality of index values indicating the recognized surrounding conditions of the vehicle is input. , The output tendency is adjusted by reinforcement learning in the pre-environment.

上記（１）～（８）の態様によれば、処理負荷の増大を抑制しつつ、車両に前走車両を追い越させる制御を好適に行うことができる。 According to the above aspects (1) to (8), it is possible to suitably control the vehicle to overtake the vehicle in front while suppressing the increase in the processing load.

実施形態に係る車両制御装置を利用した車両システム１の構成図である。It is a block diagram of the vehicle system 1 using the vehicle control device which concerns on embodiment. 第１制御部１２０および第２制御部１６０の機能構成図である。It is a functional block diagram of the 1st control unit 120 and the 2nd control unit 160. 追い越し制御が行われる場面と各種定義について説明するための図である。It is a figure for demonstrating the scene where overtaking control is performed, and various definitions. 自車両Ｍが前走車両ｍＡを追い越す動作について説明するための図である。It is a figure for demonstrating the operation which the own vehicle M overtakes the preceding vehicle mA. 追い越し制御部１４２による追い越し判断処理の概要を示す図である。It is a figure which shows the outline of the overtaking determination processing by the overtaking control unit 142. 事前環境における評価関数の学習処理について説明するための図である。It is a figure for demonstrating the learning process of the evaluation function in a pre-environment. 追い越し制御部１４２により実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process executed by the overtaking control unit 142.

以下、図面を参照し、本発明の車両制御装置、車両制御方法、およびプログラムの実施形態について説明する。 Hereinafter, embodiments of the vehicle control device, vehicle control method, and program of the present invention will be described with reference to the drawings.

［全体構成］
図１は、実施形態に係る車両制御装置を利用した車両システム１の構成図である。車両システム１が搭載される車両は、例えば、二輪や三輪、四輪等の車両であり、その駆動源は、ディーゼルエンジンやガソリンエンジンなどの内燃機関、電動機、或いはこれらの組み合わせである。電動機は、内燃機関に連結された発電機による発電電力、或いは二次電池や燃料電池の放電電力を使用して動作する。 [overall structure]
FIG. 1 is a configuration diagram of a vehicle system 1 using the vehicle control device according to the embodiment. The vehicle on which the vehicle system 1 is mounted is, for example, a vehicle such as a two-wheeled vehicle, a three-wheeled vehicle, or a four-wheeled vehicle, and the drive source thereof is an internal combustion engine such as a diesel engine or a gasoline engine, an electric motor, or a combination thereof. The electric motor operates by using the power generated by the generator connected to the internal combustion engine or the discharge power of the secondary battery or the fuel cell.

車両システム１は、例えば、カメラ１０と、レーダ装置１２と、ＬＩＤＡＲ（Light Detection and Ranging）１４と、物体認識装置１６と、通信装置２０と、ＨＭＩ（Human Machine Interface）３０と、車両センサ４０と、ナビゲーション装置５０と、ＭＰＵ（Map Positioning Unit）６０と、運転操作子８０と、自動運転制御装置１００と、走行駆動力出力装置２００と、ブレーキ装置２１０と、ステアリング装置２２０とを備える。これらの装置や機器は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続される。なお、図１に示す構成はあくまで一例であり、構成の一部が省略されてもよいし、更に別の構成が追加されてもよい。 The vehicle system 1 includes, for example, a camera 10, a radar device 12, a LIDAR (Light Detection and Ranging) 14, an object recognition device 16, a communication device 20, an HMI (Human Machine Interface) 30, and a vehicle sensor 40. , A navigation device 50, an MPU (Map Positioning Unit) 60, a driving controller 80, an automatic driving control device 100, a traveling driving force output device 200, a braking device 210, and a steering device 220. These devices and devices are connected to each other by a multiplex communication line such as a CAN (Controller Area Network) communication line, a serial communication line, a wireless communication network, or the like. The configuration shown in FIG. 1 is merely an example, and a part of the configuration may be omitted or another configuration may be added.

カメラ１０は、例えば、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）等の固体撮像素子を利用したデジタルカメラである。カメラ１０は、車両システム１が搭載される車両（以下、自車両Ｍ）の任意の箇所に取り付けられる。前方を撮像する場合、カメラ１０は、フロントウインドシールド上部やルームミラー裏面等に取り付けられる。カメラ１０は、例えば、周期的に繰り返し自車両Ｍの周辺を撮像する。カメラ１０は、ステレオカメラであってもよい。 The camera 10 is, for example, a digital camera using a solid-state image sensor such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor). The camera 10 is attached to an arbitrary position on the vehicle on which the vehicle system 1 is mounted (hereinafter referred to as the own vehicle M). When photographing the front, the camera 10 is attached to the upper part of the front windshield, the back surface of the rear-view mirror, and the like. The camera 10 periodically and repeatedly images the periphery of the own vehicle M, for example. The camera 10 may be a stereo camera.

レーダ装置１２は、自車両Ｍの周辺にミリ波などの電波を放射すると共に、物体によって反射された電波（反射波）を検出して少なくとも物体の位置（距離および方位）を検出する。レーダ装置１２は、自車両Ｍの任意の箇所に取り付けられる。レーダ装置１２は、ＦＭ－ＣＷ（Frequency Modulated Continuous Wave）方式によって物体の位置および速度を検出してもよい。 The radar device 12 radiates radio waves such as millimeter waves around the own vehicle M, and also detects radio waves (reflected waves) reflected by the object to detect at least the position (distance and direction) of the object. The radar device 12 is attached to an arbitrary position of the own vehicle M. The radar device 12 may detect the position and velocity of the object by the FM-CW (Frequency Modulated Continuous Wave) method.

ＬＩＤＡＲ１４は、自車両Ｍの周辺に光（或いは光に近い波長の電磁波）を照射し、散乱光を測定する。ＬＩＤＡＲ１４は、発光から受光までの時間に基づいて、対象までの距離を検出する。照射される光は、例えば、パルス状のレーザー光である。ＬＩＤＡＲ１４は、自車両Ｍの任意の箇所に取り付けられる。 The LIDAR 14 irradiates the periphery of the own vehicle M with light (or an electromagnetic wave having a wavelength close to that of light) and measures the scattered light. The LIDAR 14 detects the distance to the target based on the time from light emission to light reception. The emitted light is, for example, a pulsed laser beam. The LIDAR 14 is attached to any position on the own vehicle M.

物体認識装置１６は、カメラ１０、レーダ装置１２、およびＬＩＤＡＲ１４のうち一部または全部による検出結果に対してセンサフュージョン処理を行って、物体の位置、種類、速度などを認識する。物体認識装置１６は、認識結果を自動運転制御装置１００に出力する。物体認識装置１６は、カメラ１０、レーダ装置１２、およびＬＩＤＡＲ１４の検出結果をそのまま自動運転制御装置１００に出力してよい。車両システム１から物体認識装置１６が省略されてもよい。 The object recognition device 16 performs sensor fusion processing on the detection results of a part or all of the camera 10, the radar device 12, and the LIDAR 14, and recognizes the position, type, speed, and the like of the object. The object recognition device 16 outputs the recognition result to the automatic operation control device 100. The object recognition device 16 may output the detection results of the camera 10, the radar device 12, and the LIDAR 14 to the automatic operation control device 100 as they are. The object recognition device 16 may be omitted from the vehicle system 1.

通信装置２０は、例えば、セルラー網やＷｉ－Ｆｉ網、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＤＳＲＣ（Dedicated Short Range Communication）などを利用して、自車両Ｍの周辺に存在する他車両と通信し、或いは無線基地局を介して各種サーバ装置と通信する。 The communication device 20 communicates with another vehicle existing in the vicinity of the own vehicle M by using, for example, a cellular network, a Wi-Fi network, Bluetooth (registered trademark), DSRC (Dedicated Short Range Communication), or wirelessly. Communicates with various server devices via the base station.

ＨＭＩ３０は、自車両Ｍの乗員に対して各種情報を提示すると共に、乗員による入力操作を受け付ける。ＨＭＩ３０は、各種表示装置、スピーカ、ブザー、タッチパネル、スイッチ、キーなどを含む。 The HMI 30 presents various information to the occupants of the own vehicle M and accepts input operations by the occupants. The HMI 30 includes various display devices, speakers, buzzers, touch panels, switches, keys and the like.

車両センサ４０は、自車両Ｍの速度を検出する車速センサ、加速度を検出する加速度センサ、鉛直軸回りの角速度を検出するヨーレートセンサ、自車両Ｍの向きを検出する方位センサ等を含む。 The vehicle sensor 40 includes a vehicle speed sensor that detects the speed of the own vehicle M, an acceleration sensor that detects the acceleration, a yaw rate sensor that detects the angular velocity around the vertical axis, an orientation sensor that detects the direction of the own vehicle M, and the like.

ナビゲーション装置５０は、例えば、ＧＮＳＳ（Global Navigation Satellite System）受信機５１と、ナビＨＭＩ５２と、経路決定部５３とを備える。ナビゲーション装置５０は、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの記憶装置に第１地図情報５４を保持している。ＧＮＳＳ受信機５１は、ＧＮＳＳ衛星から受信した信号に基づいて、自車両Ｍの位置を特定する。自車両Ｍの位置は、車両センサ４０の出力を利用したＩＮＳ（Inertial Navigation System）によって特定または補完されてもよい。ナビＨＭＩ５２は、表示装置、スピーカ、タッチパネル、キーなどを含む。ナビＨＭＩ５２は、前述したＨＭＩ３０と一部または全部が共通化されてもよい。経路決定部５３は、例えば、ＧＮＳＳ受信機５１により特定された自車両Ｍの位置（或いは入力された任意の位置）から、ナビＨＭＩ５２を用いて乗員により入力された目的地までの経路（以下、地図上経路）を、第１地図情報５４を参照して決定する。第１地図情報５４は、例えば、道路を示すリンクと、リンクによって接続されたノードとによって道路形状が表現された情報である。第１地図情報５４は、道路の曲率やＰＯＩ（Point Of Interest）情報などを含んでもよい。地図上経路は、ＭＰＵ６０に出力される。ナビゲーション装置５０は、地図上経路に基づいて、ナビＨＭＩ５２を用いた経路案内を行ってもよい。ナビゲーション装置５０は、例えば、乗員の保有するスマートフォンやタブレット端末等の端末装置の機能によって実現されてもよい。ナビゲーション装置５０は、通信装置２０を介してナビゲーションサーバに現在位置と目的地を送信し、ナビゲーションサーバから地図上経路と同等の経路を取得してもよい。 The navigation device 50 includes, for example, a GNSS (Global Navigation Satellite System) receiver 51, a navigation HMI 52, and a routing unit 53. The navigation device 50 holds the first map information 54 in a storage device such as an HDD (Hard Disk Drive) or a flash memory. The GNSS receiver 51 identifies the position of the own vehicle M based on the signal received from the GNSS satellite. The position of the own vehicle M may be specified or complemented by an INS (Inertial Navigation System) using the output of the vehicle sensor 40. The navigation HMI 52 includes a display device, a speaker, a touch panel, keys, and the like. The navigation HMI 52 may be partially or wholly shared with the above-mentioned HMI 30. The route determination unit 53, for example, has a route from the position of the own vehicle M (or an arbitrary position input) specified by the GNSS receiver 51 to the destination input by the occupant using the navigation HMI 52 (hereinafter,). The route on the map) is determined with reference to the first map information 54. The first map information 54 is, for example, information in which a road shape is expressed by a link indicating a road and a node connected by the link. The first map information 54 may include road curvature, POI (Point Of Interest) information, and the like. The route on the map is output to MPU60. The navigation device 50 may provide route guidance using the navigation HMI 52 based on the route on the map. The navigation device 50 may be realized by, for example, the function of a terminal device such as a smartphone or a tablet terminal owned by an occupant. The navigation device 50 may transmit the current position and the destination to the navigation server via the communication device 20 and acquire a route equivalent to the route on the map from the navigation server.

ＭＰＵ６０は、例えば、推奨車線決定部６１を含み、ＨＤＤやフラッシュメモリなどの記憶装置に第２地図情報６２を保持している。推奨車線決定部６１は、ナビゲーション装置５０から提供された地図上経路を複数のブロックに分割し（例えば、車両進行方向に関して１００［ｍ］毎に分割し）、第２地図情報６２を参照してブロックごとに推奨車線を決定する。推奨車線決定部６１は、左から何番目の車線を走行するといった決定を行う。推奨車線決定部６１は、地図上経路に分岐箇所が存在する場合、自車両Ｍが、分岐先に進行するための合理的な経路を走行できるように、推奨車線を決定する。 The MPU 60 includes, for example, a recommended lane determination unit 61, and holds the second map information 62 in a storage device such as an HDD or a flash memory. The recommended lane determination unit 61 divides the route on the map provided by the navigation device 50 into a plurality of blocks (for example, divides the route into 100 [m] units with respect to the vehicle traveling direction), and refers to the second map information 62. Determine the recommended lane for each block. The recommended lane determination unit 61 determines which lane to drive from the left. When a branch point exists on the route on the map, the recommended lane determination unit 61 determines the recommended lane so that the own vehicle M can travel on a reasonable route to proceed to the branch destination.

第２地図情報６２は、第１地図情報５４よりも高精度な地図情報である。第２地図情報６２は、例えば、車線の中央の情報あるいは車線の境界の情報等を含んでいる。また、第２地図情報６２には、道路情報、交通規制情報、住所情報（住所・郵便番号）、施設情報、電話番号情報などが含まれてよい。第２地図情報６２は、通信装置２０が他装置と通信することにより、随時、アップデートされてよい。 The second map information 62 is map information with higher accuracy than the first map information 54. The second map information 62 includes, for example, information on the center of the lane, information on the boundary of the lane, and the like. Further, the second map information 62 may include road information, traffic regulation information, address information (address / zip code), facility information, telephone number information, and the like. The second map information 62 may be updated at any time by the communication device 20 communicating with another device.

運転操作子８０は、例えば、アクセルペダル、ブレーキペダル、シフトレバー、ステアリングホイール、異形ステア、ジョイスティックその他の操作子を含む。運転操作子８０には、操作量あるいは操作の有無を検出するセンサが取り付けられており、その検出結果は、自動運転制御装置１００、もしくは、走行駆動力出力装置２００、ブレーキ装置２１０、およびステアリング装置２２０のうち一部または全部に出力される。 The driving controller 80 includes, for example, an accelerator pedal, a brake pedal, a shift lever, a steering wheel, a deformed steering wheel, a joystick, and other controls. A sensor for detecting the amount of operation or the presence or absence of operation is attached to the operation controller 80, and the detection result is the automatic operation control device 100, or the traveling driving force output device 200, the brake device 210, and the steering device. It is output to a part or all of 220.

自動運転制御装置１００は、例えば、第１制御部１２０と、第２制御部１６０とを備える。第１制御部１２０と第２制御部１６０は、それぞれ、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予め自動運転制御装置１００のＨＤＤやフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭなどの着脱可能な記憶媒体に格納されており、記憶媒体（非一過性の記憶媒体）がドライブ装置に装置に装着されることで自動運転制御装置１００のＨＤＤやフラッシュメモリにインストールされてもよい。自動運転制御装置１００は「車両制御装置」の一例であり、行動計画生成部１４０と第２制御部１６０を合わせたものが「運転制御部」の一例である。 The automatic operation control device 100 includes, for example, a first control unit 120 and a second control unit 160. The first control unit 120 and the second control unit 160 are each realized by executing a program (software) by a hardware processor such as a CPU (Central Processing Unit). In addition, some or all of these components are hardware (circuits) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), and GPU (Graphics Processing Unit). It may be realized by the part; including circuitry), or it may be realized by the cooperation of software and hardware. The program may be stored in advance in a storage device (a storage device including a non-transient storage medium) such as an HDD or a flash memory of the automatic operation control device 100, or is detachable such as a DVD or a CD-ROM. It is stored in a storage medium, and the storage medium (non-transient storage medium) may be installed in the HDD or flash memory of the automatic operation control device 100 by being attached to the device in the drive device. The automatic driving control device 100 is an example of a "vehicle control device", and a combination of an action plan generation unit 140 and a second control unit 160 is an example of a "driving control unit".

図２は、第１制御部１２０および第２制御部１６０の機能構成図である。第１制御部１２０は、例えば、認識部１３０と、行動計画生成部１４０とを備える。認識部１３０は、指標値導出部１３２を備える。行動計画生成部１４０は、追い越し制御部１４２を備える。指標値導出部１３２と追い越し制御部１４２の機能については後述する。第１制御部１２０は、例えば、ＡＩ（Artificial Intelligence；人工知能）による機能と、予め与えられたモデルによる機能とを並行して実現する。例えば、「交差点を認識する」機能は、ディープラーニング等による交差点の認識と、予め与えられた条件（パターンマッチング可能な信号、道路標示などがある）に基づく認識とが並行して実行され、双方に対してスコア付けして総合的に評価することで実現されてよい。これによって、自動運転の信頼性が担保される。 FIG. 2 is a functional configuration diagram of the first control unit 120 and the second control unit 160. The first control unit 120 includes, for example, a recognition unit 130 and an action plan generation unit 140. The recognition unit 130 includes an index value derivation unit 132. The action plan generation unit 140 includes an overtaking control unit 142. The functions of the index value derivation unit 132 and the overtaking control unit 142 will be described later. The first control unit 120, for example, realizes a function by AI (Artificial Intelligence) and a function by a model given in advance in parallel. For example, the function of "recognizing an intersection" is executed in parallel with the recognition of an intersection by deep learning or the like and the recognition based on predetermined conditions (there are signals that can be matched with patterns, road markings, etc.). It may be realized by scoring and comprehensively evaluating. This ensures the reliability of autonomous driving.

認識部１３０は、カメラ１０、レーダ装置１２、およびＬＩＤＡＲ１４から物体認識装置１６を介して入力された情報に基づいて、自車両Ｍの周辺にある物体の位置、および速度、加速度等の状態を認識する。物体の位置は、例えば、自車両Ｍの代表点（重心や駆動軸中心など）を原点とした絶対座標上の位置として認識され、制御に使用される。物体の位置は、その物体の重心やコーナー等の代表点で表されてもよいし、表現された領域で表されてもよい。物体の「状態」とは、物体の加速度やジャーク、あるいは「行動状態」（例えば車線変更をしている、またはしようとしているか否か）を含んでもよい。 The recognition unit 130 recognizes the position, speed, acceleration, and other states of objects around the own vehicle M based on the information input from the camera 10, the radar device 12, and the LIDAR 14 via the object recognition device 16. do. The position of the object is recognized as, for example, a position on absolute coordinates with the representative point (center of gravity, center of drive axis, etc.) of the own vehicle M as the origin, and is used for control. The position of the object may be represented by a representative point such as the center of gravity or a corner of the object, or may be represented by a represented area. The "state" of an object may include the object's acceleration, jerk, or "behavioral state" (eg, whether it is changing lanes or is about to change lanes).

また、認識部１３０は、例えば、自車両Ｍが走行している車線（走行車線）を認識する。例えば、認識部１３０は、第２地図情報６２から得られる道路区画線のパターン（例えば実線と破線の配列）と、カメラ１０によって撮像された画像から認識される自車両Ｍの周辺の道路区画線のパターンとを比較することで、走行車線を認識する。なお、認識部１３０は、道路区画線に限らず、道路区画線や路肩、縁石、中央分離帯、ガードレールなどを含む走路境界（道路境界）を認識することで、走行車線を認識してもよい。この認識において、ナビゲーション装置５０から取得される自車両Ｍの位置やＩＮＳによる処理結果が加味されてもよい。また、認識部１３０は、一時停止線、障害物、赤信号、料金所、その他の道路事象を認識する。 Further, the recognition unit 130 recognizes, for example, the lane (traveling lane) in which the own vehicle M is traveling. For example, the recognition unit 130 has a road lane marking pattern (for example, an arrangement of a solid line and a broken line) obtained from the second map information 62 and a road lane marking around the own vehicle M recognized from the image captured by the camera 10. By comparing with the pattern of, the driving lane is recognized. The recognition unit 130 may recognize the traveling lane by recognizing not only the road marking line but also the running road boundary (road boundary) including the road marking line, the shoulder, the median strip, the guardrail, and the like. .. In this recognition, the position of the own vehicle M acquired from the navigation device 50 and the processing result by the INS may be added. The recognition unit 130 also recognizes stop lines, obstacles, red lights, tollhouses, and other road events.

認識部１３０は、走行車線を認識する際に、走行車線に対する自車両Ｍの位置や姿勢を認識する。認識部１３０は、例えば、自車両Ｍの基準点の車線中央からの乖離、および自車両Ｍの進行方向の車線中央を連ねた線に対してなす角度を、走行車線に対する自車両Ｍの相対位置および姿勢として認識してもよい。これに代えて、認識部１３０は、走行車線のいずれかの側端部（道路区画線または道路境界）に対する自車両Ｍの基準点の位置などを、走行車線に対する自車両Ｍの相対位置として認識してもよい。 When recognizing a traveling lane, the recognition unit 130 recognizes the position and posture of the own vehicle M with respect to the traveling lane. The recognition unit 130 determines, for example, the deviation of the reference point of the own vehicle M from the center of the lane and the angle formed with respect to the line connecting the center of the lane in the traveling direction of the own vehicle M with respect to the relative position of the own vehicle M with respect to the traveling lane. And may be recognized as a posture. Instead, the recognition unit 130 recognizes the position of the reference point of the own vehicle M with respect to any side end portion (road division line or road boundary) of the traveling lane as the relative position of the own vehicle M with respect to the traveling lane. You may.

行動計画生成部１４０は、原則的には推奨車線決定部６１により決定された推奨車線を走行し、更に、自車両Ｍの周辺状況に対応できるように、自車両Ｍが自動的に（運転者の操作に依らずに）将来走行する目標軌道を生成する。目標軌道は、例えば、速度要素を含んでいる。例えば、目標軌道は、自車両Ｍの到達すべき地点（軌道点）を順に並べたものとして表現される。軌道点は、道なり距離で所定の走行距離（例えば数［ｍ］程度）ごとの自車両Ｍの到達すべき地点であり、それとは別に、所定のサンプリング時間（例えば０コンマ数［ｓｅｃ］程度）ごとの目標速度および目標加速度が、目標軌道の一部として生成される。また、軌道点は、所定のサンプリング時間ごとの、そのサンプリング時刻における自車両Ｍの到達すべき位置であってもよい。この場合、目標速度や目標加速度の情報は軌道点の間隔で表現される。 In principle, the action plan generation unit 140 travels in the recommended lane determined by the recommended lane determination unit 61, and the own vehicle M automatically (driver) so as to be able to respond to the surrounding conditions of the own vehicle M. Generate a target track to travel in the future (regardless of the operation of). The target trajectory contains, for example, a velocity element. For example, the target track is expressed as an arrangement of points (track points) to be reached by the own vehicle M in order. The track point is a point to be reached by the own vehicle M for each predetermined mileage (for example, about several [m]) along the road, and separately, for a predetermined sampling time (for example, about 0 comma number [sec]). ) Target velocity and target acceleration are generated as part of the target trajectory. Further, the track point may be a position to be reached by the own vehicle M at the sampling time for each predetermined sampling time. In this case, the information of the target velocity and the target acceleration is expressed by the interval of the orbital points.

行動計画生成部１４０は、目標軌道を生成するにあたり、自動運転のイベントを設定してよい。自動運転のイベントには、定速走行イベント、低速追従走行イベント、車線変更イベント、分岐イベント、合流イベント、テイクオーバーイベントなどがある。行動計画生成部１４０は、起動させたイベントに応じた目標軌道を生成する。 The action plan generation unit 140 may set an event for automatic driving when generating a target trajectory. Autonomous driving events include constant speed driving events, low speed following driving events, lane change events, branching events, merging events, takeover events, and the like. The action plan generation unit 140 generates a target trajectory according to the activated event.

第２制御部１６０は、行動計画生成部１４０によって生成された目標軌道を、予定の時刻通りに自車両Ｍが通過するように、走行駆動力出力装置２００、ブレーキ装置２１０、およびステアリング装置２２０を制御する。 The second control unit 160 sets the traveling driving force output device 200, the brake device 210, and the steering device 220 so that the own vehicle M passes the target trajectory generated by the action plan generation unit 140 at the scheduled time. Control.

図２に戻り、第２制御部１６０は、例えば、取得部１６２と、速度制御部１６４と、操舵制御部１６６とを備える。取得部１６２は、行動計画生成部１４０により生成された目標軌道（軌道点）の情報を取得し、メモリ（不図示）に記憶させる。速度制御部１６４は、メモリに記憶された目標軌道に付随する速度要素に基づいて、走行駆動力出力装置２００またはブレーキ装置２１０を制御する。操舵制御部１６６は、メモリに記憶された目標軌道の曲がり具合に応じて、ステアリング装置２２０を制御する。速度制御部１６４および操舵制御部１６６の処理は、例えば、フィードフォワード制御とフィードバック制御との組み合わせにより実現される。一例として、操舵制御部１６６は、自車両Ｍの前方の道路の曲率に応じたフィードフォワード制御と、目標軌道からの乖離に基づくフィードバック制御とを組み合わせて実行する。 Returning to FIG. 2, the second control unit 160 includes, for example, an acquisition unit 162, a speed control unit 164, and a steering control unit 166. The acquisition unit 162 acquires the information of the target trajectory (orbit point) generated by the action plan generation unit 140 and stores it in a memory (not shown). The speed control unit 164 controls the traveling driving force output device 200 or the brake device 210 based on the speed element associated with the target trajectory stored in the memory. The steering control unit 166 controls the steering device 220 according to the degree of bending of the target trajectory stored in the memory. The processing of the speed control unit 164 and the steering control unit 166 is realized by, for example, a combination of feedforward control and feedback control. As an example, the steering control unit 166 executes a combination of feedforward control according to the curvature of the road in front of the own vehicle M and feedback control based on the deviation from the target track.

走行駆動力出力装置２００は、車両が走行するための走行駆動力（トルク）を駆動輪に出力する。走行駆動力出力装置２００は、例えば、内燃機関、電動機、および変速機などの組み合わせと、これらを制御するＥＣＵ（Electronic Control Unit）とを備える。ＥＣＵは、第２制御部１６０から入力される情報、或いは運転操作子８０から入力される情報に従って、上記の構成を制御する。 The traveling driving force output device 200 outputs a traveling driving force (torque) for the vehicle to travel to the drive wheels. The traveling driving force output device 200 includes, for example, a combination of an internal combustion engine, a motor, a transmission, and the like, and an ECU (Electronic Control Unit) that controls them. The ECU controls the above configuration according to the information input from the second control unit 160 or the information input from the operation controller 80.

ブレーキ装置２１０は、例えば、ブレーキキャリパーと、ブレーキキャリパーに油圧を伝達するシリンダと、シリンダに油圧を発生させる電動モータと、ブレーキＥＣＵとを備える。ブレーキＥＣＵは、第２制御部１６０から入力される情報、或いは運転操作子８０から入力される情報に従って電動モータを制御し、制動操作に応じたブレーキトルクが各車輪に出力されるようにする。ブレーキ装置２１０は、運転操作子８０に含まれるブレーキペダルの操作によって発生させた油圧を、マスターシリンダを介してシリンダに伝達する機構をバックアップとして備えてよい。なお、ブレーキ装置２１０は、上記説明した構成に限らず、第２制御部１６０から入力される情報に従ってアクチュエータを制御して、マスターシリンダの油圧をシリンダに伝達する電子制御式油圧ブレーキ装置であってもよい。 The brake device 210 includes, for example, a brake caliper, a cylinder that transmits hydraulic pressure to the brake caliper, an electric motor that generates hydraulic pressure in the cylinder, and a brake ECU. The brake ECU controls the electric motor according to the information input from the second control unit 160 or the information input from the operation controller 80 so that the brake torque corresponding to the braking operation is output to each wheel. The brake device 210 may include a mechanism for transmitting the hydraulic pressure generated by the operation of the brake pedal included in the operation operator 80 to the cylinder via the master cylinder as a backup. The brake device 210 is not limited to the configuration described above, and is an electronically controlled hydraulic brake device that controls the actuator according to the information input from the second control unit 160 to transmit the hydraulic pressure of the master cylinder to the cylinder. May be good.

ステアリング装置２２０は、例えば、ステアリングＥＣＵと、電動モータとを備える。電動モータは、例えば、ラックアンドピニオン機構に力を作用させて転舵輪の向きを変更する。ステアリングＥＣＵは、第２制御部１６０から入力される情報、或いは運転操作子８０から入力される情報に従って、電動モータを駆動し、転舵輪の向きを変更させる。 The steering device 220 includes, for example, a steering ECU and an electric motor. The electric motor, for example, exerts a force on the rack and pinion mechanism to change the direction of the steering wheel. The steering ECU drives the electric motor according to the information input from the second control unit 160 or the information input from the operation controller 80, and changes the direction of the steering wheel.

［追い越し制御］
以下、自動運転制御装置１００が行う追い越し制御について説明する。行動計画生成部１４０の追い越し制御部１４２は、例えば、定速走行イベントが実行されているときに作動する。追い越し制御部１４２は、自車両Ｍと同一の車線上における自車両Ｍの前方（直前）を走行する前走車両を追い越すために、自車両Ｍを一度隣接車線に車線変更し、自車両Ｍが前走車両であった車両よりも十分に前に出たときに元の車線に戻らせる。「直前」とは、間に車両が無いという意味である。 [Overtaking control]
Hereinafter, the overtaking control performed by the automatic operation control device 100 will be described. The overtaking control unit 142 of the action plan generation unit 140 operates, for example, when a constant-speed traveling event is being executed. The overtaking control unit 142 once changes the lane of the own vehicle M to the adjacent lane in order to overtake the preceding vehicle traveling in front of (immediately before) the own vehicle M on the same lane as the own vehicle M, and the own vehicle M changes the lane. Return to the original lane when the vehicle is sufficiently ahead of the vehicle in front. "Immediately before" means that there is no vehicle in between.

図３は、追い越し制御が行われる場面と各種定義について説明するための図である。図中、ｍＡは前走車両、Ｌ１は自車両Ｍが居る車線、Ｌ２は隣接車線であり、且つ自車両Ｍが前走車両ｍＡを追い越す際に車線変更する先の車線である。Ｖ_Ｍは自車両Ｍの速度、Ｖ_ｍＡは前走車両ｍＡの速度である。また、ｍＢは隣接車線Ｌ２における自車両Ｍよりも前方に居る隣接車線前方車両であり、ｍＣは隣接車線Ｌ２における自車両Ｍよりも後方に居る隣接車線後方車両であり、ｍＤは後続車両である。「後方」とは、例えば、代表点（重心など）の位置関係を道路延在方向（図中Ｘ方向）に関して比較した結果、後方であることをいう。また、自車両Ｍの後端部よりも前端部が道路延在方向（図中Ｘ方向）に関して後方に居ることを意味してもよい。前方に関しても同様である。なお、自車両Ｍが居る車線の両側に隣接車線がある場合、何らかの手法で車線変更先の車線を決定し、決定した車線を隣接車線としてよい。 FIG. 3 is a diagram for explaining a scene in which overtaking control is performed and various definitions. In the figure, mA is the vehicle in front, L1 is the lane in which the vehicle M is located, L2 is the adjacent lane, and the vehicle M is the lane to which the vehicle M changes lanes when overtaking the vehicle mA in front. VM is the speed of the own vehicle _M , and V _mA is the speed of the preceding vehicle mA. Further, mB is an adjacent lane forward vehicle in front of the own vehicle M in the adjacent lane L2, mC is an adjacent lane rear vehicle in the adjacent lane L2 behind the own vehicle M, and mD is a following vehicle. .. The term "rear" means, for example, the rear as a result of comparing the positional relationship of representative points (center of gravity, etc.) with respect to the road extension direction (X direction in the figure). Further, it may mean that the front end portion of the own vehicle M is behind the rear end portion in the road extending direction (X direction in the figure). The same applies to the front. If there are adjacent lanes on both sides of the lane in which the own vehicle M is located, the lane to be changed may be determined by some method, and the determined lane may be used as the adjacent lane.

認識部１３０の指標値導出部１３２は、前述した認識処理の結果として、（１）自車両Ｍの目標速度Ｖ_Ｍ＊から前走車両ｍＡの速度Ｖ_ｍＡを差し引いた速度差ΔＶ（＝Ｖ_Ｍ＊－Ｖ_ｍＡ）、（２）自車両Ｍと前走車両ｍＡとの距離Ｄ_ＭＡ、（３）自車両Ｍと隣接車線後方車両ｍＣとの距離Ｄ_ＭＣなどの指標値を導出する。目標速度Ｖ_Ｍ＊は、法定速度、乗員がセットしたセット速度などに基づいて決定される。距離Ｄ_ＭＡは自車両Ｍの前端部と前走車両ｍＡの後端部との道路延在方向に関する距離であるものとしたが、自車両Ｍの代表点と前走車両ｍＡの代表点との道路延在方向に関する距離であってもよい。また、距離Ｄ_ＭＣは自車両Ｍの後端部と前走車両ｍＣの前端部との道路延在方向に関する距離であるものとしたが、自車両Ｍの代表点と前走車両ｍＣの代表点との道路延在方向に関する距離であってもよい。指標値導出部１３２は、前述した（１）～（３）の指標値のうち一部のみを導出してもよく、追い越し時の自車両Ｍの環境を示すものである限り、別の指標値を導出してもよい。例えば、指標値導出部１３２は、自車両Ｍと隣接車線前方車両ｍＢとの距離、前走車両ｍＡの速度の変動度合い（例えば分散）や道路幅方向（図中Ｙ方向）に関する位置（横位置）などを指標値として導出してもよい。 As a result of the above-mentioned recognition process, the index value deriving unit 132 of the recognition unit 130 has (1) a speed difference ΔV (= VM) obtained by subtracting the speed _VmA of the preceding vehicle _mA from the target speed VM * of the own vehicle _M. * -V _mA ), (2) Distance D _MA between own vehicle M and preceding vehicle mA, (3) Distance D _MC between own vehicle M and vehicle behind the adjacent lane D MC, etc. are derived. The target speed _VM * is determined based on the legal speed, the set speed set by the occupant, and the like. The distance D _MA is the distance between the front end of the own vehicle M and the rear end of the front vehicle mA in the road extending direction, but the representative point of the own vehicle M and the representative point of the front vehicle mA It may be a distance related to the direction of extension of the road. Further, the distance DMC is the distance between the rear end of the own vehicle M and the front end of the front vehicle _mC in the road extending direction, but the representative point of the own vehicle M and the representative point of the front vehicle mC. It may be a distance related to the direction of extension of the road. The index value derivation unit 132 may derive only a part of the index values of (1) to (3) described above, and is another index value as long as it indicates the environment of the own vehicle M at the time of overtaking. May be derived. For example, the index value derivation unit 132 is a position (horizontal position) relating to the distance between the own vehicle M and the vehicle in front of the adjacent lane mB, the degree of fluctuation in the speed of the preceding vehicle mA (for example, dispersion), and the road width direction (Y direction in the figure). ) Etc. may be derived as an index value.

追い越し制御部１４２は、複数の指標値のそれぞれを、指標値ごとの評価関数に入力することで、指標値ごとの評価値を取得する。そして、追い越し制御部１４２は、複数の評価値を統合した値を閾値と比較することで、前走車両ｍＡを追い越すか否かを決定する。より具体的に、追い越し制御部１４２は、複数の評価値の積が閾値以上である場合に、前走車両ｍＡを追い越すことを決定し、自車両Ｍに前走車両ｍＡを追い越させる。 The overtaking control unit 142 acquires the evaluation value for each index value by inputting each of the plurality of index values into the evaluation function for each index value. Then, the overtaking control unit 142 determines whether or not to overtake the preceding vehicle mA by comparing the value obtained by integrating the plurality of evaluation values with the threshold value. More specifically, the overtaking control unit 142 determines to overtake the preceding vehicle mA when the product of a plurality of evaluation values is equal to or greater than the threshold value, and causes the own vehicle M to overtake the preceding vehicle mA.

図４は、自車両Ｍが前走車両ｍＡを追い越す動作について説明するための図である。追い越し制御部１４２は、まず、自車両Ｍを隣接車線Ｌ２における隣接車線後方車両ｍＣの前方に車線変更させ、次いで、元の車線Ｌ１における前走車両ｍＡの前方に車線変更させる。この際の速度制御や操舵制御に関しては本発明の中核をなさないため説明を省略する。係る動作を実行するかどうか判断する際に、自動運転制御装置１００は、少なくとも前走車両ｍＡ、隣接車線前方車両ｍＢ、隣接車線後方車両ｍＣの位置や速度、それらの変化に応じた判断をしなければならず、乗員に違和感を感じさせないようにするためには、高度な判断が要求される。 FIG. 4 is a diagram for explaining an operation in which the own vehicle M overtakes the preceding vehicle mA. The overtaking control unit 142 first causes the own vehicle M to change lanes in front of the vehicle behind the adjacent lane mC in the adjacent lane L2, and then changes lanes in front of the preceding vehicle mA in the original lane L1. Since the speed control and steering control at this time do not form the core of the present invention, the description thereof will be omitted. When determining whether or not to execute such an operation, the automatic driving control device 100 makes a determination according to at least the positions and speeds of the vehicle in front, the vehicle in front of the adjacent lane, mB, and the vehicle behind in the adjacent lane, mC, and their changes. In order not to make the occupants feel uncomfortable, a high degree of judgment is required.

そこで、追い越し制御部１４２は、事前環境において強化学習によって出力傾向が調整された複数の評価関数を用いて、前走車両ｍＡを追い越すか否かを決定する。事前環境とは、自車両Ｍにソフトウェアが適用する前の、シミュレーション環境あるいは実車を使用したテスト環境をいう。複数の評価関数のそれぞれは、指標値が入力されると評価値を出力するものである。図５は、追い越し制御部１４２による追い越し判断処理の概要を示す図である。追い越し制御部１４２は、速度差ΔＶ、距離Ｄ_ＭＡ、距離Ｄ_ＭＣなどの指標値１、２、３、…ｎを、それぞれに対応する評価関数１、２、３、…、ｎに入力し、得られた評価値Ｅ１～Ｅｎの総乗を計算する。そして、総乗が閾値Ｔｈ以上である場合に、前走車両ｍＡを追い越すことを決定する。それぞれの評価関数の出力する値は、自車両Ｍの乗員の追い越す欲求を示す値である。それぞれの評価関数の出力する値は、例えばゼロから１の間の値をとるように調整されている。 Therefore, the overtaking control unit 142 determines whether or not to overtake the preceding vehicle mA by using a plurality of evaluation functions whose output tendency is adjusted by reinforcement learning in the prior environment. The pre-environment refers to a simulation environment or a test environment using an actual vehicle before the software is applied to the own vehicle M. Each of the plurality of evaluation functions outputs the evaluation value when the index value is input. FIG. 5 is a diagram showing an outline of an overtaking determination process by the overtaking control unit 142. The overtaking control unit 142 inputs index values 1, 2, 3, ... n such as speed difference ΔV, distance D _MA , distance D _MC , etc. into the corresponding evaluation functions 1, 2, 3, ..., N, respectively. The total power of the obtained evaluation values E1 to En is calculated. Then, when the total product is equal to or higher than the threshold value Th, it is determined to overtake the preceding vehicle mA. The value output by each evaluation function is a value indicating the desire of the occupant of the own vehicle M to overtake. The value output by each evaluation function is adjusted to take a value between zero and one, for example.

［事前環境］
図６は、事前環境における評価関数の学習処理について説明するための図である。事前環境における評価関数の学習処理は、任意のコンピュータ装置によって実行される。なお、シミュレーション環境に代えて、テストコースなどの実環境が利用されてもよい。まずＡｃｔｏｒ３００として、追い越し制御部１４２と同等の機能（状態に含まれる指標値を評価関数に入力して得られる評価値の積が閾値以上である場合に追い越し実行すると判断するもの）を用意する。Ａｃｔｏｒ３００の判断結果を反映するシミュレーション環境の出力である状態は、Ａｃｔｏｒ３００と、報酬関数３１０とに入力される。 [Preliminary environment]
FIG. 6 is a diagram for explaining the learning process of the evaluation function in the pre-environment. The learning process of the evaluation function in the pre-environment is executed by any computer device. An actual environment such as a test course may be used instead of the simulation environment. First, as the Actor 300, a function equivalent to that of the overtaking control unit 142 (which is determined to be overtaken when the product of the evaluation values obtained by inputting the index value included in the state is equal to or greater than the threshold value) is prepared. The state which is the output of the simulation environment reflecting the determination result of the Actor 300 is input to the Actor 300 and the reward function 310.

報酬関数３１０は、状態に含まれる指標値（Ａｃｔｏｒに入力される指標値と一部または全部が重複してもよいし、全く別の指標値でもよい）を、指標値ごとの報酬関数に入力し、複数の報酬関数値Ｒ１、Ｒ２、Ｒ３、…、Ｒｍを得る。報酬関数３１０における指標値は、例えば、自車両Ｍの目標速度Ｖ_Ｍ＊から実速度Ｖ_Ｍを差し引いた速度差ΔＶ＃（＝Ｖ_Ｍ＊－Ｖ_Ｍ）、自車両Ｍと前走車両ｍＡとの距離Ｄ_ＭＡ、自車両Ｍと後続車両ｍＤとの距離Ｄ_ＭＤなどであり、それぞれの報酬関数は、自車両Ｍの巡航の快適さを表す報酬関数値を出力する。そして、報酬関数値Ｒ１、Ｒ２、Ｒ３、…、Ｒｍの総乗が統合報酬関数値ＲｔｏｔａｌとしてＣｒｉｔｉｃ３２０に出力される。 The reward function 310 inputs an index value included in the state (a part or all of the index value input to the actor may overlap or may be a completely different index value) into the reward function for each index value. Then, a plurality of reward function values R1, R2, R3, ..., Rm are obtained. The index values in the reward function 310 are, for example, a speed difference ΔV # (= _VM * _-VM ) obtained by subtracting the actual speed VM from the target speed VM * of the own vehicle _M , and the own vehicle _M and the preceding vehicle mA. The distance D _MA , the distance D _MD between the own vehicle M and the following vehicle mD, and the like, and each reward function outputs a reward function value representing the cruising comfort of the own vehicle M. Then, the infinite product of the reward function values R1, R2, R3, ..., Rm is output to Critic320 as the integrated reward function value Rtotal.

Ｃｒｉｔｉｃ３２０は、入力された統合報酬関数値Ｒｔｏｔａｌに基づいて、将来の報酬和を推定し、Ａｃｔｏｒ３００の行動の適切さを評価し、評価結果に基づいてＡｃｔｏｒ３００の評価関数のパラメータを調整する。そして、十分に調整が完了した段階のＡｃｔｏｒ３００の評価関数が、追い越し制御部１４２が用いる評価関数として自車両Ｍに搭載される。Ｃｒｉｔｉｃ３２０は、係る処理のためにＤＮＮ（Deep Neural Network）を用いた処理を行う。Ｃｒｉｔｉｃ３２０の処理は、例えば公知のＡｃｔｏｒ－Ｃｒｉｔｉｃ法に基づくものであり、詳細な説明を省略する。係る処理は、処理負荷が高いものであるが、車両に搭載する前の事前環境において行われるため、車両に搭載した場合の処理負荷等の問題は生じない。また、ＤＮＮ等の高度な処理を行うことで、状態価値を正確に推定することができる。なお、Ａｃｔｏｒ－Ｃｒｉｔｉｃ法に代えて、他の強化学習の手法が用いられてもよい。 The Critic 320 estimates the future sum of rewards based on the input integrated reward function value Rtotal, evaluates the appropriateness of the behavior of the Actor 300, and adjusts the parameters of the evaluation function of the Actor 300 based on the evaluation result. Then, the evaluation function of the Actor 300 at the stage where the adjustment is sufficiently completed is mounted on the own vehicle M as the evaluation function used by the overtaking control unit 142. The Critic320 performs a process using a DNN (Deep Neural Network) for such a process. The treatment of Critic320 is based on, for example, a known Actor-Critic method, and detailed description thereof will be omitted. Although such processing has a high processing load, since it is performed in a pre-environment before it is mounted on a vehicle, there is no problem such as a processing load when it is mounted on a vehicle. In addition, the state value can be accurately estimated by performing advanced processing such as DNN. In addition, instead of the Actor-Critic method, another reinforcement learning method may be used.

［処理フロー］
図７は、追い越し制御部１４２により実行される処理の流れの一例を示すフローチャートである。本フローチャートの処理は、例えば、定速走行イベントが実行されている間、繰り返し実行される。 [Processing flow]
FIG. 7 is a flowchart showing an example of the flow of processing executed by the overtaking control unit 142. The processing of this flowchart is, for example, repeatedly executed while the constant speed running event is being executed.

まず、追い越し制御部１４２は、指標値導出部１３２から複数の指標値を取得し（ステップＳ１００）、複数の指標値をそれぞれに対応する評価関数に入力する（ステップＳ１０２）。そして、追い越し制御部１４２は、評価値の総乗が閾値Ｔｈ以上であるか否かを判定する（ステップＳ１０４）。評価値の総乗が閾値Ｔｈ以上である場合、追い越し制御部１４２は、自車両Ｍに前走車両ｍＡを追い越させる（ステップＳ１０６） First, the overtaking control unit 142 acquires a plurality of index values from the index value derivation unit 132 (step S100), and inputs the plurality of index values to the corresponding evaluation functions (step S102). Then, the overtaking control unit 142 determines whether or not the infinite product of the evaluation values is equal to or greater than the threshold value Th (step S104). When the total product of the evaluation values is equal to or greater than the threshold value Th, the overtaking control unit 142 causes the own vehicle M to overtake the preceding vehicle mA (step S106).

以上説明した実施形態によれば、車両（Ｍ）の周辺状況を認識する認識部（１３０、１３２）と、車両の乗員の操作に依らずに車両の加減速および操舵を制御する運転制御部（１４０、１６０）と、を備え、運転制御部は、複数の評価関数が出力する複数の評価値を統合した値を閾値と比較することで、車両と同一の車線上における車両の前方を走行する前走車両を追い越すか否かを決定する追い越し制御部（１４２）を備え、複数の評価関数のそれぞれは、認識部により認識された車両の周辺状況を示す複数の指標値が入力されると評価値を出力するものであり、事前環境において強化学習によって出力傾向が調整されたものであるため、処理負荷の増大を抑制しつつ、車両に前走車両を追い越させるか否かを適切に判断することができる。 According to the embodiment described above, the recognition unit (130, 132) that recognizes the peripheral situation of the vehicle (M) and the operation control unit that controls acceleration / deceleration and steering of the vehicle without depending on the operation of the occupant of the vehicle (13, 132). 140, 160), and the operation control unit travels in front of the vehicle on the same lane as the vehicle by comparing a value obtained by integrating a plurality of evaluation values output by a plurality of evaluation functions with a threshold value. It is equipped with an overtaking control unit (142) that determines whether or not to overtake the vehicle in front, and each of the plurality of evaluation functions evaluates that a plurality of index values indicating the surrounding conditions of the vehicle recognized by the recognition unit are input. Since the value is output and the output tendency is adjusted by reinforcement learning in the pre-environment, it is appropriately determined whether or not the vehicle overtakes the vehicle in front while suppressing the increase in the processing load. be able to.

上記説明した実施形態は、以下のように表現することができる。
プログラムを記憶した記憶装置と、
ハードウェアプロセッサと、を備え、
前記ハードウェアプロセッサが前記記憶装置に記憶されたプログラムを実行することにより、
車両の周辺状況を認識し、
車両の乗員の操作に依らずに前記車両の加減速および操舵を制御し、
前記制御することは、複数の評価関数が出力する複数の評価値を統合した値を閾値と比較することで、前記車両と同一の車線上における前記車両の前方を走行する前走車両を追い越すか否かを決定することを含み、
前記複数の評価関数のそれぞれは、
前記認識部により認識された前記車両の周辺状況を示す複数の指標値が入力されると評価値を出力するものであり、
事前環境において強化学習によって出力傾向が調整されたものである、
ように構成されている、車両制御装置。 The embodiment described above can be expressed as follows.
A storage device that stores the program and
With a hardware processor,
By executing the program stored in the storage device by the hardware processor.
Recognize the surrounding situation of the vehicle,
It controls the acceleration / deceleration and steering of the vehicle without depending on the operation of the vehicle occupants.
The control is to overtake the preceding vehicle traveling in front of the vehicle on the same lane as the vehicle by comparing the value obtained by integrating the plurality of evaluation values output by the plurality of evaluation functions with the threshold value. Including deciding whether or not
Each of the above-mentioned evaluation functions
When a plurality of index values indicating the surrounding conditions of the vehicle recognized by the recognition unit are input, the evaluation value is output.
The output tendency is adjusted by reinforcement learning in the pre-environment.
A vehicle control unit configured as such.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１００自動運転制御装置
１３０認識部
１３２指標値導出部
１４０行動計画生成部
１４２追い越し制御部 100 Automatic operation control device 130 Recognition unit 132 Index value derivation unit 140 Action plan generation unit 142 Overtaking control unit

Claims

A recognition unit that recognizes the surrounding conditions of the vehicle and
It is equipped with an operation control unit that controls acceleration / deceleration and steering of the vehicle without depending on the operation of the occupants of the vehicle.
Whether the driving control unit overtakes the preceding vehicle traveling in front of the vehicle on the same lane as the vehicle by comparing the value obtained by integrating the plurality of evaluation values output by the plurality of evaluation functions with the threshold value. Equipped with an overtaking control unit that determines whether or not
Each of the above-mentioned evaluation functions
When one of a plurality of index values indicating the surrounding conditions of the vehicle recognized by the recognition unit is input, the evaluation value is output.
The output tendency is adjusted by reinforcement learning in the pre-environment.
Vehicle control unit.

The overtaking control unit determines to overtake the preceding vehicle when the product of the plurality of evaluation values is equal to or greater than the threshold value, and causes the vehicle to overtake the preceding vehicle.
The vehicle control device according to claim 1.

The plurality of index values include the difference between the target speed of the vehicle and the speed of the vehicle in front.
The vehicle control device according to claim 1 or 2.

The plurality of index values include the distance between the vehicle and the vehicle in front.
The vehicle control device according to any one of claims 1 to 3.

The plurality of index values include the distance between the vehicle and an adjacent rear vehicle that is behind the vehicle in the adjacent lane to which the vehicle changes lanes when overtaking.
The vehicle control device according to any one of claims 1 to 4.

Each of the plurality of evaluation functions has its output tendency adjusted by the Actor-Critic method.
The vehicle control device according to any one of claims 1 to 5.

The vehicle control device,
Recognize the surrounding situation of the vehicle,
It controls the acceleration / deceleration and steering of the vehicle without depending on the operation of the vehicle occupants.
The control is to overtake the preceding vehicle traveling in front of the vehicle on the same lane as the vehicle by comparing the value obtained by integrating the plurality of evaluation values output by the plurality of evaluation functions with the threshold value. Including deciding whether or not
Each of the above-mentioned evaluation functions
When one of a plurality of index values indicating the surrounding conditions of the recognized vehicle is input, an evaluation value is output.
The output tendency is adjusted by reinforcement learning in the pre-environment.
Vehicle control method.

On the computer installed in the vehicle,
Recognize the surrounding situation of the vehicle
The acceleration / deceleration and steering of the vehicle are controlled without depending on the operation of the vehicle occupants.
The control is to overtake the preceding vehicle traveling in front of the vehicle on the same lane as the vehicle by comparing the value obtained by integrating the plurality of evaluation values output by the plurality of evaluation functions with the threshold value. Including letting you decide whether or not
Each of the above-mentioned evaluation functions
When one of a plurality of index values indicating the surrounding conditions of the recognized vehicle is input, an evaluation value is output.
The output tendency is adjusted by reinforcement learning in the pre-environment.
program.