JP2018147103A

JP2018147103A - Model learning device, controlled variable calculation device, and program

Info

Publication number: JP2018147103A
Application number: JP2017039566A
Authority: JP
Inventors: 優司伊藤; Yuji Ito
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2017-03-02
Filing date: 2017-03-02
Publication date: 2018-09-20

Abstract

PROBLEM TO BE SOLVED: To obtain control rules for controlling an automatic control vehicle in a traffic environment including a manual control vehicle and an automatic control vehicle.SOLUTION: On the basis of multiple pieces of learning data that are collected beforehand and represent state parameters of manual control vehicles and state parameters of automatic control vehicles, for each state space representing states of multiple vehicles including the manual control vehicles and the automatic control vehicles, a state space model learning part 122 learns a model for predicting a variation in the state variables of the multiple vehicles. On the basis of the model that is learnt by the state space model learning part 122 regarding the state space, for each state space, a control rule design part 124 then generates control rules for controlling the state parameter of each of the automatic control vehicles in the multiple vehicles.SELECTED DRAWING: Figure 1

Description

本発明は、モデル学習装置、制御量算出装置、及びプログラムに関する。 The present invention relates to a model learning device, a control amount calculation device, and a program.

従来、周囲の車両の挙動を予測しながら、モデル予測制御を用いて自車両の制御を実行する車線変更最適化装置が知られている（特許文献１）。 2. Description of the Related Art Conventionally, a lane change optimization device that performs control of a host vehicle using model predictive control while predicting the behavior of surrounding vehicles is known (Patent Document 1).

また、複数台の移動体のフォーメーションに基づく状態空間方程式を定義し、それを用いて衝突回避を行う制御則をオフラインで設計する方法が知られている（非特許文献１）。 Further, a method is known in which a state space equation based on the formation of a plurality of moving bodies is defined, and a control law for performing collision avoidance using the equation is designed off-line (Non-Patent Document 1).

また、合流場面において、複数車両がスムーズに合流できるような制御入力を最適制御則として求めて実行する方法が知られている（非特許文献２）。 In addition, a method is known in which a control input that allows a plurality of vehicles to smoothly merge in an merging scene is obtained and executed as an optimal control law (Non-Patent Document 2).

特開２００６−２０９４５５号公報JP 2006-209455 A

宮崎達也、鷹羽浄嗣、「障害物回避を考慮した移動ロボット群のフォーメーション制御」、システム制御情報学会論文誌、２０１５Miyazaki Tatsuya, Takaba Joruri, “Formation Control of Mobile Robots Considering Obstacle Avoidance”, Journal of System Control Information Society, 2015 J.Rios-Torres，A.A.Mailkopoulos and P.Pisu,"Online Optimal Control of Connected Vehicles for Efficient Traffic Flow at Merging Roads," in Proc. of the IEEE 18th International Conference on Intelligent Transportation Systems,2015.J. Rios-Torres, A.A. Mailkopoulos and P. Pisu, "Online Optimal Control of Connected Vehicles for Efficient Traffic Flow at Merging Roads," in Proc. Of the IEEE 18th International Conference on Intelligent Transportation Systems, 2015.

しかし、特許文献１では、モデル予測制御を用いた手法であるため、車両数が増加した際にリアルタイム性が失われる問題が生じる。 However, since Patent Document 1 is a method using model predictive control, there arises a problem that the real-time property is lost when the number of vehicles increases.

非特許文献１では、オフラインの制御設計によりリアルタイム性は高いが、全ての移動体を制御できることが前提となっており、手動運転車のような制御できない予測困難な移動体が含まれると制御困難という問題が生じる。 In Non-Patent Document 1, real-time performance is high due to off-line control design, but it is premised on that all moving bodies can be controlled, and it is difficult to control if a non-predictable moving body such as a manually operated vehicle is included. The problem arises.

非特許文献２では、最適制御の枠組みに基づいているため所望の目標に対する制御性能は期待されるが、全ての車両を制御できる事が前提となっており、手動運転車のような制御できない予測困難な移動体が含まれると制御困難という問題が生じる。 Non-Patent Document 2 is based on an optimal control framework, so control performance with respect to a desired target is expected. However, it is assumed that all vehicles can be controlled. When a difficult moving body is included, there arises a problem that it is difficult to control.

本発明では、上記問題点を解決するために成されたものであり、手動制御移動体と自動制御移動体とを含む交通環境において、自動制御移動体を制御するための制御則を得ることができるモデル学習装置及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-mentioned problems, and it is possible to obtain a control law for controlling an automatically controlled moving body in a traffic environment including a manually controlled moving body and an automatically controlled moving body. An object of the present invention is to provide a model learning apparatus and program that can be used.

また、手動制御移動体と自動制御移動体とを含む交通環境において、自動制御移動体を制御するための制御量を得ることができる制御量算出装置及びプログラムを提供することを目的とする。 It is another object of the present invention to provide a control amount calculation apparatus and program capable of obtaining a control amount for controlling an automatic control mobile body in a traffic environment including a manual control mobile body and an automatic control mobile body.

上記目的を達成するために、本発明のモデル学習装置は、手手動制御による移動体である手動制御移動体と自動制御による移動体である自動制御移動体とを含む複数の移動体の状態を表す状態空間毎に、予め収集された前記手動制御移動体の状態変数及び前記自動制御移動体の状態変数を表す複数の学習データに基づいて、前記複数の移動体の状態変数の変化量を予測するためのモデルを学習する学習部と、前記状態空間毎に、前記学習部によって前記状態空間について学習された前記モデルに基づいて、前記複数の移動体のうちの前記自動制御移動体の各々の状態変数を制御するための制御則を生成する制御則設計部と、を含んで構成されている。 In order to achieve the above object, the model learning device of the present invention is configured to detect the states of a plurality of moving bodies including a manually controlled moving body that is a moving body by manual manual control and an automatically controlled moving body that is a moving body by automatic control. For each state space to represent, the amount of change in the state variables of the plurality of moving bodies is predicted based on a plurality of learning data representing the state variables of the manually controlled moving body and the state variables of the automatic control moving body that are collected in advance. A learning unit for learning a model for performing each of the automatic control moving bodies of the plurality of moving bodies based on the model learned for the state space by the learning unit for each state space And a control law design unit that generates a control law for controlling the state variables.

また、本発明のプログラムは、コンピュータを、手動制御による移動体である手動制御移動体と自動制御による移動体である自動制御移動体とを含む複数の移動体の状態を表す状態空間毎に、予め収集された前記手動制御移動体の状態変数及び前記自動制御移動体の状態変数を表す複数の学習データに基づいて、前記複数の移動体の状態変数の変化量を予測するためのモデルを学習する学習部、及び前記状態空間毎に、前記学習部によって前記状態空間について学習された前記モデルに基づいて、前記複数の移動体のうちの前記自動制御移動体の各々の状態変数を制御するための制御則を生成する制御則設計部として機能させるためのプログラムである。 Further, the program of the present invention is a computer for each state space that represents the state of a plurality of moving bodies including a manually controlled moving body that is a manually controlled moving body and an automatically controlled moving body that is a automatically controlled moving body. Learning a model for predicting the amount of change in the state variables of the plurality of moving bodies based on a plurality of learning data representing the state variables of the manually controlled moving bodies and the state variables of the automatic control moving bodies collected in advance. And for each state space, to control each state variable of the automatically controlled mobile body among the plurality of mobile bodies based on the model learned about the state space by the learning section This is a program for functioning as a control law design unit that generates the control law.

本発明によれば、学習部によって、手動制御による移動体である手動制御移動体と自動制御による移動体である自動制御移動体とを含む複数の移動体の状態を表す状態空間毎に、予め収集された手動制御移動体の状態変数及び自動制御移動体の状態変数を表す複数の学習データに基づいて、複数の移動体の状態変数の変化量を予測するためのモデルが学習される。また、制御則設計部によって、状態空間毎に、学習部によって状態空間について学習されたモデルに基づいて、複数の移動体のうちの自動制御移動体の各々の状態変数を制御するための制御則を生成される。 According to the present invention, the learning unit performs in advance for each state space representing the states of a plurality of moving bodies including a manually controlled moving body that is a manually controlled moving body and an automatically controlled moving body that is a automatically controlled moving body. A model for predicting the amount of change in the state variables of the plurality of mobile bodies is learned based on the collected plurality of learning data representing the state variables of the manually controlled mobile bodies and the state variables of the automatically controlled mobile bodies. Also, a control law for controlling each state variable of the automatically controlled mobile body among the plurality of mobile bodies based on the model learned about the state space by the learning section for each state space by the control law design section. Is generated.

このように、複数の移動体の状態を表す状態空間毎に、予め収集された手動制御移動体の状態変数及び自動制御移動体の状態変数を表す複数の学習データに基づいて、複数の移動体の状態変数の変化量を予測するためのモデルを学習し、状態空間毎に、状態空間について学習されたモデルに基づいて、複数の移動体のうちの自動制御移動体の制御則を生成することにより、手動制御移動体と自動制御移動体とを含む交通環境において、自動制御移動体を制御するための制御則を得ることができる。 In this way, for each state space representing the states of the plurality of moving bodies, a plurality of moving bodies are obtained based on the plurality of learning data representing the state variables of the manually controlled moving bodies and the state variables of the automatic control moving bodies that are collected in advance. Learning a model for predicting the amount of change in state variables, and generating a control law for an automatically controlled mobile body among multiple mobile bodies based on the model learned for the state space for each state space Thus, it is possible to obtain a control law for controlling the automatic control moving body in a traffic environment including the manual control moving body and the automatic control moving body.

また、本発明の前記学習部は、前記学習データとして、前記移動体の各々についての、前記移動体の状態変数と、前記移動体の周辺の前記移動体の状態変数と、前記移動体の状態変数の変化量とを用いて、前記モデルを学習するようにすることができる。 Further, the learning unit of the present invention, as the learning data, the state variable of the moving body, the state variable of the moving body around the moving body, and the state of the moving body for each of the moving bodies The model can be learned using the amount of change of the variable.

また、前記制御則設計部は、前記学習部によって学習された前記モデル、及び前記制御則を用いて表される評価関数を最適化する前記制御則を、前記状態変数のべき乗多項式、前記状態変数のＣｈｅｂｙｓｈｅｖ多項式、又は前記状態変数を入力とするニューラルネットワークを基底関数とするＨＪＢ方程式に従って求めることにより、前記制御則を生成するようにすることができる。 In addition, the control law design unit may include the model learned by the learning unit and the control law for optimizing an evaluation function expressed using the control law, a power polynomial of the state variable, the state variable The control law can be generated by obtaining the Chebyshev polynomial or the HJB equation using the neural network having the state variable as an input as a basis function.

また、前記学習部は、複数の前記移動体の状態変数を入力とし、複数の前記移動体の状態変数の変化量の平均と、複数の前記移動体の状態変数の変化量の分散とを出力するガウス過程モデルを、前記モデルとして学習するようにすることができる。 Further, the learning unit receives a plurality of state variables of the moving bodies as inputs, and outputs an average of the amount of change of the state variables of the plurality of moving bodies and a variance of the amount of change of the state variables of the plurality of moving bodies. A Gaussian process model to be learned can be learned as the model.

また、前記状態変数は、前記移動体の位置及び速度を含むようにすることができる。 Further, the state variable may include a position and a speed of the moving body.

本発明の制御量算出装置は、複数の移動体の各々についての状態変数と、手動制御移動体及び自動制御移動体の何れであるかの情報とを取得すると共に、前記複数の移動体が属する状態空間を特定する状態取得部と、前記状態取得部によって特定された前記状態空間に応じて、上記のモデル学習装置によって生成された前記制御則を取得する制御則取得部と、前記複数の移動体の状態変数と、前記制御則取得部によって取得された前記制御則とに基づいて、前記複数の移動体のうちの自動制御移動体に対する状態変数の制御量を算出する制御量算出部と、を含んで構成されている。これにより、手動制御移動体と自動制御移動体とを含む交通環境においても、自動制御移動体を制御するための制御量を得ることができる。 The control amount calculation apparatus according to the present invention acquires a state variable for each of a plurality of moving bodies and information on which of the manually controlled moving body and the automatic control moving body, and the plurality of moving bodies belong to A state acquisition unit that specifies a state space; a control law acquisition unit that acquires the control law generated by the model learning device according to the state space specified by the state acquisition unit; and the plurality of movements Based on the state variable of the body and the control law acquired by the control law acquisition unit, a control amount calculation unit that calculates the control amount of the state variable for the automatic control mobile body among the plurality of mobile bodies, It is comprised including. Thereby, even in a traffic environment including a manually controlled moving body and an automatically controlled moving body, a control amount for controlling the automatically controlled moving body can be obtained.

以上説明したように、本発明のモデル学習装置及びプログラムによれば、複数の移動体の状態を表す状態空間毎に、予め収集された手動制御移動体の状態変数及び自動制御移動体の状態変数を表す複数の学習データに基づいて、複数の移動体の状態変数の変化量を予測するためのモデルを学習し、状態空間毎に、状態空間について学習されたモデルに基づいて、複数の移動体のうちの自動制御移動体の制御則を生成することにより、手動制御移動体と自動制御移動体とを含む交通環境において、自動制御移動体を制御するための制御則を得ることができる、という効果が得られる。 As described above, according to the model learning device and the program of the present invention, the state variable of the manually controlled moving body and the state variable of the automatically controlled moving body collected in advance for each state space representing the state of the plurality of moving bodies. A model for predicting the amount of change in the state variables of a plurality of mobile objects is learned based on a plurality of learning data representing a plurality of mobile objects based on a model learned for the state space for each state space. It is possible to obtain a control law for controlling an automatic control mobile body in a traffic environment including a manual control mobile body and an automatic control mobile body by generating a control law for the automatic control mobile body An effect is obtained.

本発明の制御量算出装置及びプログラムによれば、複数の移動体の各々についての状態変数と、手動制御移動体及び自動制御移動体の何れであるかの情報とを取得すると共に、複数の移動体が属する状態空間を特定し、特定された状態空間に応じて、制御則生成装置によって生成された制御則を取得し、複数の移動体の状態変数と、取得された制御則とに基づいて、複数の移動体のうちの自動制御移動体に対する制御量を算出することにより、手動制御移動体と自動制御移動体とを含む交通環境においても、自動制御移動体を制御するための制御量を得ることができる、という効果が得られる。 According to the control amount calculation device and the program of the present invention, the state variable for each of the plurality of moving bodies and the information on which of the manual control moving body and the automatic control moving body are acquired and the plurality of movements The state space to which the body belongs is specified, the control law generated by the control law generation device is acquired according to the specified state space, and based on the state variables of the plurality of moving bodies and the acquired control law By calculating a control amount for an automatically controlled mobile body among a plurality of mobile bodies, a control amount for controlling the automatically controlled mobile body can be obtained even in a traffic environment including a manually controlled mobile body and an automatically controlled mobile body. The effect that it can obtain is acquired.

本発明の実施の形態の車両制御システムの構成を示すブロック図である。It is a block diagram which shows the structure of the vehicle control system of embodiment of this invention. 本発明の実施の形態の制御則生成装置における学習処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the learning process routine in the control law production | generation apparatus of embodiment of this invention. 本発明の実施の形態の制御則生成装置における制御則設計処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the control law design processing routine in the control law production | generation apparatus of embodiment of this invention. 本発明の実施の形態の制御量算出装置における制御量算出処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the control amount calculation process routine in the control amount calculation apparatus of embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。本発明の実施の形態では、移動体の一例である車両を対象とする場合を例に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the embodiment of the present invention, a case where a vehicle which is an example of a moving body is targeted will be described as an example.

＜車両制御システムの構成＞
本実施の形態に係る車両制御システムについて説明する。図１に示すように、本実施の形態に係る車両制御システム１０は、予め収集された学習データに基づいて、車両を制御するための制御則を生成する制御則生成装置１２と、制御則生成装置１２によって生成された制御則に基づいて、車両に対する制御量を算出する車両制御量算出装置１４とを備えている。制御則生成装置１２と車両制御量算出装置１４とは、例えばインターネット等の通信手段によって接続されている。制御則生成装置１２は、モデル学習装置の一例である。 <Configuration of vehicle control system>
A vehicle control system according to the present embodiment will be described. As shown in FIG. 1, a vehicle control system 10 according to the present embodiment includes a control law generation device 12 that generates a control law for controlling a vehicle based on learning data collected in advance, and a control law generation And a vehicle control amount calculation device 14 that calculates a control amount for the vehicle based on the control law generated by the device 12. The control law generation device 12 and the vehicle control amount calculation device 14 are connected by communication means such as the Internet. The control law generation device 12 is an example of a model learning device.

［制御則生成装置］ [Control law generator]

制御則生成装置１２は、制御則生成装置１２全体の制御を司るＣＰＵ、各処理ルーチンのプログラム等を記憶した記憶媒体としてのＲＯＭ、ワークエリアとしてデータを一時格納するＲＡＭ、及びこれらを接続するバスを含んだコンピュータにより構成されている。このような構成の場合には、各構成要素の機能を実現するためのプログラムをＲＯＭやＨＤＤ等の記憶媒体に記憶しておき、これをＣＰＵが実行することによって、各機能が実現されるようにする。 The control law generation device 12 includes a CPU that controls the control law generation device 12 as a whole, a ROM as a storage medium that stores a program of each processing routine, a RAM that temporarily stores data as a work area, and a bus that connects these It is comprised by the computer containing. In the case of such a configuration, a program for realizing the function of each component is stored in a storage medium such as a ROM or HDD, and each function is realized by executing the program by the CPU. To.

このコンピュータをハードウエアとソフトウエアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、制御則生成装置１２は、図１に示されるように、学習データ収集部１１８と、学習データベース１２０と、状態空間モデル学習部１２２と、制御則設計部１２４と、制御則データベース１２６と、制御部１２８と、通信部１３０とを備える。状態空間モデル学習部１２２は、学習部の一例である。 If this computer is described in terms of functional blocks divided for each function realizing means determined based on hardware and software, the control law generating device 12 includes a learning data collecting unit 118, a learning database, as shown in FIG. 120, a state space model learning unit 122, a control law design unit 124, a control law database 126, a control unit 128, and a communication unit 130. The state space model learning unit 122 is an example of a learning unit.

本実施形態では、手動制御による車両である手動制御車両と自動制御による車両である自動制御車両とを含む複数の車両の状態を表す状態空間毎に、複数の車両の状態変数の変化量を予測するためのモデルを学習する。本実施の形態の状態空間について、以下説明する。 In the present embodiment, the amount of change in the state variables of the plurality of vehicles is predicted for each state space representing the state of the plurality of vehicles including the manually controlled vehicle that is a manually controlled vehicle and the automatically controlled vehicle that is a automatically controlled vehicle. Learn the model to do. The state space of the present embodiment will be described below.

なお、以下では、ベクトルｖ∈Ｒ^ｎの第ｉ成分を［ｖ］_ｉと表記する。また、行列Ｍ∈Ｒ^ｎ×ｍのｉ行ｊ列成分を、［Ｍ］_ｉ，ｊと表記する。また、行列Ｍ∈Ｒ^ｎ×ｍのｉ行ベクトル及びｊ列ベクトルをそれぞれ［Ｍ］_ｉ，・，［Ｍ］_・，ｊと表記する。 In the following, the i-th component of the vector vεR ⁿ is denoted as [v] _i . In addition, an i row and j column component of the matrix MεR ^{n × m} is denoted as [M] _{i, j} . Further, the matrix M∈R ^{n × m} i row vector and j column vector each _{[M] i, ·, [} M] ·, denoted by _j.

本実施形態では、自動制御車両の前後方向に対する制御を行う場合を例に説明する。まず、Ｎ台の車両に対して状態空間を定義する。時刻ｔにおける車両ｉの前後方向の絶対位置をｐ_{ｇｌｏ，ｉ}（ｔ）とし、絶対速度をｖ_{ｇｌｏ，ｉ}（ｔ）とする。そして、Ｎ台の車両の目標とする位置に関する定常状態をｐ_{ｒｅｆ，ｉ}（ｔ）とし、目標とする速度に関する定常状態をｖ_{ｒｅｆ，ｉ}（ｔ）とする。ｉは車両を識別するためのインデックスを表す。 In this embodiment, a case where control is performed in the front-rear direction of the automatic control vehicle will be described as an example. First, a state space is defined for N vehicles. The absolute position in the front-rear direction of the vehicle i at time t is p _{glo, i} (t), and the absolute speed is v _{glo, i} (t). A steady state related to the target positions of the N vehicles is set to p _{ref, i} (t), and a steady state related to the target speed is set to v _{ref, i} (t). i represents an index for identifying the vehicle.

また、目標とする定常状態からの相対位置をｐ_ｉ：＝ｐ_{ｇｌｏ，ｉ}−ｐ_{ｒｅｆ，ｉ}（ｔ）として設定する。また、目標とする定常状態からの相対速度をｖ_ｉ：＝ｖ_{ｇｌｏ，ｉ}−ｖ_{ｒｅｆ，ｉ}（ｔ）として設定する。この場合、状態変数ｘ（ｔ）を以下の式（１）によって定義する。 Further, the relative position from the target steady state is set as p _i : = p _{glo, i} −p _{ref, i} (t). In addition, the target relative speed from the steady state is set as v _i : = v _{glo, i} −v _{ref, i} (t). In this case, the state variable x (t) is defined by the following equation (1).

（１）
(1)

上記式（１）に示されるように、本実施形態においては、車両の位置及び車両の速度が状態変数として含まれる。また、状態空間は２Ｎ次元のユークリッド空間とする。 As shown in the above formula (1), in the present embodiment, the position of the vehicle and the speed of the vehicle are included as state variables. The state space is a 2N-dimensional Euclidean space.

次に、Ｎ台の車両に対して、Ｍ台の車両ｉ_１，ｉ_２,...,ｉ_Ｍが自動制御車両として制御可能とする場合、状態空間方程式は以下の式（２）によって与えられる。 Next, for M vehicles, when M vehicles i ₁ , i ₂ ,..., I _M can be controlled as automatic control vehicles, the state space equation is given by the following equation (2): It is done.

（２）
(2)

ここで、上記式（２）におけるｕ（ｔ）∈Ｒ^Ｍは、Ｍ台の自動制御車両に対する制御量（例えば、自動制御車両に対して与える加速度）である。本実施形態では、上記式（２）におけるパッシブダイナミクスｆ（ｘ（ｔ））Ｒ^２Ｎをガウス過程によりモデル化する。また、上記式（２）における行列Ｂのうち、以下の式（Ａ）に示す各要素は０でない定数（加速度に対する係数）であり、その他の各要素の値は０である。 Here, the formula (2) u (t) in ∈R ^M is a control amount for automatic control vehicle M stage (e.g., the acceleration given to the automatic control vehicle). In the present embodiment, the passive dynamics f (x (t)) R ^2N in the above equation (2) is modeled by a Gaussian process. Further, in the matrix B in the above formula (2), each element shown in the following formula (A) is a non-zero constant (coefficient for acceleration), and the values of the other elements are zero.

（Ａ）
(A)

上記式（Ａ）において、例えば、２番目の車両が手動制御車両であり、１番目の車両と３番目の車両が自動制御車両である場合には、ｉ_１＝１，ｉ_２＝３となる。上記式（２）に示す行列Ｂの各要素には、各車両に対する、速度及び加速度の制御量に対する係数が並べられて格納されている。そのため、１番目の車両１の速度に対する係数と１番目の車両の加速度に対する係数、２番目の車両の速度に対する係数と２番目の車両の加速度に対する係数というように、各車両に対する係数が順に並べられている。そのため、偶数番目の要素が加速度を表すため「ｉ_ｊ」に対して２が乗算されている。 In the above formula (A), for example, when the second vehicle is a manually controlled vehicle and the first vehicle and the third vehicle are automatically controlled vehicles, i ₁ = 1 and i ₂ = 3. . In each element of the matrix B shown in the above equation (2), coefficients for the control amounts of speed and acceleration for each vehicle are arranged and stored. Therefore, a coefficient for each vehicle is arranged in order, such as a coefficient for the speed of the first vehicle 1, a coefficient for the acceleration of the first vehicle, a coefficient for the speed of the second vehicle, and a coefficient for the acceleration of the second vehicle. ing. For this reason, “i _j ” is multiplied by 2 because even-numbered elements represent acceleration.

なお、自動制御車両の挙動はモデル化が比較的容易なため、例えば、以下の式（Ｂ）に示されるように、パッシブダイナミクスｆ（ｘ（ｔ））を予め定義しておいても良い。 Since the behavior of an automatically controlled vehicle is relatively easy to model, for example, passive dynamics f (x (t)) may be defined in advance as shown in the following equation (B).

（Ｂ）
(B)

なお、上記式（Ｂ）では、［ｆ（ｘ（ｔ））］には各車両の速度と加速度が上記式（Ａ）と同様に順に並べられており、奇数番目の要素が速度を表すため上記式（Ｂ）に示されるような関係となる。 In the above formula (B), the speed and acceleration of each vehicle are arranged in order in [f (x (t))] as in the above formula (A), and the odd-numbered elements represent the speed. The relationship is as shown in the above formula (B).

以上の様に、様々なＮ、Ｍ，ｉ_１，ｉ_２,...,ｉ_Ｍ及び目標とする定常状態に応じて、状態空間と状態空間方程式とが定義される。 As described above, the state space and the state space equation are defined according to various N, M, i ₁ , i ₂ ,..., I _M and the target steady state.

学習データ収集部１１８は、手動制御車両と自動制御車両とを含む複数の車両の状態を表す状態空間毎に、予め収集された手動制御車両の状態変数、自動制御車両の状態変数、手動制御車両の状態変数の変化量、及び自動制御車両の状態変数の変化量を表す複数の学習データを収集する。 The learning data collection unit 118 collects the state variable of the manually controlled vehicle, the state variable of the automatically controlled vehicle, the state variable of the automatically controlled vehicle, which are collected in advance for each state space representing the state of a plurality of vehicles including the manually controlled vehicle and the automatically controlled vehicle. A plurality of pieces of learning data representing the amount of change in the state variable and the amount of change in the state variable of the automatically controlled vehicle are collected.

具体的には、学習データ収集部１１８は、状態空間毎に、状態空間に応じた学習データ｛（ｘ^（ｉ），ｆ^（ｉ））｝^Ｄ _ｉ＝１を収集する。本実施形態では、パッシブダイナミクスｆ（ｘ（ｔ））Ｒ^２Ｎをガウス過程によりモデル化するため、ｆ（ｘ^（ｉ））＝ｆ^（ｉ）を満たすＤ組の学習データ｛（ｘ^（ｉ），ｆ^(ｉ)）｝^Ｄ _ｉ＝１を予め用意する。 Specifically, the learning data collection unit 118 collects learning data {(x ⁽ⁱ⁾ , f ⁽ⁱ⁾ )} ^D _{i = 1} corresponding to the state space for each state space. In this embodiment, since passive dynamics f (x (t)) R ^2N is modeled by a Gaussian process, D sets of learning data satisfying f (x ⁽ⁱ⁾ ) = f ⁽ⁱ⁾ {(x ⁽ⁱ⁾ , F ⁽ⁱ⁾ )} ^D _{i = 1} is prepared in advance.

なお、学習データ｛（ｘ^(ｉ)，ｆ^(ｉ)）｝^Ｄ _ｉ＝１の収集方法としては、例えば、過去の交通データからある時刻のＮ台の車両群をピックアップして｛（ｘ^（ｉ），ｆ^（ｉ））｝^Ｄ _ｉ＝１とすればよい。この場合、例えば、定点カメラ（図示省略）によって取得された画像から、複数の車両の各々についての状態変数とパッシブダイナミクスとを含む学習データ｛（ｘ^（ｉ），ｆ^（ｉ））｝^Ｄ _ｉ＝１を取得する。 As a method of collecting learning data {(x ⁽ⁱ⁾ , f ⁽ⁱ⁾ )} ^D _{i = 1} , for example, N vehicle groups at a certain time are picked up from past traffic data and {(x ^{( i)} , f ⁽ⁱ⁾ )} ^D _{i = 1} . In this case, for example, learning data {(x ⁽ⁱ⁾ , f ⁽ⁱ⁾ )} ^D _i including state variables and passive dynamics for each of a plurality of vehicles from an image acquired by a fixed point camera (not shown). _{= 1} is acquired.

なお、学習データ収集部１１８は、学習データとして、車両の各々についての、車両の状態変数と、当該車両の周辺の車両の状態変数と、当該車両の状態変数の変化量とを用いて、ガウス過程モデルを学習するようにしてもよい。 Note that the learning data collection unit 118 uses, as learning data, a gaussian for each vehicle using a vehicle state variable, vehicle state variables around the vehicle, and a change amount of the vehicle state variable. A process model may be learned.

この場合には、例えば、Ｎ台の車両間の相互作用は近傍の車両のみと仮定し、以下の式（３）に示す近似を用いることができる。 In this case, for example, it is assumed that the interaction between the N vehicles is only a nearby vehicle, and the approximation shown in the following equation (3) can be used.

（３）
(3)

上記式（３）に示されるようにＮ台の車両群を近似的に分解するためには、以下の式（４）に示すような、２台の車両又は３台の車両を表す車両群をピックアップしたデータを複数用いる。これにより、学習データ収集部１１８は、Ｎ台の車両群に対する学習データ（ｘ^（ｉ），ｆ^（ｉ））を生成する。 In order to approximately decompose N vehicle groups as shown in the above equation (3), a vehicle group representing two vehicles or three vehicles as shown in the following equation (4) is used. Use multiple picked up data. As a result, the learning data collection unit 118 generates learning data (x ⁽ⁱ⁾ , f ⁽ⁱ⁾ ) for the N vehicle groups.

（４）
(4)

学習データベース１２０には、学習データ収集部１１８によって収集された複数の学習データが格納される。 The learning database 120 stores a plurality of learning data collected by the learning data collection unit 118.

状態空間モデル学習部１２２は、学習データベース１２０に格納された複数の学習データに基づいて、手動制御車両と自動制御車両とを含む複数の車両の状態変数の変化量を予測するためのモデルとして、ガウス過程モデルを学習する。ガウス過程モデルにより、複数の車両の状態変数を入力として、複数の車両の状態変数の変化量の平均と、複数の車両の状態変数の変化量の分散とが得られる。 The state space model learning unit 122 is a model for predicting changes in state variables of a plurality of vehicles including a manually controlled vehicle and an automatically controlled vehicle based on a plurality of learning data stored in the learning database 120. Learn Gaussian process model. According to the Gaussian process model, the state variables of the plurality of vehicles are input, and the average of the amount of change of the state variables of the plurality of vehicles and the variance of the amount of change of the state variables of the plurality of vehicles are obtained.

具体的には、状態空間モデル学習部１２２は、学習データベース１２０に格納されたＤ組のデータ｛（ｘ^（ｉ），ｆ^（ｉ））｝^Ｄ _ｉ＝１を用いて、パッシブダイナミクスｆ（ｘ）を学習する。 Specifically, the state space model learning unit 122 uses the D sets of data {(x ⁽ⁱ⁾ , f ⁽ⁱ⁾ )} ^D _{i = 1} stored in the learning database 120 to use the passive dynamics f (x ).

例えば、状態空間モデル学習部１２２は、参考文献１に記載の技術に基づいて、ガウス過程モデルにより、パッシブダイナミクスｆ（ｘ）をモデル化する。 For example, the state space model learning unit 122 models the passive dynamics f (x) using a Gaussian process model based on the technique described in Reference Document 1.

参考文献１：C. E. Rasmussen and C. K. I. Williams, "Gaussian Processes for Machine Learning", MIT Press, 2014. Reference 1: C. E. Rasmussen and C. K. I. Williams, "Gaussian Processes for Machine Learning", MIT Press, 2014.

ガウス過程の特徴として、（１）モデル化対象の知識は必要とせず予測性能が高い、（２）過学習を生じにくい、（３）予測の期待値だけでなく分散も求まるため、予測の不確実さも情報として得られる、等が挙げられる。 The characteristics of the Gaussian process are as follows: (1) No knowledge of the modeling target is required and the prediction performance is high, (2) Overlearning is less likely to occur, (3) Not only the expected value of prediction but also the variance is obtained. Certainty can also be obtained as information.

なお、学習データ｛（ｘ^（ｉ），ｆ^（ｉ））｝^Ｄ _ｉ＝１に関しては、以下の表記を用いる。 Note that the following notation is used for learning data {(x ⁽ⁱ⁾ , f ⁽ⁱ⁾ )} ^D _{i = 1} .

（５）

（６）
(5)

(6)

状態空間モデル学習部１２２は、上記式（５）及び式（６）に示される学習データ（Ｘ，Ｆ）を用いて、パッシブダイナミクスｆ（ｘ）をガウス過程としてモデル化する。パッシブダイナミクスｆ（ｘ）がガウス過程に従うと仮定した場合、以下の式（７）が成立する。 The state space model learning unit 122 models the passive dynamics f (x) as a Gaussian process using the learning data (X, F) represented by the above equations (5) and (6). When it is assumed that the passive dynamics f (x) follows a Gaussian process, the following equation (7) is established.

（７）
(7)

ここで、上記式（７）におけるＫ_ＸＸ∈Ｒ^Ｄ×Ｄ，Ｋ_Ｘｘ（ｘ）∈Ｒ^Ｄ，Ｋ_ｘｘ（ｘ）∈Ｒはカーネル関数Ｋ（ｘ^（ｉ），ｘ^（ｊ））を要素に持つ行列、ベクトル、スカラーであり、以下の式（８）〜（１０）により定義される。 Here, _{K XX} ∈R ^{D × D} in the formula _{(7), K Xx (x} ) ∈R D, K xx (x) ∈R the kernel function ^{^{K (x (i), x}} (j)) elements Are defined by the following equations (8) to (10).

（８）

（９）

（１０）
(8)

(9)

(10)

上記式（８）におけるδ_ｉｊはクロネッカーデルタ、σ_ｎ∈Ｒはハイパーパラメータである。また、上記式（９）におけるＫ_Ｘｘ（ｘ）及び上記式（１０）におけるＫ_ｘｘ（ｘ）はｘの関数である。カーネル関数Ｋ（ｘ^（ｉ），ｘ^（ｊ））は様々な種類があるが、ここでは以下の式（１１）に示すガウシアンカーネルを採用する。 In the above equation (8), δ _ij is the Kronecker delta, and σ _n εR is a hyperparameter. Also, _K xx in _K Xx in the above formula (9) (x) and the formula (10) (x) is a function of x. There are various types of kernel functions K (x ⁽ⁱ⁾ , x ^(j) ), but here, a Gaussian kernel represented by the following equation (11) is adopted.

（１１）
(11)

ここで、上記式（１１）におけるσ_ｆ∈Ｒ，Σ_ｗ＞０∈Ｒ^ｎ×ｎはハイパーパラメータであり、Σ_ｗは対角行列とする。状態空間モデル学習部１２２は、以下の式（１２）に従って、状態Ｘが与えられたときの［Ｆ］_・，ｋの対数尤度が最大化されるように、ハイパーパラメータθ:＝｛σ_ｎ，σ_ｆ，Σ_ｗ｝を推定する。 Here, σ _f ∈R, Σ _w > 0∈R ^{n × n} in the above equation (11) is a hyperparameter, and Σ _w is a diagonal matrix. The state space model learning unit 122 performs hyperparameter θ: = {σ _n so that the log likelihood of [F] _{·, k} when the state X is given is maximized according to the following equation (12). , Σ _f , Σ _w }.

（１２）
(12)

ただし、ガウス過程でモデル化されるパッシブダイナミクスｆ（ｘ）はｎ次元であるため、スカラー関数に対する最尤推定の枠組みを拡張する必要がある。本実施形態では、パッシブダイナミクスｆ（ｘ）の各成分は独立であると仮定し、かつ全成分で同じハイパーパラメータθを共有するように、以下の式（１３）に示す最大化問題によってハイパーパラメータθを推定する。 However, since the passive dynamics f (x) modeled in the Gaussian process is n-dimensional, it is necessary to extend the framework of maximum likelihood estimation for the scalar function. In this embodiment, it is assumed that each component of the passive dynamics f (x) is independent, and the hyperparameter is expressed by the maximization problem shown in the following equation (13) so that all components share the same hyperparameter θ. Estimate θ.

（１３）
(13)

上記式（１３）によってハイパーパラメータθが推定された後、ある状態ｘに対するパッシブダイナミクスｆ（ｘ）の予測値ｆ＾（ｘ）は、以下の式（１４）〜（１６）に示される平均Ｅ［ｆ＾（ｘ）］と分散Ｖａｒ［f＾（ｘ）］とをもつガウス分布で与えられる。 After the hyperparameter θ is estimated by the above equation (13), the predicted value f ^ (x) of the passive dynamics f (x) for a certain state x is an average E expressed by the following equations (14) to (16). It is given by a Gaussian distribution with [f ^ (x)] and variance Var [f ^ (x)].

（１４）

（１５）

（１６）
(14)

(15)

(16)

ここで、本実施形態では、以下の式（Ｃ）に示されるように、分散は無視できるほど十分小さいものと仮定する。 Here, in the present embodiment, it is assumed that the variance is sufficiently small to be negligible as shown in the following formula (C).

（Ｃ）
(C)

以上により、パッシブダイナミクスｆ（ｘ）がガウス過程の予測平均値Ｅ［ｆ＾（ｘ）］としてモデル化される。 Thus, the passive dynamics f (x) is modeled as the predicted average value E [f ^ (x)] of the Gaussian process.

なお、本実施形態では、ｆ（ｘ）をまとめてモデル化したが、上記式（３）のｆ_ｉ ^〜（ｘ）毎に別々にモデル化を行った後に、上記式（３）に従ってｆ_ｉ ^〜（ｘ）を結合し、ｆ＾（ｘ）を求めても良い。 In the present embodiment has been modeled together f (x), after the model separately for each f _{i ~} of the formula (3) ^(x), f _i according to the equation (3) ^˜ (x) may be combined to obtain f ^ (x).

制御則設計部１２４は、状態空間毎に、状態空間モデル学習部１２２によって状態空間について学習されたパッシブダイナミクスに基づいて、複数の車両のうちの自動制御車両の状態量を制御するための制御則を生成する。 The control law design unit 124 controls the state quantity of the automatically controlled vehicle among the plurality of vehicles based on the passive dynamics learned for the state space by the state space model learning unit 122 for each state space. Is generated.

制御則設計部１２４は、状態空間モデル学習部１２２により得られたパッシブダイナミクスｆ＾（ｘ）が表す状態変数の変化量の平均と行列Ｂとに対して、状態フィードバック制御則ｕ_ＦＢ（ｘ）を設計する。制御則設計部１２４は、例えば、参考文献２に記載の技術に基づいて、オフライン型のモデル予測制御を用いればよい。 The control law design unit 124 calculates the state feedback control law u _FB (x) with respect to the average state variable variation matrix represented by the passive dynamics f ^ (x) obtained by the state space model learning unit 122 and the matrix B. To design. For example, the control law design unit 124 may use off-line model predictive control based on the technique described in Reference 2.

参考文献２：A. Grancharova, J. Kocijan and T. A. Johansen, "Explicit Stochastic Nonlinear Predictive Control Based on Gaussian Process Models.", ECC, 2007. Reference 2: A. Grancharova, J. Kocijan and T. A. Johansen, "Explicit Stochastic Nonlinear Predictive Control Based on Gaussian Process Models.", ECC, 2007.

また、制御則設計部１２４は、参考文献３に記載の技術に基づいて、制御則を生成してもよい。 Further, the control law design unit 124 may generate a control law based on the technique described in Reference Document 3.

参考文献３：R. W. Beard, G. Saridis and J. Wen, "Approximate Solutions to the Time-Invariant Hamilton-Jacobi-Bellman Equation", JOURNAL OF OPTIMIZATION THEORY AND APPLICATION, 1998. Reference 3: R. W. Beard, G. Saridis and J. Wen, "Approximate Solutions to the Time-Invariant Hamilton-Jacobi-Bellman Equation", JOURNAL OF OPTIMIZATION THEORY AND APPLICATION, 1998.

制御則設計部１２４は、上記参考文献３に記載の技術に基づき制御則を生成する場合、Generalized Hamilton Jacobi Bellman（ＧＨＪＢ）方程式を用いて、ある基底関数を用いてＧＨＪＢ方程式の残差が小さくなるようにＧＨＪＢの解を近似的に求めることで制御則を併せて得る。 When the control law design unit 124 generates a control law based on the technique described in Reference Document 3, the residual of the GHJB equation is reduced using a generalized Hamilton Jacobi Bellman (GHJB) equation and a certain basis function. Thus, the control law is also obtained by approximately obtaining the solution of GHJB.

そこで、本実施形態では、制御則設計部１２４は、状態空間モデル学習部１２２によって学習されたパッシブダイナミクス、及び制御則を用いて表される評価関数を最適化する制御則を、所定の基底関数を用いてＨＪＢ方程式に従って求めることにより、制御則を生成する。基底関数としては、状態変数のべき乗多項式、状態変数のＣｈｅｂｙｓｈｅｖ多項式、又は状態変数を入力とするニューラルネットワークが挙げられる。以下、具体的に説明する。 Therefore, in this embodiment, the control law design unit 124 uses a predetermined basis function as a control law that optimizes the passive dynamics learned by the state space model learning unit 122 and the evaluation function expressed using the control law. A control law is generated by obtaining according to the HJB equation using. Examples of the basis function include a power polynomial of a state variable, a Chebyshev polynomial of a state variable, or a neural network that receives a state variable as an input. This will be specifically described below.

以下に、ＨＪＢ方程式を用いたｕ_ＦＢ（ｘ）の設計方法を示す。以下の式（１７）は、入力にアファインな連続時間非線形システムを表す。また、以下の式（１８）及び（１９）に示されるような、評価関数Ｊ（ｕ，ｘ（０））に対する非線形最適制御問題を考える。評価関数Ｊ（ｕ，ｘ（０））は、制御則を用いて表される評価関数の一例である。 The design method of u _FB (x) using the HJB equation is shown below. Equation (17) below represents a continuous-time nonlinear system that is affine to the input. Further, consider a nonlinear optimal control problem for the evaluation function J (u, x (0)) as shown in the following equations (18) and (19). The evaluation function J (u, x (0)) is an example of an evaluation function expressed using a control law.

（１７）

（１８）

（１９）
(17)

(18)

(19)

ここで、上記式（１８）におけるｑ（ｘ）＞０∈Ｒは、状態ｘに対する各時刻のコストを設計するパラメータを表し、Ｒ（ｘ）＞０∈Ｒ^ｍ×ｍは、入力ｕに対する各時刻のコストを設計するパラメータを表す。ｑ（ｘ）及びＲ（ｘ）は、設計者によって事前に設定される。 Here, q (x)> 0∈R in the above equation (18) represents a parameter for designing the cost of each time with respect to the state x, and R (x)> 0∈R ^{m × m} represents each of the inputs u. Represents a parameter for designing the cost of time. q (x) and R (x) are preset by the designer.

この場合、上記式（１９）に示されるように、評価関数Ｊ（ｕ，ｘ（０））を最小化する最適制御入力ｕ（ｘ）の導出を目的とする。上記（１７）に示される、入力にアファインな連続時間非線形システムに対して上記式（１８）を最小化する非線形最適制御則は、以下の式（２０）に示すＨＪＢ方程式に従うことが知られている。 In this case, the purpose is to derive the optimum control input u (x) that minimizes the evaluation function J (u, x (0)) as shown in the above equation (19). It is known that the nonlinear optimal control law that minimizes the above equation (18) for a continuous time nonlinear system that is affine to the input shown in (17) follows the HJB equation shown in the following equation (20). Yes.

（２０）
(20)

上記式（２０）の解Ｖ（ｘ）は値関数と呼ばれ、以下の式（２１）〜（２４）を満たす必要がある。 The solution V (x) of the above equation (20) is called a value function and needs to satisfy the following equations (21) to (24).

（２１）

（２２）

（２３）

（２４）
(21)

(22)

(23)

(24)

上記式（２０）及び上記式（２１），（２２），（２３），（２４）を満たす値関数Ｖ（ｘ）が求まった場合、上記式（１９）の最適制御入力ｕ（ｘ）は以下の式（２５）によって与えられる。 When the value function V (x) satisfying the above equation (20) and the above equations (21), (22), (23), (24) is obtained, the optimum control input u (x) of the above equation (19) is It is given by the following equation (25).

（２５）
(25)

しかし、上記式（２０）を解く事は一般に困難であり、値関数Ｖ（ｘ）が初等関数でない場合も多分に有り得る。そこで、本実施形態では、値関数の候補Ｖ（ｘ，ｖ^〜）を、以下の式（２６）に示す基底関数φ（ｘ）を用いて近似する。 However, it is generally difficult to solve the above equation (20), and there are many cases where the value function V (x) is not an elementary function. Therefore, in this embodiment, the candidate V (x, v ^~) value function is approximated by a basis function phi (x) as shown in formula (26).

（２６）
(26)

ここで、上記式（２６）のｖ^〜∈Ｒ^Ｎｂは、基底関数φ（ｘ）∈Ｒ^Ｎｂに対する係数である。基底関数としては、状態変数のべき乗多項式、状態変数のChebyshev多項式、状態変数を入力とするニューラルネットワークが挙げられる。続いて、上記式（２０）のＨＪＢ方程式に対して、以下の式（２７）に示すように、各状態ｘ∈Ｓ_ｘの二乗誤差を積算した残差ｒ_ＬＳＱ（Ｖ（ｖ^〜））を定義する。 Here, ^{v ~} ∈R ^Nb in the formula (26) is a coefficient for basis functions φ (x) ∈R ^Nb. As the basis function, there are a power polynomial of a state variable, a Chebyshev polynomial of the state variable, and a neural network having the state variable as an input. Subsequently, with respect to HJB equation of the equation (20), as shown in the following equation (27), the respective states X∈S _x residuals _r LSQ the square error obtained by integrating the (V ^(v ~)) Define.

（２７）
(27)

ここで、上記式（２７）におけるＳ_ｘはｘに関する制御領域であり、設計者が事前に設定しておくものとする。そして、以下の式（２８）に示すように、残差ｒ_ＬＳＱ（Ｖ（ｖ^〜））をｖ^〜に関して最小化する。 Here, S _x in the above equation (27) is a control region relating to _{x, and} is set in advance by the designer. Then, as shown in the following equation (28) to minimize with respect to the ^{v ~} residuals _{^{r LSQ (V (v ~)}} ).

（２８）
(28)

適切な初期値ｖ^〜｛０｝を起点に勾配法等でｖ^〜を更新しながら上記式（２８）の最適化問題を解くことによって、高い近似精度でＶ（ｘ，ｖ^〜）が得られる。最終的に得られたパラメータをｖ^〜＊としたとき、上記式（２５）に従って、制御則ｕ_ＦＢ（ｘ）は以下の式（２９）で与えられる。 By solving the optimization problem of the above formula (28) while updating the v ^~ a gradient method or the like appropriate initial values v ^{~ {0}} in origin, V (x, v ^~) is obtained with a high accuracy of approximation . When the finally obtained parameters are v ^{to *} , the control law u _FB (x) is given by the following equation (29) according to the above equation (25).

（２９）
(29)

制御則設計部１２４は、上記式（２９）によって得られた制御則ｕ_ＦＢ（ｘ）を制御則データベース１２６に格納する。 The control law design unit 124 stores the control law u _FB (x) obtained by the above equation (29) in the control law database 126.

制御則データベース１２６には、制御則設計部１２４によって得られた、状態空間毎の制御則ｕ_ＦＢ（ｘ）が格納される。状態空間は、Ｎ，Ｍ，ｉ_１，ｉ_２,...,ｉ_Ｍに応じて異なる。また、状態空間に応じて、目標とする定常状態が予め設定される。 The control law database 126 stores the control law u _FB (x) for each state space obtained by the control law design unit 124. The state space varies depending on N, M, i ₁ , i ₂ ,..., I _M. Further, a target steady state is set in advance according to the state space.

制御部１２８は、後述する車両制御量算出装置１４から送信された制御信号に応じて、制御則データベース１２６に格納された制御則ｕ_ＦＢ（ｘ）を読み出し、通信部１３０を介して車両制御量算出装置１４へ送信する。 The control unit 128 reads the control law u _FB (x) stored in the control law database 126 in accordance with a control signal transmitted from the vehicle control amount calculation device 14 described later, and the vehicle control amount via the communication unit 130. It transmits to the calculation device 14.

通信部１３０は、制御部１２８によって読み出された制御則ｕ_ＦＢ（ｘ）を車両制御量算出装置１４へ送信する。 The communication unit 130 transmits the control law u _FB (x) read by the control unit 128 to the vehicle control amount calculation device 14.

［車両制御量算出装置］ [Vehicle control amount calculation device]

車両制御量算出装置１４は、車両制御量算出装置１４全体の制御を司るＣＰＵ、各処理ルーチンのプログラム等を記憶した記憶媒体としてのＲＯＭ、ワークエリアとしてデータを一時格納するＲＡＭ、及びこれらを接続するバスを含んだコンピュータにより構成されている。このような構成の場合には、各構成要素の機能を実現するためのプログラムをＲＯＭやＨＤＤ等の記憶媒体に記憶しておき、これをＣＰＵが実行することによって、各機能が実現されるようにする。 The vehicle control amount calculation device 14 includes a CPU that controls the vehicle control amount calculation device 14 as a whole, a ROM as a storage medium that stores programs of each processing routine, a RAM that temporarily stores data as a work area, and a connection between them. It is comprised by the computer containing the bus which carries out. In the case of such a configuration, a program for realizing the function of each component is stored in a storage medium such as a ROM or HDD, and each function is realized by executing the program by the CPU. To.

このコンピュータをハードウエアとソフトウエアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、車両制御量算出装置１４は、図１に示されるように、状態取得部１４０と、制御則取得部１４２と、通信部１４４と、制御則記憶部１４６と、制御量算出部１４８とを備える。 When this computer is described in terms of functional blocks divided for each function realizing means determined based on hardware and software, the vehicle control amount calculating device 14 includes a state acquisition unit 140, a control law, as shown in FIG. The acquisition unit 142, the communication unit 144, the control law storage unit 146, and the control amount calculation unit 148 are provided.

状態取得部１４０は、対象とする交通環境下において、複数の車両の各々についての状態変数と、手動制御車両及び自動制御車両の何れであるかの情報とを取得する。また、状態取得部１４０は、車両の台数と、自動制御車両の台数と、各車両が手動制御車両及び自動制御車両の何れであるかの情報とに基づいて、複数の車両が属する状態空間を特定する。 The state acquisition unit 140 acquires a state variable for each of a plurality of vehicles and information indicating which of the manually controlled vehicle and the automatically controlled vehicle is in a target traffic environment. Further, the state acquisition unit 140 determines a state space to which a plurality of vehicles belong based on the number of vehicles, the number of automatically controlled vehicles, and information on whether each vehicle is a manually controlled vehicle or an automatically controlled vehicle. Identify.

具体的には、状態取得部１４０は、対象とする交通環境下において、複数車両の位置及び速度を逐次検出する。複数車両の位置及び速度の検出方法としては、例えば定点カメラ（図示省略）によって撮像された画像から、対象とする交通環境下における複数車両の位置及び速度を検出することができる。または、状態取得部１４０は、車車間通信を用いて取得された複数車両の位置及び速度を収集する。そして、状態取得部１４０は、得られた複数車両の位置や手動制御車両及び自動制御車両の何れであるかの情報から、適切なＮ，Ｍ，ｉ_１，ｉ_２,...,ｉ_Ｍを表す状態空間を特定し、現時刻の状態変数ｘ（ｔ）を求める。 Specifically, the state acquisition unit 140 sequentially detects the positions and speeds of a plurality of vehicles in a target traffic environment. As a method for detecting the position and speed of a plurality of vehicles, for example, the position and speed of the plurality of vehicles in a target traffic environment can be detected from an image captured by a fixed point camera (not shown). Alternatively, the state acquisition unit 140 collects the positions and speeds of a plurality of vehicles acquired using inter-vehicle communication. Then, the state acquisition unit 140 selects appropriate N, M, i ₁ , i ₂ ,..., I _{M based on} the obtained position of the plurality of vehicles and information on whether the vehicle is a manually controlled vehicle or an automatically controlled vehicle. Is determined, and the current state variable x (t) is obtained.

制御則取得部１４２は、状態取得部１４０によって特定された状態空間に応じて、制御則生成装置１２によって生成された制御則を取得する。具体的には、制御則取得部１４２は、状態取得部１４０によって特定された状態空間を含む制御信号を、制御則生成装置１２へ通信部１４４を介して送信する。 The control law acquisition unit 142 acquires the control law generated by the control law generation device 12 according to the state space specified by the state acquisition unit 140. Specifically, the control law acquisition unit 142 transmits a control signal including the state space specified by the state acquisition unit 140 to the control law generation device 12 via the communication unit 144.

制御則生成装置１２の通信部１３０は、車両制御量算出装置１４から送信された制御信号を受信する。制御則生成装置１２の制御部１２８は、通信部１３０により受信された制御信号に含まれる状態空間を表す情報に応じて、制御則データベース１２６に格納された制御則ｕ_ＦＢ（ｘ）を読み出す。具体的には、制御部１２８は、制御信号に含まれる状態空間に対応する制御則ｕ_ＦＢ（ｘ）を制御則データベース１２６から読み出す。そして、制御部１２８は、制御則ｕ_ＦＢ（ｘ）を、通信部１３０を介して車両制御量算出装置１４へ送信する。 The communication unit 130 of the control law generation device 12 receives the control signal transmitted from the vehicle control amount calculation device 14. The control unit 128 of the control law generation device 12 reads the control law u _FB (x) stored in the control law database 126 according to information representing the state space included in the control signal received by the communication unit 130. Specifically, the control unit 128 reads the control law u _FB (x) corresponding to the state space included in the control signal from the control law database 126. Then, the control unit 128 transmits the control law u _FB (x) to the vehicle control amount calculation device 14 via the communication unit 130.

制御則取得部１４２は、制御則生成装置１２から送信された制御則ｕ_ＦＢ（ｘ）を取得し、制御則記憶部１４６へ格納する。また、制御則取得部１４２は、状態取得部１４０によって特定された状態空間に応じて、制御則生成装置１２から、目標とする定常状態を取得する。 The control law acquisition unit 142 acquires the control law u _FB (x) transmitted from the control law generation device 12 and stores the control law u _FB (x) in the control law storage unit 146. Further, the control law acquisition unit 142 acquires a target steady state from the control law generation device 12 according to the state space specified by the state acquisition unit 140.

制御則記憶部１４６には、制御則取得部１４２によって取得された制御則ｕ_ＦＢ（ｘ）が格納される。制御則記憶部１４６に格納される制御則ｕ_ＦＢ（ｘ）は、現在の状態空間に対応する制御則である。 The control law storage unit 146 stores the control law u _FB (x) acquired by the control law acquisition unit 142. The control law u _FB (x) stored in the control law storage unit 146 is a control law corresponding to the current state space.

制御量算出部１４８は、状態取得部１４０によって取得された複数の車両の状態変数と、制御則記憶部１４６に格納された制御則ｕ_ＦＢ（ｘ）とに基づいて、複数の車両のうちの自動制御車両に対する状態変数の制御量を算出する。具体的には、制御量算出部１４８は、状態取得部１４０によって取得された複数の車両の状態変数ｘ（ｔ）を、制御則記憶部１４６に格納された現在の状態空間に対応する制御則ｕ_ＦＢ（ｘ）へ入力し、現時刻における状態変数の制御量ｕ（ｔ）＝ｕ_ＦＢ（ｘ（ｔ）））を算出する。そして、制御量算出部１４８は、現時刻における自動制御車両に対する状態変数の制御量ｕ（ｔ）と、目標とする定常状態とから、加速度を算出して出力する。 Based on the state variables of the plurality of vehicles acquired by the state acquisition unit 140 and the control law u _FB (x) stored in the control law storage unit 146, the control amount calculation unit 148 The control amount of the state variable for the automatically controlled vehicle is calculated. Specifically, the control amount calculation unit 148 uses the control law corresponding to the current state space stored in the control law storage unit 146 for the state variables x (t) of the plurality of vehicles acquired by the state acquisition unit 140. Input to u _FB (x), and the control variable u (t) = u _FB (x (t))) of the state variable at the current time is calculated. Then, the control amount calculation unit 148 calculates and outputs the acceleration from the control amount u (t) of the state variable for the automatically controlled vehicle at the current time and the target steady state.

制御量算出部１４８によって出力された加速度は、複数の自動制御車両へ与えられる。 The acceleration output by the control amount calculation unit 148 is given to a plurality of automatic control vehicles.

＜制御則生成装置１２の作用＞
次に、本実施の形態に係る制御則生成装置１２の作用について説明する。まず、学習データ収集部１１８によって複数の学習データを収集され、学習データベース１２０に格納される。そして、状態空間モデル学習部１２２は、図２に示す学習処理ルーチンを実行する。 <Operation of Control Law Generating Device 12>
Next, the operation of the control law generation device 12 according to the present embodiment will be described. First, a plurality of learning data is collected by the learning data collection unit 118 and stored in the learning database 120. And the state space model learning part 122 performs the learning process routine shown in FIG.

ステップＳ１００において、状態空間モデル学習部１２２は、状態空間毎に、学習データベース１２０に格納された当該状態空間の複数の学習データを取得する。 In step S100, the state space model learning unit 122 acquires a plurality of pieces of learning data of the state space stored in the learning database 120 for each state space.

ステップＳ１０２において、状態空間モデル学習部１２２は、状態空間毎に上記ステップＳ１００で取得された当該状態空間の複数の学習データに基づいて、複数の車両の状態変数の変化量を予測するためのモデルとしてガウス過程モデルを学習する。 In step S102, the state space model learning unit 122 predicts the amount of change in the state variables of the plurality of vehicles based on the plurality of learning data of the state space acquired in step S100 for each state space. As a Gaussian process model.

ステップＳ１０４において、状態空間モデル学習部１２２は、状態空間毎に学習されたガウス過程モデルを結果として出力して、学習処理ルーチンを終了する。 In step S104, the state space model learning unit 122 outputs the Gaussian process model learned for each state space as a result, and ends the learning processing routine.

次に、制御則設計部１２４は、学習処理ルーチンによって状態空間毎に学習されたガウス過程モデルに基づいて、複数の車両のうちの自動制御車両の制御則を生成する。上記学習処理ルーチンが終了すると制御則設計部１２４は、図３に示す制御則設計処理ルーチンを実行する。 Next, the control law design unit 124 generates a control law for an automatically controlled vehicle among a plurality of vehicles based on the Gaussian process model learned for each state space by the learning processing routine. When the learning process routine ends, the control law design unit 124 executes the control law design process routine shown in FIG.

ステップＳ２００において、制御則設計部１２４は、学習処理ルーチンによって状態空間毎に学習されたガウス過程モデルの平均と、状態空間毎の係数行列Ｂとを取得する。 In step S200, the control law design unit 124 acquires the average of the Gaussian process models learned for each state space by the learning processing routine and the coefficient matrix B for each state space.

ステップＳ２０２において、制御則設計部１２４は、上記ステップＳ２００で取得されたガウス過程モデルの平均と係数行列Ｂとに基づいて、上記式（２９）に従って、制御則ｕ_ＦＢ（ｘ）を状態空間毎に取得する。 In step S202, the control law design unit 124 calculates the control law u _FB (x) for each state space according to the above equation (29) based on the average of the Gaussian process model and the coefficient matrix B acquired in step S200. To get to.

ステップＳ２０４において、制御則設計部１２４は、上記ステップＳ２０２で得られた状態空間毎の制御則ｕ_ＦＢ（ｘ）を制御則データベース１２６へ格納して、制御則設計処理ルーチンを終了する。 In step S204, the control law design unit 124 stores the control law u _FB (x) for each state space obtained in step S202 in the control law database 126, and ends the control law design processing routine.

＜車両制御量算出装置１４の作用＞
次に、本実施の形態に係る車両制御量算出装置１４の作用について説明する。車両制御量算出装置１４は、対象とする交通環境下において、図４に示す制御量算出処理ルーチンを実行する。 <Operation of Vehicle Control Amount Calculation Device 14>
Next, the operation of the vehicle control amount calculation device 14 according to the present embodiment will be described. The vehicle control amount calculation device 14 executes a control amount calculation processing routine shown in FIG. 4 under the target traffic environment.

ステップＳ３００において、状態取得部１４０は、対象とする交通環境下において、複数の車両の各々についての状態変数ｘ（ｔ）と、手動制御車両及び自動制御車両の何れであるかの情報とを取得する。また、状態取得部１４０は、車両の台数と、自動制御車両の台数と、各車両が手動制御車両及び自動制御車両の何れであるかの情報とに基づいて、複数の車両が属する状態空間を特定する。 In step S300, the state acquisition unit 140 acquires a state variable x (t) for each of a plurality of vehicles and information on which of the manually controlled vehicle and the automatically controlled vehicle is in a target traffic environment. To do. Further, the state acquisition unit 140 determines a state space to which a plurality of vehicles belong based on the number of vehicles, the number of automatically controlled vehicles, and information on whether each vehicle is a manually controlled vehicle or an automatically controlled vehicle. Identify.

ステップＳ３０２において、制御則取得部１４２は、上記ステップＳ３００で特定された状態空間に応じて、制御則生成装置１２の制御則データベース１２６に格納された制御則を取得する。そして、制御則取得部１４２は、取得した制御則ｕ_ＦＢ（ｘ）を制御則記憶部１４６へ格納する。また、制御則取得部１４２は、特定された状態空間に応じて、制御則生成装置１２から、目標とする定常状態を取得する。 In step S302, the control law acquisition unit 142 acquires the control law stored in the control law database 126 of the control law generation device 12 according to the state space specified in step S300. Then, the control law acquisition unit 142 stores the acquired control law u _FB (x) in the control law storage unit 146. Further, the control law acquisition unit 142 acquires a target steady state from the control law generation device 12 according to the specified state space.

ステップＳ３０４において、制御量算出部１４８は、状態取得部１４０によって取得された複数の車両の状態変数ｘ（ｔ）と、上記ステップＳ３０２で制御則記憶部１４６に格納された制御則ｕ_ＦＢ（ｘ）とに基づいて、複数の車両のうちの自動制御車両に対する状態変数の制御量ｕ（ｔ）を算出する。そして、制御量算出部１４８は、現時刻における自動制御車両に対する状態変数の制御量ｕ（ｔ）と、目標とする定常状態とから、自動制御車両の各々に対する加速度を算出して出力する。 In step S304, the control amount calculation unit 148 includes the plurality of vehicle state variables x (t) acquired by the state acquisition unit 140, and the control law u _FB (x stored in the control law storage unit 146 in step S302 above. ), A control variable u (t) of a state variable for an automatically controlled vehicle among a plurality of vehicles is calculated. Then, the control amount calculation unit 148 calculates and outputs the acceleration for each of the automatic control vehicles from the control amount u (t) of the state variable for the automatic control vehicle at the current time and the target steady state.

ステップＳ３０６において、制御量算出部１４８は、上記ステップＳ３０４で算出された自動制御車両の各々に対する加速度を結果として出力して、制御量算出処理ルーチンを終了する。 In step S306, the control amount calculation unit 148 outputs the acceleration for each of the automatically controlled vehicles calculated in step S304 as a result, and ends the control amount calculation processing routine.

このように、本実施の形態の制御則生成装置によれば、複数の車両の状態を表す状態空間毎に、予め収集された手動制御車両の状態変数及び自動制御車両の状態変数を表す複数の学習データに基づいて、複数の車両の状態変数の変化量を予測するためのモデルを学習し、状態空間毎に、状態空間について学習されたモデルに基づいて、複数の車両のうちの自動制御車両の制御則を生成することにより、手動制御車両と自動制御車両とを含む交通環境において、自動制御車両を制御するための制御則を得ることができる。 As described above, according to the control law generation device of the present embodiment, for each state space representing a plurality of vehicle states, a plurality of state variables for a manually controlled vehicle and a state variable for an automatically controlled vehicle collected in advance. Based on the learning data, a model for predicting the amount of change in the state variables of a plurality of vehicles is learned, and for each state space, an automatically controlled vehicle among the plurality of vehicles based on the model learned for the state space By generating the control law, it is possible to obtain a control law for controlling the automatic control vehicle in a traffic environment including the manual control vehicle and the automatic control vehicle.

また、本実施の形態の制御量算出装置によれば、複数の車両の各々についての状態変数と、手動制御車両及び自動制御車両の何れであるかの情報とを取得すると共に、複数の車両が属する状態空間を特定し、特定された状態空間に応じて、制御則生成装置によって生成された制御則を取得し、複数の車両の状態変数と、取得された制御則とに基づいて、複数の車両のうちの自動制御車両に対する制御量を算出することにより、手動制御車両と自動制御車両とを含む交通環境においても、自動制御車両を制御するための制御量を得ることができる。 In addition, according to the control amount calculation device of the present embodiment, the state variable for each of the plurality of vehicles and the information on which of the manual control vehicle and the automatic control vehicle are acquired, and the plurality of vehicles are Identify the state space to which it belongs, acquire the control law generated by the control law generation device according to the specified state space, and based on the state variables of the plurality of vehicles and the acquired control law, By calculating the control amount for the automatic control vehicle among the vehicles, it is possible to obtain the control amount for controlling the automatic control vehicle even in a traffic environment including the manual control vehicle and the automatic control vehicle.

また、挙動の予測が困難な手動制御車両を含む交通環境下で、手動制御車両と協調した安全、最適な自動制御車両の制御がリアルタイムに実行可能となる。これにより、交通量の渋滞削減、燃費や旅行時間の向上が期待される。 In addition, in a traffic environment including a manually controlled vehicle in which behavior is difficult to predict, it is possible to execute a safe and optimal control of the automatically controlled vehicle in cooperation with the manually controlled vehicle in real time. This is expected to reduce traffic congestion and improve fuel efficiency and travel time.

また、手動制御車両の挙動データを含む学習データから、手動制御車両が混在した複数車両を一つのシステムとみなしてガウス過程でモデル化することにより、手動制御車両を含むシステム全体の予測が高精度に実現できる。また、複数の車両を一つのモデルとみなした設計により、挙動の予測が困難な手動制御車両と協調した車両制御が可能となる。 In addition, from the learning data including the behavior data of manually controlled vehicles, multiple vehicles with manually controlled vehicles are regarded as one system and modeled in a Gaussian process, so that the prediction of the entire system including manually controlled vehicles is highly accurate. Can be realized. In addition, the design in which a plurality of vehicles are regarded as one model enables vehicle control in cooperation with a manually controlled vehicle whose behavior is difficult to predict.

また、学習したモデルに基づき制御則をオフラインで設計しておくことで、オンラインでは制御則に従い制御入力を計算するだけで済むため、リアルタイム性が高い制御が可能となる。 Also, by designing the control law off-line based on the learned model, it is only necessary to calculate the control input in accordance with the control law online, so that control with high real-time characteristics is possible.

なお、上記実施形態では、自動制御車両の前後方向に対する制御を例に説明したがこれに限定されるものではなく、車両の幅方向（横方向）も含めた制御を行っても良い。 In the above embodiment, the control in the front-rear direction of the automatically controlled vehicle has been described as an example. However, the present invention is not limited to this, and control including the width direction (lateral direction) of the vehicle may be performed.

なお、本発明のプログラムは、記録媒体に格納して提供することができる。 The program of the present invention can be provided by being stored in a recording medium.

１２制御則生成装置
１０車両制御システム
１４車両制御量算出装置
１１８学習データ収集部
１２０学習データベース
１２２状態空間モデル学習部
１２４制御則設計部
１２６制御則データベース
１２８制御部
１３０通信部
１４０状態取得部
１４２制御則取得部
１４４通信部
１４６制御則記憶部
１４８制御量算出部 DESCRIPTION OF SYMBOLS 12 Control law production | generation apparatus 10 Vehicle control system 14 Vehicle control amount calculation apparatus 118 Learning data collection part 120 Learning database 122 State space model learning part 124 Control law design part 126 Control law database 128 Control part 130 Communication part 140 State acquisition part 142 Control Law acquisition unit 144 Communication unit 146 Control law storage unit 148 Control amount calculation unit

Claims

The state of the manually controlled moving body collected in advance for each state space representing the state of a plurality of moving bodies including a manually controlled moving body that is a manually controlled moving body and an automatically controlled moving body that is a automatically controlled moving body. A learning unit that learns a model for predicting the amount of change in the state variables of the plurality of moving bodies based on a plurality of learning data representing the variables and the state variables of the automatically controlled moving bodies;
For each state space, based on the model learned for the state space by the learning unit, a control law is generated for controlling each state variable of the automatic control moving body among the plurality of moving bodies. A control law design department to
Model learning device including

The learning unit, as the learning data, for each of the moving bodies, the state variable of the moving body, the state variable of the moving body around the moving body, the amount of change of the state variable of the moving body, The model learning apparatus according to claim 1, wherein the model is learned by using a model.

The control law design unit includes the model learned by the learning unit and the control law for optimizing an evaluation function represented by using the control law, a power polynomial of the state variable, Chebyshev of the state variable The model learning apparatus according to claim 1, wherein the control law is generated by obtaining a polynomial or a neural network having the state variable as an input according to an HJB equation having a basis function.

The learning unit receives a plurality of state variables of the mobile objects, and outputs an average of change amounts of the state variables of the mobile objects and a variance of change amounts of the state variables of the mobile objects. The model learning apparatus according to any one of claims 1 to 3, wherein a process model is learned as the model.

The state variable includes the position and speed of the moving body,
The model learning device according to any one of claims 1 to 4.

A state acquisition unit for acquiring a state variable for each of the plurality of moving bodies and information on whether the moving body is a manual control moving body or an automatic control moving body, and identifying a state space to which the plurality of moving bodies belong; ,
According to the state space specified by the state acquisition unit, a control law acquisition unit that acquires the control law generated by the model learning device according to any one of claims 1 to 5,
A control amount for calculating a control amount of the state variable for the automatic control mobile body among the plurality of mobile bodies based on the state variables of the plurality of mobile bodies and the control law acquired by the control law acquisition unit A calculation unit;
Control amount calculation apparatus including

Computer
The state of the manually controlled moving body collected in advance for each state space representing the state of a plurality of moving bodies including a manually controlled moving body that is a manually controlled moving body and an automatically controlled moving body that is a automatically controlled moving body. Based on a plurality of learning data representing a variable and a state variable of the automatic control moving body, a learning unit for learning a model for predicting a change amount of the state variable of the plurality of moving bodies, and for each state space, Based on the model learned for the state space by the learning unit, a control law design unit that generates a control law for controlling each state variable of the automatic control mobile body among the plurality of mobile bodies A program to make it work.