JP7381804B1

JP7381804B1 - Route control device and route control method

Info

Publication number: JP7381804B1
Application number: JP2023154067A
Authority: JP
Inventors: 純柿島
Original assignee: Internet Initiative Japan Inc
Current assignee: Internet Initiative Japan Inc
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2023-11-16
Anticipated expiration: 2043-09-21

Abstract

【課題】より簡易な構成により遠隔で車両の進路制御を行うことを目的とする。【解決手段】複数の通信エリアＡ１～Ａｎで規定される車両２の移動空間において、初期地点の通信エリアＡ１の位置から目的地点の通信エリアＡｎの位置までの車両２の進路を制御する進路制御装置１であって、通信エリアＡを跨いだ際に送信される車両２からの位置登録信号に関連付けられている該通信エリアＡの位置を、車両２の現在の位置として取得する第１取得部１０と、学習モデルを用いて学習された、車両２が各通信エリアＡの位置から順次進むべき進路の方策に基づいて、第１取得部１０によって車両２の現在の位置として取得された現在の通信エリアＡの位置から、車両２が次に進むべき進路を決定する決定部１５と、決定部１５によって決定された、車両２が次に進むべき進路を所定の通信規格のコアネットワーク３を介して車両２に指示する進路制御部１６とを備える。【選択図】図１An object of the present invention is to remotely control the course of a vehicle using a simpler configuration. [Solution] Course control that controls the course of the vehicle 2 from the position of the communication area A1 at the initial point to the position of the communication area An at the destination point in the movement space of the vehicle 2 defined by a plurality of communication areas A1 to An. A first acquisition unit of the device 1 that acquires the position of the communication area A associated with the location registration signal transmitted from the vehicle 2 when crossing the communication area A as the current position of the vehicle 2; 10, and the current position acquired by the first acquisition unit 10 as the current position of the vehicle 2 based on the route the vehicle 2 should take sequentially from the position of each communication area A, which is learned using the learning model. A determining unit 15 determines the next route for the vehicle 2 from the position of the communication area A, and a determining unit 15 determines the next route for the vehicle 2 determined by the determining unit 15 via the core network 3 of a predetermined communication standard. and a route control unit 16 that instructs the vehicle 2 to follow the directions. [Selection diagram] Figure 1

Description

本発明は、進路制御装置および進路制御方法に関する。 The present invention relates to a course control device and a course control method.

従来から、車両の自動運転における自車位置推定技術として、ＧＰＳなどの測位衛星からの信号および、測位衛星からの信号を補完するために車両の挙動を検知する６軸慣性センサ（ＩｎｅｒｔｉａｌＭｅａｓｕｒｅｍｅｎｔＵｎｉｔ：ＩＭＵ）や、タイヤの回転数を計測して車両が進んだ距離を計測する走行距離計（ＤｉｓｔａｎｃｅＭｅａｓｕｒｉｎｇＩｎｓｔｒｕｍｅｎｔ：ＤＭＩ）が知られている。 Conventionally, vehicle position estimation technology for autonomous vehicle driving has been based on signals from positioning satellites such as GPS, and six-axis inertial sensors (inertial measurement units) that detect vehicle behavior to complement signals from positioning satellites. Distance Measuring Instruments (DMI) that measure the distance traveled by a vehicle by measuring the number of rotations of tires are known.

車両がトンネル内を走行している場合など通信状態が悪い状況では測位衛星からのＧＰＳの信号が受信できない場合がある。しかし、ＧＰＳを補完するために用いられているＩＭＵおよびＤＭＩにおいても、状況によっては自車位置推定精度が十分でない場合がある。例えば、ＩＭＵを用いた自車位置推定では誤差が蓄積しやすい欠点がある。ＤＭＩを用いた自車位置推定では、車速や車両の向きが変わるときに、計測精度が低下する場合がある。 In situations where communication conditions are poor, such as when a vehicle is driving in a tunnel, GPS signals from positioning satellites may not be received. However, even with the IMU and DMI used to supplement GPS, the accuracy of estimating the vehicle's position may not be sufficient depending on the situation. For example, self-vehicle position estimation using an IMU has the disadvantage that errors tend to accumulate. In self-vehicle position estimation using DMI, measurement accuracy may decrease when the vehicle speed or direction of the vehicle changes.

このように、従来の自動運転における自車位置推定では、ＧＰＳの測位衛星からの電波の受信状況が悪い場合に、自車位置推定の精度が低下し、自車両の自動運転ＥＣＵによって進路や経路の選択を正確に行うことが困難な場合があった。 In this way, in conventional self-driving vehicle position estimation, when the reception of radio waves from GPS positioning satellites is poor, the accuracy of self-position estimation decreases, and the self-driving ECU of the self-driving vehicle is unable to determine the course or route. It was sometimes difficult to make accurate selections.

そこで、特許文献１は、ＧＰＳおよび地図情報を組み合わせて自動運転の制御を行う場合に、通信遅延が生じた際には、車両がいる地域情報を収集して現在状況地図情報を作成し、設定された目的地点と現在状況地図情報とに基づいて、目的地点までの車両の走行領域の予測演算を行う技術を開示している。 Therefore, when controlling automatic driving using a combination of GPS and map information, when a communication delay occurs, Patent Document 1 collects information on the area where the vehicle is located, creates current situation map information, and sets the The present invention discloses a technology that performs predictive calculation of the travel area of a vehicle to a destination point based on the destination point and current situation map information.

しかし、特許文献１が開示する技術では、車両の速度や周辺情報などの様々な情報を収集し地図を生成して車両の走行領域の予測演算を行うため、予測演算が複雑化し、演算負荷が多大となる。 However, the technology disclosed in Patent Document 1 collects various information such as vehicle speed and surrounding information, generates a map, and performs predictive calculations of the vehicle's driving area, which makes the predictive calculations complicated and increases the calculation load. It will be huge.

特開２０２１－１１１３２９号公報JP 2021-111329 Publication

このように、従来の技術では、より簡易な構成により遠隔で車両の進路制御を行うことができなかった。 As described above, with the conventional technology, it has not been possible to remotely control the course of a vehicle using a simpler configuration.

本発明は、上述した課題を解決するためになされたものであり、より簡易な構成により遠隔で車両の進路制御を行うことを目的とする。 The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to remotely control the course of a vehicle with a simpler configuration.

上述した課題を解決するために、本発明に係る進路制御装置は、複数の通信エリアで規定される車両の移動空間において、初期地点の通信エリアの位置から目的地点の通信エリアの位置までの前記車両の進路を制御する進路制御装置であって、通信エリアを跨いだ際に送信される、前記車両からの位置登録信号に関連付けられている該通信エリアの位置を、前記車両の現在の位置として取得するように構成された第１取得部と、学習モデルを用いて学習された、前記車両が各通信エリアの位置から順次進むべき進路の方策に基づいて、前記第１取得部によって前記車両の前記現在の位置として取得された現在の通信エリアの位置から、前記車両が次に進むべき進路を決定するように構成された決定部と、前記決定部によって決定された、前記車両が次に進むべき進路を所定の通信規格のコアネットワークを介して前記車両に指示するように構成された進路制御部とを備える。 In order to solve the above-mentioned problems, a route control device according to the present invention provides a route control device for moving a vehicle from a position of a communication area at an initial point to a position of a communication area at a destination point in a movement space of a vehicle defined by a plurality of communication areas. A route control device that controls the route of a vehicle, the location of which is associated with a location registration signal from the vehicle transmitted when crossing a communication area, as the current location of the vehicle. The first acquisition unit is configured to acquire information about the vehicle based on the strategy of the route that the vehicle should take sequentially from the position of each communication area, which has been learned using a learning model. a determining unit configured to determine the next course that the vehicle should take from the current communication area position acquired as the current position; and a route control unit configured to instruct the vehicle on a desired route via a core network of a predetermined communication standard.

また、本発明に係る進路制御装置において、ささらに、前記車両が前記初期地点の通信エリアの位置から前記目的地点の通信エリアの位置に到達するまでに、前記車両が前記各通信エリアの位置から順次進むべき進路を計算した推定結果に報酬関数を適用して、前記車両が前記目的地点の通信エリアの位置へ到達するための報酬が最大化するように更新し、前記車両が前記各通信エリアの位置から順次進むべき前記進路の方策を、前記学習モデルを用いて学習するように構成された学習部と、前記学習部によって学習された、前記進路の方策を記憶するように構成された記憶部とを備え、前記決定部は、前記記憶部から前記進路の方策を読み出して、前記車両が次に進むべき進路を決定してもよい。 Further, in the route control device according to the present invention, the vehicle may move from the position of each of the communication areas until the vehicle reaches the position of the communication area of the destination point from the position of the communication area of the initial point. A reward function is applied to the estimation result of calculating the route to be taken sequentially, and the reward function is updated so that the reward for the vehicle to reach the position of the communication area of the destination point is maximized. a learning section configured to use the learning model to learn strategies for the course to proceed sequentially from the position of , and a memory configured to store the strategies for the route learned by the learning section. The determination unit may read out the route strategy from the storage unit and determine the route the vehicle should take next.

上述した課題を解決するために、本発明に係る進路制御装置は、さらに、前記移動空間において渋滞が発生している領域に対応する通信エリアの位置を取得するように構成された第２取得部と、前記報酬関数は、前記車両の前記目的地点に係る通信エリアの位置への到達度、および前記渋滞が発生している領域に対応する通信エリアの位置への前記車両の到達度を変数として含んでいてもよい。 In order to solve the above-mentioned problems, the route control device according to the present invention further includes a second acquisition unit configured to acquire the position of a communication area corresponding to an area where traffic congestion occurs in the movement space. And, the reward function uses as variables the degree of arrival of the vehicle to the position of the communication area related to the destination point and the degree of arrival of the vehicle to the position of the communication area corresponding to the area where the traffic jam occurs. May contain.

また、本発明に係る進路制御装置において、前記学習モデルは、入力層、隠れ層、および出力層を含むニューラルネットワークモデルであり、前記学習部は、前記現在の通信エリアの位置を前記ニューラルネットワークモデルの入力として与え、前記ニューラルネットワークモデルの演算を行い、前記車両が前記現在の通信エリアの位置から次に進むべき進路として、右折、左折、および直進を含む各々の行動をとった場合に得られる将来の前記報酬の累積値の期待値を表す行動価値関数の第１推定値を出力し、前記学習部は、さらに、前記車両が次に到達した通信エリアの位置を前記ニューラルネットワークモデルの入力として与え、前記ニューラルネットワークモデルの演算を行い、前記行動価値関数の第２推定値を出力し、前記学習部は、前記第１推定値が、前記第２推定値から計算される目標値となるように、前記ニューラルネットワークモデルの重みパラメータを学習し、前記記憶部は、学習済みの重みパラメータを記憶してもよい。 Further, in the route control device according to the present invention, the learning model is a neural network model including an input layer, a hidden layer, and an output layer, and the learning unit is configured to calculate the current communication area position from the neural network model. is given as an input, the neural network model is calculated, and the next course that the vehicle should take from the current communication area position is obtained when each action including turning right, turning left, and going straight is taken. The learning unit outputs a first estimated value of an action value function representing an expected value of the cumulative value of the reward in the future, and the learning unit further uses the position of the communication area that the vehicle reaches next as an input to the neural network model. and calculates the neural network model to output a second estimated value of the action value function, and the learning unit is configured to make the first estimated value a target value calculated from the second estimated value. The weight parameters of the neural network model may be learned, and the storage unit may store the learned weight parameters.

また、本発明に係る進路制御装置において、前記第１取得部は、前記コアネットワークに含まれる、加入者情報を管理する統合データリポジトリから、前記車両の前記現在の通信エリアの位置を取得し、前記進路制御部は、前記コアネットワークに含まれるユーザープレーン機能を介して、前記車両に前記次に進むべき進路に係る指示を送信してもよい。 Further, in the route control device according to the present invention, the first acquisition unit acquires the current communication area position of the vehicle from an integrated data repository that manages subscriber information included in the core network; The route control unit may transmit an instruction regarding the next route to the vehicle via a user plane function included in the core network.

上述した課題を解決するために、本発明に係る進路制御方法は、複数の通信エリアで規定される車両の移動空間において、初期地点の通信エリアの位置から目的地点の通信エリアの位置までの前記車両の進路を制御するための進路制御方法であって、通信エリアを跨いだ際に送信される、前記車両からの位置登録信号に関連付けられている該通信エリアの位置を、前記車両の現在の位置として取得する第１取得ステップと、学習モデルを用いて学習された、前記車両が各通信エリアの位置から順次進むべき進路の方策に基づいて、前記第１取得ステップで前記車両の前記現在の位置として取得された現在の通信エリアの位置から、前記車両が次に進むべき進路を決定する決定ステップと、前記決定ステップで決定された、前記車両が次に進むべき進路を所定の通信規格のコアネットワークを介して前記車両に指示する進路制御ステップとを備える。 In order to solve the above-mentioned problems, the route control method according to the present invention provides the route control method from the position of the communication area of the initial point to the position of the communication area of the destination point in the movement space of the vehicle defined by a plurality of communication areas. A route control method for controlling the route of a vehicle, wherein the current location of the vehicle is determined by determining the location of the communication area associated with a location registration signal from the vehicle that is transmitted when the vehicle crosses the communication area. The current position of the vehicle is acquired in the first acquisition step based on the strategy of the course that the vehicle should take sequentially from the position of each communication area, which is learned using a learning model. a determining step of determining the next course that the vehicle should take based on the current communication area position acquired as the position; and a route control step of instructing the vehicle via a core network.

また、本発明に係る進路制御方法において、さらに、前記車両が前記初期地点の通信エリアの位置から前記目的地点の通信エリアの位置に到達するまでに、前記車両が前記各通信エリアの位置から順次進むべき進路を計算した推定結果に報酬関数を適用して、前記車両が前記目的地点の通信エリアの位置へ到達するための報酬が最大化するように更新し、前記車両が前記各通信エリアの位置から順次進むべき前記進路の方策を、前記学習モデルを用いて学習するように構成された学習ステップと、前記学習ステップで学習された、前記進路の方策を記憶部に記憶する記憶ステップとを備え、前記決定ステップは、前記記憶部から前記進路の方策を読み出して、前記車両が次に進むべき進路を決定してもよい。 Further, in the route control method according to the present invention, the vehicle may sequentially move from the communication area position of each of the communication areas until the vehicle reaches the communication area position of the destination point from the communication area position of the initial point. A reward function is applied to the estimated result of calculating the course to be taken, and the reward for the vehicle to reach the communication area of the destination point is updated so as to maximize the reward for the vehicle to reach the position of the communication area of the destination point. a learning step configured to learn, using the learning model, a strategy for the route to be taken sequentially from a position; and a storage step for storing in a storage unit the strategy for the route learned in the learning step. In the determining step, the route strategy may be read from the storage unit to determine the route the vehicle should take next.

また、本発明に係る進路制御方法において、さらに、前記移動空間において渋滞が発生している領域に対応する通信エリアの位置を取得するように構成された第２取得ステップと、前記報酬関数は、前記車両の前記目的地点に係る通信エリアの位置への到達度、および前記渋滞が発生している領域に対応する通信エリアの位置への前記車両の到達度を変数として含んでいてもよい。 Further, in the route control method according to the present invention, the second obtaining step is configured to obtain a position of a communication area corresponding to an area where traffic congestion occurs in the moving space, and the reward function includes: The variables may include a degree of arrival of the vehicle to a position in a communication area related to the destination point and a degree of arrival of the vehicle to a position of a communication area corresponding to the area where the traffic jam occurs.

また、本発明に係る進路制御方法において、前記学習モデルは、入力層、隠れ層、および出力層を含むニューラルネットワークモデルであり、前記学習ステップは、前記現在の通信エリアの位置を前記ニューラルネットワークモデルの入力として与え、前記ニューラルネットワークモデルの演算を行い、前記車両が前記現在の通信エリアから次に進むべき進路として、右折、左折、および直進を含む各々の行動をとった場合に得られる将来の前記報酬の累積値の期待値を表す行動価値関数の第１推定値を出力し、前記学習ステップは、さらに、前記車両が次に到達した通信エリアの位置を前記ニューラルネットワークモデルの入力として与え、前記ニューラルネットワークモデルの演算を行い、前記行動価値関数の第２推定値を出力し、前記学習ステップは、前記第１推定値が、前記第２推定値から計算される目標値となるように、前記ニューラルネットワークモデルの重みパラメータを学習し、前記記憶ステップは、学習済みの重みパラメータを前記記憶部に記憶してもよい。 Further, in the route control method according to the present invention, the learning model is a neural network model including an input layer, a hidden layer, and an output layer, and the learning step includes determining the current communication area position from the neural network model. is given as an input, the neural network model is calculated, and the future path obtained when the vehicle takes each action, including turning right, turning left, and going straight, is calculated as the next course the vehicle should take from the current communication area. outputting a first estimated value of an action value function representing an expected value of the cumulative value of the reward; the learning step further includes providing a position of a communication area that the vehicle reaches next as an input to the neural network model; Calculating the neural network model and outputting a second estimated value of the action value function, the learning step is such that the first estimated value becomes a target value calculated from the second estimated value. The step of learning weight parameters of the neural network model and storing the learned weight parameters may be stored in the storage unit.

また、本発明に係る進路制御方法において、前記第１取得ステップは、前記コアネットワークに含まれる、加入者情報を管理する統合データリポジトリから、前記車両の前記現在の通信エリアの位置を取得し、前記進路制御ステップは、前記コアネットワークに含まれるユーザープレーン機能を介して、前記車両に前記次に進むべき進路に係る指示を送信してもよい。 Further, in the route control method according to the present invention, the first acquisition step acquires the current communication area position of the vehicle from an integrated data repository that manages subscriber information included in the core network; The route control step may include transmitting an instruction regarding the next route to the vehicle via a user plane function included in the core network.

本発明によれば、学習モデルを用いて学習された、車両が各通信エリアの位置から順次進むべき進路の方策に基づいて、現在の通信エリアの位置から車両が次に進むべき進路を決定する。そのため、より簡易な構成により遠隔で車両の進路制御を行うことができる。 According to the present invention, the next course that the vehicle should take from the current communication area position is determined based on the course that the vehicle should take sequentially from the position of each communication area, which is learned using a learning model. . Therefore, the route of the vehicle can be controlled remotely with a simpler configuration.

図１は、本発明の実施の形態に係る進路制御装置を含む進路制御システムの構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a route control system including a route control device according to an embodiment of the present invention. 図２は、本実施の形態に係る進路制御システムの概要を説明するための図である。FIG. 2 is a diagram for explaining an overview of the route control system according to the present embodiment. 図３は、本実施の形態に係る学習部による学習処理を説明するための図である。FIG. 3 is a diagram for explaining learning processing by the learning section according to the present embodiment. 図４は、本実施の形態に係る学習部の構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of the learning section according to this embodiment. 図５は、本実施の形態に係る進路制御装置のハードウェア構成を示すブロック図である。FIG. 5 is a block diagram showing the hardware configuration of the route control device according to this embodiment. 図６は、本実施の形態に係る進路制御装置の学習処理を示すフローチャートである。FIG. 6 is a flowchart showing the learning process of the route control device according to the present embodiment. 図７は、本実施の形態に係る進路制御装置の学習処理を示すフローチャートである。FIG. 7 is a flowchart showing the learning process of the route control device according to the present embodiment. 図８は、本実施の形態に係る進路制御装置の進路制御処理を示すフローチャートである。FIG. 8 is a flowchart showing the route control process of the route control device according to the present embodiment.

以下、本発明の好適な実施の形態について、図１から図８を参照して詳細に説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to FIGS. 1 to 8.

図１は、本発明の実施の形態に係る進路制御装置１を備える進路制御システムの構成を示すブロック図である。本実施の形態に係る進路制御システムは、複数の通信エリアＡ１～Ａｎで規定される車両２の移動空間において、車両２が通信エリアＡ１～Ａｎを跨いだ際に送る位置登録信号を用いて、初期地点の通信エリアＡ１の位置から目的地点の通信エリアＡｎの位置までの車両２の進路を制御する。 FIG. 1 is a block diagram showing the configuration of a route control system including a route control device 1 according to an embodiment of the present invention. The route control system according to the present embodiment uses a position registration signal sent when the vehicle 2 crosses the communication areas A1 to An in a movement space of the vehicle 2 defined by a plurality of communication areas A1 to An. The course of the vehicle 2 from the initial point in the communication area A1 to the destination point in the communication area An is controlled.

［進路制御システムの構成］
まず、本発明の実施の形態に係る進路制御装置１を備える進路制御システムの概要について説明する。図１に示すように、進路制御システムは、例えば、ＳＡ方式の５Ｇ無線通信システムに対応する進路制御装置１、車両２、基地局ＢＳ１～ＢＳｎ、およびコアネットワーク３を備える。 [Configuration of route control system]
First, an outline of a course control system including a course control device 1 according to an embodiment of the present invention will be explained. As shown in FIG. 1, the route control system includes a route control device 1, a vehicle 2, base stations BS1 to BSn, and a core network 3, which are compatible with, for example, an SA type 5G wireless communication system.

基地局ＢＳ１～ＢＳｎは、５Ｇ方式に対応した無線基地局で構成され、通信エリアＡ１～Ａｎに在圏する車両２とコアネットワーク３との間の通信を中継する。以下において、基地局ＢＳ１～ＢＳｎおよび通信エリアＡ１～Ａｎをそれぞれ区別しない場合には、それぞれ基地局ＢＳ、通信エリアＡと総称する場合がある。 The base stations BS1 to BSn are configured of wireless base stations compatible with the 5G system, and relay communications between the core network 3 and the vehicles 2 located in the communication areas A1 to An. In the following, when base stations BS1 to BSn and communication areas A1 to An are not distinguished from each other, they may be collectively referred to as base station BS and communication area A, respectively.

図１に示すように、各基地局ＢＳ１～ＢＳｎの通信エリアＡ１～Ａｎは、車両２が移動する移動空間を規定する。また、各通信エリアＡ１～Ａｎは、車両２が初期地点から目的地点まで移動する幹線道路などの道路の区間をカバーするように配置されている。本実施の形態では、通信エリアＡ１～Ａｎは同一の大きさのセルを有するものとする。 As shown in FIG. 1, the communication areas A1 to An of each base station BS1 to BSn define a movement space in which the vehicle 2 moves. Further, each of the communication areas A1 to An is arranged to cover a section of a road such as a main road along which the vehicle 2 travels from the initial point to the destination point. In this embodiment, it is assumed that communication areas A1 to An have cells of the same size.

図２は、進路制御システムが制御対象とする移動空間を模式的に示した図である。図２の各点線の円は、移動空間に配置された各基地局ＢＳ１～ＢＳ１６がカバーする通信エリアＡ１～Ａ１６を示している。車両２は、基地局ＢＳ１～ＢＳ１６の通信エリアＡ１～Ａ１６の位置をウェイポイントとして初期地点Ｓから目的地点Ｇの通信エリアＡ１６の位置まで矢印で示す進路に沿って移動する。 FIG. 2 is a diagram schematically showing a movement space to be controlled by the course control system. Each dotted circle in FIG. 2 indicates a communication area A1 to A16 covered by each base station BS1 to BS16 arranged in the moving space. The vehicle 2 moves from the initial point S to the destination point G in the communication area A16 along a course indicated by an arrow, using the communication areas A1 to A16 of the base stations BS1 to BS16 as waypoints.

本実施の形態では、車両２は、進路制御システムによる進路指示に従って通信エリアＡ１～Ａ１６毎に右折、左折、あるいは直進していき、初期地点Ｓから目的地点Ｇまで到達する。また、進路指示は、車両２の走行する方向を基準とした右左折直進であるものとする。例えば、通信エリアＡ１の位置で直進の進路指示があった場合、車両２は、通信エリアＡ１の位置から直進し、次の通信エリアＡ２に到達する。さらに、通信エリアＡ２の位置から左折の進路指示があった場合、車両２は左折し、次の通信エリアＡ６へ到達する。このように、車両２は、通信エリアＡ１～Ａ１６毎に進路指示に従って移動空間を移動する。 In this embodiment, the vehicle 2 reaches from the initial point S to the destination point G by turning right, turning left, or going straight in each of the communication areas A1 to A16 according to the route instructions given by the route control system. Further, it is assumed that the route instruction is to turn right, left, or go straight based on the direction in which the vehicle 2 is traveling. For example, if there is a direction instruction to go straight at the position of the communication area A1, the vehicle 2 will go straight from the position of the communication area A1 and reach the next communication area A2. Further, if a left turn course instruction is given from the position of the communication area A2, the vehicle 2 turns left and reaches the next communication area A6. In this way, the vehicle 2 moves in the movement space according to the route instructions for each of the communication areas A1 to A16.

車両２には、通信端末２０が搭載されている。車両２には、自動車、原動機付自動車、自動二輪車などが含まれる。通信端末２０は、プロセッサ、主記憶装置、補助記憶装置、通信インターフェース等を備え、車両２に搭載されている端末装置、あるいは車両２を利用するユーザのスマートフォンなどの携帯通信端末、タブレット型コンピュータなどとして実現される。 The vehicle 2 is equipped with a communication terminal 20 . Vehicles 2 include automobiles, motorized vehicles, motorcycles, and the like. The communication terminal 20 includes a processor, a main storage device, an auxiliary storage device, a communication interface, etc., and is a terminal device installed in the vehicle 2, a mobile communication terminal such as a smartphone of a user using the vehicle 2, a tablet computer, etc. It is realized as.

具体的には、通信端末２０は、ＳＩＭ２１を備える。車両２は、通信端末２０が備えるＳＩＭ２１のＩＭＳＩ（ＩｎｔｅｒｎａｔｉｏｎａｌＭｏｂｉｌｅＳｕｂｓｃｒｉｂｅｒＩｄｅｎｔｉｔｙ）によって一意に識別される。 Specifically, the communication terminal 20 includes a SIM 21. The vehicle 2 is uniquely identified by the IMSI (International Mobile Subscriber Identity) of the SIM 21 included in the communication terminal 20 .

通信端末２０のプロセッサは、車両２の移動にともなって通信端末２０が通信エリアＡ１～Ａｎを跨ぐ際に、新たな通信エリアＡ１～Ａｎの基地局ＢＳ１～ＢＳｎに対して位置登録信号（ＴＡＵ）を送信する。 When the communication terminal 20 straddles the communication areas A1 to An as the vehicle 2 moves, the processor of the communication terminal 20 sends a location registration signal (TAU) to base stations BS1 to BSn in the new communication areas A1 to An. Send.

また、車両２は、ＥＣＵ（ＥｌｅｃｔｒｏｎｉｃＣｏｎｔｒｏｌＵｎｉｔ）２２を備え、車両２のステア制御、駆動制御、ブレーキ制御、および自動運転制御として、進路制御装置１からの進路指示を処理する。車両２は、図示されないＧＰＳ機能を有するＧＰＳモジュールや、カーナビゲーションシステム、およびカメラやＬｉＤＡＲなどの各種センサを備えることができる。 The vehicle 2 also includes an ECU (Electronic Control Unit) 22, which processes route instructions from the route control device 1 as steering control, drive control, brake control, and automatic driving control of the vehicle 2. The vehicle 2 can be equipped with a GPS module having a GPS function (not shown), a car navigation system, and various sensors such as a camera and LiDAR.

進路制御装置１とコアネットワーク３とは、ＬＡＮやＷＡＮなどのネットワークＮＷを介して接続されている。また、無線アクセスネットワークを構成する基地局ＢＳ１～ＢＳｎとコアネットワーク３とは、バックホールリンクなどのネットワークＬを介して接続されている。 The route control device 1 and the core network 3 are connected via a network NW such as a LAN or WAN. Further, the base stations BS1 to BSn constituting the radio access network and the core network 3 are connected via a network L such as a backhaul link.

コアネットワーク３は、Ｃ－ｐｌａｎｅ内のノードであるＡＭＦ（ＡｃｃｅｓｓａｎｄＭｏｂｉｌｉｔｙＭａｎａｇｅｍｅｎｔＦｕｎｃｔｉｏｎ）３０、ＵＤＭ（ＵｎｉｆｉｅｄＤａｔａＭａｎａｇｅｍｅｎｔ）３１、およびＵＤＲ（統合データリポジトリ：ＵｎｉｆｉｅｄＤａｔａＲｅｐｏｓｉｔｏｒｙ）３２を備える。また、コアネットワーク３は、Ｕ－ｐｌａｎｅ内のノードとして、ＵＰＦ（ＵｓｅｒＰｌａｎｅＦｕｎｃｔｉｏｎ）３３を備える。進路制御装置１は、ＵＤＲ３２の通信インターフェース３２ａを介して、車両２からの位置登録信号に関連付けられた通信エリアＡの位置情報を取得する。また、進路制御装置１は、ＵＰＦ３３の通信インターフェース３３ａを介して、決定された進路の指示を車両２に送出する。 The core network 3 includes nodes within the C-plane, such as an AMF (Access and Mobility Management Function) 30, a UDM (Unified Data Management) 31, and a UDR (Unified Data Repository). sitory) 32. The core network 3 also includes a UPF (User Plane Function) 33 as a node within the U-plane. The route control device 1 acquires the position information of the communication area A associated with the position registration signal from the vehicle 2 via the communication interface 32a of the UDR 32. Further, the route control device 1 sends instructions for the determined route to the vehicle 2 via the communication interface 33a of the UPF 33.

例えば、図１に示すように、車両２が基地局ＢＳ１の通信エリアＡ１から基地局ＢＳ２の通信エリアＡ２を跨った際に、通信端末２０は、基地局ＢＳ２およびＡＭＦ３０を介してＵＤＭ３１に位置登録要求を行うための位置登録信号を送信する。 For example, as shown in FIG. 1, when the vehicle 2 crosses from the communication area A1 of the base station BS1 to the communication area A2 of the base station BS2, the communication terminal 20 registers the location with the UDM 31 via the base station BS2 and the AMF 30. Sends a location registration signal to make a request.

ＡＭＦ３０は、受信した信号をＵＤＭ３１に対して送信し、ＵＤＭ３１は、車両２が備える通信端末２０の端末識別情報により位置登録を行う。さらに、車両２が備える通信端末２０が送信した位置登録信号、および端末識別情報は、ＵＤＲ３２において、在圏する基地局ＢＳおよび通信エリアＡに関する識別情報、ならびに位置登録信号の送信タイムスタンプ（日時を示す情報）とともに記憶される。 The AMF 30 transmits the received signal to the UDM 31, and the UDM 31 performs location registration using the terminal identification information of the communication terminal 20 included in the vehicle 2. Furthermore, the location registration signal and terminal identification information transmitted by the communication terminal 20 provided in the vehicle 2 are stored in the UDR 32 as identification information regarding the base station BS and communication area A located in the area, and a transmission time stamp (date and time) of the location registration signal. information shown).

本実施の形態に係る進路制御装置１は、ネットワークＮＷを介して、ＵＤＲ３２に記憶されている位置登録信号に関連付けられている、当該位置登録信号を受信した各基地局ＢＳの識別情報あるいは通信エリアＡの識別情報を取得する。また、進路制御装置１は、後述の設定情報記憶部１４に、移動空間に配置されている基地局ＢＳおよび通信エリアＡの経度と緯度とのＧＰＳ座標などの位置情報を設定情報として記憶している。本実施の形態では、車両２の位置は、車両２が在圏する通信エリアＡの基地局ＢＳの位置であるものとして扱う。なお、以下の説明において、通信エリアＡの位置といった場合には、対応する基地局ＢＳの位置を指すものとする。 The route control device 1 according to the present embodiment transmits, via the network NW, the identification information or communication area of each base station BS that has received the location registration signal, which is associated with the location registration signal stored in the UDR 32. Obtain A's identification information. Further, the route control device 1 stores position information such as GPS coordinates of longitude and latitude of the base station BS and the communication area A arranged in the moving space in the setting information storage unit 14, which will be described later, as setting information. There is. In this embodiment, the position of the vehicle 2 is treated as the position of the base station BS in the communication area A where the vehicle 2 is located. In addition, in the following description, the position of the communication area A shall refer to the position of the corresponding base station BS.

このように、進路制御装置１は、車両２からの位置登録信号をＵＤＲ３２から取得することで、位置登録信号のタイムスタンプが示す時刻での車両２の位置を取得することができる。 In this way, the route control device 1 can acquire the position of the vehicle 2 at the time indicated by the time stamp of the position registration signal by acquiring the position registration signal from the vehicle 2 from the UDR 32.

［進路制御装置の機能ブロック］
進路制御装置１は、第１取得部１０、第２取得部１１、学習部１２、学習モデル記憶部１３、設定情報記憶部１４、決定部１５、および進路制御部１６を備える。進路制御装置１は、車両２からの位置登録信号に基づいて、車両２が次に進むべき進路を決定し、車両２に進路を指示する。 [Functional block of route control device]
The course control device 1 includes a first acquisition section 10 , a second acquisition section 11 , a learning section 12 , a learning model storage section 13 , a setting information storage section 14 , a determination section 15 , and a course control section 16 . The route control device 1 determines the route the vehicle 2 should take next based on the position registration signal from the vehicle 2, and instructs the vehicle 2 on the route.

第１取得部１０は、車両２が通信エリアＡを跨いだ際に送信される、車両２からの位置登録信号に関連付けられている該通信エリアＡの位置を、車両２の現在の位置として取得する。より具体的には、第１取得部１０は、ネットワークＮＷを介してＵＤＲ３２の通信インターフェース３２ａから、車両２の通信端末２０が送信した、位置登録信号および、位置登録信号に関連付けられている情報を取得する。位置登録信号に関連付けられている情報には、通信端末２０のＳＩＭ２１、位置登録信号を受信した基地局ＢＳの識別情報または通信エリアＡの識別情報、および位置登録信号を発信された時刻のタイムスタンプが含まれる。 The first acquisition unit 10 acquires, as the current position of the vehicle 2, the position in the communication area A associated with the location registration signal from the vehicle 2, which is transmitted when the vehicle 2 crosses the communication area A. do. More specifically, the first acquisition unit 10 acquires a location registration signal and information associated with the location registration signal transmitted by the communication terminal 20 of the vehicle 2 from the communication interface 32a of the UDR 32 via the network NW. get. The information associated with the location registration signal includes the SIM 21 of the communication terminal 20, the identification information of the base station BS that received the location registration signal or the identification information of the communication area A, and the timestamp of the time when the location registration signal was transmitted. is included.

第１取得部１０は、後述の設定情報記憶部１４に記憶されている基地局ＢＳ１～ＢＳｎの識別情報または通信エリアＡ１～Ａｎの識別情報に関連付けられた、通信エリアＡ１～ＡｎのＧＰＳ座標の位置情報を参照し、位置登録信号が送信された時刻での車両２の位置として取得することができる。 The first acquisition unit 10 acquires GPS coordinates of communication areas A1 to An associated with identification information of base stations BS1 to BSn or identification information of communication areas A1 to An stored in a setting information storage unit 14, which will be described later. By referring to the position information, it is possible to obtain the position of the vehicle 2 at the time when the position registration signal was transmitted.

第２取得部１１は、車両２の移動空間において渋滞が発生している領域に対応する通信エリアＡの位置を取得する。第２取得部１１は、図示されない外部の交通情報サーバから、ネットワークＮＷを介して移動空間における渋滞情報や交通規制が発生している位置座標を取得することができる。第２取得部１１は、設定情報記憶部１４に記憶されている、移動空間に配置された通信エリアＡ１～Ａｎの位置情報を参照し、渋滞が発生している領域の座標に対応する通信エリアＡ１～Ａｎの位置を取得する。 The second acquisition unit 11 acquires the position of the communication area A corresponding to the area where traffic congestion occurs in the moving space of the vehicle 2. The second acquisition unit 11 can acquire traffic congestion information and position coordinates where traffic regulations are occurring in the moving space from an external traffic information server (not shown) via the network NW. The second acquisition unit 11 refers to the position information of the communication areas A1 to An arranged in the moving space, which is stored in the setting information storage unit 14, and the communication area corresponding to the coordinates of the area where the traffic jam is occurring. Obtain the positions of A1 to An.

第２取得部１１は、例えば、図２に示すように、渋滞が発生している領域Ｊに対応する通信エリアＡ７、Ａ１１の位置を取得する。渋滞や交通規制に係る通信エリアＡ７、Ａ１１は、車両２の進路制御を行う際には、進路選択から除外されるように設定される。 For example, as shown in FIG. 2, the second acquisition unit 11 acquires the positions of communication areas A7 and A11 corresponding to area J where traffic congestion occurs. The communication areas A7 and A11 related to congestion and traffic regulations are set to be excluded from route selection when performing route control for the vehicle 2.

第２取得部１１は、車両２の初期地点と目的地点とが設定され、進路制御を開始する時点で、渋滞や交通規制に係る通信エリアＡの位置を取得することができる。あるいは、第２取得部１１は、一定の周期で渋滞や交通規制に係る通信エリアＡの位置を取得することができる。 The second acquisition unit 11 can acquire the position of the communication area A related to congestion or traffic regulation at the time when the initial point and destination point of the vehicle 2 are set and route control is started. Alternatively, the second acquisition unit 11 can acquire the position of the communication area A related to congestion or traffic regulation at regular intervals.

学習部１２は、車両２が初期地点の通信エリアＡ１の位置から目的地点の通信エリアＡｎの位置に到達するまでに、各通信エリアＡ１～Ａｎの位置から順次進むべき進路を計算した推定結果に報酬関数を適用して、車両２が目的地点の通信エリアＡｎの位置へ到達するための報酬が最大化するように更新し、車両２が各通信エリアＡ１～Ａｎから順次進むべき進路の方策を、学習モデルを用いて学習する。 The learning unit 12 uses the estimated result of calculating the course that the vehicle 2 should take sequentially from the position of each communication area A1 to An until it reaches the position of the communication area An of the destination point from the position of the communication area A1 of the initial point. The reward function is applied to update so that the reward for the vehicle 2 to reach the destination point in the communication area An is maximized, and the route that the vehicle 2 should take sequentially from each communication area A1 to An is determined. , learn using a learning model.

本実施の形態では、車両２が各通信エリアＡ１～Ａｎの位置から順次進むべき進路の方策として、右折、左折、および直進の３つの行動を採用する場合を例示する。しかし、通信エリアＡのカバーする面積や、道路の形状に応じて配置されている通信エリアＡの間隔等に応じて、より詳細な行動を進路の方策として学習することができる。 In the present embodiment, a case will be exemplified in which the vehicle 2 adopts three actions: right turn, left turn, and straight ahead, as the route the vehicle 2 should take sequentially from the position of each communication area A1 to An. However, depending on the area covered by the communication area A, the interval between the communication areas A arranged according to the shape of the road, etc., more detailed actions can be learned as a course strategy.

本実施の形態では、学習部１２は、図３に示すような入力層、隠れ層、および出力層を含むニューラルネットワークモデルを学習モデルとして用いる。また、ニューラルネットワークモデルとして、車両２の位置である状態ｓ_ｔを受取り、全ての行動価値Ｑ（ｓ_ｔ，直進）、Ｑ（ｓ_ｔ，左折）、Ｑ（ｓ_ｔ，右折）を出力するニューラルネットワークであるＤｅｅｐＱ－Ｎｅｔｗｏｒｋ（ＤＱＮ）を採用する。 In this embodiment, the learning unit 12 uses a neural network model including an input layer, a hidden layer, and an output layer as shown in FIG. 3 as a learning model. In addition, as a neural network model, a neural network that receives the state s _t that is the position of the vehicle 2 and outputs all action values Q (s _t , go straight), Q (s _t , left turn), and Q (s _t , right turn) The network Deep Q-Network (DQN) is adopted.

より具体的には、学習部１２は、現在の車両２の位置を示す現在の通信エリアＡの位置をニューラルネットワークモデルの入力として与え、ニューラルネットワークモデルの演算を行い、車両２が現在の通信エリアＡの位置から次に進むべき進路として、右折、左折、および直進を含む各々の行動をとった場合に得られる将来の報酬の累積値の期待値を表す行動価値関数の第１推定値Ｑ１を出力する。 More specifically, the learning unit 12 provides the current position of the communication area A indicating the current position of the vehicle 2 as an input to the neural network model, calculates the neural network model, and determines whether the vehicle 2 is in the current communication area. The first estimated value Q1 of the action value function that represents the expected value of the cumulative value of future rewards obtained when each action including turning right, turning left, and going straight is taken as the next course from position A. Output.

報酬とは、車両２の現在の位置を示す状態ｓ、車両２が右折、左折、または直進する行動ａ、および車両２の次の位置、すなわち次の状態ｓ’の報酬関数ｒ＝ｒ（ｓ，ａ，ｓ’）で与えられる。本実施の形態では、報酬関数は、車両２の目的地点に係る通信エリアＡの位置への到達度、および車両２の渋滞が発生している領域に対応する通信エリアＡの位置への到達度を変数として含む。例えば、車両２の右左折直進に係る行動によって、目的地点により近づく場合や、目的地点に最短距離で到達する場合には、スカラー量である報酬が、より大きい値として設定される。 The reward is a state s indicating the current position of the vehicle 2, an action a in which the vehicle 2 turns right, turns left, or goes straight, and a reward function r=r(s , a, s'). In this embodiment, the reward function is the degree to which the vehicle 2 reaches the position in the communication area A corresponding to the destination point, and the degree to which the vehicle 2 reaches the position in the communication area A corresponding to the area where traffic congestion occurs. Contains as a variable. For example, when the vehicle 2 moves closer to the destination point or reaches the destination point by the shortest distance due to an action related to turning right or left or going straight, the reward, which is a scalar amount, is set to a larger value.

一方、車両２が目的地点に遠ざかる、あるいは、図２に示すように渋滞や交通規制に係る通信エリアＡ７、Ａ１１に到達する場合には、マイナスの報酬値（例えば、ｒ＝－１）が与えられる設計とすることができる。このように、渋滞や交通規制に係る通信エリアＡの報酬をマイナスの値として設定することで、車両２がこれらの地点を避けて目的地点に到達することができる。 On the other hand, if the vehicle 2 moves away from the destination point or reaches a communication area A7 or A11 related to traffic congestion or traffic regulation as shown in FIG. 2, a negative reward value (for example, r = -1) is given. The design can be made as follows. In this way, by setting the reward for communication area A related to traffic congestion and traffic regulation as a negative value, the vehicle 2 can avoid these points and reach the destination point.

さらに、学習部１２は、車両２が次に到達した通信エリアＡの位置をニューラルネットワークモデルの入力として与え、ニューラルネットワークモデルの演算を行い、行動価値関数の第２推定値Ｑ２を出力する。学習部１２は、第１推定値Ｑ１が、第２推定値Ｑ２から計算される目標値となるように、ニューラルネットワークモデルの重みパラメータを学習する。 Furthermore, the learning unit 12 provides the position of the communication area A that the vehicle 2 has next arrived at as an input to the neural network model, performs calculations on the neural network model, and outputs the second estimated value Q2 of the action value function. The learning unit 12 learns the weight parameters of the neural network model so that the first estimated value Q1 becomes the target value calculated from the second estimated value Q2.

ニューラルネットワークモデルの重みパラメータをθとし、行動価値関数をＱ（ｓ，ａ；θ）と表すと、学習の最小化損失関数は、次の式（１）で与えられる。
Ｌ（θ）＝１／２｛ｒ＋γｍａｘ_ａＱ（ｓ’，ａ’；θ）－Ｑ（ｓ，ａ；θ）｝^２
・・・（１） When the weight parameter of the neural network model is θ and the action value function is expressed as Q(s, a; θ), the learning minimization loss function is given by the following equation (1).
L (θ) = 1/2 {r + γmax _a Q (s', a'; θ) - Q (s, a; θ)} ²
...(1)

上式（１）において、ｒは、報酬（即時報酬）であり、γは割引率を示す。Ｑ（ｓ，ａ；θ）は、第１推定値Ｑ１に対応し、Ｑ（ｓ’，ａ’；θ）は、１ステップ進んだ状態ｓ’での行動価値すなわち第２推定値Ｑ２に対応する。目標値は、ｒ＋γｍａｘ_ａ’Ｑ（ｓ’，ａ’；θ）で表される。 In the above formula (1), r is a reward (immediate reward), and γ indicates a discount rate. Q(s, a; θ) corresponds to the first estimated value Q1, and Q(s', a'; θ) corresponds to the action value in the state s', which is one step advanced, that is, the second estimated value Q2. do. The target value is expressed as r+γmax _a' Q(s', a'; θ).

学習部１２は、上式（１）で与えられる損失関数の勾配を誤差逆伝搬することでニューラルネットワークモデルの重みパラメータを更新することができる。 The learning unit 12 can update the weight parameters of the neural network model by backpropagating the gradient of the loss function given by the above equation (1).

さらに具体的には、学習部１２は、図４に示すように、メインＱＮ１２１およびターゲットＱＮ１２３の２つのニューラルネットワークを用いるＦｉｘｅｄＴａｒｇｅｔＱ－Ｎｅｔｗｏｒｋを採用することができる。メインＱＮ１２１は最適な行動を選択して行動価値関数Ｑを更新する。一方、ターゲットＱＮ１２３は、行動の結果の次の状態ｓ’でとるべき行動ａ’の価値を推定および評価する。メインＱＮ１２１およびターゲットＱＮ１２３は、同一のレイヤ構造のニューラルネットワークを有するが、メインＱＮ１２１のパラメータは「θ」であり、ターゲットＱＮ１２３のパラメータは「θ^－」で与えられる。 More specifically, the learning unit 12 can employ a Fixed Target Q-Network that uses two neural networks, a main QN 121 and a target QN 123, as shown in FIG. The main QN 121 selects the optimal action and updates the action value function Q. On the other hand, the target QN 123 estimates and evaluates the value of the action a' to be taken in the next state s' as a result of the action. The main QN 121 and the target QN 123 have neural networks with the same layer structure, but the parameter of the main QN 121 is "θ", and the parameter of the target QN 123 is given by "θ ^- ".

メインＱＮ１２１は、環境１２０から車両２の現在の位置を状態ｓとして受け取る。環境１２０は、車両２が置かれた移動空間のシステムであり、この環境１２０下で、車両２は、右左折直進の行動ａをとることで別の通信エリアＡへ移動し、次の状態ｓ’に遷移すると同時に、環境１２０から報酬ｒを獲得する。 The main QN 121 receives the current position of the vehicle 2 from the environment 120 as the state s. The environment 120 is a moving space system in which the vehicle 2 is placed. Under this environment 120, the vehicle 2 moves to another communication area A by taking action a of turning right or left and going straight, and enters the next state s. 'At the same time, the reward r is obtained from the environment 120.

学習部１２は、車両２の現在の位置に係る状態ｓをメインＱＮ１２１に入力し、行動価値関数Ｑ（ｓ，ａ；θ）を求める。学習部１２は、例えば、ε－ｇｒｅｅｄｙ法を用いて行動ａを計算し、あるいは、現時点での最適な右左折直進の行動ａｒｇｍａｘ_ａＱ（ｓ，ａ；θ）を求める。環境１２０において、車両２は、現時点での最適な右左折直進の行動ａｒｇｍａｘ_ａＱ（ｓ，ａ；θ）を行う。環境１２０は、車両２が行動ａｒｇｍａｘ_ａＱ（ｓ，ａ；θ）を行った結果、移動した先の通信エリアＡの位置を次の状態ｓ’として観測し、報酬ｒを出力する。経験データ１２４は、環境１２０から出力された経験（ｓ，ａ，ｒ，ｓ’）を保存する。 The learning unit 12 inputs the state s related to the current position of the vehicle 2 to the main QN 121 and calculates the action value function Q(s, a; θ). The learning unit 12 calculates the behavior a using, for example, the ε-greedy method, or determines the current optimal behavior argmax _a Q(s, a; θ) of turning left, right, and going straight. In the environment 120, the vehicle 2 performs the current optimal behavior of turning left, right, and going straight argmax _a Q(s, a; θ). The environment 120 observes the position of the communication area A to which the vehicle 2 has moved as a next state s' as a result of the action argmax _a Q(s, a; θ), and outputs the reward r. Experience data 124 stores experience (s, a, r, s') output from environment 120.

学習部１２は、ＤＱＮ損失算出１２２において、損失関数Ｌを求め、損失関数Ｌの勾配でメインＱＮ１２１の重みを更新する。 The learning unit 12 calculates a loss function L in a DQN loss calculation 122, and updates the weight of the main QN 121 with the gradient of the loss function L.

学習部１２は、メインＱＮ１２１の重みを定期的にターゲットＱＮ１２３にコピーし同期を行う。ターゲットＱＮ１２３の同期は、メインＱＮ１２１の重みの更新頻度よりも低い頻度で行われる。学習部１２は、経験データ１２４から経験を取り出して、過去の状態をターゲットＱＮ１２３に入力し、推定値ｍａｘ_ａ’Ｑ（ｓ’，ａ’；θ^－）を出力させる。学習部１２は、ターゲットＱＮ１２３が出力した推定値ｍａｘ_ａ’Ｑ（ｓ’，ａ’；θ^－）に基づく目標値ｒ＋γｍａｘ_ａ’Ｑ（ｓ’，ａ’；θ^－）を用いて、ＤＱＮ損失算出１２２でメインＱＮ１２１の重みの学習を行う。 The learning unit 12 periodically copies the weight of the main QN 121 to the target QN 123 and performs synchronization. The synchronization of the target QN 123 is performed at a lower frequency than the weight update frequency of the main QN 121. The learning unit 12 extracts the experience from the experience data 124, inputs the past state to the target QN 123, and outputs the estimated value max _a' Q(s', a'; θ ⁻ ). The learning unit 12 calculates the DQN loss using the target value r+γmax _a _' Q(s', a'; θ ^- ) based on the estimated value max a'Q(s',a'; θ ^- ) output by the target QN 123. In calculation 122, learning of the weight of the main QN 121 is performed.

図１に戻り、学習モデル記憶部１３は、学習済みのニューラルネットワークモデルの重みを記憶する。 Returning to FIG. 1, the learning model storage unit 13 stores the weights of the learned neural network model.

設定情報記憶部１４は、車両２および通信端末２０の識別情報、車両２の進路制御を行う移動空間の位置情報、および移動空間に配置されている各通信エリアＡの位置情報が記憶されている。また、設定情報記憶部１４は、事前に取得された車両２の初期地点および目的地点の位置情報を記憶している。設定情報記憶部１４は、初期地点および目的地点の位置に対応する通信エリアＡの位置を記憶することができる。その他にも、設定情報記憶部１４は、移動空間の地図情報を記憶することができる。 The setting information storage unit 14 stores identification information of the vehicle 2 and the communication terminal 20, position information of a movement space where the course of the vehicle 2 is controlled, and position information of each communication area A arranged in the movement space. . Furthermore, the setting information storage unit 14 stores positional information of the initial point and destination point of the vehicle 2 that has been acquired in advance. The setting information storage unit 14 can store the positions of the communication area A corresponding to the positions of the initial point and the destination point. In addition, the setting information storage unit 14 can also store map information of the moving space.

決定部１５は、学習モデルを用いて学習された、車両２が各通信エリアＡの位置から順次進むべき進路の方策に基づいて、第１取得部１０によって取得された現在の通信エリアＡの位置から、車両２が次に進むべき進路を決定する。決定部１５は、学習モデル記憶部１３に記憶されている学習済みの重みを読み出して、現在の通信エリアＡの位置を学習済みのニューラルネットワークモデルに入力として与え、学習済みのニューラルネットワークモデルの演算を行い、次に進むべき進路として右左折直進のうち最適な行動を決定する。 The determining unit 15 determines the current position of the communication area A acquired by the first acquisition unit 10 based on the route the vehicle 2 should take sequentially from the position of each communication area A, which is learned using the learning model. From this, the next route for the vehicle 2 is determined. The determining unit 15 reads out the learned weights stored in the learning model storage unit 13, provides the current position of the communication area A as an input to the learned neural network model, and calculates the learned neural network model. The robot determines the best course to take next, either by turning left, right, or going straight.

進路制御部１６は、決定部１５によって決定された、車両２が次に進むべき進路を、コアネットワーク３を介して車両２に指示する。具体的には、進路制御部１６は、ＵＰＦ３３を介して、車両２に対して進路の指示を送信する。進路制御部１６は、車両２が目的地点の通信エリアに到達するまで、進路の指示を行う。 The route control unit 16 instructs the vehicle 2 via the core network 3 about the route that the vehicle 2 should take next, which has been determined by the determination unit 15 . Specifically, the route control unit 16 transmits a route instruction to the vehicle 2 via the UPF 33. The route control unit 16 instructs the vehicle 2 on the route until the vehicle 2 reaches the communication area of the destination point.

［進路制御装置のハードウェア構成］
次に、上述した機能を有する進路制御装置１を実現するハードウェア構成の一例について、図５を用いて説明する。 [Hardware configuration of route control device]
Next, an example of a hardware configuration that implements the route control device 1 having the above-described functions will be described using FIG. 5.

図５に示すように、進路制御装置１は、例えば、バス１０１を介して接続されるプロセッサ１０２、主記憶装置１０３、通信インターフェース１０４、補助記憶装置１０５、入出力Ｉ／Ｏ１０６を備えるコンピュータと、これらのハードウェア資源を制御するプログラムによって実現することができる。 As shown in FIG. 5, the route control device 1 includes, for example, a computer connected via a bus 101, including a processor 102, a main storage device 103, a communication interface 104, an auxiliary storage device 105, and an input/output I/O 106; This can be realized by a program that controls these hardware resources.

主記憶装置１０３には、プロセッサ１０２が各種制御や演算を行うためのプログラムが予め格納されている。プロセッサ１０２と主記憶装置１０３とによって、図１に示した第１取得部１０、第２取得部１１、学習部１２、決定部１５、進路制御部１６など進路制御装置１の各機能が実現される。 The main storage device 103 stores in advance programs for the processor 102 to perform various controls and calculations. The processor 102 and the main storage device 103 realize each function of the route control device 1, such as the first acquisition unit 10, second acquisition unit 11, learning unit 12, determination unit 15, and route control unit 16 shown in FIG. Ru.

通信インターフェース１０４は、進路制御装置１と各種外部電子機器との間をネットワーク接続するためのインターフェース回路である。 The communication interface 104 is an interface circuit for establishing a network connection between the route control device 1 and various external electronic devices.

補助記憶装置１０５は、読み書き可能な記憶媒体と、その記憶媒体に対してプログラムやデータなどの各種情報を読み書きするための駆動装置とで構成されている。補助記憶装置１０５には、記憶媒体としてハードディスクやフラッシュメモリなどの半導体メモリを使用することができる。 The auxiliary storage device 105 includes a readable and writable storage medium and a drive device for reading and writing various information such as programs and data to and from the storage medium. For the auxiliary storage device 105, a semiconductor memory such as a hard disk or a flash memory can be used as a storage medium.

補助記憶装置１０５は、進路制御装置１が実行する進路制御プログラムを格納するプログラム格納領域を有する。また、補助記憶装置１０５は、ニューラルネットワークモデルの学習を行うための学習プログラムを格納する領域を有する。補助記憶装置１０５によって、図１で説明した学習モデル記憶部１３、設定情報記憶部１４が実現される。さらには、例えば、上述したデータやプログラムなどをバックアップするためのバックアップ領域などを有していてもよい。 The auxiliary storage device 105 has a program storage area that stores a route control program executed by the route control device 1. Further, the auxiliary storage device 105 has an area for storing a learning program for learning a neural network model. The auxiliary storage device 105 realizes the learning model storage unit 13 and the setting information storage unit 14 described in FIG. Furthermore, for example, it may have a backup area for backing up the above-mentioned data, programs, and the like.

入出力Ｉ／Ｏ１０６は、外部機器からの信号を入力したり、外部機器へ信号を出力したりする入出力装置である。 The input/output I/O 106 is an input/output device that inputs signals from external devices and outputs signals to external devices.

［進路制御装置の動作］
次に、上述した構成を有する進路制御装置１の動作を、図６から図８のフローチャートを参照して説明する。 [Operation of course control device]
Next, the operation of the route control device 1 having the above-described configuration will be explained with reference to the flowcharts of FIGS. 6 to 8.

はじめに、図６を参照して、進路制御装置１による学習処理を説明する。まず、進路制御装置１は、車両２の初期地点および目的地点の位置を取得する（ステップＳ１）。例えば、進路制御装置１は、車両２のカーナビゲーションシステムに入力された目的地点および車両２の現在位置を取得することができる。 First, the learning process by the route control device 1 will be explained with reference to FIG. First, the route control device 1 acquires the positions of the initial point and destination point of the vehicle 2 (step S1). For example, the route control device 1 can acquire the destination point and the current position of the vehicle 2 input into the car navigation system of the vehicle 2.

次に、第２取得部１１は、移動空間において渋滞が発生している領域に対応する通信エリアの位置を取得する（ステップＳ２）。例えば、第２取得部１１は、外部の交通情報サーバから、渋滞情報や交通規制が発生している領域の位置情報を取得し、対応する通信エリアの位置を特定することができる。また、第２取得部１１が、渋滞や交通規制に係る通信エリアの位置を一定周期で取得する構成としてもよい。 Next, the second acquisition unit 11 acquires the position of the communication area corresponding to the area where traffic congestion occurs in the moving space (step S2). For example, the second acquisition unit 11 can acquire traffic congestion information or location information of an area where traffic restrictions are occurring from an external traffic information server, and can specify the location of the corresponding communication area. Alternatively, the second acquisition unit 11 may be configured to acquire the position of a communication area related to congestion or traffic regulation at regular intervals.

次に、第１取得部１０は、車両２の現在の位置として、車両２が在圏している現在の通信エリアＡの位置を取得する（ステップＳ３）。具体的には、第１取得部１０は、コアネットワーク３のＵＤＲ３２から、車両２が通信エリアを跨いだ際に送信した位置登録信号に関連付けられている通信エリアＡまたは基地局ＢＳの識別情報、およびタイムスタンプを取得する。第１取得部１０は、設定情報記憶部１４に記憶されている通信エリアＡあるいは基地局ＢＳの識別情報に関連付けられている通信エリアＡの位置情報を、車両２の現在の位置として取得する。 Next, the first acquisition unit 10 acquires the current position of the communication area A in which the vehicle 2 is located, as the current position of the vehicle 2 (step S3). Specifically, the first acquisition unit 10 obtains, from the UDR 32 of the core network 3, identification information of the communication area A or base station BS associated with the location registration signal transmitted when the vehicle 2 crosses the communication area; and get the timestamp. The first acquisition unit 10 acquires the communication area A stored in the setting information storage unit 14 or the position information of the communication area A associated with the identification information of the base station BS as the current position of the vehicle 2.

次に、学習部１２は、ステップＳ３で取得された車両２の現在の状態である、車両２が在圏している現在の通信エリアＡの位置をニューラルネットワークモデルに入力として与え、ニューラルネットワークモデルの演算を行って、車両２が現在の通信エリアＡから次に進むべき進路として、右折、左折、および直進を含む各々の行動をとった場合に得られる将来の報酬の累積値の期待値を表す行動価値関数の第１推定値Ｑ１を出力する（ステップＳ４）。 Next, the learning unit 12 inputs the current position of the communication area A in which the vehicle 2 is located, which is the current state of the vehicle 2 acquired in step S3, to the neural network model, and Calculate the expected value of the cumulative value of future rewards obtained when vehicle 2 takes each action including turning right, turning left, and going straight as the next course to proceed from the current communication area A. The first estimated value Q1 of the represented action value function is output (step S4).

さらに、学習部１２は、車両２が次に到達した通信エリアＡｎの位置をニューラルネットワークモデルの入力として与え、ニューラルネットワークモデルの演算を行い、行動価値関数の第２推定値Ｑ２を出力する（ステップＳ６）。学習部１２は、第２推定値Ｑ２から目標値を算出する（ステップＳ７）。続いて、学習部１２は、第１推定値Ｑ１が、第２推定値Ｑ２から計算される目標値となるように、ニューラルネットワークモデルの重みパラメータを学習する（ステップＳ８）。具体的には、学習部１２は、上式（１）の損失関数を最小化するようにニューラルネットワークモデルの重みパラメータを更新する。 Furthermore, the learning unit 12 provides the position of the communication area An that the vehicle 2 has reached next as an input to the neural network model, performs calculations on the neural network model, and outputs the second estimated value Q2 of the action value function (step S6). The learning unit 12 calculates a target value from the second estimated value Q2 (step S7). Subsequently, the learning unit 12 learns the weight parameters of the neural network model so that the first estimated value Q1 becomes the target value calculated from the second estimated value Q2 (step S8). Specifically, the learning unit 12 updates the weight parameters of the neural network model so as to minimize the loss function of equation (1) above.

学習モデル記憶部１３は、ステップＳ８で得られた学習済みの重みを記憶する（ステップＳ９）。 The learning model storage unit 13 stores the learned weights obtained in step S8 (step S9).

次に、図７を参照して、メインＱＮ１２１およびターゲットＱＮ１２３の２つのニューラルネットワークを用いるＦｉｘｅｄＴａｒｇｅｔＱ－Ｎｅｔｗｏｒｋを採用した場合の、学習部１２による学習処理を説明する。 Next, with reference to FIG. 7, learning processing by the learning unit 12 will be described when a Fixed Target Q-Network using two neural networks, the main QN 121 and the target QN 123, is adopted.

ステップＳ１からステップＳ３までの処理は、図６で説明した学習処理のステップと同様である。その後、学習部１２は、メインＱＮ１２１にステップＳ３で取得された、車両２が在圏している通信エリアＡの位置を入力として与え、ニューラルネットワークの演算を行って、行動価値関数Ｑを出力し、次に進むべき進路ａを計算する（ステップＳ１２０）。 The processes from step S1 to step S3 are similar to the steps of the learning process explained with reference to FIG. After that, the learning unit 12 inputs the position of the communication area A in which the vehicle 2 is located, which was acquired in step S3, to the main QN 121, performs neural network calculations, and outputs the action value function Q. , calculates the next course a to follow (step S120).

次に、学習部１２は、ステップＳ１２０で求めた進路ａで車両２の行動を環境１２０に返し、次の車両２の状態ｓ’である、車両２が進んだ先の通信エリアＡの位置および報酬ｒを得る（ステップＳ１２１）。なお、報酬関数で与えられる報酬ｒは、ステップＳ２において一定周期で取得される、渋滞が発生している通信エリアＡの位置への到達度が随時反映される構成とすることができる。 Next, the learning unit 12 returns the behavior of the vehicle 2 along the route a obtained in step S120 to the environment 120, and determines the position of the communication area A to which the vehicle 2 has proceeded, which is the next state s' of the vehicle 2. Obtain a reward r (step S121). Note that the reward r given by the reward function can be configured to reflect the degree of arrival at the position of the communication area A where the traffic jam is occurring, which is obtained at regular intervals in step S2, at any time.

学習部１２は、ステップＳ１２１で得られた経験（ｓ，ａ，ｒ，ａ’）を経験データ１２４に保存する（ステップＳ１２２）。次に、学習部１２は、ＤＱＮ損失算出１２２において、損失関数Ｌを求め、損失関数Ｌの勾配でメインＱＮ１２１の重みを更新する（ステップＳ１２３）。学習部１２は、ステップＳ１２０からステップＳ１２３までの処理を設定された回数繰り返す。 The learning unit 12 stores the experience (s, a, r, a') obtained in step S121 in the experience data 124 (step S122). Next, the learning unit 12 calculates the loss function L in the DQN loss calculation 122, and updates the weight of the main QN 121 with the slope of the loss function L (step S123). The learning unit 12 repeats the processing from step S120 to step S123 a set number of times.

その後、学習部１２は、メインＱＮ１２１の重みを定期的にターゲットＱＮ１２３にコピーし同期を行う（ステップＳ１２４）。ターゲットＱＮ１２３の同期は、メインＱＮ１２１の重みの更新頻度よりも低い頻度で行われる。次に、学習部１２は、経験データ１２４から経験を取り出して、過去の状態をターゲットＱＮ１２３に入力し、推定値ｍａｘ_ａ’Ｑ（ｓ’，ａ’；θ^－）を出力させる（ステップＳ１２６）。 After that, the learning unit 12 periodically copies the weight of the main QN 121 to the target QN 123 and performs synchronization (step S124). The synchronization of the target QN 123 is performed at a lower frequency than the weight update frequency of the main QN 121. Next, the learning unit 12 extracts the experience from the experience data 124, inputs the past state to the target QN 123, and outputs the estimated value max _a' Q(s', a'; θ ^- ) (step S126). .

次に、学習部１２は、ターゲットＱＮ１２３が出力した推定値ｍａｘ_ａ’Ｑ（ｓ’，ａ’；θ^－）に基づく目標値ｒ＋γｍａｘ_ａ’Ｑ（ｓ’，ａ’；θ^－）を計算する（ステップＳ１２７）。次に、学習部１２は、ステップＳ１２７で算出された目標値を用いて、ＤＱＮ損失算出１２２で損失関数Ｌを計算する（ステップＳ１２８）。次に、学習部１２は、損失関数Ｌで与えられる損失を最小化するようにメインＱＮ１２１の重みの学習を行う（ステップＳ１２９）。その後、学習済みの重みを学習モデル記憶部１３に記憶する（ステップＳ９）。 Next, the learning unit 12 calculates the target value r+γmax _a _' Q(s', a'; θ ^- ) based on the estimated value max a'Q(s',a'; θ ^- ) output by the target QN 123. (Step S127). Next, the learning unit 12 calculates the loss function L in the DQN loss calculation 122 using the target value calculated in step S127 (step S128). Next, the learning unit 12 learns the weights of the main QN 121 so as to minimize the loss given by the loss function L (step S129). Thereafter, the learned weights are stored in the learning model storage unit 13 (step S9).

次に、図８を参照し、進路制御装置１による進路制御処理を説明する。まず、決定部１５は、学習モデル記憶部１３から学習済みのニューラルネットワークモデルをロードする（ステップＳ４０）。本実施の形態では、決定部１５は、学習済みのＤＱＮをロードする。次に、第１取得部１０は、車両２の現在の位置である、車両２が在圏している通信エリアＡの位置を取得する（ステップＳ４１）。 Next, with reference to FIG. 8, route control processing by the route control device 1 will be described. First, the determining unit 15 loads the trained neural network model from the learning model storage unit 13 (step S40). In this embodiment, the determining unit 15 loads the learned DQN. Next, the first acquisition unit 10 acquires the current position of the vehicle 2, which is the position of the communication area A in which the vehicle 2 is located (step S41).

次に、決定部１５は、ステップＳ４０でロードした学習済みのニューラルネットワークモデル、すなわち車両２が各通信エリアＡの位置から順次進むべき進路の方策に基づいて車両２が現在の通信エリアの位置から次に進むべき進路を決定する（ステップＳ４２）。具体的には、決定部１５は、学習済みのニューラルネットワークモデルにステップＳ４１で取得した現在の通信エリアＡの位置を入力として与え、学習済みのニューラルネットワークモデルの演算を行って、車両２が次に進む右左折直進のうちのいずれかの進路を決定する。決定部１５は、学習済みのニューラルネットワークモデルから出力される右折、左折、直進の行動ごとの行動価値関数Ｑのうち最も確率値が高い行動を選択し、進路として決定する。 Next, the determining unit 15 determines whether the vehicle 2 should move from the current communication area position based on the learned neural network model loaded in step S40, that is, the route the vehicle 2 should take sequentially from the position of each communication area A. The next course to follow is determined (step S42). Specifically, the determining unit 15 inputs the current position of the communication area A obtained in step S41 to the trained neural network model, performs calculations on the trained neural network model, and determines whether the vehicle 2 Decide which route to take: go right, left, or go straight. The determining unit 15 selects the action with the highest probability value from among the action value functions Q for each action of turning right, turning left, and going straight, which are output from the learned neural network model, and determines the action as the course.

その後、進路制御部１６は、コアネットワーク３のＵＰＦ３３を介して車両２に対して、ステップＳ４２で決定した車両２の進路を指示する（ステップＳ４３）。車両２は、進路の指示を受けると、車両２が備えるＥＣＵ２２は進路の指示にしたがって、各制御アクチュエータに制御指令を出力することで、車両２は、次の通信エリアＡに移動する。 Thereafter, the route control unit 16 instructs the vehicle 2, via the UPF 33 of the core network 3, about the route of the vehicle 2 determined in step S42 (step S43). When the vehicle 2 receives the route instruction, the ECU 22 included in the vehicle 2 outputs a control command to each control actuator according to the route instruction, so that the vehicle 2 moves to the next communication area A.

次に、車両２が目的地点に到達した場合には、処理は終了する（ステップＳ４４：ＹＥＳ）。一方、車両２が目的地点に到達していない場合には（ステップＳ４４：ＮＯ）、ステップＳ４１からステップＳ４３までの処理を繰り返す。例えば、車両２が進路の指示にしたがって移動した先の通信エリアＡを跨いだ際に送信される位置登録信号により、関連付けられている通信エリアＡの位置が、目的地点として設定されている通信エリアＡの位置と一致するか否かに基づいて、目的地点への到達の有無を判定することができる。 Next, when the vehicle 2 reaches the destination point, the process ends (step S44: YES). On the other hand, if the vehicle 2 has not reached the destination point (step S44: NO), the processes from step S41 to step S43 are repeated. For example, a communication area in which the position of the associated communication area A is set as the destination point by a location registration signal transmitted when the vehicle 2 crosses the communication area A to which it has moved in accordance with route instructions. Based on whether or not the location matches the location of A, it is possible to determine whether or not the destination point has been reached.

以上説明したように、本実施の形態に係る進路制御装置１によれば、車両２が通信エリアＡを跨ぐ際に送出される位置登録信号に関連付けられた基地局ＢＳの位置すなわち通信エリアＡの位置を車両２の現在の位置として取得し、学習済みのニューラルネットワークによって獲得された進路の方策に基づいて、次に進むべき進路を決定する。そのため、より簡易な構成により遠隔で車両２の進路制御を行うことができる。 As explained above, according to the route control device 1 according to the present embodiment, the location of the base station BS associated with the location registration signal sent when the vehicle 2 crosses the communication area A, that is, the location of the communication area A. The position is acquired as the current position of the vehicle 2, and the next course to be taken is determined based on the course strategy obtained by the learned neural network. Therefore, the course of the vehicle 2 can be controlled remotely with a simpler configuration.

また、本実施の形態に係る進路制御装置１によれば、位置登録信号を利用して、学習済みのニューラルネットワークによって獲得された進路の方策に基づいて次に進むべき進路を決定するため、測位衛星からの電波が受信できない状況においても、自車位置推定を行い、最適な進路を選択することができる。 Further, according to the route control device 1 according to the present embodiment, the position registration signal is used to determine the next route to be taken based on the route strategy acquired by the trained neural network. Even in situations where radio waves from satellites cannot be received, the vehicle's position can be estimated and the optimal course selected.

また、本実施の形態に係る進路制御装置１によれば、学習モデルとしてＤＱＮを採用するため、渋滞や交通規制に係る地点を避けた進路の選択ができる。 Further, according to the route control device 1 according to the present embodiment, since DQN is employed as a learning model, it is possible to select a route that avoids traffic jams and points related to traffic regulations.

なお、上述の実施の形態では、５Ｇに準拠する進路制御システムである場合を例示したが、ＬＴＥや６Ｇに準拠する進路制御システムであってもよい。 In addition, in the above-mentioned embodiment, although the case where it is a route control system based on 5G was illustrated, the route control system based on LTE or 6G may be sufficient.

以上、本発明の進路制御装置および進路制御方法における実施の形態について説明したが、本発明は説明した実施の形態に限定されるものではなく、請求項に記載した発明の範囲において当業者が想定し得る各種の変形を行うことが可能である。 Although the embodiments of the route control device and route control method of the present invention have been described above, the present invention is not limited to the described embodiments, and those skilled in the art can imagine it within the scope of the invention described in the claims. Various possible modifications can be made.

１…進路制御装置、１０…第１取得部、１１…第２取得部、１２…学習部、１３…学習モデル記憶部、１４…設定情報記憶部、１５…決定部、１６…進路制御部、２…車両、２０…通信端末、２１…ＳＩＭ、２２…ＥＣＵ、３…コアネットワーク、３０…ＡＭＦ、３１…ＵＤＭ、３２…ＵＤＲ、３３…ＵＰＦ、１０１…バス、１０２…プロセッサ、１０３…主記憶装置、３２ａ、３３ａ、１０４…通信インターフェース、１０５…補助記憶装置、１０６…入出力Ｉ／Ｏ、１２０…環境、１２１…メインＱＮ、１２２…ＤＱＮ損失算出、１２３…ターゲットＱＮ、１２４…経験データ、ＢＳ１～ＢＳｎ…基地局、Ａ１～Ａｎ…通信エリア、Ｌ、ＮＷ…ネットワーク。
DESCRIPTION OF SYMBOLS 1... Course control device, 10... First acquisition part, 11... Second acquisition part, 12... Learning part, 13... Learning model storage part, 14... Setting information storage part, 15... Determination part, 16... Course control part, 2... Vehicle, 20... Communication terminal, 21... SIM, 22... ECU, 3... Core network, 30... AMF, 31... UDM, 32... UDR, 33... UPF, 101... Bus, 102... Processor, 103... Main memory Device, 32a, 33a, 104... Communication interface, 105... Auxiliary storage device, 106... Input/output I/O, 120... Environment, 121... Main QN, 122... DQN loss calculation, 123... Target QN, 124... Experience data, BS1 to BSn...Base station, A1 to An...Communication area, L, NW...Network.

Claims

A route control device that controls the route of the vehicle from a communication area position at an initial point to a communication area position at a destination point in a vehicle movement space defined by a plurality of communication areas,
a first acquisition unit configured to acquire, as the current position of the vehicle, a position in the communication area associated with a location registration signal transmitted from the vehicle when crossing the communication area;
of the current communication area, which is acquired as the current position of the vehicle by the first acquisition unit, based on a course that the vehicle should take sequentially from the position of each communication area, which is learned using a learning model. a determining unit configured to determine, from the position, the next course the vehicle should take;
A route control device comprising: a route control unit configured to instruct the vehicle on the next route determined by the determination unit, via a core network of a predetermined communication standard.

The course control device according to claim 1,
Further, a reward function is applied to an estimation result of calculating a course that the vehicle should sequentially take from the position of each communication area until the vehicle reaches the position of the communication area of the destination point from the position of the communication area of the initial point. is applied so that the reward for the vehicle to reach the communication area position of the destination point is maximized, and the route policy for the vehicle to proceed sequentially from the communication area position is updated, a learning unit configured to learn using the learning model;
a storage unit configured to store the course strategy learned by the learning unit;
The route control device according to claim 1, wherein the determining unit reads out the route strategy from the storage unit and determines the route the vehicle should take next.

The course control device according to claim 2,
Further, a second acquisition unit configured to acquire the position of a communication area corresponding to an area where traffic congestion occurs in the moving space;
The reward function includes, as variables, the degree of arrival of the vehicle to a position in a communication area related to the destination point, and the degree of arrival of the vehicle to a position in a communication area corresponding to the area where the traffic jam occurs. A course control device featuring:

The course control device according to claim 3,
The learning model is a neural network model including an input layer, a hidden layer, and an output layer,
The learning unit provides the current communication area position as an input to the neural network model, performs calculations on the neural network model, and determines a right turn as the next course the vehicle should take from the current communication area position. outputting a first estimated value of an action value function representing an expected value of the future cumulative value of the reward obtained when each action including turning left and going straight is taken;
The learning unit further provides a position of a communication area that the vehicle next arrives at as an input to the neural network model, performs calculations on the neural network model, and outputs a second estimated value of the action value function,
The learning unit learns weight parameters of the neural network model so that the first estimated value becomes a target value calculated from the second estimated value,
The course control device, wherein the storage unit stores learned weight parameters.

The course control device according to any one of claims 1 to 4,
The first acquisition unit acquires the current communication area position of the vehicle from an integrated data repository included in the core network,
The route control device, wherein the route control unit transmits an instruction regarding the next route to be taken to the vehicle via a user plane function included in the core network.

A route control method for controlling the route of a vehicle from a communication area position at an initial point to a communication area position at a destination point in a vehicle movement space defined by a plurality of communication areas, the method comprising:
a first acquisition step of acquiring, as the current position of the vehicle, a position in the communication area associated with a location registration signal transmitted from the vehicle when crossing the communication area;
The current communication area acquired as the current position of the vehicle in the first acquisition step is based on the route the vehicle should take sequentially from the position of each communication area, which is learned using a learning model. determining the next course the vehicle should take from the location;
A route control method comprising: a route control step of instructing the vehicle, via a core network of a predetermined communication standard, on the route the vehicle should take next, determined in the determining step.

In the course control method according to claim 6,
Further, a reward function is applied to an estimation result of calculating a course that the vehicle should sequentially take from the position of each communication area until the vehicle reaches the position of the communication area of the destination point from the position of the communication area of the initial point. is applied so that the reward for the vehicle to reach the communication area position of the destination point is maximized, and the route policy for the vehicle to proceed sequentially from the communication area position is updated, a learning step configured to learn using the learning model;
a storage step of storing the course strategy learned in the learning step in a storage unit;
The route control method, wherein the determining step reads the route plan from the storage unit and determines the route the vehicle should take next.

In the course control method according to claim 7,
Further, a second acquisition step configured to acquire a position of a communication area corresponding to an area where traffic congestion occurs in the moving space;
The reward function includes, as variables, the degree of arrival of the vehicle to a position in a communication area related to the destination point, and the degree of arrival of the vehicle to a position in a communication area corresponding to the area where the traffic jam occurs. A path control method characterized by:

In the course control method according to claim 8,
The learning model is a neural network model including an input layer, a hidden layer, and an output layer,
In the learning step, the position of the current communication area is given as an input to the neural network model, the neural network model is operated, and the next course the vehicle should take from the current communication area is to turn right or turn left. , and outputting a first estimated value of an action value function representing the expected value of the cumulative value of the future reward obtained when each action including walking straight is taken;
The learning step further includes providing the position of the next communication area that the vehicle has arrived at as an input to the neural network model, performing calculations on the neural network model, and outputting a second estimated value of the action value function;
The learning step learns weight parameters of the neural network model so that the first estimated value becomes a target value calculated from the second estimated value,
The course control method, wherein the storing step stores learned weight parameters in the storage unit.

In the route control method according to any one of claims 6 to 9,
The first obtaining step obtains the current communication area position of the vehicle from an integrated data repository included in the core network;
The route control method is characterized in that, in the route control step, an instruction regarding the next route to be taken is transmitted to the vehicle via a user plane function included in the core network.