KR102461362B1

KR102461362B1 - Control server that generates route guidance data through traffic prediction based on reinforcement learning

Info

Publication number: KR102461362B1
Application number: KR1020220044085A
Authority: KR
Inventors: 탁세현; 이동훈; 이종덕
Original assignee: 한국교통연구원
Priority date: 2021-12-28
Filing date: 2022-04-08
Publication date: 2022-11-01
Also published as: KR102497716B1; WO2023128406A1; KR102508367B1

Abstract

According to an embodiment of the present invention, a control server receives, in real-time, vehicle location data and travel data from a vehicle module and traffic data for each road section from a traffic sensor. The control server capable of providing a reliable starting point-arrival point movement route by considering a travel time related to uncertain traffic conditions comprises: a database which stores the vehicle location data and travel data, and stores the traffic data as real-time traffic data and past traffic data; a traffic prediction module which generates a first weight through deep learning on the past traffic data and predicts a future speed for each road section by applying the first weight to the real-time traffic data; and a traffic prediction routing module which generates a second weight by applying reinforcement learning to the past traffic data and generates route guidance data by applying the second weight to the vehicle location data, the travel data, and the future speed for each road section.

Description

{CONTROL SERVER THAT GENERATES ROUTE GUIDANCE DATA THROUGH TRAFFIC PREDICTION BASED ON REINFORCEMENT LEARNING}

본 발명은 차량의 내비게이션 시스템에 관한 것으로, 더욱 상세하게는 강화학습 기반의 교통 예측을 통해 경로 안내 데이터를 생성하여 차량에 제공하는 관제 서버에 관한 것이다. The present invention relates to a vehicle navigation system, and more particularly, to a control server that generates and provides route guidance data to a vehicle through traffic prediction based on reinforcement learning.

차량 내비게이션 시스템은 출발지(Origin)에서 목적지(Destination)까지의 전역 경로를 제공하도록 설계된 경로 안내 시스템이다. 출발지-목적지(Origin-to-Destination: 이하, OD)의 이동 경로는 차량 내비게이션 시스템에 내장된 경로 계획 알고리즘에 의해 결정된다. 출발지에서 목적지까지의 이동 시간을 최소화하는 최적의 OD 경로를 제공하기 위해 지난 수십 년 동안 다양하고 새로운 라우팅(Routing) 알고리즘을 개발하기 위해 많은 노력이 있어왔다. 그러나 예기치 못한 사건으로 인한 비반복적 교통 혼잡에서 미래 교통 상황과 관련된 불확실성과 관련하여 몇 가지 문제가 여전히 존재한다. A vehicle navigation system is a route guidance system designed to provide a global route from an origin to a destination. The moving route of the origin-to-destination (hereinafter, OD) is determined by a route planning algorithm built into the vehicle navigation system. In order to provide an optimal OD route that minimizes the travel time from the origin to the destination, many efforts have been made over the past few decades to develop various new routing algorithms. However, some problems still exist regarding uncertainty related to future traffic conditions in non-recurring traffic congestion caused by unforeseen events.

예컨대, 예측 교통 정보를 포함하는 경로 스케줄에 대한 종래의 기술들은 예측 모델의 예측 오류가 낮다고 가정했기 때문에 이동 경로는 예측 정확도에 크게 영향을 받을 수 있다. 또한, 딥러닝 기술을 사용하여 개발된 트래픽 예측 모델이 많이 있음에도 불구하고 특히 반복되지 않는 교통 체증에서 예기치 않은 이벤트의 영향으로 예측 기능이 좋지 않아 최적의 경로를 결정하기가 여전히 어려운 실정이다. For example, since conventional techniques for a route schedule including predicted traffic information assume that a prediction error of a prediction model is low, a moving route may be greatly affected by prediction accuracy. In addition, although there are many traffic prediction models developed using deep learning technology, it is still difficult to determine the optimal route due to the poor prediction function due to the influence of unexpected events, especially in non-repeating traffic jams.

(1) 한국 등록특허공보 10-1843683 (2018.03.23)(1) Korean Patent Publication No. 10-1843683 (Mar. 23, 2018) (2) 한국 공개특허공보 10-2022-0019204 (2022.02.16)(2) Korean Patent Publication No. 10-2022-0019204 (2022.02.16)

본 발명의 목적은, 본 발명의 목적은 불확실한 교통 상황과 관련된 이동 시간을 고려하여 신뢰할 수 있는 출발지-목적지 이동 경로를 제공하는 차량용 내비게이션 시스템을 제공하는 데 있다. SUMMARY OF THE INVENTION It is an object of the present invention to provide a vehicle navigation system that provides a reliable source-destination travel route in consideration of travel time related to an uncertain traffic situation.

본 발명의 일 실시 예에 따른 차량 모듈로부터 차량 위치 데이터와 여행 데이터를, 그리고 트래픽 감지기로부터 도로 구간별 트래픽 데이터를 실시간으로 수신하는 관제 서버는, 상기 차량 위치 데이터와 여행 데이터를 저장하고, 상기 트래픽 데이터를 실시간 트래픽 데이터와 과거 트래픽 데이터로 저장하는 데이터베이스, 상기 과거 트래픽 데이터에 대한 딥러닝 학습을 통해 제 1 가중치를 생성하고, 상기 제 1 가중치를 상기 실시간 트래픽 데이터에 적용하여 상기 도로 구간별 미래 속도를 예측하는 트래픽 예측 모듈, 그리고 상기 과거 트래픽 데이터에 대한 강화학습을 적용하여 제 2 가중치를 생성하고, 상기 제 2 가중치를 상기 차량 위치 데이터, 상기 여행 데이터, 그리고 상기 도로 구간별 미래 속도를 적용하여 경로 안내 데이터를 생성하는 트래픽 예측 라우팅 모듈을 포함한다A control server that receives vehicle location data and travel data from a vehicle module and traffic data for each road section from a traffic sensor in real time from a vehicle module according to an embodiment of the present invention stores the vehicle location data and travel data, and A database that stores data as real-time traffic data and historical traffic data, generates a first weight through deep learning learning for the past traffic data, and applies the first weight to the real-time traffic data to apply the first weight to the future speed for each road section A traffic prediction module to predict Includes a traffic prediction routing module that generates route guidance data.

이 실시 예에서, 상기 트래픽 감지기는 C-ITS 검출기와 ITS 검출기를 포함하는 하이브리드형 트래픽 감지기에 대응한다.In this embodiment, the traffic detector corresponds to a hybrid traffic detector comprising a C-ITS detector and an ITS detector.

이 실시 예에서, 상기 트래픽 예측 모듈은, 상기 과거 트래픽 데이터를 전처리하여 데이터 세트를 구성한 후 딥러닝 모델을 학습하여 상기 제 1 가중치를 생성하는 제 1 일괄 프로세스와, 상기 제 1 가중치를 기반으로 예측 지평선(Prediction horizon)에 대한 상기 도로 구간별 미래 속도를 예측하는 제 1 실시간 프로세스를 포함하되, 상기 제 1 실시간 프로세스는 그래프 웨이브넷(Graph WaveNet)을 예측 모델로 사용한다. In this embodiment, the traffic prediction module includes a first batch process of generating the first weight by learning a deep learning model after constructing a data set by preprocessing the past traffic data, and predicting based on the first weight A first real-time process for predicting the future speed for each road section with respect to a prediction horizon, wherein the first real-time process uses a Graph WaveNet as a prediction model.

이 실시 예에서, 상기 트래픽 예측 라우팅 모듈은, 상기 과거 트래픽 데이터를 시뮬레이션 데이터 세트로 구성한 후 강화학습 모델을 학습하여 상기 제 2 가중치를 생성하는 제 2 일괄 프로세스와, 상기 제 2 가중치를 기반으로 상기 도로 구간별 미래 속도와 상기 차량 위치 데이터, 상기 여행 데이터를 강화학습 기반의 트래픽 예측 차량 라우팅 알고리즘에 적용하여 상기 경로 안내 데이터를 생성하는 제 2 실시간 프로세스를 포함한다.In this embodiment, the traffic prediction routing module comprises a second batch process for generating the second weight by learning a reinforcement learning model after configuring the past traffic data as a simulation data set; and a second real-time process of generating the route guidance data by applying the future speed for each road section, the vehicle location data, and the travel data to a traffic prediction vehicle routing algorithm based on reinforcement learning.

이 실시 예에서, 상기 경로 안내 데이터를 생성하는 상기 트래픽 예측 차량 라우팅 알고리즘은, 에이전트(Agent), 동작(Action), 상태(State) 및 보상 함수(Reward function)를 포함하고, 상기 보상 함수는 예상 이동 시간과 실제 이동 시간 사이에 허용 가능한 갭이 있는지 판단하기 위한 예측 보상을 포함한다.In this embodiment, the traffic prediction vehicle routing algorithm for generating the route guidance data includes an agent, an action, a state, and a reward function, and the reward function is expected Includes predictive compensation to determine if there is an acceptable gap between travel time and actual travel time.

상술한 본 발명의 실시 예에 따르면, 불확실한 교통 상황에도 이동 시간을 최소화할 수 있는 신뢰할 수 있는 출발-목적지 이동 경로를 제공하는 차량 내비게이션 시스템을 구현할 수 있다. According to the above-described embodiment of the present invention, it is possible to implement a vehicle navigation system that provides a reliable departure-destination movement route capable of minimizing travel time even in an uncertain traffic situation.

도 1은 본 발명의 실시 예에 따른 차량 내비게이션 시스템을 간략히 보여주는 도면이다.
도 2는 도 1의 차량 내비게이션 시스템의 주요 구성 및 데이터의 이동을 간략히 보여주는 블록도이다.
도 3은 도 1에 도시된 관제 서버의 경로 예측 데이터를 생성하는 방법을 간략히 보여주는 순서도이다.
도 4는 본 발명의 관제 서버에서 경로 예측 데이터를 생성하기 위한 트래픽 예측 모듈과 트래픽 예측 라우팅 모듈의 구성 및 동작을 보여주는 블록도이다.
도 5는 강화학습 기반의 트래픽 예측 차량 라우팅 알고리즘(RL-TPVR)에서 트래픽 예측 라우팅의 개념을 보여준다.
도 6은 본 발명의 트래픽 예측 라우팅 알고리즘의 성능을 평가하기 위한 시뮬레이션에 사용된 도로망과 교통 수요 패턴을 보여주는 도면이다.
도 7은 도 6의 도로망과 교통 수요 패턴에서 교통 체증의 반복 및 비 반복적인 혼잡 상황에 대한 예들을 시나리오별로 보여준다.
도 8은 본 발명의 RL-TPVR의 트래픽 예측 모델에 사용된 하이퍼파라미터 값의 세부사항은 보여주는 테이블이다.
도 9는 본 발명의 RL-TPVR 트래픽 예측 라우팅 모델에 사용된 하이퍼파라미터 값을 보여주는 테이블이다.
도 10은 RL-TPVR의 트래픽 예측 라우팅 모델을 훈련할 때 연속 5회에 걸쳐 이동 평균을 사용하여 측정한 평균 보상을 보여주는 그래프이다.
도 11은 다양한 시나리오 사례에서 예측 보상이 있는 경우와 없는 경우 RL-TPVR 알고리즘 간에 OD 이동 시간 비교 결과를 보여주는 그래프이다.
도 12는 시나리오별 예측 오차가 다른 RL-TPVR 간의 성능 격차를 보여주는 그래프이다.
도 13은 각 시나리오별로 여러 알고리즘들의 경로 안내의 예들을 보여주는 도면들이다.
도 14는 다양한 시나리오에서 각 알고리즘에 대한 OD 이동 시간의 평균 및 표준편차를 보여주는 표이다.
도 15는 각 시나리오의 OD 이동 시간에 대한 단측 Wilcoxon 부호 순위 검정에 대한 p-값을 보여주는 표이다.1 is a diagram schematically illustrating a vehicle navigation system according to an embodiment of the present invention.
FIG. 2 is a block diagram schematically illustrating a main configuration and movement of data of the vehicle navigation system of FIG. 1 .
3 is a flowchart schematically illustrating a method of generating path prediction data of the control server shown in FIG. 1 .
4 is a block diagram showing the configuration and operation of a traffic prediction module and a traffic prediction routing module for generating path prediction data in the control server of the present invention.
5 shows the concept of traffic prediction routing in a reinforcement learning-based traffic prediction vehicle routing algorithm (RL-TPVR).
6 is a diagram showing a road network and traffic demand pattern used in simulation for evaluating the performance of the traffic prediction routing algorithm of the present invention.
7 shows examples of repetitive and non-repetitive congestion situations of traffic jams in the road network and traffic demand pattern of FIG. 6 for each scenario.
8 is a table showing details of hyperparameter values used in the traffic prediction model of RL-TPVR according to the present invention.
9 is a table showing hyperparameter values used in the RL-TPVR traffic prediction routing model of the present invention.
10 is a graph showing average compensation measured using a moving average over five consecutive times when training a traffic prediction routing model of RL-TPVR.
11 is a graph showing OD movement time comparison results between RL-TPVR algorithms with and without predictive compensation in various scenario cases.
12 is a graph showing a performance gap between RL-TPVRs having different prediction errors for each scenario.
13 is a diagram showing examples of route guidance of various algorithms for each scenario.
14 is a table showing the mean and standard deviation of OD transit times for each algorithm in various scenarios.
15 is a table showing the p-value for the one-sided Wilcoxon signed rank test for the OD transit time of each scenario.

이하, 본 발명의 일부 실시 예들을 예시적인 도면을 참조하여 상세하게 설명한다. 각 도면의 구성 요소들에 참조 부호를 부가함에 있어서, 동일한 구성 요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가질 수 있다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략할 수 있다.Hereinafter, some embodiments of the present invention will be described in detail with reference to exemplary drawings. In adding reference numerals to components of each drawing, the same components may have the same reference numerals as much as possible even though they are indicated in different drawings. In addition, in describing the present invention, when it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description may be omitted.

도 1은 본 발명의 실시 예에 따른 차량 내비게이션 시스템을 간략히 보여주는 도면이다. 도 1을 참조하면, 차량 내비게이션 시스템(1000)은 차량 모듈(1100), 트래픽 감지기(1200), 노변 기지국(1300), 클라우드 서버(1400), 그리고 관제 서버(1500)를 포함한다.1 is a diagram schematically illustrating a vehicle navigation system according to an embodiment of the present invention. Referring to FIG. 1 , a vehicle navigation system 1000 includes a vehicle module 1100 , a traffic detector 1200 , a roadside base station 1300 , a cloud server 1400 , and a control server 1500 .

차량 모듈(1100)은 위치 센서와 내비게이션 장치를 포함할 수 있다. 차량 모듈(1100)의 위치 센서는 차량 모듈(1100)은 차량 또는 자율 주행차에 장착되어 차량 위치 정보를 노변 기지국(1300)이나 클라우드 서버(1400)로 전송한다. 여기서, 차량 위치 정보는 서비스 이용자의 정확한 위치 정보를 나타낸다. 위치 센서는, 예를 들면, GPS(Global Positioning System) 센서일 수 있다. 내비게이션 장치는 자율 주행차의 출발지, 목적지, 출발 시간과 같은 여행 정보를 입력하는 데 사용된다. 여행 정보는 출발지, 목적지, 출발 시간을 기술한다. 여행 정보가 관제 서버(1500)에서 처리된 후, 본 발명의 실시 예로서 제공되는 경로 안내 데이터로 생성된다. 관제 서버(1500)에서 생성된 경로 안내 데이터는 클라우드 서버(1400)를 통해 차량 모듈(1100)의 내비게이션 장치로 제공된다. 내비게이션 장치는 수신된 경로 안내 데이터를 이용하여 경로 계획을 기반으로 자율 주행차의 출발-목적지(OD) 이동 경로를 제공할 수 있다.The vehicle module 1100 may include a location sensor and a navigation device. The location sensor of the vehicle module 1100 is mounted on a vehicle or an autonomous vehicle to transmit vehicle location information to the roadside base station 1300 or the cloud server 1400 . Here, the vehicle location information represents accurate location information of a service user. The location sensor may be, for example, a Global Positioning System (GPS) sensor. Navigation devices are used to input travel information such as origin, destination, and departure time for autonomous vehicles. Travel information describes the origin, destination, and departure time. After the travel information is processed by the control server 1500, it is generated as route guidance data provided as an embodiment of the present invention. The route guidance data generated by the control server 1500 is provided to the navigation device of the vehicle module 1100 through the cloud server 1400 . The navigation device may use the received route guidance data to provide a departure-destination (OD) movement route of the autonomous vehicle based on route planning.

트래픽 감지기(1200)는 도로의 트래픽 상태를 감지하여 노변 기지국(1300)으로 전송한다. 트래픽 감지기(1200)는 노변에 설치된 카메라나 레이더, 라이다 센서, 그리고 도로에서 발생하는 사고나 불법 주정차 등의 비정상 상황을 모니터링하는 센서들을 포함할 수 있다. 특히, 트래픽 감지기(1200)는, 예를 들면, 기존의 지능형 교통 시스템(Intelligent Transport Systems: 이하, ITS)과 차세대 지능형 교통 시스템(Cooperative-ITS: 이하, C-ITS)을 결합한 하이브리드 지능형 교통 시스템일 수 있다. 기존의 유도 루프 감지기에서 얻은 트래픽 데이터는 속도, 흐름 및 점유율과 같은 지점 측정에서 파생되기 때문에, ITS 감지기는 종종 과소 평가되거나 과대 평가된 트래픽 데이터를 제공한다. 반대로, C-ITS 감지기는 딥러닝 기반 객체 감지, 추적 및 에지 컴퓨팅과 같은 여러 고급 컴퓨팅 기술을 기반으로 광범위한 감지 기능을 갖춘 트래픽 모니터링 카메라를 사용하여 보다 정확한 트래픽 데이터를 제공할 수 있다. 그러나 차세대 지능형 교통 시스템(C-ITS)은 아직 전체 도로망에 배치되지 않은 상태이다. 따라서, 본 발명의 트래픽 감지기(1200)는 ITS와 C-ITS를 결합한 하이브리드 지능형 교통 시스템을 기반으로 운영함으로써 보다 안정적인 트래픽 데이터를 사용할 수 있다. 본 발명에서는 C-ITS/ITS 검출기를 트래픽 감지기(1200)로 사용하여 구간 교통 정보로부터 보다 정확한 도로 구간별 교통 정보를 얻을 수 있다. 트래픽 데이터는 노변 기지국(1300)으로 전송되어, 관제 서버(1500)에서 사용된다.The traffic detector 1200 detects the traffic condition of the road and transmits it to the roadside base station 1300 . The traffic detector 1200 may include a camera, radar, and lidar sensor installed on a roadside, and sensors for monitoring an abnormal situation such as an accident or illegal parking on the road. In particular, the traffic detector 1200 is, for example, a hybrid intelligent transportation system that combines an existing intelligent transport system (hereinafter referred to as ITS) and a next-generation intelligent transportation system (Cooperative-ITS: hereinafter, C-ITS). can Because traffic data obtained from conventional inductive loop detectors is derived from point measurements such as speed, flow and occupancy, ITS detectors often provide underestimated or overestimated traffic data. Conversely, C-ITS detectors can provide more accurate traffic data using traffic monitoring cameras with extensive detection capabilities based on several advanced computing technologies such as deep learning-based object detection, tracking, and edge computing. However, the next-generation intelligent transportation system (C-ITS) has not yet been deployed in the entire road network. Therefore, the traffic detector 1200 of the present invention can use more stable traffic data by operating based on a hybrid intelligent traffic system combining ITS and C-ITS. In the present invention, more accurate traffic information for each road section can be obtained from section traffic information by using the C-ITS/ITS detector as the traffic detector 1200 . The traffic data is transmitted to the roadside base station 1300 and used in the control server 1500 .

노변 기지국(1300, Roadside Equipment)은 자율 주행차가 운행하는 도로에 설치되어, 차량 모듈(1100)과 트래픽 감지기(1200)에서 전송하는 교통 데이터를 수집한다. 노변 기지국(1300)은 수집된 트래픽 데이터를 관제 서버(1500)에 전송하다. 트래픽 데이터는, 도로 구간별 교통 상태를 지시하는 구간 트래픽 데이터와 자율 주행차의 차량 모듈(1100)에서 제공하는 위치 정보가 포함될 수 있다. 구간 트래픽 데이터는 주로 개별 링크에 대한 구간 트래픽 정보를 통합하여 수집되며, 노변 기지국(1300)은 수집된 구간 트래픽 정보를 관제 서버(1500)로 전송한다. 노변 기지국(1300)은 차량의 위치 정보도 구간 트래픽 정보와 동일하게 관제 서버(1500)로 전송한다.The roadside base station 1300 (Roadside Equipment) is installed on a road on which an autonomous vehicle operates, and collects traffic data transmitted from the vehicle module 1100 and the traffic sensor 1200 . The roadside base station 1300 transmits the collected traffic data to the control server 1500 . The traffic data may include section traffic data indicating a traffic condition for each road section and location information provided by the vehicle module 1100 of the autonomous vehicle. Section traffic data is mainly collected by integrating section traffic information for individual links, and the roadside base station 1300 transmits the collected section traffic information to the control server 1500 . The roadside base station 1300 also transmits vehicle location information to the control server 1500 in the same manner as section traffic information.

클라우드 서버(1400)는 이동하는 차량 모듈(1100)에서 전송하는 위치 정보나 여행 정보를 수신하여 관제 서버(1500)로 전달하거나, 관제 서버(1500)로부터 전송되는 경로 안내 데이터를 차량 모듈(1100)에 제공한다. 여기서, 클라우드 서버(1400)는 LTE(Long-Term Evolution)나 5G 통신 시스템을 기반으로 하는 이동 통신 서비스를 지원하는 것으로 간주될 수 있다. 하지만, 클라우드 서버(1400)에서 지원하는 통신 표준은 차량사물통신(V2X, Vehicle to Everything)이나 3GPP(3rd Generation Partnership Project), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 2G, 3G, 4G, 5G, 6G 등을 포함할 수 있고, 본 발명의 통신 표준은 여기에 한정되지는 않는다. The cloud server 1400 receives location information or travel information transmitted from the moving vehicle module 1100 and transmits it to the control server 1500 , or transmits route guidance data transmitted from the control server 1500 to the vehicle module 1100 . provided to Here, the cloud server 1400 may be considered to support a mobile communication service based on Long-Term Evolution (LTE) or a 5G communication system. However, the communication standards supported by the cloud server 1400 are V2X, Vehicle to Everything, 3GPP (3rd Generation Partnership Project), WIMAX (World Interoperability for Microwave Access), Wi-Fi, 2G, 3G, 4G, 5G, 6G, and the like, and the communication standard of the present invention is not limited thereto.

관제 서버(1500)는 차량 모듈(1100)에서 전송된 위치 정보와 여행 정보를 클라우드 서버(1400)나 노변 기지국(1300)을 통해 수집한다. 또한, 관제 서버(1500)는 트래픽 감지기(1200)로부터 실시간으로 전송되는 트래픽 데이터를 노변 기지국(1300)을 통해 수집한다. 관제 서버(1500)는 수집된 자율 주행차의 여행 정보, 위치 정보, 그리고 트래픽 데이터에 기반하여 경로 안내 데이터를 생성한다. 그리고 관제 서버(1500)는 생성된 경로 안내 데이터를 클라우드 서버(1400)를 경유하여 해당 자율 주행차의 차량 모듈(1100)에 제공할 수 있다.The control server 1500 collects the location information and travel information transmitted from the vehicle module 1100 through the cloud server 1400 or the roadside base station 1300 . In addition, the control server 1500 collects traffic data transmitted in real time from the traffic detector 1200 through the roadside base station 1300 . The control server 1500 generates route guidance data based on the collected travel information, location information, and traffic data of the autonomous vehicle. In addition, the control server 1500 may provide the generated route guidance data to the vehicle module 1100 of the autonomous vehicle via the cloud server 1400 .

본 발명의 실시 예에 따른 관제 서버(1500)에서는 3가지 주요 기능을 수행한다. 첫째, 관제 서버(1500)는 실시간으로 트래픽 데이터 및 차량 데이터를 수집한다. 실시간 트래픽 데이터는 해당 도로 구간 정보와 함께 과거 트래픽 정보 데이터베이스에 저장될 뿐만 아니라, 각 도로 구간의 향후 교통 상황을 예측하는데 활용된다. 차량 데이터는 과거 트래픽 데이터베이스에도 저장된다. 동시에 차량 데이터는 OD 여행 경로를 생성하는 데 활용된다. 관제 서버(1500)의 두 번째 기능은 트래픽 예측 모듈과 트래픽 예측 라우팅 모듈을 기반으로 경로 안내 데이터를 생성한다. 트래픽 예측 모듈에는 각 도로 구간의 미래 속도를 예측하는 데 사용할 수 있는 딥러닝 모델이 포함된다. 과거 트래픽 데이터를 기반으로 트래픽 예측 모듈은 트래픽 예측 라우팅 모듈과 관련된 강화 모델을 업데이트하는 데 사용되는 딥러닝 모델을 훈련하고 검증한다. 관제 서버(1500)의 세 번째 기능은 실시간 트래픽 데이터 및 딥러닝 모델과 트래픽 예측 모듈 및 트래픽 예측 라우팅 모듈이 포함된 일괄 프로세스(Batch process)를 통해 경로 안내 정보를 생성한다. 이상의 관제 서버(1500)의 기능은 후술하는 도면들을 통해서 보다 상세히 설명하기로 한다. The control server 1500 according to an embodiment of the present invention performs three main functions. First, the control server 1500 collects traffic data and vehicle data in real time. Real-time traffic data is not only stored in the historical traffic information database along with the corresponding road section information, but is also used to predict future traffic conditions for each road section. Vehicle data is also stored in the historical traffic database. At the same time, vehicle data is utilized to generate OD travel routes. A second function of the control server 1500 generates route guidance data based on the traffic prediction module and the traffic prediction routing module. The traffic prediction module includes a deep learning model that can be used to predict the future speed of each road segment. Based on historical traffic data, the traffic prediction module trains and validates the deep learning model used to update the reinforcement model associated with the traffic prediction routing module. The third function of the control server 1500 generates route guidance information through a batch process including real-time traffic data and deep learning models, and a traffic prediction module and a traffic prediction routing module. The function of the above control server 1500 will be described in more detail with reference to the drawings to be described later.

이상에서 설명된 본 발명의 실시 예에 따른 차량 내비게이션 시스템(1000)에서는 딥러닝(Deep-learning) 모델에 기반한 트래픽 예측 모듈 및 트래픽 예측 라우팅 모듈을 사용하여 미래 교통 상황에 기반하여 경로 안내 데이터를 생성할 수 있다. 따라서, 현재 교통 상황에 기반한 경로 계산에서 오는 반복되지 않는 교통 체증이나 예기치 못한 이벤트에서의 경로 예측 기능의 저하 문제를 미래 교통 상황을 반영하여 해결함으로써, 고신뢰성의 경로 안내 데이터의 서비스가 가능하다.The vehicle navigation system 1000 according to the embodiment of the present invention described above uses a traffic prediction module and a traffic prediction routing module based on a deep-learning model to generate route guidance data based on future traffic conditions. can do. Therefore, it is possible to provide a highly reliable route guidance data service by reflecting the future traffic conditions and solving the problem of deterioration of the route prediction function in non-repeating traffic jams or unexpected events resulting from route calculation based on the current traffic conditions.

본 발명의 차량 내비게이션 시스템(1000)이 적용되는 도로(100)는 고속도로나 간선도로, 일반도로, 산업 도로, 항만과 같은 다양한 도로에 적용될 수 있다. 본 발명의 차량 내비게이션 시스템(1000)이 적용되는 도로(100)는 상술한 도로 종류에만 국한되지 않고 C-ITS/ITS 감지기와 노변 기지국이 설치된 다양한 도로에서 적용될 수 있다. The road 100 to which the vehicle navigation system 1000 of the present invention is applied may be applied to various roads such as highways, arterial roads, general roads, industrial roads, and ports. The road 100 to which the vehicle navigation system 1000 of the present invention is applied is not limited to the above-described road types and may be applied to various roads in which C-ITS/ITS sensors and roadside base stations are installed.

도 2는 도 1의 차량 내비게이션 시스템의 주요 구성 및 데이터의 이동을 간략히 보여주는 블록도이다. 도 2를 참조하면, 미래 교통 상황을 예측하여 경로 안내 데이터를 생성하는 차량 내비게이션 시스템(1000)은 차량 모듈(1100), 트래픽 감지기(1200), 노변 기지국(1300), 클라우드 서버(1400), 그리고 관제 서버(1500)를 포함한다. FIG. 2 is a block diagram schematically illustrating a main configuration and movement of data of the vehicle navigation system of FIG. 1 . Referring to FIG. 2 , the vehicle navigation system 1000 for generating route guidance data by predicting future traffic conditions includes a vehicle module 1100 , a traffic detector 1200 , a roadside base station 1300 , a cloud server 1400 , and It includes a control server 1500 .

차량 모듈(1100)은 위치 센서(1120)와 내비게이션 장치(1140)를 포함할 수 있다. 위치 센서(1120)는 자율 주행차의 현재 위치를 감지하고, 감지 결과에 따라 차량 위치 데이터를 생성한다. 위치 센서(1120)에서 생성된 차량 위치 데이터는 노변 기지국(1300)을 통해서 관제 서버(1500)에 전달된다. 위치 센서(1120)는 GPS 센서일 수 있지만, 글로나스(GLONASS)나 갈릴레오 시스템에서 제공하는 위치 정보를 수신하는 센서일 수도 있다. 내비게이션 장치(1140)는 자율 주행차의 출발지, 목적지, 출발 시간과 같은 여행 정보를 입력받아 여행 데이터로서 관제 서버(1500)에 제공된다. 여행 데이터는 관제 서버(1500)에서 경로 안내 데이터를 생성하기 위한 출발-목적지(OD) 설정과 시간 설정에 사용될 것이다. 내비게이션 장치(1140)는 관제 서버(1500)에서 전송된 경로 안내 데이터를 사용하여 자율 주행차의 경로를 설정할 수 있다. The vehicle module 1100 may include a location sensor 1120 and a navigation device 1140 . The position sensor 1120 detects the current position of the autonomous vehicle and generates vehicle position data according to the detection result. The vehicle location data generated by the location sensor 1120 is transmitted to the control server 1500 through the roadside base station 1300 . The location sensor 1120 may be a GPS sensor, but may also be a sensor that receives location information provided by a GLONASS or Galileo system. The navigation device 1140 receives travel information such as a departure point, a destination, and a departure time of the autonomous vehicle and provides it to the control server 1500 as travel data. The travel data will be used in the departure-destination (OD) setting and time setting for generating the route guidance data in the control server 1500 . The navigation device 1140 may set the route of the autonomous vehicle using route guidance data transmitted from the control server 1500 .

트래픽 감지기(1200)는 도로의 트래픽 상태를 감지하여 도로 구간별로 구분될 수 있는 트래픽 정보로 노변 기지국(1300)에 전송한다. 트래픽 감지기(1200)는, 지능형 교통 시스템(ITS)과 차세대 지능형 교통 시스템(C-ITS)을 결합한 하이브리드 지능형 교통 시스템으로 구현될 수 있다. 따라서, 본 발명의 트래픽 감지기(1200)는 센서들로부터 수집된 구간 교통 정보로부터 보다 정확한 도로 구간별 교통 정보를 얻을 수 있다. 트래픽 정보는 노변 기지국(1300)을 경유하여 관제 서버(1500)에 전달된다.The traffic detector 1200 detects the traffic state of the road and transmits it to the roadside base station 1300 as traffic information that can be classified for each road section. The traffic detector 1200 may be implemented as a hybrid intelligent transportation system combining an intelligent transportation system (ITS) and a next-generation intelligent transportation system (C-ITS). Accordingly, the traffic detector 1200 of the present invention can obtain more accurate traffic information for each road section from the section traffic information collected from the sensors. The traffic information is transmitted to the control server 1500 via the roadside base station 1300 .

노변 기지국(1300)은 차량 모듈(1100)로부터 제공되는 차량 위치 데이터와 트래픽 감지기(1200)로부터 제공되는 트래픽 정보를 관제 서버(1500)로 전달한다. The roadside base station 1300 transmits vehicle location data provided from the vehicle module 1100 and traffic information provided from the traffic detector 1200 to the control server 1500 .

관제 서버(1500)는 트래픽 예측 모듈(1520) 그리고 트래픽 예측 라우팅 모듈(1540)을 포함할 수 있다. 트래픽 예측 모듈(1520)과 트래픽 예측 라우팅 모듈(1540)은 여행 데이터와 트래픽 정보를 사용하여 미래 교통 상황을 반영하는 경로 안내 데이터를 생성한다. 트래픽 예측 모듈(1520)은 과거 트래픽 데이터를 기반으로 각 도로 구간별 평균 속도 데이터를 포함하는 데이터 셋을 구성한다. 그리고 데이터 셋을 기반으로 딥러닝(Deep-learning) 모델을 통한 학습을 통하여 가중치 세트를 생성한다. 트래픽 예측 라우팅 모듈(1540)은 과거 트래픽 데이터로부터 생성된 가중치 세트와 실시간 트래픽 데이터를 기반으로 도로 구간별 미래 속도를 계산한다. 트래픽 예측 라우팅 모듈(1540)은 시뮬레이션 데이터를 기반으로 강화학습 모델의 트레이닝을 수행하고, 트레이닝된 가중치 세트를 차량 위치 데이터와 여행 데이터를 고려하여 최적의 라우팅 정보인 경로 안내 데이터를 생성할 수 있다. The control server 1500 may include a traffic prediction module 1520 and a traffic prediction routing module 1540 . The traffic prediction module 1520 and the traffic prediction routing module 1540 generate route guidance data reflecting future traffic conditions by using the travel data and traffic information. The traffic prediction module 1520 configures a data set including average speed data for each road section based on past traffic data. And based on the data set, a weight set is created through learning through a deep-learning model. The traffic prediction routing module 1540 calculates a future speed for each road section based on a weight set generated from past traffic data and real-time traffic data. The traffic prediction routing module 1540 may perform training of the reinforcement learning model based on the simulation data, and generate route guidance data, which is optimal routing information, by taking the trained weight set into consideration of vehicle location data and travel data.

도 3은 도 1에 도시된 관제 서버의 경로 안내 데이터를 생성하는 방법을 간략히 보여주는 순서도이다. 도 3을 참조하면, 관제 서버(1500)의 트래픽 예측 모듈(1520) 및 트래픽 예측 라우팅 모듈(1540)은 차량 모듈(1100)로부터 제공되는 차량 위치 데이터와 여행 데이터, 트래픽 감지기(1200)로부터의 트래픽 데이터를 기반으로 미래 교통 상황을 예측하고, 경로 안내 데이터를 생성한다. FIG. 3 is a flowchart schematically illustrating a method of generating route guidance data of the control server shown in FIG. 1 . Referring to FIG. 3 , the traffic prediction module 1520 and the traffic prediction routing module 1540 of the control server 1500 include vehicle location data and travel data provided from the vehicle module 1100 , and traffic from the traffic detector 1200 . It predicts future traffic conditions based on the data and generates route guidance data.

S110 단계에서, 관제 서버(1500)는 실시간 트래픽 데이터(1010), 차량 위치 데이터(1030), 여행 데이터(1040, Trip data)를 수신하고 수집한다. In step S110 , the control server 1500 receives and collects real-time traffic data 1010 , vehicle location data 1030 , and travel data 1040 , Trip data.

S120 단계에서, 실시간 트래픽 데이터(1010)는 각 도로 구간 정보와 함께 데이터 베이스에 저장된다. 실시간 트래픽 데이터(1010)는 추후 각 도로 구간 별로 미래 교통 상황을 예측하는데 사용될 것이다.In step S120, the real-time traffic data 1010 is stored in the database together with each road section information. The real-time traffic data 1010 will be used to predict future traffic conditions for each road section later.

S130 단계에서, 트래픽 예측 모듈(1520)은 과거 트래픽 데이터를 기반으로 각 도로 구간별로 미래 속도를 예측하기 위한 딥러닝 모델을 학습하고 검증한다. In step S130 , the traffic prediction module 1520 learns and verifies a deep learning model for predicting a future speed for each road section based on past traffic data.

S140 단계에서, 트래픽 예측 모듈(1520)은 딥러닝 모델의 학습되고 검증된 가중치 값을 사용하여 각 도로 구간별로 미래 속도 또는 각 도로 구간별 미래 교통 상태를 예측한다. In step S140 , the traffic prediction module 1520 predicts a future speed for each road section or a future traffic state for each road section using the learned and verified weight values of the deep learning model.

S150 단계에서, 딥러닝 모델의 학습되고 검증된 가중치 값을 활용하여 트래픽 예측 라우팅 모듈(1540)은 시뮬레이션 데이터를 생성한다. 시뮬레이션 데이터에 기반하여, 트래픽 예측 라우팅 모듈(1540)은 강화학습 모델을 트레이닝하고, 그 결과로 학습된 가중치 값으로 업데이트된다. In step S150, the traffic prediction routing module 1540 generates simulation data by using the learned and verified weight values of the deep learning model. Based on the simulation data, the traffic prediction routing module 1540 trains the reinforcement learning model, and as a result is updated with the learned weight values.

S160 단계에서, 트래픽 예측 라우팅 모듈(1540)은 실시간으로 제공된 차량 위치 데이터와 여행 데이터를 수신 및 저장한다. In step S160 , the traffic prediction routing module 1540 receives and stores vehicle location data and travel data provided in real time.

S170 단계에서, 트래픽 예측 라우팅 모듈(1540)은 실시간 차량 위치 데이터와 여행 데이터와 강화학습 모델의 훈련된 가중치를 적용하여 경로 예측 데이터를 생성한다. In step S170, the traffic prediction routing module 1540 generates route prediction data by applying real-time vehicle location data, travel data, and trained weights of the reinforcement learning model.

S180 단계에서, 트래픽 예측 라우팅 모듈(1540)은 내비게이션 서비스 사용 요청이 발생하면, 생성된 경로 예측 데이터를 자율 주행차에 전송할 것이다.In step S180 , the traffic prediction routing module 1540 transmits the generated route prediction data to the autonomous vehicle when a navigation service use request occurs.

도 4는 본 발명의 관제 서버에서 경로 예측 데이터를 생성하기 위한 트래픽 예측 모듈과 트래픽 예측 라우팅 모듈의 구성 및 동작을 보여주는 블록도이다. 도 4를 참조하면, 트래픽 예측 라우팅 모듈(1540)은 트래픽 예측 모듈(1520)에서 예측된 미래 트래픽 상태를 참조하여 도로 구간별 미래 속도를 예측할 수 있다.4 is a block diagram showing the configuration and operation of a traffic prediction module and a traffic prediction routing module for generating path prediction data in the control server of the present invention. Referring to FIG. 4 , the traffic prediction routing module 1540 may predict the future speed for each road section with reference to the future traffic state predicted by the traffic prediction module 1520 .

트래픽 예측 모듈(1520)은 실시간 트래픽 데이터(1010)와 과거 트래픽 데이터(1020)를 사용하여 딥러닝 모델을 학습하고, 그 결과로부터 각 도로 구간별 미래 교통 상태를 예측할 수 있다. 이를 위하여, 트래픽 예측 모듈(1520)은 딥러닝 모델을 사용하여 학습된 가중치 값을 생성하는 일괄 프로세스(Batch process)와, 일괄 프로세스의 출력과 실시간 트래픽 데이터를 기반으로 각 도로 구간의 미래 예상 속도를 예측하는 실시간 프로세스(Real-time process)를 포함한다. The traffic prediction module 1520 may learn a deep learning model by using the real-time traffic data 1010 and the past traffic data 1020 , and predict future traffic conditions for each road section from the result. To this end, the traffic prediction module 1520 predicts the future speed of each road section based on a batch process that generates a weight value learned using a deep learning model, and the output of the batch process and real-time traffic data. It includes a real-time process of prediction.

트래픽 예측 모듈(1520)의 일괄 프로세스는, 각 도로 구간의 평균 속도 데이터를 포함하는 데이터 세트를 기반으로 딥러닝 모델의 훈련된 가중치 값 세트를 생성한다. 트래픽 예측 모듈(1520)의 일괄 프로세스는 과거 트래픽 데이터(1020)에 대한 전처리(S210)하여 데이터 세트(S220)를 구성하고, 데이터 세트를 사용한 딥러닝 모델을 학습하여 가중치를 유효화(S230)하여 학습된 가중치(S240)를 생성하는 단계로 나타낼 수 있다. 여기서, 딥러닝 모델의 훈련을 위한 데이터 세트는 단위 시간 간격 동안 집계된 원시 과거 트래픽 데이터에서 파생된다. 단위 시간 간격은 주어진 도로망의 특성에 따라 결정될 수 있다. 예를 들면, 간선도로에는 더 짧은 시간 간격을 사용하는 것이 적절하고 고속도로에는 더 큰 시간 간격을 사용하는 것이 적절할 수 있다. The batch process of the traffic prediction module 1520 generates a set of trained weight values of the deep learning model based on a data set including average speed data of each road section. The batch process of the traffic prediction module 1520 pre-processes the past traffic data 1020 (S210) to configure the data set (S220), learns a deep learning model using the data set, and validates the weights (S230). It can be represented as a step of generating the weighted weight S240. Here, the data set for training the deep learning model is derived from raw historical traffic data aggregated over a unit time interval. The unit time interval may be determined according to the characteristics of a given road network. For example, it may be appropriate to use a shorter time interval for an arterial road and a larger time interval for a highway.

트래픽 예측 모듈(1520)의 실시간 프로세스는 실시간 교통 데이터(1010)와 일괄 프로세스의 출력인 미리 학습된 가중치(S250)를 기반으로 예측 지평선(Prediction horizon)에 대한 각 도로 구간의 미래 속도를 예측(S260)하는 절차가 포함된다. 예측 지평선은 각 여행의 특성에 따라 결정될 수 있다. 예측 지평선은 각 여행의 예상 이동 시간보다 훨씬 커야 한다.The real-time process of the traffic prediction module 1520 predicts the future speed of each road section with respect to the prediction horizon based on the real-time traffic data 1010 and the pre-learned weight (S250) that is the output of the batch process (S260) ) procedures are included. The predicted horizon may be determined according to the characteristics of each trip. The forecast horizon should be much larger than the estimated travel time of each trip.

트래픽 예측 모델(1520)과 유사하게 트래픽 예측 라우팅 모듈(1540)도 일괄 프로세스와 실시간 프로세스로 구성된다. 트래픽 예측 라우팅 모듈(1540)의 일괄 프로세스(Batch process)에서 과거 트래픽 데이터(1020)를 사용하여 강화학습을 위한 시뮬레이션 데이터를 생성(S310)한다. 그리고 생성된 시뮬레이션 데이터(S320)를 사용한 강화학습 모델의 학습 및 가중치를 유효화(S330)하는 단계들을 포함한다. 학습된 가중치(S340)는 실시간 프로세스에 전달(S350)되고, 트래픽 예측 모듈(1520)에서 생성된 도로 구간별 미래 속도, 차량 위치 데이터(1030), 그리고 여행 데이터(1040)와 결합되어 경로 안내 데이터로 생성된다. 시뮬레이션 데이터를 생성(S310)을 기반으로 강화학습 모델의 훈련된 가중치 값 세트가 생성되며, 이는 강화학습 모델의 예상 누적 보상을 최대화하기 위한 최적의 솔루션으로 사용된다. 일괄 프로세스에서 얻은 사전 학습된 가중치(S350) 값을 통해 트래픽 예측 라우팅 모듈(1540)의 실시간 프로세스는 내비게이션 서비스 사용 요청이 있을 때 출발-목적지(OD) 이동 경로에 대한 최적의 경로 정보를 제공할 수 있다. Similar to the traffic prediction model 1520 , the traffic prediction routing module 1540 is also configured as a batch process and a real-time process. Simulation data for reinforcement learning is generated using the past traffic data 1020 in the batch process of the traffic prediction routing module 1540 (S310). And it includes the steps of validating the training and weight of the reinforcement learning model using the generated simulation data (S320) (S330). The learned weight ( S340 ) is transmitted to the real-time process ( S350 ), and is combined with the future speed for each road section generated by the traffic prediction module 1520 , vehicle location data 1030 , and travel data 1040 to provide route guidance data is created with Based on the simulation data generation ( S310 ), a set of trained weight values of the reinforcement learning model is generated, which is used as an optimal solution for maximizing the expected cumulative reward of the reinforcement learning model. Through the pre-learned weight (S350) value obtained from the batch process, the real-time process of the traffic prediction routing module 1540 can provide optimal route information for the departure-destination (OD) movement path when there is a request to use the navigation service. have.

여기서, 강화학습 기반의 트래픽 예측 차량 라우팅 알고리즘(Reinforcement-Learning-Based Traffic-Predictive Vehicle Routing Algorithm: 이하, RL-TPVR)은 트래픽 예측과 차량 경로 생성의 두 가지 기능을 수행한다. 트래픽 예측 차량 라우팅 알고리즘(RL-TPVR)의 이러한 기능은 각각 트래픽 예측 모듈(1520) 및 트래픽 예측 라우팅 모듈(1540)에 직접 관련된다. 트래픽 예측 모듈(1520)은 실제 적용에 적합한 예측 모델이 필요하다. 다양한 딥러닝 기반 예측 모델이 보다 적절한 특성 추출을 달성하기 위해 트래픽의 공간적 및 시간적 관계를 고려했지만, 본 발명에서는 RL-TPVR은 가장 효과적인 각 도로 구간의 미래 속도 예측 방법 중 하나로 그래프 웨이브넷(Graph WaveNet, Wu et al., 2019)을 채택했다.Here, the Reinforcement-Learning-Based Traffic-Predictive Vehicle Routing Algorithm (hereinafter, RL-TPVR) performs two functions of traffic prediction and vehicle route generation. These functions of the traffic prediction vehicle routing algorithm (RL-TPVR) are directly related to the traffic prediction module 1520 and the traffic prediction routing module 1540 respectively. The traffic prediction module 1520 requires a prediction model suitable for actual application. Although various deep learning-based prediction models have considered the spatial and temporal relationships of traffic to achieve more appropriate feature extraction, in the present invention, RL-TPVR is one of the most effective methods for predicting the future speed of each road section. , Wu et al., 2019).

그래프 웨이브넷(Graph WaveNet)의 가장 두드러진 특징 중 하나는 추론 시간이 크게 단축된다는 점이다. 이러한 특징은 확산 컨볼루션 순환 신경망(Diffusion Convolutional Recurrent Neural Network: 이하, DCRNN)(Li et al., 2017) 및 시공간 그래프 컨볼루션 네트워크(STGCN)(Yu et al., 2017)와 같은 다른 예측 모델에 비해 훨씬 짧은 컴퓨팅 시간으로 단일 실행에서 여러 예측을 제공할 수 있다. 따라서, 전체 도로망에 대해 장기적인 예측이 필요하지만 이 방법은 짧은 시간 내에 각 도로 구간의 예측 속도 값을 제공할 수 있다. 따라서 그래프 웨이브넷(Graph WaveNet) 학습 방법은 본 발명의 강화학습 기반의 트래픽 예측 차량 라우팅 알고리즘(RL-TPVR)의 트래픽 예측에 사용되며 후술하는 바와 같이 설명될 수 있다.One of the most striking features of Graph WaveNet is that inference time is greatly reduced. These features have been demonstrated in other predictive models such as Diffusion Convolutional Recurrent Neural Networks (DCRNNs) (Li et al., 2017) and spatiotemporal graph convolutional networks (STGCNs) (Yu et al., 2017). It can provide multiple predictions in a single run with much shorter compute time compared to Therefore, long-term prediction is required for the entire road network, but this method can provide the predicted speed value for each road section within a short time. Therefore, the Graph WaveNet learning method is used for traffic prediction of the reinforcement learning-based traffic prediction vehicle routing algorithm (RL-TPVR) of the present invention, and can be described as described below.

도로 네트워크에서 C-ITS/ITS 감지기의 공간 분포는 그래프 관계 'G = (V, E, A)'로 표시된다. 여기서, 'V'는 C-ITS/ITS 감지기 세트, 'E'는 검출기들 간의 에지 세트, 행렬 'A'는 검출기 간의 근접도를 나타내는 인접 행렬 'A∈

^N×F'로 나타낼 수 있다. 시간 단계 t에서 교통 패턴을 나타내는 동적 특성 행렬은 'X ^t∈

^N×F'로 표현될 수 있다. 여기서, 'N'은 도로망의 C-ITS/ITS 감지기 수이고, 'F'는 각 검출기의 특징의 수이다. RL-TPVR의 트래픽 예측 모델은 과거 그래프 신호 'H'를 기반으로 예측 지평선 'T'에서 미래 그래프 신호를 예측하는 데 사용되는 매핑 함수 ζ(ㆍ)를 찾는 것을 목표로 한다. 이러한 관계는 아래 수학식 1과 같이 표현될 수 있다.The spatial distribution of C-ITS/ITS detectors in the road network is represented by the graph relationship 'G = (V, E, A )'. Here, 'V' is a C-ITS/ITS detector set, 'E' is an edge set between detectors, and matrix ' A ' is an adjacency matrix indicating proximity between detectors ' A ∈'

It can be expressed as ^N×F '. The dynamic characteristic matrix representing the traffic pattern at time step t is ' X ^t ∈

It can be expressed as ^N×F '. Here, 'N' is the number of C-ITS/ITS detectors in the road network, and 'F' is the number of features of each detector. The traffic prediction model of RL-TPVR aims to find the mapping function ζ(·) used to predict the future graph signal at the prediction horizon 'T' based on the past graph signal 'H'. This relationship can be expressed as Equation 1 below.

여기서, 'X ^(t-H):T∈

^N×F×H'이고, 'X ^(t+1):t+T∈

^N×F×H'이다. where ' X ^(tH):T ∈

^N×F×H ', and ' X ^(t+1):t+T ∈

It is ^N×F×H '.

시공간적 특성을 고려하여 보다 효과적인 모델링을 달성하기 위해 RL-TPVR의 트래픽 예측 모델은 L 개의 시공간 레이어로 구성된다. 각 레이어는 두 가지 유형의 빌딩 블록, 즉 그래프 및 시간 컨볼루션 레이어로 구성된다. 그래프 컨볼루션 계층은 DCRNN을 사용하는 반면 시간적 컨볼루션 계층은 확장된 인과 컨볼루션 신경망(DCCNN)을 채택한다. 또한, R-TPVR은 사전 지식 없이 학습 가능한 매개변수를 통해 숨겨진 공간 종속성을 캡처하는 데 사용되는 그래프 웨이브넷(Graph WaveNet)의 주요 기여 중 하나인 자체 적응형 인접 행렬 A _adt도 고려한다. 적응 인접 행렬 A _adt는 아래 수학식 2와 같이 계산된다.In order to achieve more effective modeling considering spatiotemporal characteristics, the traffic prediction model of RL-TPVR consists of L spatiotemporal layers. Each layer consists of two types of building blocks: graph and temporal convolution layers. Graph convolutional layer uses DCCNN, while temporal convolutional layer adopts Extended Causal Convolutional Neural Network (DCCNN). R-TPVR also considers the self-adaptive adjacency matrix A _adt , which is one of the major contributions of Graph WaveNet, which is used to capture hidden spatial dependencies through parameters that can be learned without prior knowledge. The adaptive adjacency matrix A _adt is calculated as in Equation 2 below.

여기서, E ₁, E ₂

, E ₁과 E ₂는 학습 가능한 매개변수가 있는 임베딩 행렬을 나타내고, 'B'는 노드 임베딩의 기능 차원 수이다. 임베딩 방법을 사용하면 그래프 컨볼루션 레이어의 출력은

로 나타낼 수 있고, 아래 수학식 3으로 표현될 수 있다.where E ₁ , E ₂

, E ₁ and E ₂ represent embedding matrices with learnable parameters, and 'B' is the number of functional dimensions of node embeddings. With the embedding method, the output of the graph convolution layer is

It can be expressed as , and can be expressed by Equation 3 below.

여기서, 'P ^l _forward'와 'P ^l _backward'는 그래프 컨볼루션 레이어의 l 번째 출력에 사용된 확산 과정의 순방향 및 역방향 전이 행렬을 나타내고, 일련의 W _l 행렬(W

)은 그래프 컨볼루션 레이어의 l 번째 출력에서 모델 매개변수를 나타낸다. 순방향 전이 행렬은 P ^l _forward = A/rowsum(A), 역방향 전이 행렬은 P ^l _backward = A ^T/rowsum(A ^T)로 정의된다. Here, ' P ^l _forward ' and ' P ^l _backward ' represent the forward and backward transition matrices of the diffusion process used for the l-th output of the graph convolution layer, and a series of W _l matrices ( W

) represents the model parameters in the l-th output of the graph convolution layer. The forward transition matrix is defined as P ^l _forward = A /rowsum( A ), and the backward transition matrix is defined as P ^l _backward = A ^T /rowsum( A ^T ).

대조적으로, 시간적 컨볼루션 계층의 DCCNN은 시간적 특징 추출에서 중요한 역할을 한다. 확장된 인과관계 컨볼루션을 통해 수용 필드(receptive field)를 레이어별로 확장함으로써 DCCNN은 계산 시간을 크게 줄일 수 있을 뿐만 아니라 장거리 시퀀스 데이터를 비재귀적으로 고려할 수 있다. 시간 단계 t에서 입력 x

및 필터 f_θ

의 인과관계 컨볼루션 연산(★)은 아래 수학식 4로 표현될 수 있다. In contrast, DCCNN of temporal convolutional layer plays an important role in temporal feature extraction. By extending the receptive field layer-by-layer through extended causal convolution, DCCNN can significantly reduce computation time and consider long-range sequence data non-recursively. input at time step t x

and filter f _θ

The causal convolution operation (★) of can be expressed by Equation 4 below.

여기서, 'd'는 레이어 깊이에 따라 증가하는 특정 수의 입력 값을 건너뛰는 데 필요한 확장 계수(dilation factor)를 나타낸다.Here, 'd' represents a dilation factor required to skip a specific number of input values that increase according to the layer depth.

각 빌딩 블록에 그래프 컨볼루션 레이어와 2개의 게이팅 메커니즘 기반 시간 컨볼루션 레이어로 구성된 RL-TPVR의 트래픽 예측 모델 아키텍처는 손실 함수를 제외하고 그래프 웨이브넷(Graph WaveNet)의 아키텍처와 동일하다. 본 연구에서는 기존 모델과 달리 RL-TPVR의 예측 모델을 학습하기 위한 손실 함수로 RMSE(Root Mean Square Error)를 사용하며 아래 수학식 5와 같이 정의된다.The traffic prediction model architecture of RL-TPVR, which consists of a graph convolution layer on each building block and two gating mechanism-based temporal convolution layers, is the same as that of Graph WaveNet except for the loss function. In this study, unlike the existing model, RMSE (Root Mean Square Error) is used as a loss function for learning the predictive model of RL-TPVR, and is defined as Equation 5 below.

여기서, 'Θ _P'는 매핑 함수 ζ(ㆍ)를 나타내는 데 사용되는 RL-TPVR의 트래픽 예측 모델의 매개변수 집합에 해당한다. RL-TPVR은 링크 기반 트래픽 예측을 수행하기 때문에 링크가 교차로 사이의 도로 구간에서 차선 또는 다중 차선으로 그룹화되므로 F는 '1'과 같다.Here, 'Θ _P ' corresponds to the parameter set of the RL-TPVR traffic prediction model used to represent the mapping function ζ(·). Because RL-TPVR performs link-based traffic prediction, F is equal to '1' because links are grouped into lanes or multi-lanes in road sections between intersections.

도 5는 강화학습 기반의 트래픽 예측 차량 라우팅 알고리즘(RL-TPVR)에서 트래픽 예측 라우팅의 개념을 보여준다. 여기에는 에이전트(Agent), 동작(Action), 상태(State) 및 보상 함수(Reward function)의 4가지 중요한 요소가 포함된다. 에이전트(Agent)는 최적의 정책에서 생성된 내비게이션 서비스가 제공되는 자아 차량(Ego vehicle)을 의미한다. 자아 차량은 경로 안내를 받고 최단 이동 시간으로 목적지에 도달하기 위한 동작(Action)을 취한다. 5 shows the concept of traffic prediction routing in a reinforcement learning-based traffic prediction vehicle routing algorithm (RL-TPVR). It includes four important elements: Agent, Action, State, and Reward function. The agent refers to an ego vehicle in which a navigation service created from an optimal policy is provided. The self-vehicle receives the route guidance and takes an action to reach the destination in the shortest travel time.

동작(Action)은 출발지에서 목적지까지의 모든 링크에서 에이전트(Agent)가 내린 라우팅 결정에 해당하는 경로 선택을 나타낸다. 시간 단계 t에서 링크에 대한 동작(Action)은 a_t∈A로 표시될 수 있으며, 여기서 'A'는 동작 공간(Action Space)을 나타낸다. 동작 공간은 자아 차량이 위치한 현재 링크에 연결된 링크의 수에 따라 달라진다. 예를 들면, 도시된 바와 같이 동작 공간 A={우회전, 직진, 좌회전}은 U-턴을 제외한 후속 옵션에 대한 링크가 3개이기 때문이다. RL-TPVR은 자아가 의사결정 영역(Decision area) 내에서 경로를 선택하도록 요구한다. 결정 영역을 설정하면 자아 차량이 차선을 변경하고 내비게이션 시스템이 제공하는 이동 경로를 따를 수 있는 충분한 시간을 제공할 수 있다. 그러나 RL-TPVR에 결정 영역을 도입한 목적은 라우팅 문제를 이산 시간 확률론적 제어 프로세스(discrete-time stochastic control process)로 고려할 뿐만 아니라, 최신 트래픽 정보를 적시에 에이전트에 업데이트하는 것이다. 시간 단계 t에서 결정 영역 d_t는 아래 수학식 6으로 표현된다.Action represents the path selection corresponding to the routing decision made by the Agent in all links from the source to the destination. An action for a link at time step t may be expressed as a _t ∈A, where 'A' represents an action space. The working space depends on the number of links connected to the current link where the ego vehicle is located. For example, as shown, the working space A={turn right, go straight, turn left} has three links to subsequent options except for the U-turn. RL-TPVR requires the ego to choose a path within a decision area. Establishing a decision zone can provide sufficient time for the self-vehicle to change lanes and follow the route of movement provided by the navigation system. However, the purpose of introducing the decision domain in RL-TPVR is not only to consider the routing problem as a discrete-time stochastic control process, but also to update the agent with the latest traffic information in a timely manner. The determination region d _t at the time step t is expressed by Equation 6 below.

여기서, 'L_t'는 시점 t에서 에이전트가 위치한 링크(Link)의 길이를 나타내고, 'm_t'는 안전하게 정지하는 데 필요한 최소 거리를 나타낸다. 정지 최소 거리 'm_t'는 아래 수학식 7로 계산될 수 있다.Here, 'L _t ' represents the length of a link where the agent is located at time t, and 'm _t ' represents the minimum distance required to stop safely. The minimum stopping distance 'm _t ' may be calculated by Equation 7 below.

여기서, 'V_t,max'와

는 시간 단계 t에서 에이전트가 위치한 링크의 자유 흐름 속도 및 예상 평균 속도를 나타내고, 'a_dec'는 자아 차량(Ego vehicle)의 최대 감속 속도를 나타내며, 'τ'는 지각 반응 시간을 나타낸다. where 'V _t,max ' and

denotes the free-flow velocity and the expected average velocity of the link where the agent is located at time step t, 'a _dec ' denotes the maximum deceleration velocity of the ego vehicle, and 'τ' denotes the perceptual reaction time.

상태(State)는 관제 서버(1500) 또는 C-ITS/ITS 센터 모듈의 관찰 및 예측을 기반으로 하는 시공간 교통 환경을 설명한다. 상태에 더 많은 교통 변수가 관련되어 있으면 주어진 교통 상황을 더 정확하게 나타낼 수 있다. 그러나 상태 공간(State space)은 상태에서 고려되는 트래픽 변수의 수에 따라 기하급수적으로 증가하여 종종 과도한 계산 시간과 낮은 수렴을 초래한다. 따라서, 차원 문제와 동적 트래픽 상황의 복잡성을 처리하기 위해 시간 단계 t에서 링크의 상태는 수학식 8과 같이 표현된다.State describes the space-time traffic environment based on observation and prediction of the control server 1500 or the C-ITS/ITS center module. The more traffic variables involved in a state, the more accurately it can represent a given traffic situation. However, the state space grows exponentially with the number of traffic variables considered in the state, often resulting in excessive computation time and low convergence. Therefore, in order to deal with the dimensionality problem and the complexity of the dynamic traffic situation, the state of the link at time step t is expressed as Equation (8).

여기서, 'l_t'는 시간 단계 t에서 에이전트의 추정된 위치이고 'P_t'는 시간 단계 t에서 에이전트가 위치한 링크에 연결된 후속 링크에 대한 예측된 평균 속도 세트를 설명한다. 예를 들면, 도 5에서 현재 링크에 연결된 세 개의 링크가 있기 때문에 P_t=[p_t→R, p_t→S, p_t→L]이다. 만일, 에이전트가 '직진' 행동(action)을 취하는 경우, 'p_t→S'는 's_t+1'의

의 예측에 사용된다. 동시에 RL-TPVR은 트래픽 예측 모듈에서 's_t+1'에 사용된 예측 평균 속도 세트를 재귀적으로 로드한다. 상태 정의에 관련된 모든 변수는 RL-TPVR의 트래픽 예측 기능을 사용하여 결정할 수 있다. 이는 MDP 공식의 상태 정의가 RL-TPVR이 로컬 경로 계획뿐만 아니라 글로벌 경로 계획을 통해 생성된 이동 경로를 제공할 수 있도록 함을 시사한다.where 'l _t ' is the estimated position of the agent at time step t and 'P _t ' describes the set of predicted average velocities for subsequent links connected to the link where the agent is located at time step t. For example, in FIG. 5 , since there are three links connected to the current link, P _t =[p _t→R , p _t→S , p _t→L ]. If the agent takes the 'straight' action, 'p _t→S ' is the value of 's _t+1 '.

used in the prediction of At the same time, RL-TPVR recursively loads the predicted average speed set used for 's _t+1 ' in the traffic prediction module. All variables related to state definition can be determined using the traffic prediction function of RL-TPVR. This suggests that the state definition in the MDP formula allows RL-TPVR to provide not only local route planning but also travel routes generated through global route planning.

RL-TPVR에서 일반적인 차량 위치는 경도 및 위도와 같은 2차원 벡터로 간주되었다. 상태 공간의 차원을 줄이기 위해 본 발명의 RL-TPVR은 아래 수학식 9와 같이 추정 차량 위치와 목적지 'l_s' 사이의 유클리드 거리(ED)를 사용하여 1차원 공간에서 추정 차량 위치 'l_t'를 지정한다.In RL-TPVR, typical vehicle positions were considered as two-dimensional vectors such as longitude and latitude. In order to reduce the dimension of the state space, the RL-TPVR of the present invention uses the Euclidean distance (ED) between the estimated vehicle position and the destination 'l _s ' as shown in Equation 9 below, and the estimated vehicle position 'l _t ' in the one-dimensional space to specify

여기서, 's₁'과 's₂'는 각각 에이전트의 목적지의 경도와 위도를 나타내고, 'l_t,e1'와 'l_t,e2'는 각각 시간 스텝 t에서 추정된 차량 위치의 경도와 위도를 나타낸다. 차량 위치는 도로 구간의 시작점과 끝점과 같은 도로의 기하학적 정보를 기반으로 하는 선형 적분을 이용하여 정확하게 추정할 수 있지만, 삼각형 유사도를 이용하여 계산할 수도 있다. 예를 들어, 도 5와 같이, 'l_t,e1' 도로의 중심(Centriod) 경도에서 d _t 만큼 떨어져 있는 것으로 간주되는 반면, 'l_t,e2'는 중심의 위도에 해당한다. Here, 's ₁ ' and 's ₂ ' represent the longitude and latitude of the agent's destination, respectively, and 'l _t,e1 ' and 'l _t,e2 ' are the longitude and latitude of the vehicle location estimated at time step t, respectively. indicates The vehicle position can be accurately estimated using linear integration based on the geometric information of the road, such as the start and end points of a road segment, but can also be calculated using triangular similarity. For example, as shown in FIG. 5 , 'l _t,e1 ' is considered to be d _t away from the center longitude of the road, whereas 'l _t,e2 ' corresponds to the latitude of the center.

RL-TPVR의 트래픽 예측 라우팅에서 가장 중요한 요소는 보상 함수(Reward function)이다. 예상 누적 보상을 최대화하기 위한 최적의 정책을 결정하는 MDP 최적화 프로세스의 목표와 직접 연결된다. 불확실한 교통 상황과 관련된 이동 시간을 최소화하기 위해 출발-목적지(OD) 경로를 제공하기 위해 보상 함수 'r_t'는 수학식 10과 같이 공식화된다.The most important element in the traffic prediction routing of RL-TPVR is the reward function. It is directly linked to the goal of the MDP optimization process to determine the optimal policy to maximize the expected cumulative reward. In order to provide a departure-to-destination (OD) route in order to minimize the travel time associated with an uncertain traffic situation, a compensation function 'r _t ' is formulated as Equation (10).

여기서, 보상들(r_t,distance, r_t,time, r_t,prediction, r_t,terminal) 각각은 아래 수학식 11로 나타낼 수 있다.Here, each of the rewards r _t,distance , r _t,time , r _t,prediction , r _t,terminal may be expressed by Equation 11 below.

여기서, 보상들(r_t,distance, r_t,time, r_t,prediction, r_t,terminal)은 각각 시간 단계 t에서 거리, 시간, 예측 및 최종 보상을 나타내고; 'clip(ㆍ, 최소, 최대)'은 한계를 설정하도록 조정된 클리핑 함수를 나타내고; 'ED_j→ld'는 동작(Action) j에 의해 결정된 목적지와 차량 위치 사이의 유클리드 거리(ED)를 나타내고; 'TT_t'는 출발지에서 s_t지점까지의 이동 시간을; 'κ'는 최종 보상값을 나타낸다.where the rewards r _t,distance , r _t,time , r _t,prediction , r _t,terminal respectively represent distance, time, prediction and final compensation at time step t; 'clip(·, min, max)' denotes a clipping function adjusted to set limits; 'ED _j→ld ' represents the Euclidean distance (ED) between the destination and the vehicle location determined by Action j; 'TT _t ' is the travel time from the origin to the point s _t ; 'κ' represents the final reward value.

마르코프 결정 과정(MDP)의 관점에서 RL-TPVR은 최단 이동 시간 경로를 제공하기 위한 라우팅 문제에 희소 보상(Sparse reward)이 있기 때문에 본질적인 보상 없이는 의사 결정 정책을 효과적으로 학습할 수 없다. 따라서, RL-TPVR에는 내적 보상(intrinsic reward)과 외적 보상(extrinsic reward)이 모두 포함되며, 여기서, 내적 보상은 거리 보상(r_t,distance), 시간 보상(r_t,time) 그리고 예측 보상(r_t,prediction)을 가리키고, 외적 보상은 최종 보상(r_t,terminal)을 가리킨다. From the point of view of the Markov decision process (MDP), RL-TPVR cannot effectively learn a decision policy without an intrinsic reward because there is a sparse reward in the routing problem to provide the shortest travel time path. Therefore, RL-TPVR includes both an intrinsic reward and an extrinsic reward, where the intrinsic reward is a distance reward (r _t,distance ), a time reward (r _t,time ), and a predictive reward ( r _t,prediction ), and the extrinsic reward points to the final reward (r _t,terminal ).

수학식 11에서 표현된 바와 같이, 거리 보상의 확장성을 고려하기 위해 RL-TPVR은 거리 보상(r_t,distance)을 '0'과 '1' 사이 값의 비율로 나타낸다. 이렇게 하면 에이전트가 목적지에 도달할 수 있다. 유사하게 시간 보상(r_t,time)과 예측 보상(r_t,prediction)도 비율로 나타내지만 '-1'부터 '1'까지의 범위를 갖는다. 시간 보상(r_t,time)은 출발지-목적지(OD) 이동 시간을 최소화하도록 설계되었으며, 반면 예측 보상(r_t,prediction)은 이동 시간 변동성을 고려하기 위한 것이다. As expressed in Equation 11, in order to consider the scalability of the distance compensation, the RL-TPVR expresses the distance compensation (r _t,distance ) as a ratio of a value between '0' and '1'. This allows the agent to reach its destination. Similarly, time compensation (r _t,time ) and prediction compensation (r _t,prediction ) are expressed as ratios, but range from '-1' to '1'. Time compensation (r _t,time ) is designed to minimize origin-destination (OD) travel time, whereas prediction compensation (r _t,prediction ) is intended to account for travel time variability.

RL-TPVR의 가장 큰 특징은 보상 함수에서 예측 보상(r_t,prediction)을 고려한다는 점이다. 이는 모빌리티 서비스 관점에서 내비게이션의 서비스 신뢰도에 영향을 받는 예상 이동 시간과 실제 이동 시간 사이에 허용 가능한 갭이 있는지 판단하는 중요한 기준이 될 수 있다. 따라서, RL-TPVR은 보상 함수에 기반한 OD 이동 시간의 변동성을 줄여 제안 시스템이 강력한 내비게이션 서비스를 제공하는 데 도움이 될 것으로 기대된다. 마지막으로 RL-TPVR의 보상 함수는 최종 보상(r_t,terminal)을 이용한 외적 보상을 포함한다. 단말 상태가 목적지 링크에 있을 때 큰 양의 보상 값이 주어지는 반면, 도로망의 다른 경계 링크에 단말 상태가 있을 때 큰 음의 보상 값이 주어진다. The biggest feature of RL-TPVR is that it considers the prediction compensation (r _t,prediction ) in the compensation function. This can be an important criterion for determining whether there is an acceptable gap between the expected travel time and the actual travel time, which are affected by the service reliability of navigation from the viewpoint of the mobility service. Therefore, it is expected that RL-TPVR will help the proposed system to provide a robust navigation service by reducing the variability of the OD travel time based on the reward function. Finally, the reward function of RL-TPVR includes an extrinsic reward using the final reward (r _t,terminal ). A large positive reward value is given when the terminal state is on the destination link, while a large negative reward value is given when the terminal state is on another boundary link in the road network.

RL-TPVR의 트래픽 예측 라우팅 모듈은 라우팅 문제를 MDP로 공식화하지만 최적의 정책을 얻기 위해서는 여전히 강화학습 모델이 필요하다. 트래픽 예측 라우팅 기능은 일괄 프로세스(batch process)로 구현된다. 그러나 탐색 및 착취 문제(exploration and exploitation problems)로 인해 훈련 시간과 정확도 사이에 트레이드-오프(Trade-off)가 존재한다. 또한, 본 발명은 불확실한 교통 상황에서 순차적인 의사 결정 행위의 영역으로 라우팅 문제를 다루기 때문에 관찰 공간이 일반적으로 예상되는 것보다 훨씬 더 클 가능성이 있다. 따라서, 'On-policy' 강화학습 모델보다는 'Off-policy' 강화학습 모델을 사용하는 것이 더 적절하다. 특히, 강화학습 모델이 연속 샘플 간의 상관관계를 제거하기 위해 재생 버퍼를 사용하는 경우 우선순위가 지정된 경험 재생(PER)을 고려하는 것이 유리하다. 재생 버퍼에서 중요한 전환(s_i, a_i, r_i, s_i+1)을 샘플링한다. 따라서 본 발명에서는 RL-TPVR의 트래픽 예측 라우팅 기능에서 강화학습 모델을 위한 PER 알고리즘을 채택하며, 이를 PDDQN(PER-based Double-deep Q-network)이라 한다. RL-TPVR의 트래픽 예측 라우팅에 사용되는 학습 방법의 구체적인 내용은 다음과 같다.The traffic prediction routing module of RL-TPVR formalizes the routing problem as MDP, but still requires a reinforcement learning model to obtain an optimal policy. The traffic prediction routing function is implemented as a batch process. However, there is a trade-off between training time and accuracy due to exploration and exploitation problems. Furthermore, the observation space is likely to be much larger than would normally be expected because the present invention addresses the routing problem into the realm of sequential decision-making actions in uncertain traffic conditions. Therefore, it is more appropriate to use the 'Off-policy' reinforcement learning model rather than the 'On-policy' reinforcement learning model. In particular, it is advantageous to consider prioritized experiential play (PER) when a reinforcement learning model uses a play buffer to remove correlations between successive samples. Sample important transitions (s _i , a _i , r _i , s _i+1 ) in the playback buffer. Therefore, in the present invention, the PER algorithm for the reinforcement learning model is adopted in the traffic prediction routing function of RL-TPVR, which is called PDDQN (PER-based Double-deep Q-network). Specific details of the learning method used for RL-TPVR traffic prediction routing are as follows.

PDDQN은 DDQN(Van Hasselt et al., 2016)의 확장된 버전으로, DQN(deep-Q-network)과 관련된 과대 평가 문제를 다루기 위한 것이다. DDQN은 원래 DQN의 대상에서 최대 작업을 작업 선택 및 평가로 분해한다. DDQN은 아래 수학식 12와 같이 시간차(TD) 오류 'δ_i'를 기반으로 업데이트된다.PDDQN is an extended version of DDQN (Van Hasselt et al., 2016) to deal with the overestimation problem associated with deep-Q-network (DQN). DDQN decomposes the maximum task in the original DQN's target into task selection and evaluation. The DDQN is updated based on the time difference (TD) error 'δ _i ' as shown in Equation 12 below.

여기서, 'γ'는 할인 요인을 나타내고, Q_θ ^-(s, a)는 대상 신경망 'θ^-'에 관련된 가중치 집합이 있는 상태가 주어졌을 때 동작의 품질을 평가하는 동작-값(Action-value) 함수를 나타낸다. Q_θ(s_i, a_i)는 's_i'와 'a_i'의 쌍에 대한 온라인 신경망 θ의 가중치 세트가 있는 동작-값 함수를 나타낸다. DQN 학습 방법과 유사하게 대상 신경망 'θ^-'의 매개변수는 온라인 신경망 θ에 포함된 가중치 매개변수의 복사본으로 주기적으로 업데이트된다. TD 오류를 기반으로 재생 버퍼에서 전환의 우선순위를 지정하기 위해 중요도 샘플링 기술을 사용한 비균일 샘플링이 수학식 13 및 수학식 14와 같이 PDDQN에서 추가로 고려된다.Here, 'γ' represents a discount factor, and Q _θ ^- (s, a) is an action-value that evaluates the quality of an action when a state with a set of weights related to the target neural network 'θ ^- ' is given. ) represents a function. Q _θ (s _i , a _i ) denotes a set of weighted action-value functions of the online neural network θ for pairs of 's _i ' and 'a _i '. Similar to the DQN learning method, the parameters of the target neural network 'θ ^- ' are periodically updated with a copy of the weight parameters included in the online neural network θ. Non-uniform sampling using importance sampling technique to prioritize transitions in the playback buffer based on TD error is further considered in PDDQN as shown in equations 13 and 14.

여기서, U(i)는 샘플링 전환 i의 확률을 나타내고, 'u_i ^α'는 샘플링 전환 i의 우선 순위를 나타내고, 'α'는 우선 순위 지수이고, 'b'는 미니 배치(mini-batch)의 크기를 나타낸다.where U(i) denotes the probability of sampling transition i, 'u _i ^α ' denotes the priority of sampling transition i, 'α' is the priority index, and 'b' is the mini-batch indicates the size of

여기서, 'w_i'는 중요도-샘플링 가중치(importance-sampling weight), 'B'는 버퍼 크기, 'β'는 우선순위 중요도-샘플링 지수(important-sampling exponent)를 나타낸다. 수학식 12 내지 수학식 14의 사용에서, 온라인 신경망 θ에 관련된 가중치 매개변수는 다음의 수학식 15와 같이 업데이트된다. Here, 'w _i ' denotes an importance-sampling weight, 'B' denotes a buffer size, and 'β' denotes a priority importance-sampling exponent. In the use of Equations 12 to 14, the weight parameter related to the online neural network θ is updated as shown in Equation 15 below.

여기서, 'η'는 단계 크기를 나타낸다.Here, 'η' represents the step size.

도 6은 본 발명의 트래픽 예측 라우팅 알고리즘의 성능을 평가하기 위한 시뮬레이션에 사용된 도로망과 교통 수요 패턴을 보여주는 도면이다. 도 6을 참조하면, 3×3 그리드 형태의 도시 네트워크는 각 링크에 4개의 차선이 있으며, 모두 양방향인 사이트로 간주된다. 자유 유속은 모든 도로에서 50km/h이고 링크 길이 L₁과 L₂는 각각 200m와 300m이다. 6 is a diagram showing a road network and traffic demand pattern used in simulation for evaluating the performance of the traffic prediction routing algorithm of the present invention. Referring to FIG. 6 , the city network in the form of a 3×3 grid has four lanes on each link, and all are considered to be bidirectional sites. The free flow velocity is 50 km/h on all roads and the link lengths L ₁ and L ₂ are 200 m and 300 m respectively.

출퇴근 시간 동안 도시 도로망에 비대칭적인 교통 수요가 종종 있기 때문에 현장에서는 두 개의 메이저 수요 흐름(Major Demand)과 두 개의 마이너 수요 흐름(Minor Demand)을 고려한다. 두 개의 마이너 수요 흐름은 서쪽과 남쪽으로 발생하는 반면, 두 개의 메이저 수요 흐름은 동쪽과 북쪽으로 발생하며 동쪽으로 향하는 교통량은 북쪽으로 향하는 교통량보다 약간 크다. 개별 교차로에서 4상 신호 계획이 있는 각 방향의 녹색 신호 타임은 40초로 설정된 동쪽 방향을 제외하고 30초로 설정된다. 또한, 시뮬레이션 실험은 대규모 동쪽으로 향하는 트래픽으로 인한 대기열 스필백을 방지할 수 있도록 신호 오프셋을 조정하여 동쪽으로 향하는 트래픽 흐름에 대한 여러 신호 간의 간단한 조정을 고려한다. 또한, 트래픽 수요의 일일 변화를 설명하기 위해 다른 평균 및 표준 편차 값으로 정규 분포를 따르는 일련의 무작위 변수를 사용하여 각 방향의 일일 트래픽 수요를 생성한다. 여행 생성 및 분포를 기반으로 이 연구는 동적 트래픽 할당을 위해 SUMO의 DUArouter 도구를 사용하지만 SUMO에서 트래픽 할당 도구를 선택하는 데 사용할 수 있는 다른 옵션이 있다. 트래픽 할당 도구는 경로 선택 방법과 라우팅 알고리즘을 제어할 수 있으므로 네트워크에서 이질적으로 로드된 트래픽을 사용하여 통근 트래픽 수요 패턴을 나타내는 것이 더 적절할 수 있다.Since there is often asymmetric traffic demand on the urban road network during rush hours, the site considers two major and two minor demand streams. The two minor demand flows occur west and south, while the two major demand flows occur east and north, with eastbound traffic being slightly larger than northbound traffic. At individual intersections, the green signal time in each direction with a four-phase signaling scheme is set to 30 seconds, except for the east direction, which is set to 40 seconds. In addition, the simulation experiments allow for simple coordination between multiple signals for east-facing traffic flows by adjusting the signal offsets to avoid queue spillback due to large-scale east-facing traffic. It also generates daily traffic demand in each direction using a set of random variables that follow a normal distribution with different mean and standard deviation values to account for daily changes in traffic demand. Based on the trip generation and distribution, this study uses SUMO's DUArouter tool for dynamic traffic allocation, but there are other options available for choosing a traffic allocation tool from SUMO. Traffic allocation tools can control how routes are selected and routing algorithms, so it may be more appropriate to use heterogeneously loaded traffic from the network to represent commuting traffic demand patterns.

도 7은 도 6의 도로망과 교통 수요 패턴에서 교통 체증의 반복 및 비 반복적인 혼잡 상황에 대한 예들을 시나리오별로 보여준다. 도 7을 참조하면, 불확실한 교통 상황에서 RL-TPVR의 성능을 분석하기 때문에 시뮬레이션 실험에서는 반복적인 교통 혼잡 사례와 비재발성 교통 혼잡 사례를 포함한 여러 교통 시나리오가 고려될 수 있다. 7 shows examples of repetitive and non-repetitive congestion situations of traffic jams in the road network and traffic demand pattern of FIG. 6 for each scenario. Referring to FIG. 7 , since the performance of the RL-TPVR is analyzed in an uncertain traffic situation, several traffic scenarios including repetitive traffic congestion cases and non-recurrent traffic congestion cases can be considered in the simulation experiment.

(a)에 도시된 시나리오 1은 대규모 교통 수요로 인한 반복적인 교통 혼잡의 정상적인 경우를 설명하고, 다른 시나리오들 (b) 내지 (e)는 다른 지정된 링크에 정차한 차량으로 인한 비정상적 교통 혼잡의 비정상적인 경우를 나타낸다. 비정상적인 경우 정지된 차량이 토출량(Discharge flow)에 영향을 미쳐 용량이 감소하는 현상을 보이다. 따라서 지정된 링크를 통과할 때 예기치 않은 지연이 발생한다. 정지된 차량은 에이전트 차량이 출발지에서 출발하기 직전에 미리 결정된 링크에 있을 것이다. 에이전트 차량은 내비게이션 시스템이 제공하는 초기 글로벌 경로에 링크가 포함되어 있는 경우 혼잡한 도로를 통과하기 위해 우회하거나 추가 이동 시간이 소요될 수 있는 경로 결정이 필요하여 종종 시간 지연이 발생할 것으로 예상된다. 그렇지 않으면, 초기 경로에 혼잡한 도로가 포함되지 않은 경우 지정된 경로에 합리적인 이동 시간이 필요하다. 따라서, 이러한 교통 시나리오는 가까운 미래 교통 상황의 가능한 변화를 고려하여 제안된 알고리즘의 성능을 분석하는 데 사용할 수 있다.Scenario 1 shown in (a) describes the normal case of repeated traffic congestion due to large-scale traffic demand, and the other scenarios (b) to (e) are abnormal cases of abnormal traffic congestion caused by vehicles stopped at other designated links. indicates the case. In an abnormal case, a stopped vehicle affects the discharge flow, resulting in a decrease in capacity. As a result, unexpected delays occur when traversing a given link. The stopped vehicle will be at a predetermined link just before the agent vehicle departs from the origin. Agent vehicles are often expected to experience delays when a link is included in the initial global route provided by the navigation system, requiring route decisions that may take detours or additional travel time to traverse congested roads. Otherwise, a reasonable travel time is required on the designated route if the initial route does not include congested roads. Therefore, these traffic scenarios can be used to analyze the performance of the proposed algorithm considering possible changes in traffic conditions in the near future.

도 8은 본 발명의 RL-TPVR의 트래픽 예측 모델에 사용된 하이퍼파라미터 값의 세부사항은 보여주는 테이블이다. 도 8의 파라미터를 사용하여 RL-TPVR의 학습 및 테스트를 위해 시뮬레이션은 20일 동안 트래픽 데이터를 생성했다. 8 is a table showing details of hyperparameter values used in the traffic prediction model of RL-TPVR according to the present invention. For training and testing of RL-TPVR using the parameters in Fig. 8, the simulation generated traffic data for 20 days.

학습 세트는 처음 16일의 데이터 세트를 사용하고, 이후 2일의 데이터 세트와 나머지 2일의 데이터 세트는 각각 검증 세트와 테스트 세트로 사용했다. 시뮬레이션 런타임은 1일 240분이었고, 60분에서 180분 사이의 시간 범위를 최대 통근 시간으로 간주했다. 테스트에 C-ITS/ITS 감지기가 광범위하게 배치되었다는 가정 하에 평균 링크 속도에 대한 트래픽 데이터는 5분마다 수집되었다. 이는 제안 시스템에서 트래픽 예측 모듈의 단위 시간 간격이 5분으로 설정되었음을 의미한다. For the training set, the data set of the first 16 days was used, and the data set of the second day and the data set of the remaining two days were used as the validation set and the test set, respectively. The simulation runtime was 240 minutes per day, and the time range between 60 and 180 minutes was considered the maximum commuting time. Traffic data for average link speed was collected every 5 minutes, assuming that C-ITS/ITS detectors were extensively deployed in the test. This means that the unit time interval of the traffic prediction module in the proposed system is set to 5 minutes.

하이퍼파라미터 설정을 사용하면 RL-TPVR의 트래픽 예측 모듈(1520)에 있는 딥러닝 모델이 일괄 프로세스로 학습된다. 이후, 트래픽 예측 모듈(1520)은 제안된 내비게이션의 사용 요청이 있는 경우 실시간 프로세스 기반의 실시간 교통 데이터를 이용하여 트래픽 예측 값을 생성한다. 트래픽 예측 값은 강화학습에 사용되는 상태 정보에 대한 과거 트래픽 데이터베이스 및 트래픽 예측 라우팅 모듈(1540)과 공유된다.Using hyperparameter settings, the deep learning model in the traffic prediction module 1520 of RL-TPVR is trained in a batch process. Thereafter, when there is a request to use the proposed navigation, the traffic prediction module 1520 generates a traffic prediction value using real-time traffic data based on a real-time process. The traffic prediction value is shared with the historical traffic database and the traffic prediction routing module 1540 for state information used for reinforcement learning.

도 9는 본 발명의 RL-TPVR 트래픽 예측 라우팅 모델에 사용된 하이퍼파라미터 값을 보여주는 테이블이다. 도 9를 참조하면, 트래픽 예측 모듈(1520)에서 사용되는 딥러닝 모델과 유사하게 트래픽 예측 라우팅 모듈(1540)은 강화학습 모델 훈련을 위한 하이퍼파라미터 설정을 지정해야 한다.9 is a table showing hyperparameter values used in the RL-TPVR traffic prediction routing model of the present invention. Referring to FIG. 9 , similarly to the deep learning model used in the traffic prediction module 1520 , the traffic prediction routing module 1540 must specify hyperparameter settings for training the reinforcement learning model.

트래픽 예측 라우팅 모델(1540)은 과거 트래픽 데이터베이스에서 얻은 트래픽 정보를 사용하여 지정된 하이퍼파라미터 설정으로 학습된다. 트래픽 예측 라우팅 모듈(1540)의 강화학습 모델을 일괄 프로세스로 교육한 후 실시간 프로세스로 내비게이션 서비스를 제공할 수 있다.The traffic prediction routing model 1540 is trained with specified hyperparameter settings using traffic information obtained from a historical traffic database. After training the reinforcement learning model of the traffic prediction routing module 1540 as a batch process, a navigation service may be provided as a real-time process.

도 10은 RL-TPVR의 트래픽 예측 라우팅 모델을 훈련할 때 연속 5회에 걸쳐 이동 평균을 사용하여 측정한 평균 보상을 보여주는 그래프이다. 도 10을 참조하면, 전반적으로 평균 보상은 탐사율이 점차 감소함에 따라 최적의 값에 접근한다. 초반의 높은 탐색률로 인해 평균 보상이 크게 변동하여 600회에 도달한다. 또한, 에피소드 수가 600회를 초과하면 평균 보상이 수렴되는 것을 관찰할 수 있다. 10 is a graph showing average compensation measured using a moving average over five consecutive times when training a traffic prediction routing model of RL-TPVR. Referring to FIG. 10 , the overall average compensation approaches an optimal value as the exploration rate gradually decreases. Due to the high initial search rate, the average reward fluctuates greatly, reaching 600 times. In addition, it can be observed that the average reward converges when the number of episodes exceeds 600.

제안된 시스템은 트래픽 예측 모듈(1520) 및 트래픽 예측 라우팅 모듈(1540)과 관련된 신경망을 훈련할 때 비반복적인 트래픽 혼잡 사례를 포함하지 않는다. 비정상적 상황으로 인한 비반복적 트래픽 혼잡은 테스트 데이터 세트에서만 관찰된다. 따라서 제안된 시스템은 RL-TPVR의 유효성을 입증하기 위해 이전에 본 적이 없는 트래픽 사례를 사용하여 평가되어야 한다.The proposed system does not include non-repetitive traffic congestion cases when training the neural network associated with the traffic prediction module 1520 and the traffic prediction routing module 1540 . Non-repetitive traffic congestion due to anomalies is observed only in the test data set. Therefore, the proposed system should be evaluated using previously unseen traffic cases to prove the validity of RL-TPVR.

도 11은 다양한 시나리오 사례에서 예측 보상이 있는 경우와 없는 경우 RL-TPVR 알고리즘 간에 OD 이동 시간 비교 결과를 보여주는 그래프이다. 도 11을 참조하면, 각 시나리오의 100개의 독립적인 경우에 대한 예측 보상이 있거나 없는 RL-TPVR을 사용하여 얻은 OD 이동 시간이 파란색 및 주황색 상자 그림을 사용하여 각각 도시되어 있다. 11 is a graph showing OD movement time comparison results between RL-TPVR algorithms with and without predictive compensation in various scenario cases. Referring to Fig. 11, the OD transit times obtained using RL-TPVR with and without predictive compensation for 100 independent cases of each scenario are shown using blue and orange boxplots, respectively.

반복되는 교통 혼잡 상황인 시나리오1에서 예측 보상이 있는 RL-TPVR은 예측 보상이 없는 RL-TPVR에 비해 훨씬 작은 변동으로 OD 이동 시간의 더 낮은 중앙값을 나타냈다. 다른 시나리오들(시나리오2 ~ 시나리오5)은 시나리오1과 유사하며, 예측 보상이 있는 RL-TPVR은 예측 보상이 없는 RL-TPVR에 비해 더 나은 라우팅 성능을 나타낸다.In Scenario 1, a recurrent traffic congestion situation, the RL-TPVR with predictive reward exhibited lower median OD travel times with much smaller fluctuations than the RL-TPVR without predictive reward. Other scenarios (Scenario 2 ~ Scenario 5) are similar to Scenario 1, and RL-TPVR with predictive compensation shows better routing performance than RL-TPVR without predictive compensation.

또한, 시나리오1에서 예측 보상이 없는 RL-TPVR이 OD 이동 시간에 큰 편차를 보이는 것을 관찰할 수 있다. 이러한 경향은 다른 시나리오들에서도 관찰된다. 시나리오2, 시나리오3, 시나리오5와 같은 일부 비반복적 교통 혼잡 사례에서 OD 이동 시간의 분산이 약간 증가함을 알 수 있다. 반대로, 예측 보상이 있는 RL-TPVR이 다음을 보여주는 것을 쉽게 관찰할 수 있다. 비반복적인 혼잡 상황에서도 그렇지 않은 경우보다 분산이 더 낮다. 이는 예측 보상이 있는 RL-TPVR이 혼잡 유형에 관계없이 안정적인 라우팅 성능을 나타냄을 나타낸다. 이러한 결과는 RL-TPVR의 예측 보상이 OD 이동 시간의 변동성을 줄임으로써 강력한 경로 안내에 기여함을 시사한다. 따라서 이러한 결과는 경로 안내에 대한 RL-TPVR과 관련된 예측 보상의 효과를 확립하기 위한 실증적 증거를 제공하는 것으로 이해될 수 있다.In addition, in Scenario 1, it can be observed that RL-TPVR without predictive compensation shows a large deviation in OD movement time. This trend is also observed in other scenarios. It can be seen that the dispersion of OD travel time slightly increases in some non-repetitive traffic congestion cases such as Scenario 2, Scenario 3, and Scenario 5. Conversely, it can be easily observed that RL-TPVR with predictive compensation shows: Even in non-repetitive congestion situations, the variance is lower than otherwise. This indicates that RL-TPVR with predictive compensation shows stable routing performance regardless of congestion type. These results suggest that the predictive reward of RL-TPVR contributes to robust route guidance by reducing the variability in OD travel time. Therefore, these results can be understood as providing empirical evidence to establish the effect of predictive reward related to RL-TPVR on route guidance.

도 12는 시나리오별 예측 오차가 다른 RL-TPVR 간의 성능 격차를 보여주는 그래프이다. 도 12를 참조하면, RL-TPVR의 성능에 대한 예측 기능의 영향을 추가로 탐색하기 위해 각 시나리오에서 100개의 독립적인 경우에 대해 완벽한 예측과 서로 다른 예측 오류가 있는 RL-TPVR 간의 OD 이동 시간의 성능 격차가 도시된다.12 is a graph showing a performance gap between RL-TPVRs having different prediction errors for each scenario. Referring to Figure 12, to further explore the effect of the prediction function on the performance of RL-TPVR, the OD transit time between RL-TPVR with perfect prediction and different prediction errors for 100 independent cases in each scenario. The performance gap is shown.

도시된 그래프에서, 트래픽 예측 오차가 있는 RL-TPVR과 없는 RL-TPVR 간의 성능 차이의 평균값과 표준편차 값은 각각 컬러 막대와 검은색 오차 막대 그래프로 표시된다. 5%에서 25% 범위의 트래픽 예측 오류가 고려된다. 예를 들어, 특정 기간 동안 도로 구간의 실제 평균 속도가 30km/h인 경우 예측 오차가 5%일 때 28.5km/h 또는 31.5km/h가 예측 평균 속도인 것으로 간주된다. In the graph shown, the average value and standard deviation value of the performance difference between the RL-TPVR with and without traffic prediction error are indicated by color bars and black error bar graphs, respectively. Traffic prediction errors ranging from 5% to 25% are considered. For example, if the actual average speed of a road section during a specific period is 30 km/h, 28.5 km/h or 31.5 km/h is considered to be the predicted average speed when the prediction error is 5%.

시나리오1과 같이 트래픽 예측 오차가 커질수록 성능 격차의 평균값과 표준편차 값이 증가한다. 유사하게, 다른 시나리오에서는 예측 오류가 감소함에 따라 성능 격차가 감소한다. RL-TPVR은 트래픽 예측 정확도가 증가함에 따라 약간 개선된 라우팅 성능을 나타낸다. 반대로 비재발성 교통혼잡의 경우 성능차이의 평균값과 표준편차 값은 반복적 정체의 경우에 비해 급격히 감소하였다. 예를 들어 시나리오1과 비교하여 시나리오4에서 이러한 성능 격차가 절반 이상 감소하는 것을 관찰할 수 있다. 이러한 추세는 제안된 라우팅 알고리즘이 좋지 않은 예측 능력을 보여준다.As in Scenario 1, as the traffic prediction error increases, the average value and standard deviation of the performance gap increase. Similarly, in other scenarios, the performance gap decreases as the prediction error decreases. RL-TPVR shows slightly improved routing performance as traffic prediction accuracy increases. Conversely, in the case of non-recurrent traffic congestion, the average value and standard deviation of the performance difference decreased sharply compared to the case of repeated congestion. For example, it can be observed that this performance gap is reduced by more than half in Scenario 4 compared to Scenario 1. This trend shows that the proposed routing algorithm has poor predictive ability.

도 13은 각 시나리오별로 여러 알고리즘들의 경로 안내의 예들을 보여주는 도면들이다. 도 13을 참조하면, 동일한 트래픽 수요 패턴을 포함하는 다양한 시나리오에 대한 각 라우팅 알고리즘 솔루션의 몇 가지 예를 보여준다. 여기서, 출발지와 목적지는 각각 네트워크의 왼쪽 하단 및 오른쪽 상단 링크로 설정된다.13 is a diagram showing examples of route guidance of various algorithms for each scenario. Referring to FIG. 13 , several examples of each routing algorithm solution for various scenarios involving the same traffic demand pattern are shown. Here, the source and destination are set to the lower left and upper right links of the network, respectively.

반복되는 교통혼잡 사례인 시나리오1에서는 Dijkstra 알고리즘이 다른 알고리즘에 비해 훨씬 긴 OD 이동시간을 보이는 것을 관찰할 수 있다. 제안된 알고리즘과 비교하여 Dijkstra 알고리즘은 30% 이상의 시간 지연을 겪는다. 에이전트 차량은 Dijkstra 알고리즘이 제공하는 최단 이동 경로 중 하나를 따르더라도 목적지에 도달하는 데 더 많은 시간을 소비하는 것을 관찰할 수 있는데, 이는 좌측 상단에 해당하는 이동 경로에 많은 교통량이 있기 때문이다. 네트워크의 모서리. 반대로 다른 라우팅 알고리즘은 다른 여행 경로를 제공한다. 그들은 왼쪽 상단 모서리를 피하는 것과 같은 공통 우회 경로를 가지고 있으므로 Dijkstra 알고리즘과 비교하여 OD 이동 시간을 상당히 줄이다.In Scenario 1, which is a case of repeated traffic congestion, it can be observed that the Dijkstra algorithm shows a much longer OD travel time than other algorithms. Compared to the proposed algorithm, the Dijkstra algorithm suffers from a time delay of 30% or more. It can be observed that the agent vehicle spends more time reaching its destination even if it follows one of the shortest travel paths provided by the Dijkstra algorithm, because there is a lot of traffic on the travel path corresponding to the upper left. edge of the network. Conversely, different routing algorithms provide different travel paths. They have a common detour path, such as avoiding the upper left corner, thus significantly reducing the OD travel time compared to the Dijkstra algorithm.

비재발성 교통혼잡 사례인 시나리오 2에서 Dijkstra 알고리즘의 OD 이동시간은 이동경로의 비정상적인 교통상황으로 인해 RL-TPVR에 비해 약 12% 정도 증가함을 알 수 있다. 유사하게, 시나리오 3은 A* 알고리즘의 최악의 성능을 보여주며, 이 알고리즘의 OD 이동 시간은 일반적인 경우에 비해 약 44% 증가한다.In Scenario 2, which is a non-recurrent traffic congestion case, it can be seen that the OD travel time of the Dijkstra algorithm increases by about 12% compared to the RL-TPVR due to the abnormal traffic conditions of the moving route. Similarly, scenario 3 shows the worst-case performance of the A* algorithm, and the OD travel time of this algorithm increases by about 44% compared to the general case.

또한, 기존 알고리즘은 미래 트래픽의 가능한 변경을 고려하지 않고 라우팅 결정을 고수하여 비 반복적 트래픽 혼잡의 영향을 고려한 적응 능력이 좋지 않음을 관찰할 수 있다. 시나리오 4에서는 에이전트 차량이 내비게이션 시스템에서 RL-VR을 사용할 때 목적지까지 이동하는 데 28% 이상의 추가 이동 시간이 필요한 것으로 나타났다. RL-VR은 보상 모델링 시스템을 통해 최단 이동 시간 경로를 제공하도록 설계되었지만 이 알고리즘은 반복되지 않는 교통 혼잡으로 인해 상당한 시간 지연이 발생한다.In addition, it can be observed that the existing algorithm has poor adaptability considering the impact of non-repetitive traffic congestion by adhering to routing decisions without considering possible changes in future traffic. Scenario 4 shows that when agent vehicles use RL-VR in their navigation system, more than 28% additional travel time is required to get to their destination. Although RL-VR is designed to provide the shortest travel time path through a compensation modeling system, this algorithm suffers from significant time delays due to non-repeating traffic congestion.

가장 뛰어난 성능은 RL-TPVR에서 얻을 수 있다. 제안된 라우팅 알고리즘은 기존에 개발된 알고리즘과 달리 비정상적인 교통 상황에서 동적 이동 경로를 제공한다. 시나리오 4에서 볼 수 있듯이 RL-TPVR은 가능한 교통 혼잡을 피하기 위해 이동 경로를 변경하고 RL-VR에 비해 OD 이동 시간을 22.5% 단축할 수 있다. 이 현상은 RL-TPVR의 예측 기능이 비정상적인 교통 상황과 관련된 이동 경로를 제외할 수 있기 때문에 관찰될 수 있다. 이러한 경향은 시나리오5에서도 관찰되는데, RL-VR의 OD 이동 시간은 시나리오1과 비교하여 거의 24.81% 증가한 반면 제안된 알고리즘의 OD 이동 시간은 반복되는 교통 혼잡의 경우와 거의 동일하다. 또한, 기존에 개발된 라우팅 알고리즘의 이동 경로는 비정상적인 교통 상황으로 인한 교통 혼잡에 대해 불변인 반면, RL-TPVR은 비 반복적인 교통 혼잡 상황에서도 예기치 않은 지연을 완화하기 위해 유연하고 동적인 이동 경로를 제공한다. 이러한 결과는 RL-TPVR이 비반복적인 교통 혼잡으로 인한 예기치 않은 지연을 완화하기 위해 내비게이션 시스템에서 사용하는 가장 효과적인 라우팅 알고리즘임을 시사한다.The best performance can be obtained from RL-TPVR. Unlike previously developed algorithms, the proposed routing algorithm provides a dynamic movement path in abnormal traffic conditions. As shown in Scenario 4, RL-TPVR can change travel routes to avoid possible traffic congestion and reduce OD travel time by 22.5% compared to RL-VR. This phenomenon can be observed because the predictive function of RL-TPVR can exclude travel routes related to abnormal traffic conditions. This trend is also observed in Scenario 5, where the OD travel time of RL-VR increased by almost 24.81% compared to Scenario 1, while the OD travel time of the proposed algorithm is almost the same as in the case of repeated traffic congestion. In addition, while the previously developed routing algorithm is invariant to traffic congestion caused by abnormal traffic conditions, RL-TPVR develops a flexible and dynamic movement route to alleviate unexpected delays even in non-repetitive traffic congestion situations. to provide. These results suggest that RL-TPVR is the most effective routing algorithm used in navigation systems to mitigate unexpected delays caused by non-repetitive traffic congestion.

도 14는 다양한 시나리오에서 각 알고리즘에 대한 OD 이동 시간의 평균 및 표준편차를 보여주는 표이다. 도 14를 참조하면, 크기 s=100의 무작위 샘플을 사용하여 생성된 특정 트래픽 수요 패턴과 관련된 여러 시나리오에 대해 RL-TPVR의 라우팅 성능과 이전에 개발된 라우팅 알고리즘을 비교한다. 각 경우에는 출발 시간, 출발지, 목적지, 및 트래픽 수요, 다양한 트래픽 조건에 대한 라우팅 알고리즘의 보다 포괄적인 분석을 수행할 수 있다.14 is a table showing the mean and standard deviation of OD transit times for each algorithm in various scenarios. Referring to FIG. 14 , the routing performance of RL-TPVR and a previously developed routing algorithm are compared for various scenarios related to specific traffic demand patterns generated using random samples of size s=100. In each case, a more comprehensive analysis of the routing algorithm for departure time, origin, destination, and traffic demand, various traffic conditions can be performed.

시나리오1에서는 A^* 알고리즘이 다른 라우팅 알고리즘보다 훨씬 더 많은 이동 시간을 요구하는 최악의 성능을 보이는 것으로 나타났다. 또한, A^* 알고리즘에서 구한 표준편차가 다른 알고리즘보다 훨씬 크기 때문에 도착 시간을 부정확하게 추정할 수 있다.In Scenario 1, the A ^* algorithm showed the worst performance requiring much more travel time than other routing algorithms. Also, since the standard deviation obtained by the A ^* algorithm is much larger than that of other algorithms, the arrival time may be estimated inaccurately.

반대로, RL-VR은 Dijkstra 및 A^*알고리즘보다 훨씬 더 나은 성능을 보이다. 또한, 제안된 알고리즘은 기존 라우팅 알고리즘보다 성능이 뛰어납니다. Dijkstra, A^* 및 RL-VR 알고리즘과 비교하여 RL-TPVR은 OD 이동 시간을 각각 약 27%, 39% 및 9% 줄이다. 이러한 경향은 비재발성 교통혼잡 사례에서도 관찰되었다. 예를 들어 RL-TPVR은 Dijkstra, A^* 및 RL-VR 알고리즘과 비교하여 시나리오 4의 평균 이동 시간을 각각 24.33%, 35.43% 및 27.12% 줄일 수 있다.Conversely, RL-VR outperforms Dijkstra and A ^* algorithms much better. In addition, the proposed algorithm outperforms the existing routing algorithms. Compared with the Dijkstra, A ^* and RL-VR algorithms, RL-TPVR reduces the OD transit time by approximately 27%, 39% and 9%, respectively. This trend was also observed in non-recurrent traffic congestion cases. For example, RL-TPVR can reduce the average travel time of Scenario 4 by 24.33%, 35.43% and 27.12%, respectively, compared to the Dijkstra, A ^* and RL-VR algorithms.

중요한 것은 제안한 알고리즘이 모든 교통 상황에서 작은 편차로 OD 이동 시간을 크게 줄이는 것으로 나타났으며, 이는 비반복적인 혼잡이 발생하는 경우에도 라우팅 성능이 안정화될 수 있음을 나타낸다. 이는 RL-TPVR이 성능 격차 분석의 이전 결과와 일치하는 OD 이동 시간의 변동성을 줄임으로써 내비게이션 시스템이 신뢰할 수 있는 경로 안내를 제공할 수 있음을 의미한다.Importantly, the proposed algorithm was shown to significantly reduce the OD travel time with small deviations in all traffic conditions, indicating that the routing performance can be stabilized even when non-repetitive congestion occurs. This means that RL-TPVR reduces the variability in OD travel time, which is consistent with previous results of performance gap analysis, so that the navigation system can provide reliable route guidance.

통계 결과에 따르면 RL-TPVR에서 가장 뛰어난 성능을 얻을 수 있다. 그럼에도 불구하고 몇 가지 극단적인 경우가 이 시스템의 전체 성능을 무효화하고 제안된 알고리즘의 성능이 다른 기존 알고리즘에 비해 과대평가될 수 있다고 의심하는 것이 합리적이다. RL-TPVR이 모든 동일한 트래픽 조건에서 기존 알고리즘보다 더 나은 라우팅 성능을 보이는지 확인하기 위해 유의 수준 0.05의 단측 Wilcoxon 부호 순위 검정을 수행한다. 동일한 교통 상황 i에서 이전에 개발된 알고리즘과 RL-TPVR 간의 OD 이동 시간의 차이는 δ_i로 표시된다. n개 샘플에 대한 δ_i의 중앙값은 m_δn으로 표현될 수 있으며, 여기서 n은 100과 같다.According to the statistical results, the best performance can be obtained from RL-TPVR. Nevertheless, it is reasonable to suspect that some extreme cases will invalidate the overall performance of this system and that the performance of the proposed algorithm may be overestimated compared to other existing algorithms. To confirm that RL-TPVR has better routing performance than the existing algorithm under all the same traffic conditions, a one-tailed Wilcoxon signed rank test with significance level of 0.05 is performed. The difference in OD travel time between the previously developed algorithm and RL-TPVR in the same traffic situation i is denoted by δ _i . The median of δ _i for n samples can be expressed as m _δn , where n is equal to 100.

도 15는 각 시나리오의 OD 이동 시간에 대한 단측 Wilcoxon 부호 순위 검정에 대한 p-값을 보여주는 표이다. 도 15를 참조하면, p-값은 모든 시나리오에서 유의 수준 0.05보다 훨씬 낮다. 이는 주어진 유의 수준에서 대립 가설을 뒷받침할 충분한 증거가 있음을 시사한다. 이러한 결과는 모든 시나리오에서 RL-TPVR을 사용하여 최단 이동 시간 경로를 얻을 수 있음을 시사한다. 즉, 본 연구의 이러한 결과는 내비게이션 시스템이 반복 및 비반복 교통 혼잡 상황 모두에서 RL-TPVR을 활용함으로써 더 큰 이점을 얻을 수 있음을 나타낸다. 따라서 불확실한 교통 상황과 관련된 OD 이동 시간을 줄이는 가장 효과적인 방법은 RL-TPVR이 장착된 내비게이션을 사용하는 것이라고 결론지을 수 있다. 15 is a table showing the p-value for the one-sided Wilcoxon signed rank test for the OD transit time of each scenario. Referring to FIG. 15 , the p-value is well below the significance level of 0.05 in all scenarios. This suggests that there is sufficient evidence to support the alternative hypothesis at a given level of significance. These results suggest that the shortest travel time path can be obtained using RL-TPVR in all scenarios. In other words, these results of this study indicate that the navigation system can obtain greater benefits by utilizing RL-TPVR in both repetitive and non-repetitive traffic congestion situations. Therefore, it can be concluded that the most effective way to reduce the OD travel time associated with uncertain traffic conditions is to use RL-TPVR-equipped navigation.

이상에서 기술된 내용은 본 발명을 실시하기 위한 구체적인 실시 예들이다. 본 발명은 상술된 실시 예들뿐만 아니라, 단순하게 설계 변경되거나 용이하게 변경할 수 있는 실시 예들 또한 포함할 것이다. 또한, 본 발명은 실시 예들을 이용하여 용이하게 변형하여 실시할 수 있는 기술들도 포함될 것이다. 따라서, 본 발명의 범위는 상술된 실시 예들에 국한되어 정해져서는 안되며 후술하는 특허청구범위뿐만 아니라 이 발명의 특허청구범위와 균등한 것들에 의해 정해져야 할 것이다.The contents described above are specific embodiments for carrying out the present invention. The present invention will include not only the above-described embodiments, but also simple design changes or easily changeable embodiments. In addition, the present invention will include techniques that can be easily modified and implemented using the embodiments. Therefore, the scope of the present invention should not be limited to the above-described embodiments, but should be defined by the claims described below as well as the claims and equivalents of the present invention.

Claims

A control server that receives vehicle location data and travel data from a vehicle module and traffic data for each road section from a traffic sensor in real time, comprising:
a database storing the vehicle location data and the travel data, and storing the traffic data as real-time traffic data and historical traffic data;
a traffic prediction module that generates a first weight through deep learning on the past traffic data and predicts a future speed for each road section by applying the first weight to the real-time traffic data; and
Traffic prediction that generates a second weight by applying reinforcement learning to the past traffic data, and generates route guidance data by applying the second weight to the vehicle location data, the travel data, and the future speed for each road section a routing module;
The traffic prediction routing module performs the reinforcement learning by applying a compensation function including a distance compensation, a time compensation, a prediction compensation, and a final compensation,
The distance compensation is a compensation value of '0' or more and '1' or less according to the Euclidean distance between the destination and the vehicle location, and the time compensation is '-1' or more and '1' depending on the travel time from the origin to the destination. a compensation value below, and the prediction compensation is a compensation value of '-1' or more and '1' or less depending on whether the gap between the expected travel time and the actual travel time is acceptable, and the final reward is that the vehicle location is A control server that is provided with a positive or negative compensation value depending on whether it is located on the destination link of

The method of claim 1,
The traffic detector is a control server corresponding to a hybrid traffic detector including a C-ITS detector and an ITS detector.

The method of claim 1,
The traffic prediction module includes:
A first batch process for generating the first weight by learning a deep learning model after constructing a data set by pre-processing the past traffic data, and for each road section for a prediction horizon based on the first weight a first real-time process of predicting a future rate;
The first real-time process is a control server using Graph WaveNet as a prediction model.

4. The method of claim 3,
The traffic prediction routing module includes:
a second batch process of generating the second weight by learning a reinforcement learning model after composing the past traffic data into a simulation data set; the future speed for each road section and the vehicle location data based on the second weight; A control server including a second real-time process for generating the route guidance data by applying travel data to a reinforcement learning-based traffic prediction vehicle routing algorithm.

5. The method of claim 4,
The traffic prediction vehicle routing algorithm for generating the route guidance data is a control server that executes the reinforcement learning by applying an agent, an action, a state, and the reward function.