KR102400833B1

KR102400833B1 - Method and apparatus for controlling traffic signal based on ai reinforce learning

Info

Publication number: KR102400833B1
Application number: KR1020200186650A
Authority: KR
Inventors: 김영찬; 김준원
Original assignee: 서울시립대학교 산학협력단
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2022-05-20

Abstract

Disclosed is a traffic signal control device, which comprises: an intersection learning unit for deriving a green light time adjustment ratio between barriers and an inconsistent current time green light time adjustment ratio as a first action on the basis of a first state, and outputting an intersection signal control variable; a network learning unit for deriving an offset adjustment ratio for each intersection and an offset for each intersection as a second action on the basis of a second state, and outputting a network signal control variable; a traffic learning model unit for simulating a traffic situation on the basis of an input through-traffic volume for each movement type and an initial queue for each movement type to output the first state to the intersection learning unit and to output the second state to the network learning unit, and simulating the traffic situation again on the basis of the first and second actions to update the first and second states; and a signal control unit for receiving the intersection signal control variable and the network signal control variable, and applying the same to a traffic signal network. Therefore, the urban traffic congestion can be relieved.

Description

AI reinforcement learning-based traffic signal control device and method for performing the same

본 발명은 AI(artificial intelligence) 강화학습에 기반하여 교통 신호를 제어하는 장치 및 이를 수행하는 방법에 관한 것이다. The present invention relates to an apparatus for controlling a traffic signal based on artificial intelligence (AI) reinforcement learning and a method for performing the same.

최근, AI 개발로 인하여 이를 도시 공학에 적용하는 경우도 존재하지만, AI를 이용하여 교통 신호를 제어하는 기술은 주로 정보수집부분에 활용되었다. 즉, AI를 적용하여 정보를 수집하였으나, 이를 이용하여 교통 신호를 제어하는 기술 개발은 아직 초기 단계이다. Recently, due to the development of AI, there are cases where it is applied to urban engineering, but the technology to control traffic signals using AI has been mainly used for information collection. In other words, information was collected by applying AI, but the development of technology to control traffic signals using it is still in its infancy.

한편, 기존에 연구된 AI 강화학습 기반의 교통신호제어 기술들은 학습 단계별 현시 이동식 방법으로 다음 단계에 현시가 유지될지 또는 다른 어떤 현시가 올지 결정하는 방법이다. 이는 현시순서, 주기 및 배리어에 대한 제약이 필요한 현재 현장에 설치되어 있는 표준신호제어기에서 받아들이기 어려워 실제 현장에 적용하기 어려운 단점이 있다. On the other hand, the previously studied AI reinforcement learning-based traffic signal control technologies are a method of determining whether the appearance will be maintained in the next stage or any other manifestation will be made by a display-moving method for each stage of learning. This has a disadvantage in that it is difficult to apply to the actual field because it is difficult to accept in the standard signal controller installed in the current field, which requires restrictions on the display order, period, and barrier.

이에, AI가 직접 신호제어변수를 조정함으로써 교통 패턴 변화에 AI가 능동적으로 대응하는 실제 현장적용이 가능한 기술 개발이 필요하다. Therefore, it is necessary to develop a technology that can be applied to actual fields where AI can actively respond to changes in traffic patterns by directly adjusting signal control variables.

한국등록특허공보 제 10-2171671호(2020.10.23)Korean Patent Publication No. 10-2171671 (2020.10.23)

본 발명이 해결하고자 하는 과제는, AI가 직접 신호제어변수를 조정함으로써 교통 패턴 변화에 AI가 능동적으로 대응할 수 있는 교통 신호 제어 장치 및 이를 활용한 방법을 제공하는 것이다. An object of the present invention is to provide a traffic signal control device and a method using the same in which AI can actively respond to changes in traffic patterns by directly adjusting signal control variables by AI.

또한, 본 발명이 해결하고자 하는 과제는, 실제 현장에서 운영할 수 있도록 듀얼 링(Dual-ring) 기반의 현시 체계를 반영할 수 있는 교통 신호 제어 장치 및 이를 활용한 방법을 제공하는 것이다. In addition, the problem to be solved by the present invention is to provide a traffic signal control device capable of reflecting a dual-ring-based display system so that it can be operated in an actual field, and a method using the same.

또한, 본 발명이 해결하고자 하는 과제는, 교차로 뿐만 아니라 네트워크 연동 최적화를 위한 교통류 모형 학습에 기초한 교통 신호 제어 장치 및 이를 활용한 방법을 제공하는 것이다. In addition, the problem to be solved by the present invention is to provide a traffic signal control apparatus based on traffic flow model learning for optimizing not only intersections but also network interworking and a method using the same.

또한, 본 발명이 해결하고자 하는 과제는, 주어진 교통 상황에서 현시, 주기 및 옵셋과 같은 신호 제어 변수 변화에 따른 교통류 모형의 교통 상황 변화에 대한 강화 학습을 기반으로 하는 교통 신호 제어 장치 및 이를 활용한 방법을 제공하는 것이다.In addition, the problem to be solved by the present invention is, in a given traffic situation, based on reinforcement learning for changes in traffic conditions of a traffic flow model according to changes in signal control variables such as appearance, period, and offset. To provide a traffic signal control device and a method using the same.

본 발명의 실시예에 따르면, 교통 신호 제어 장치가 제공된다. 상기 교통 신호 제어 장치는, 제1 스테이트를 기반으로 배리어간 녹색 시간 조정 비율과 상충현시간 녹색 시간 조정 비율을 제1 액션으로 도출하고, 교차로 신호 제어 변수를 출력하는 교차로 학습부와; 제2 스테이트를 기반으로 교차로별 옵셋 조정 비율과 교차로별 옵셋을 제2 액션으로 도출하고, 네트워크 신호 제어 변수를 출력하는 네트워크 학습부와; 입력되는 이동류별 통과 교통량과 이동류별 초기 대기 행렬를 기반으로 교통 상황을 모사하여 상기 제1 스테이트를 상기 교차로 학습부로 출력하고, 상기 제2 스테이트를 상기 네트워크 학습부로 출력하고, 상기 제1 액션 및 상기 제2 액션을 기반으로 상기 교통 상황을 다시 모사하여 상기 제1 스테이트와 상기 제2 스테이트를 갱신하는 교통 학습 모형부와; 상기 교차로 신호 제어 변수 및 상기 네트워크 신호 제어 변수를 수신하고, 상기 교차로 신호 제어 변수 및 상기 네트워크 신호 제어 변수를 교통 신호 네트워크에 적용하는 신호 제어부를 포함한다.According to an embodiment of the present invention, a traffic signal control device is provided. The traffic signal control apparatus includes: an intersection learning unit for deriving a green time adjustment ratio between barriers and a green time adjustment ratio for a conflicting current time as a first action based on a first state, and outputting an intersection signal control variable; a network learning unit that derives an offset adjustment ratio for each intersection and an offset for each intersection as a second action based on the second state, and outputs a network signal control variable; The first state is output to the intersection learning unit by simulating the traffic situation based on the input passing traffic volume for each moving flow and the initial waiting matrix for each moving flow, and the second state is output to the network learning unit, and the first action and the first a traffic learning model unit for re-simulating the traffic situation based on 2 actions and updating the first state and the second state; and a signal control unit receiving the intersection signal control variable and the network signal control variable, and applying the intersection signal control variable and the network signal control variable to a traffic signal network.

제1 스테이트는 현시이동류별 공간 점유률, 현시이동류별 녹색시간 비율, 교차로 평균 제어 지체 및 현시별 포화도 중 적어도 하나를 포함할 수 있다.The first state may include at least one of a space occupancy rate for each current flow, a green time ratio for each current flow, an average intersection control delay, and a saturation level for each current.

상기 제2 스테이트는 제어 교차로 간 구간 평균 제어 지체 및 교차로별 옵셋을 포함할 수 있다.The second state may include a section average control delay between control intersections and an offset for each intersection.

상기 교차로 신호 제어 변수는 현시별 녹색 시간를 포함할 수 있다.The intersection signal control variable may include a green time for each display.

상기 네트워크 신호 제어 변수는 신호 변화 주기, 및 교차로 옵셋을 포함할 수 있ㄷ. The network signal control variable may include a signal change period, and an intersection offset.

상기 교차로 학습부 및 상기 네트워크 학습부는 연속적인 액션 스페이스 학습을 수행하는 DDPG(Deep Deterministic Policy Gradient) 알고리즘을 포함할 수 있다. The intersection learning unit and the network learning unit may include a Deep Deterministic Policy Gradient (DDPG) algorithm for performing continuous action space learning.

상기 교통 학습 모형부는, 상기 제1 액션으로 인하여 교차로 지체가 감소되면 상기 제1 액션을 보상하고, 상기 제2 액션으로 인하여 네트워크 지체가 감소되면 상기 제2 액션을 보상할 수 있다. The traffic learning model unit may compensate for the first action when the intersection delay is reduced due to the first action, and compensate the second action when the network delay is reduced due to the second action.

상기 교통 학습 모형부는 시공간적 셀 단위로 교통류 충격파 전파를 나타내는 셀 전파 모델을 기반으로 상기 시공간적 셀의 지체를 도출하고, 상기 지체를 기반으로 상기 제1 스테이트 및 상기 제2 스테이트를 도출할 수 있다.The traffic learning model unit may derive the latencies of the spatiotemporal cell based on a cell propagation model representing traffic flow shock wave propagation in units of spatiotemporal cells, and may derive the first state and the second state based on the lag.

본 발명의 다른 실시예에 따르면, 교차로 학습부 및 네트워크 학습부를 포함하는 교통 신호 제어 장치에 의한 교통 신호 제어 방법이 제공된다. 상기 교통 신호 제어 방법은, 입력되는 이동류별 통과 교통량과 이동류별 초기 대기 행렬를 기반으로 교통 상황을 모사하여 제1 스테이트를 상기 교차로 학습부로 출력하는 단계와; 상기 제1 스테이트를 기반으로 배리어간 녹색 시간 조정 비율과 상충현시간 녹색 시간 조정 비율을 제1 액션으로 학습하는 단계와; 상기 제1 액션을 기반으로 상기 교통 상황을 다시 모사하여 상기 제1 스테이트를 갱신하는 단계와; 상기 제1 액션으로 인하여 교차로 지체가 감소되면 상기 제1 액션을 보상하는 단계와; 상기 제1 스테이트, 상기 제1 액션 및 상기 보상을 기반으로 최적의 교차로 신호 제어 변수를 도출하고, 도출된 상기 교차로 신호 제어 변수를 교통 신호 네트워크에 적용하는 단계를 포함한다. According to another embodiment of the present invention, there is provided a traffic signal control method by a traffic signal control device including an intersection learning unit and a network learning unit. The traffic signal control method includes: outputting a first state to the intersection learning unit by simulating a traffic situation based on an input amount of passing traffic for each flow and an initial waiting queue for each flow; learning a green time adjustment ratio between barriers and a conflicting current time green time adjustment ratio as a first action based on the first state; updating the first state by re-simulating the traffic situation based on the first action; compensating for the first action when the intersection delay is reduced due to the first action; and deriving an optimal intersection signal control variable based on the first state, the first action, and the reward, and applying the derived intersection signal control variable to a traffic signal network.

상기 교통 신호 제어 방법은, 입력되는 이동류별 통과 교통량과 이동류별 초기 대기 행렬를 기반으로 교통 상황을 모사하여 제2 스테이트를 상기 네트워크 학습부로 출력하는 단계와; 상기 제2 스테이트를 기반으로 교차로별 옵셋 조정 비율과 교차로별 옵셋을 제2 액션으로 학습하는 단계와; 상기 제2 액션을 기반으로 상기 교통 상황을 다시 모사하여 상기 제2 스테이트를 갱신하는 단계와; 상기 제2 액션으로 인하여 네트워크 지체가 감소되면 상기 제2 액션을 보상하는 단계와; 상기 제2 스테이트, 상기 제2 액션 및 상기 보상을 기반으로 최적의 네트워크 신호 제어 변수를 도출하고, 도출된 상기 네트워크 신호 제어 변수를 교통 신호 네트워크에 적용하는 단계를 더 포함할 수 있다.The traffic signal control method includes: outputting a second state to the network learning unit by simulating a traffic situation based on an input amount of passing traffic for each flow and an initial waiting queue for each flow; learning an offset adjustment ratio for each intersection and an offset for each intersection as a second action based on the second state; updating the second state by re-simulating the traffic situation based on the second action; compensating for the second action when network delay is reduced due to the second action; The method may further include deriving an optimal network signal control variable based on the second state, the second action, and the reward, and applying the derived network signal control variable to a traffic signal network.

상기 제1 스테이트는 현시이동류별 공간 점유률, 현시이동류별 녹색시간 비율, 교차로 평균 제어 지체 및 현시별 포화도 중 적어도 하나를 포함할 수 있다. 상기 제2 스테이트는 제어 교차로 간 구간 평균 제어 지체 및 교차로별 옵셋을 포함할 수 있다. 상기 교차로 신호 제어 변수는 현시별 녹색 시간를 포함할 수 있다. 상기 네트워크 신호 제어 변수는 신호 변화 주기, 및 교차로 옵셋을 포함할 수 있다. The first state may include at least one of a space occupancy rate for each current flow, a green time ratio for each current flow, an average intersection control delay, and a saturation level for each current. The second state may include a section average control delay between control intersections and an offset for each intersection. The intersection signal control variable may include a green time for each display. The network signal control variable may include a signal change period and an intersection offset.

상기 교통 상황을 모사하는 단계 및 상기 교통 상황을 다시 모사하는 단계는, 시공간적 셀 단위로 교통류 충격파 전파를 나타내는 셀 전파 모델을 기반으로 상기 시공간적 셀의 지체를 도출하고, 상기 지체를 기반으로 상기 제1 스테이트 및 상기 제2 스테이트를 도출하는 단계를 포함할 수 있다.The step of simulating the traffic situation and the step of simulating the traffic situation again includes deriving the delay of the spatiotemporal cell based on a cell propagation model representing the traffic flow shock wave propagation in units of spatiotemporal cells, and based on the delay, the first deriving a state and the second state.

상기 교차로 신호 제어 변수는 소정 신호 주기 마다 최적화될 수 있다.The intersection signal control variable may be optimized for every predetermined signal period.

상기 네트워크 신호 제어 변수는 소정 시간 마다 최적화될 수 있다.The network signal control variable may be optimized every predetermined time.

본 발명의 실시예에 따르면, AI가 직접 신호제어변수를 조정함으로써 교통 패턴 변화에 AI가 능동적으로 대응할 수 있는 교통 신호 제어 장치 및 이를 활용한 방법이 제공된다. According to an embodiment of the present invention, there is provided a traffic signal control device and a method using the same, in which AI can actively respond to changes in traffic patterns by directly adjusting signal control variables by AI.

본 발명의 일 실시예에 따르면, 실제 현장에서 운영할 수 있도록 Dual-ring 기반의 현시 체계를 반영할 수 있는 교통 신호 제어 장치 및 이를 활용한 방법이 제공된다. According to an embodiment of the present invention, a traffic signal control device capable of reflecting a dual-ring-based display system so as to be operated in an actual field, and a method using the same are provided.

본 발명의 일 실시예에 따르면, 교차로 뿐만 아니라 네트워크 연동 최적화를 위한 교통류 모형 학습에 기초한 교통 신호 제어 장치 및 이를 활용한 방법이 제공된다. According to an embodiment of the present invention, there is provided a traffic signal control apparatus based on traffic flow model learning for optimizing network interworking as well as an intersection, and a method using the same.

본 발명의 일 실시예에 따르면, 주어진 교통 상황에서 현시, 주기 및 옵셋과 같은 신호 제어 변수 변화에 따른 교통 상황의 변화에 대한 강화 학습을 기반으로 하는 교통 신호 제어 장치 및 이를 활용한 방법이 제공된다. According to an embodiment of the present invention, there is provided a traffic signal control device and a method using the same, based on reinforcement learning on a change in traffic conditions according to changes in signal control variables such as appearance, period, and offset in a given traffic situation. .

이를 통해, 현재 직면한 교통 혼잡 문제를 해소하고, 최근 급격히 발전하고 있는 AI 기술을 활용한 도시부 교통류 최적화 관리 기술 확보를 통해 도시부 교통 혼잡을 완화시킬 수 있다.Through this, it is possible to solve the current traffic congestion problem and to alleviate urban traffic congestion by securing the urban traffic flow optimization management technology using AI technology, which is rapidly developing recently.

도 1은 본 발명의 일 실시예에 따른 교통 신호 제어 장치의 제어 블록도이다.
도 2는 AI 강화 학습 개념을 설명하기 위한 도면이다.
도 3은 본 발명의 일 예에 따른 교통 학습 모형부에 적용되는 모형을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 개별 교차로의 신호 제어 그룹을 설명한 도면이다.
도 5는 본 발명의 일 실시예에 따른 교통 신호 제어 방법을 설명하기 위한 제어 흐름도이다.
도 6은 본 발명의 실시예에 따른, 컴퓨팅 장치를 나타내는 도면이다. 1 is a control block diagram of a traffic signal control apparatus according to an embodiment of the present invention.
2 is a diagram for explaining the concept of AI reinforcement learning.
3 is a diagram for explaining a model applied to a traffic learning model unit according to an example of the present invention.
4 is a view for explaining a signal control group of an individual intersection according to an embodiment of the present invention.
5 is a control flowchart illustrating a traffic signal control method according to an embodiment of the present invention.
6 is a diagram illustrating a computing device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, with reference to the accompanying drawings, the embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본 명세서에서, 동일한 구성요소에 대해서 중복된 설명은 생략한다.In the present specification, duplicate descriptions of the same components will be omitted.

또한 본 명세서에서, 어떤 구성요소가 다른 구성요소에 '연결되어' 있다거나 '접속되어' 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에 본 명세서에서, 어떤 구성요소가 다른 구성요소에 '직접 연결되어' 있다거나 '직접 접속되어' 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.Also, in this specification, when it is said that a certain element is 'connected' or 'connected' to another element, it may be directly connected or connected to the other element, but other elements in the middle It should be understood that there may be On the other hand, in this specification, when it is mentioned that a certain element is 'directly connected' or 'directly connected' to another element, it should be understood that the other element does not exist in the middle.

또한, 본 명세서에서 사용되는 용어는 단지 특정한 실시예를 설명하기 위해 사용되는 것으로써, 본 발명을 한정하려는 의도로 사용되는 것이 아니다. In addition, the terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention.

또한 본 명세서에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. Also, in this specification, the singular expression may include the plural expression unless the context clearly dictates otherwise.

또한 본 명세서에서, '포함하다' 또는 '가지다' 등의 용어는 명세서에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품, 또는 이들을 조합한 것이 존재함을 지정하려는 것일 뿐, 하나 또는 그 이상의 다른 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 할 것이다.Also, in this specification, terms such as 'include' or 'have' are only intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, and one or more It is to be understood that the existence or addition of other features, numbers, steps, operations, components, parts, or combinations thereof, is not precluded in advance.

또한 본 명세서에서, '및/또는' 이라는 용어는 복수의 기재된 항목들의 조합 또는 복수의 기재된 항목들 중의 어느 항목을 포함한다. 본 명세서에서, 'A 또는 B'는, 'A', 'B', 또는 'A와 B 모두'를 포함할 수 있다.Also in this specification, the term 'and/or' includes a combination of a plurality of described items or any item of a plurality of described items. In this specification, 'A or B' may include 'A', 'B', or 'both A and B'.

또한 본 명세서에서, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략될 것이다.Also, in this specification, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted.

기존 신호 운영의 경우 축단위로 SA를 구분하여 축단위로 신호연동 계획을 수행하고 있으며, SA간 연계는 운영자의 경험에 의해 수행되고 있다. 연동 최적화를 위한 기존의 교통류 모형도 축단위로 최적화를 수행하고 있으며, 네트워크 연동 최적화를 위한 교통류 모형은 없는 실정이다. In the case of the existing signal operation, the SA is divided by axis and the signal interlocking plan is carried out in the axis unit, and the connection between SAs is performed based on the operator's experience. Existing traffic flow models for interlocking optimization are also performing axial optimization, and there is no traffic flow model for network interlocking optimization.

이에, 본 발명은 AI 기반 모델 예측 제어(Model Predictive Control) 네트워크 신호 최적화 기술을 적용하여 AI 강화 학습을 통해 주어진 교통 상황에서 신호제어 변수(현시, 주기, 옵셋) 변화에 따른 교통 상황의 변화를 학습하고 이를 통해최적 값에 근접한 네트워크 연동 최적화를 수행하는 것을 목적으로 한다. Accordingly, the present invention applies AI-based model predictive control (Model Predictive Control) network signal optimization technology to learn changes in traffic conditions according to changes in signal control variables (present, period, offset) in a given traffic situation through AI reinforcement learning. and through this, it aims to perform network interworking optimization close to the optimal value.

또한 본 발명은 교통 상황을 간단하고 현실에 가깝게 모사할 수 있는 Kinematic Wave 기반 MESO 모형을 적용해 신호 제어 변수 변화(현시, 주기, 옵셋)에 따른 교통 상황의 변화를 세밀하게 학습할 수 있고, 네트워크 신호 운영 관리를 통해 비슷한 교통류 특성을 보이는 교차로간 네트워크를 구성할 수 있다. In addition, the present invention applies a Kinematic Wave-based MESO model that can simulate the traffic situation in a simple and realistic way, so that it is possible to learn in detail the change of the traffic situation according to the signal control variable change (appearance, period, offset), and the network Through signal operation management, a network between intersections with similar traffic flow characteristics can be constructed.

이하, 도면을 참조하여 본 발명에 대한 상세히 기술한다. Hereinafter, the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 교통 신호 제어 장치의 제어 블록도이다. 1 is a control block diagram of a traffic signal control apparatus according to an embodiment of the present invention.

도시된 바와 같이, 본 실시예에 따른 교통 신호 제어 장치는 교통 학습 모형부(100), 교차로 학습부(200), 네트워크 학습부(300), 신호 제어부(400)를 포함하고, 교통 신호 제어 장치는 교통 신호 네트워크(500)에 신호 제어 변수를 출력한다.As shown, the traffic signal control device according to this embodiment includes a traffic learning model unit 100 , an intersection learning unit 200 , a network learning unit 300 , and a signal control unit 400 , and the traffic signal control device outputs a signal control variable to the traffic signal network 500 .

상술된 바와 같이, 교통 신호 제어 장치는 교통 상황 변화에 대응해 AI가 직접 교통 신호를 조정하는 것으로, 이때 듀얼 링(Dual-ring) 기반 현시체계를 반영해 현장 적용을 위한 제약 사항(주기, 배리어, 현시 순서)을 준수함으로써 현장에 설치되어 있는 표준 교통신호제어기, 즉 교통 신호 네트워크(500)에서 운영 가능하다. 이를 위하여 초기 신호 제어 변수(주기, 현시별 녹색시간, 옵셋) 및 수집/가공된 교통 정보를 Kinematic Wave 기반 MESO 모형, 즉 교통 학습 모형부(100)에 입력해 현재 교통 상태를 모사한 후, 신호 제어 변수(주기, 현시별 녹색시간, 옵셋) 조정 시 충격파 기반 메소스코픽 모형 따른 교통 상태의 변화를 AI가 학습해 교통 상태에 대응해 최적의 신호제어변수를 도출한다. AI 학습은 교차로 학습부(200), 네트워크 학습부(300)에서 수행되고, AI 학습에 의한 신호 제어 변수는 신호 제어부(400)로 출력되어, 교통 신호 네트워크(500)에 적용될 수 있다. As described above, the traffic signal control device directly adjusts traffic signals by AI in response to changes in traffic conditions. , display order), it is possible to operate in the standard traffic signal controller installed in the field, that is, the traffic signal network 500 . To this end, the initial signal control variables (period, green time, offset) and collected/processed traffic information are input into the Kinematic Wave-based MESO model, that is, the traffic learning model unit 100 to simulate the current traffic state, and then the signal When adjusting the control variables (period, green time, offset), the AI learns the change in traffic condition according to the shock wave-based mesoscopic model and derives the optimal signal control variable in response to the traffic condition. AI learning is performed by the intersection learning unit 200 and the network learning unit 300 , and signal control variables by AI learning are output to the signal control unit 400 , and may be applied to the traffic signal network 500 .

도 1의 구성 요소에 대한 설명에 앞서, 본 발명에 적용되는 AI 강화 학습을 설명하면 다음과 같다. 도 2는 AI 강화 학습 개념을 설명하기 위한 도면이다. Prior to the description of the components of FIG. 1, AI reinforcement learning applied to the present invention will be described as follows. 2 is a diagram for explaining the concept of AI reinforcement learning.

AI 강화 학습이란 현재 상태(State, 이하, 스테이트)에서 어떤 행동(Action, 이하, 액션)을 취하는 것이 최적인지를 학습하는 것으로, 행동을 취할 때 마다 보상(Reward)을 부여한다. 이를 수없이 반복해 부여되는 보상(Reward)을 최대화하는 방향으로 행동(Action)을 결정하도록 학습을 수행 한다. 소정의 주기에 따라 최대의 보상에 대한 액션이 결정되고, 이러한 액션이 또 다시 스태이트 설정에 반영되고, 다시 보상을 부여하는 방식으로 최적의 액션을 도출하는 알고리즘이 적용된다. AI reinforcement learning is to learn which action (Action, hereinafter, action) is optimal to take in the current state (State, hereinafter, state), and a reward is given whenever an action is taken. Learning is performed to determine the action in the direction of maximizing the reward given repeatedly over and over again. An action for the maximum reward is determined according to a predetermined period, this action is reflected in the state setting again, and an algorithm for deriving an optimal action is applied in such a way that a reward is given again.

본 발명에서는 일 예에 따라 다양한 강화 학습 알고리즘 중 하나인 DDPG(Deep Deterministic Policy Gradient)이 적용될 수 있다. DDPG 알고리즘은 off-policy, continuous actor-critic이라는 DPG(Deterministic Policy Gradient)알고리즘에 기반을 두고 있다. In the present invention, according to an example, DDPG (Deep Deterministic Policy Gradient), which is one of various reinforcement learning algorithms, may be applied. The DDPG algorithm is based on the Deterministic Policy Gradient (DPG) algorithm called off-policy, continuous actor-critic.

또한, 본 실시예에 따른 알고리즘은 좀 더 복잡한 학습을 수행하기 위하여 딥러닝 기법(DQN, Deep Q-Network)에서 제안한 소프트 업데이트(soft update) 및 배치 러닝(batch learning)을 적용하였으며, 액션 스페이스(Action Space)가 연속적인 환경에서 AI 강화 학습에 최적화된 모형이다. In addition, the algorithm according to this embodiment applies soft update and batch learning proposed by deep learning techniques (DQN, Deep Q-Network) to perform more complex learning, and the action space ( Action Space) is a model optimized for AI reinforcement learning in a continuous environment.

정리하면, 액션 스페이스가 연속적인 신호 제어 변수 조정 값을 직접 학습하기에 적합하기 때문에 본 발명에 따른 교통 신호 제어 장치는 AI에게 신호 제어 변수 조정을 학습시키기 위한 강화 학습 모형으로 DDPG 알고리즘을 채택하였다. In summary, since the action space is suitable for directly learning the continuous signal control variable adjustment value, the traffic signal control device according to the present invention adopts the DDPG algorithm as a reinforcement learning model for learning the signal control variable adjustment to AI.

본 발명에 따른 교통 신호 제어 장치에 적용되는 AI 기반 모델 예측 제어(Model Predictive Control) 네트워크 신호 최적화 기술은 개별교차로 신호최적화 알고리즘와 네트워크 옵셋 최적화 알고리즘을 구성될 수 있고, 이때 각각의 AI 에이전트(Agent) 들은 병렬로 현시 최적화를 수행할 수 있다. 상술된 DDPG 알고리즘 기반 AI 에이전트는 교차로 학습부(200), 네트워크 학습부(300)로 구현되고, 교통 상황 모사에 대한 모델은 교통 학습 모형부(100)로 구현될 수 있다. The AI-based model predictive control network signal optimization technology applied to the traffic signal control device according to the present invention may consist of an individual intersection signal optimization algorithm and a network offset optimization algorithm, wherein each AI agent is Appearance optimization can be performed in parallel. The above-described DDPG algorithm-based AI agent may be implemented by the intersection learning unit 200 and the network learning unit 300 , and the model for simulating the traffic situation may be implemented by the traffic learning model unit 100 .

교차로 학습부(200)은 제1 스테이트를 기반으로 배리어간 녹색 시간 조정 비율과 상충현시간 녹색 시간 조정 비율을 제1 액션으로 도출하고, 교차로 신호 제어 변수를 출력하고, 네트워크 학습부(300)는 제2 스테이트를 기반으로 교차로별 옵셋 조정 비율과 교차로별 옵셋을 제2 액션으로 도출하고, 네트워크 신호 제어 변수를 출력할 수 있다. The intersection learning unit 200 derives the green time adjustment ratio between barriers and the conflicting current time green time adjustment ratio as a first action based on the first state, and outputs the intersection signal control variable, and the network learning unit 300 Based on the second state, an offset adjustment ratio for each intersection and an offset for each intersection may be derived as a second action, and a network signal control variable may be output.

교차로 신호 제어 변수를 최적화 하기 위한 교차로 학습부(200)와 네트워크 신호 제어 변수를 최적화 하기 위한 네트워크 학습부(300)는 서로 병렬적으로 동작하며 각각 상이한 주기에 따라 신호 제어 변수를 최적화 한다. The intersection learning unit 200 for optimizing the intersection signal control variable and the network learning unit 300 for optimizing the network signal control variable operate in parallel with each other and optimize the signal control variable according to different cycles, respectively.

교통 학습 모형부(100)는 입력되는 이동류별 통과 교통량과 이동류별 초기 대기 행렬를 기반으로 교통 상황을 모사하여 제1 스테이트를 교차로 학습부(200)로 출력하고, 교통 상황을 모사하여 제2 스테이트를 네트워크 학습부(300)로 출력하고, 제1 액션 및 제2 액션을 기반으로 교통 상황을 다시 모사하여 제1 스테이트와 제2 스테이트를 갱신하는 모델링을 수행한다. The traffic learning model unit 100 simulates the traffic situation based on the input traffic volume for each movement flow and the initial queue for each movement flow, and outputs the first state to the intersection learning unit 200, and simulates the traffic situation to obtain the second state. It outputs to the network learning unit 300 and performs modeling in which the first state and the second state are updated by re-simulating the traffic situation based on the first action and the second action.

본 실시예예 따른 제1 스테이트는 현시이동류별 공간 점유률, 현시이동류별 녹색시간 비율, 교차로 평균 제어 지체 및 현시별 포화도 중 적어도 하나를 포함할 수 있고, 제2 스테이트는 제어 교차로 간 구간 평균 제어 지체 및 교차로별 옵셋을 포함할 수 있다. 또한, 교차로 신호 제어 변수는 현시별 녹색 시간를 포함하고, 네트워크 신호 제어 변수는 신호 변화 주기, 및 교차로 옵셋을 포함할 수 있다. The first state according to this embodiment may include at least one of a space occupancy rate for each current flow, a green time ratio for each current flow, an intersection average control delay, and a saturation level for each current flow, and the second state includes an average control delay between control intersections and It may include an offset for each intersection. In addition, the intersection signal control variable may include a green time for each appearance, and the network signal control variable may include a signal change period and an intersection offset.

즉, 교통 학습 모형부(100)는 입력되는 교통 정보를 기반으로 현시이동류별 공간 점유률, 현시이동류별 녹색시간 비율, 교차로 평균 제어 지체 및 현시별 포화도 등을 제1 스테이트로 도출할 수 있고, 이러한 제1 스테이트는 교차로 학습부(200)로 입력되고, 교차로 학습부(200)는 입력된 제1 스테이트의 학습을 통하여 제1 액션으로 배리어간 녹색 시간 조정 비율과 상충현시간 녹색 시간 조정 비율을 출력할 수 있다. 이러한 제1 액션을 기반으로 교차로 신호 제어 변수인 현시별 녹색 시간이 변경될 수 있고, 변경된 현시별 녹색 시간을 기반으로 교통 학습 모형부(100)는 다시 제1 스테이트를 갱신할 수 있다. 이러한 과정이 반복되면서 최적의 교차로 신호 제어 변수가 도출될 수 있다. That is, the traffic learning model unit 100 can derive the space occupancy rate for each current flow, the green time ratio for each current flow, the intersection average control delay and the saturation for each current as the first state, based on the input traffic information, as the first state. The first state is input to the intersection learning unit 200, and the intersection learning unit 200 outputs the green time adjustment ratio between barriers and the green time adjustment ratio for the conflicting current time as a first action through learning of the input first state. can do. Based on this first action, the green time for each appearance, which is an intersection signal control variable, may be changed, and the traffic learning model unit 100 may update the first state again based on the changed green time for each appearance. As this process is repeated, an optimal intersection signal control variable can be derived.

유사하게, 교통 학습 모형부(100)는 입력되는 교통 정보를 기반으로 제어 교차로 간 구간 평균 제어 지체 및 교차로별 옵셋 등을 제2 스테이트로 도출할 수 있고, 이러한 제2 스테이트는 네트워크 학습부(300)로 입력되고, 네트워크 학습부(300)는 입력된 제2 스테이트의 학습을 통하여 제2 액션으로 교차로별 옵셋 조정 비율과 교차로별 옵셋을 출력할 수 있다. 이러한 제2 액션을 기반으로 네트워크 신호 제어 변수인 신호 변화 주기, 및 교차로 옵셋이 변경될 수 있고, 변경된 신호 변화 주기, 및 교차로 옵셋을 기반으로 교통 학습 모형부(100)는 다시 제2 스테이트를 갱신할 수 있다. 이러한 과정이 반복되면서 최적의 네트워크 신호 제어 변수가 도출될 수 있다. Similarly, the traffic learning model unit 100 may derive the average control delay between control intersections and the offset for each intersection as a second state based on the input traffic information, and the second state is the network learning unit 300 ), and the network learning unit 300 may output an offset adjustment ratio for each intersection and an offset for each intersection as a second action through learning of the input second state. Based on this second action, the signal change period and intersection offset, which are network signal control variables, may be changed, and the traffic learning model unit 100 updates the second state again based on the changed signal change period and the intersection offset. can do. As this process is repeated, an optimal network signal control variable can be derived.

한편, 본 실시예예 따른 교통 신호 최적화를 위한 AI 강화 학습에 요구되는 정보는 이동류 교통량 및 이동류별 대기 행렬에 대한 정보를 포함할 수 있고, 이동류 교통량 및 이동류별 대기 행렬에 대한 정보는 교통 학습 모형부(100)로 입력되어 교통 상황을 모사하는 데 활용될 수 있다. Meanwhile, information required for AI reinforcement learning for optimizing traffic signals according to the present embodiment may include information on a moving flow traffic volume and a waiting queue for each moving flow, and information about a moving traffic volume and a waiting queue for each moving flow is traffic learning. It may be input to the model unit 100 and used to simulate a traffic situation.

이동류 교통량 및 이동류별 대기 행렬에 대한 정보는 최초에는 소정의 초기 값으로 설정되었다가 스테이트 및 액션의 반복적인 갱신으로 인하여 교통 신호 네트워크(500)에 신호 제어 변수가 적용되면, 교통 신호 네트워크(500)를 통하여 실질적으로 수집된 교통 정보를 기반으로 다시 업데이트 될 수 있다. Information on the moving flow traffic volume and the waiting queue for each moving flow is initially set to a predetermined initial value, and when a signal control variable is applied to the traffic signal network 500 due to repeated updates of states and actions, ) can be updated again based on the actually collected traffic information.

이동류별 교통량은 정지선을 통과하는 교통량으로 정의될 수 있고, 시간 가공 단위는 신호가 변경되는 소정의 주기(Cycle)로 설정될 수 있고, 공간 가공단위는 현시 이동류별(접근방향별, 직진/좌회전) 교차로 공간으로 설정될 수 있다. The amount of traffic per movement flow can be defined as the amount of traffic passing through the stop line, the time processing unit can be set to a predetermined cycle in which the signal is changed, and the spatial processing unit can be defined by the current movement flow (approach direction, straight/left turn). ) can be set as an intersection space.

또한, 이동류별 초기 대기행렬은 녹색시간에서 적색시간으로 전이 시점에 검지 영역에 있는 잔여교통량으로 정의될 수 있고, 시간 가공 단위 및 공간 가공 단위는 이동류별 교통량과 동일하게 설정될 수 있다.In addition, the initial queue for each flow may be defined as the amount of traffic remaining in the detection area at the time of transition from green time to red time, and the time processing unit and space processing unit may be set to be the same as the traffic volume for each movement flow.

한편, 일 예에 따라, 교통 학습 모형부(100)는 AI 에이전트에게 신호 제어 변수 변화에 따른 교통류 변화를 학습시키기 위한 교통류 모사 메소스코픽 모형인 Kinematic Wave 기반 MESO 모형이 적용될 수 있다. 이는 시공간적 셀 단위로 교통류 충격파 전파를 나타낸 모형인 Kinematic Wave Model (Daganzo, 1994)을 기반으로 도시부 교통 네트워크 모사에 적합하도록 개발한 모형이다. 이는 각 셀의 초기 밀도에 기초하여 각 셀의 밀도에 따라 수요와 공급 교통량을 결정하고, 각 셀 경계의 전이 교통량을 결정하고, 1초마다 셀 밀도 값을 업데이트하는 방식으로 시간과 공간에 따라 달라지는 밀도 값을 셀마다 부여하여 충격파의 전이를 도출할 수 있다. 하류 셀에서 받을 수 있는 교통량(공급)과 상류 셀에서 보내고자 하는 교통량(수요) 중 작은 값이 교통량으로 결정될 수 있고, 1초 전의 셀 밀도 값과 경계의 교통량에 의해 셀의 밀소 값이 결정된다. 이러한 밀도 값에 의하여 도로 상의 지체 정도가 도출될 수 있다. Meanwhile, according to an example, the traffic learning model unit 100 may apply a Kinematic Wave-based MESO model that is a traffic flow simulation mesoscopic model for learning the traffic flow change according to the signal control variable change to the AI agent. This is a model developed to be suitable for simulating urban traffic networks based on the Kinematic Wave Model (Daganzo, 1994), which is a model representing the propagation of traffic shock waves in units of space and time cells. Based on the initial density of each cell, it determines the supply and demand traffic according to the density of each cell, determines the transition traffic at each cell boundary, updates the cell density value every second, and so on. By assigning a density value to each cell, the transition of the shock wave can be derived. The smaller of the traffic volume that can be received from the downstream cell (supply) and the amount of traffic that the upstream cell wants to send (demand) can be determined as the traffic volume, and the density value of the cell is determined by the cell density value 1 second before and the traffic volume at the boundary . The degree of delay on the road may be derived from this density value.

본 실시예에 따른 Kinematic Wave 기반 MESO 모형은 도시부 교차로 기하구조에 대한 특징 및 다양한 현시 체계를 반영하여 이를 통해 시공간 셀을 이용한 지체를 산정할 수 있다. The Kinematic Wave-based MESO model according to this embodiment reflects the characteristics of the urban intersection geometry and various display systems, and through this, it is possible to calculate the delay using the space-time cell.

도시부 교통류에서는 하나의 이동류의 지체가 다른 인접한 이동류 소통상황에 영향을 미칠 수 있다. 예컨대, 좌회전 교통류에 지체가 발생해 대기 행렬이 좌회전 차로를 넘어서는 경우 직진 이동류도 영향을 받아 지체가 발생할 수 있다. 이와 같은 상황을 기존 교통류 모형에서 모사할 경우 단순히 직진의 지체가 악화된 것으로 판단하지만, 본 실시예예 따른 모형에서는 좌회전 과포화로 인한 교통류 악화로 판단할 수 있다. In urban traffic flow, a delay in one flow may affect the traffic situation of another adjacent flow. For example, if a delay occurs in the left-turning flow and the queue crosses the left-turning lane, the straight-through flow may also be affected and delay may occur. When such a situation is simulated in the existing traffic flow model, it is determined that the delay in going straight is simply aggravated, but in the model according to this embodiment, it can be determined that the traffic flow is worsening due to oversaturation of the left turn.

본 실시예에 따른 교통 학습 모형부(100)는 교차로 기하구조에 따라 2가지 상황을 구분하고 있으며, 이는 도 3에 도시되어 있다. The traffic learning model unit 100 according to this embodiment classifies two situations according to the intersection geometry, which is illustrated in FIG. 3 .

도 3은 본 발명의 일 예에 따른 교통 학습 모형부에 적용되는 모형을 설명하기 위한 도면이다. 3 is a diagram for explaining a model applied to a traffic learning model unit according to an example of the present invention.

도 3의 (a)는 " First In First Out" 구조를 나타낸 것으로 직진 이동류 차로수가 1차로인 경우, 직진 한차로가 막히면 양쪽 모두 빠져나갈 수 없는 기하구조 형태를 나타내고 있으며, 이러한 기하 구조를 반영하기 위하여 교통 학습 모형부(100)는 Daganzo(1993) 분류 모형(Combining Downstream Supply)을 사용하였다. Fig. 3 (a) shows the "First In First Out" structure, and when the number of lanes for the straight-through flow is one, it shows a geometrical structure in which both sides cannot escape if the straight-forward one-lane is blocked, reflecting this geometry In order to do this, the traffic learning model unit 100 used the Daganzo (1993) classification model (Combining Downstream Supply).

또한, 도 3의 (b)는 " Non First In First Out" 구조를 나타낸 것으로, 직진 이동류가 다차로인 경우 직진 한차로가 막히면, 직진의 용량은 감소하지만 통과할 수 있는 기하구조 형태를 나타내고 있으며, 이러한 기하 구조를 반영하기 위하여 교통 학습 모형부(100) Lebacque(1996) 분류 모형(Splitting Upstream Demand)을 사용할 수 있다. In addition, (b) of FIG. 3 shows the "Non First In First Out" structure, and when the straight moving flow is a multi-lane road, when the straight one-lane road is blocked, the capacity of going straight is reduced, but it shows a shape of a geometric structure that can pass In order to reflect this geometric structure, the traffic learning model unit 100 and Lebacque (1996) classification model (Splitting Upstream Demand) can be used.

또한, 교통 학습 모형부(100)는 선직진, 선좌회전, 동시 신호, 중첩 현시 등과 같은 다양한 현시 체계를 반영할 수 있고, 초기 밀도보다 큰 밀도를 갖는 셀들의 밀도값의 합으로 총 지체를 도출할 수 있다. 지체를 도출하기 위하여 다양한 도로의 다양한 기하 구조를 반영할 수 있다. 예컨대, 직진 전용구간, 좌회전 전용구간, 직진 좌회전 혼용구간으로 구분해 총 지체 산정할 수 있고, 혼용 구간은 직진, 좌회전 교통량 비율로 가중치를 주어 반영할 수 있다. 이 때, 직진, 좌회전 총 지체를 각각 직진, 좌회전 차량수로 나누어 평균함으로써 지체를 산정할 수 있다. In addition, the traffic learning model unit 100 can reflect various display systems such as straight line straight, line left turn, simultaneous signal, overlapping display, etc., and derives the total delay by the sum of the density values of cells having a density greater than the initial density. can do. In order to derive the delay, it is possible to reflect the various geometries of various roads. For example, the total delay can be calculated by dividing the straight-only section, the left-turn only section, and the straight-left mixed section, and the mixed section can be reflected by giving weight to the ratio of the straight-line and left-turn traffic. At this time, the delay can be calculated by dividing the total delay in going straight and turning left by the number of vehicles going straight and turning left, respectively, and averaging.

AI 에이전트인 교차로 학습부(200)는 교차로 당 1개가 배치될 수 있고, 개별 교차로 현시별 녹색시간 조정을 수행한다. 이는 배리어 경계 및 상충 현시 경계를 조정하는 것으로 소정의 주기, 예컨대 기본 3주기 별로 개별 교차로 현시 최적화를 수행할 수 있다. One intersection learning unit 200, which is an AI agent, may be disposed per intersection, and performs green time adjustment for each intersection display. This is by adjusting the barrier boundary and the conflicting display boundary, and individual intersection display optimization can be performed for each predetermined period, for example, three basic periods.

도 4는 본 발명의 일 실시예에 따른 개별 교차로의 신호 제어 변수를 설명한 도면이다.4 is a diagram illustrating signal control parameters of individual intersections according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 하나의 주기 동안 두 개의 배리어 그룹에 대한 신호가 변경될 수 있다. 교차로 학습부(200)는 두 개의 배리어 간의 시간 경계(Ⅰ)를 조정하고, 하나의 링베이어 그룹 내에서 상충현시 시간 경계(Ⅱ)를 조정하는 역할을 수행한다. As shown in FIG. 4 , signals for two barrier groups may be changed during one period. The intersection learning unit 200 serves to adjust the time boundary (I) between the two barriers, and to adjust the time boundary (II) in case of conflict within one ring bayer group.

교차로 학습부(200)가 AI 강화 학습을 수행하기 위한 스테이트, 액션 및 보상은 다음과 같다. The states, actions, and rewards for the intersection learning unit 200 to perform AI reinforcement learning are as follows.

스테이트는 제어대상의 신호제어변수 조정에 따른 교통상태를 대표할 수 있는 지표를 의미하고, 현시별 공간점유율 및 녹색시간 비율을 스테이트 지표로 설정할 수 있다. 예컨대, 4지교차로 기준 16개 스테이트 지표가 존재할 수 있다. The state means an indicator that can represent the traffic condition according to the adjustment of the signal control variable of the control target, and the space occupancy and green time ratio for each state can be set as state indicators. For example, there may be 16 state indicators based on 4 intersections.

액션은 제어 대상의 신호제어변수 조정을 위한 행동으로 듀얼 링(Dual-ring) 기반의 현시조정을 위해 배리어별 녹색시간 조정비율과 상충현시간 녹색시간 조정비율로 설정될 수 있다. 예컨대, 4지교차로 기준으로 배리어간 1개, 상충현시간 4개와 같은 5개의 액션이 존재할 수 있다. The action is an action for adjusting the signal control variable of the control target, and can be set as a green time adjustment ratio for each barrier and a conflicting current time green time adjustment ratio for dual-ring-based appearance adjustment. For example, there may be 5 actions such as 1 between barriers and 4 during conflict based on 4 intersections.

보상은 액션에 따른 결과를 평가하기 위한 보상값으로 개별 교차로 지체 증감 여부에 따라 결정될 수 있다. 예를 들어, 교차로 지체 감소 시 +1의 보상을 수행하고, 교차로 지체 증가 시 보상은 0으로 설정될 수 있다. The compensation is a compensation value for evaluating the result according to the action, and may be determined depending on whether the delay at each intersection increases or decreases. For example, compensation of +1 may be performed when intersection delay is reduced, and compensation may be set to 0 when intersection delay increases.

또 다른 AI 에이전트인 네트워크 학습부(300)는 제어 네트워크 당 1개가 배치될 수 있고, 네트워크를 구성하는 교차료 간 옵셋 조정을 수행한다. 이는 복수의 교차로를 통과하는 차량이 정차 없이 교차로를 통과할 수 있도록 신호를 조정하는 것을 의미하여, 소정의 주기, 예컨대 기본 1시간 단위 주기로 개별 네트워크 옵셋 최적화를 수행할 수 있다. Another AI agent, the network learning unit 300 , may be arranged one per control network, and performs offset adjustment between intersections constituting the network. This means that a signal is adjusted so that a vehicle passing through a plurality of intersections can pass through the intersection without stopping, so that individual network offset optimization can be performed at a predetermined period, for example, a basic period of one hour.

네트워크 학습부(300)가 AI 강화 학습을 수행하기 위한 스테이트, 액션 및 보상은 다음과 같다. The states, actions, and rewards for the network learning unit 300 to perform AI reinforcement learning are as follows.

스테이트는 제어대상의 신호제어변수 조정에 따른 교통상태를 대표할 수 있는 지표를 의미하고, 제어교차로간 양방향 구간 편균제어 지체, 제어교차로 간 상대 옵셋으로 설정할 수 있다. 예컨대, 3

1 네트워크 기준으로 4개 스테이트 지표가 존재할 수 있고, 이 때 스테이트는 제어교차로 간 구간 평균제어지체(양방향 평균) 2개 및 제어교차로 간 상대옵셋 2개로 구성될 수 있다. The state means an indicator that can represent the traffic condition according to the adjustment of the signal control variable of the control target, and can be set as the average control delay between control intersections in both directions and the relative offset between the control intersections. For example, 3

There may be four state indicators based on one network, and in this case, the state may consist of two average control delays (two-way average) between control intersections and two relative offsets between control intersections.

액션은 제어 대상의 신호제어변수 조정을 위한 행동으로 제어 교차로별 옵셋 조정 비율로 설정될 수 있다. 예컨대, 제어 네트워크의 중심 교차로는 옵셋을 0으로 고정하고 3

2 네트워크 기준으로 5개 액션이 존재할 수 있다. The action is an action for adjusting the signal control variable of the control target and may be set as an offset adjustment ratio for each control intersection. For example, the central intersection of the control network has an offset of 0 and 3

2 There may be 5 actions based on the network.

보상은 액션에 따른 결과를 평가하기 위한 보상값으로 네트워크 지체 증감 여부에 따라 결정될 수 있다. 예를 들어, 네트워크 지체 감소 시 +1의 보상을 수행하고, 네트워크 지체 증가 시 보상은 0으로 설정될 수 있다. The reward is a reward value for evaluating the result according to the action, and may be determined depending on whether the network delay increases or decreases. For example, when network delay decreases, compensation of +1 may be performed, and when network delay increases, compensation may be set to 0.

신호 제어부(400)는 교차로 신호 제어 변수 및 네트워크 신호 제어 변수를 수신하고, 교차로 신호 제어 변수 및 네트워크 신호 제어 변수를 교통 신호 네트워크(500)에 실질적으로 적용할 수 있다. The signal control unit 400 may receive the intersection signal control variable and the network signal control variable, and substantially apply the intersection signal control variable and the network signal control variable to the traffic signal network 500 .

도 5는 본 발명의 일 실시예에 따른 교통 신호 제어 방법을 설명하기 위한 제어 흐름도이다. 도 5를 참조하여 본 실시예에 따른 교통 신호 제어 방법을 정리하면 다음과 같다. 5 is a control flowchart illustrating a traffic signal control method according to an embodiment of the present invention. The traffic signal control method according to the present embodiment is summarized as follows with reference to FIG. 5 .

상술된 바와 같이, AI 에이전트인 교차로 학습부(200)와 네트워크 학습부(300)는 교통 학습 모형부(100)와 병렬적으로 동작되며, 최적화 하고자 하는 신호 제어 변수가 다르기 때문에 스테이트 및 액션은 다르지만, 동작을 위한 알고리즘은 동일하다. 도 5에서는 교차로 학습부(200)에 관련된 제1 스테이트와 제1 액션을 중심으로 설명되며, 제1 스테이트를 제2 스테이트로, 제1 액션을 제2 액션으로 변경하면 네트워크 학습부(300)에 의한 AI 강화 학습 과정이 도출될 수 있다. As described above, the intersection learning unit 200 and the network learning unit 300, which are AI agents, operate in parallel with the traffic learning model unit 100, and since the signal control variables to be optimized are different, the states and actions are different, but , the algorithm for the operation is the same. In FIG. 5 , the first state and the first action related to the intersection learning unit 200 are mainly described, and when the first state is changed to the second state and the first action is changed to the second action, the network learning unit 300 is AI reinforcement learning process can be derived.

우선, 교통 신호 제어 장치는 교통 상태 정보를 수집 및 가공하여 교통 학습 모형부(100)가 활용할 수 있는 정보로 변환할 수 있다(S510).First, the traffic signal control device may collect and process traffic state information and convert it into information that the traffic learning model unit 100 can utilize ( S510 ).

교통 학습 모형부(100)는 이동류별 통과 교통량과 이동류별 초기 대기 행렬을 기반으로 제1 스테이트(i)와 보상 산출을 위한 교차로 총지체(i)를 도출할 수 있다(S520). The traffic learning model unit 100 may derive the first state (i) and the intersection total delay (i) for compensation calculation based on the amount of passing traffic for each flow and the initial queue for each flow ( S520 ).

상술된 바와 같이 제1 스테이트는 현시이동류별 공간 점유률, 현시이동류별 녹색시간 비율, 교차로 평균 제어 지체 및 현시별 포화도를 포함할 수 있다. As described above, the first state may include a space occupancy rate for each current flow, a green time ratio for each current flow, an average intersection control delay, and a saturation level for each current.

제1 스테이트(i)를 수신한 교차로 학습부(200)는 제1 스테이트(i)를 기반으로 제1 액션(i)을 도출할 수 있다(S530).Upon receiving the first state (i), the intersection learning unit 200 may derive a first action (i) based on the first state (i) ( S530 ).

제1 액션은 배리어간 녹색 시간 조정 비율과 상충현시간 녹색 시간 조정 비율일 수 있다. The first action may be an inter-barrier green time adjustment ratio and a conflicting current time green time adjustment ratio.

제1 액션(i)이 다시 교통 학습 모형부(100)로 입력되면, 교통 학습 모형부(100)는 제1 액션(i)이 반영된 신호 제어 변수 조정을 기반으로 하여 제1 스테이트(i+1) 및 교차로 총지체(i+1)를 도출할 수 있다(S540). 즉, 교차로 학습부(200)는 제1 액션(i)을 기반으로 제1 스테이트(i)를 제1 스테이트(i+1)로 갱신할 수 있다. 이는 제1 액션(i)을 기반으로 교통 상황을 새롭게 모사한 것을 의미한다. 교통 학습 모형부(100)는 기존의 제1 스테이트(i)에 따른 교차로 총지제(i)와 제1 스테이트(i+1)에 따른 교차로 총지제(i+1)를 비교하여 제1 액션(i)에 대한 보상(i)을 도출할 수 있다. 즉, 제1 액션(i)에 의하여 업데이트된 교차로 총지제(i+1)이 교차로 총지제(i)보다 감소된 것으로 판단되면, 즉 교차로 지체가 감소되면 +1의 보상(i)이 부여된다(S550).When the first action (i) is again input to the traffic learning model unit 100, the traffic learning model unit 100 is based on the signal control variable adjustment in which the first action (i) is reflected, the first state (i+1) ) and the intersection total delay (i+1) can be derived (S540). That is, the intersection learning unit 200 may update the first state (i) to the first state (i+1) based on the first action (i). This means that the traffic situation is newly simulated based on the first action (i). The traffic learning model unit 100 compares the existing intersection total occupancy system (i) according to the first state (i) and the intersection general occupancy system (i+1) according to the first state (i+1) to perform a first action ( A reward (i) for i) can be derived. That is, if it is determined that the intersection total zoning system (i+1) updated by the first action (i) is reduced compared to the intersection general zoning system (i), that is, if the intersection delay is reduced, a reward (i) of +1 is given. (S550).

제1 스테이트(i), 제1 스테이트(i+1), 제1 액션(i), 및 보상(i)를 저장 및 학습하여 업데이트된 AI 에이전트, 즉 교차로 학습부(200)는 제1 스테이트(i+1)에 대응하는 제1 액션(i+1)을 도출할 수 있다(S560).The AI agent updated by storing and learning the first state (i), the first state (i+1), the first action (i), and the reward (i), that is, the intersection learning unit 200, is the first state ( A first action (i+1) corresponding to i+1) may be derived (S560).

S520 내지 S560의 과정은 기설정된 횟수 또는 소정 주기 동안 반복될 수 있고, 하나의 사이클이 완료되면 소정 주기 동안 반복되었는지 여부가 판단될 수 있다(S570).The processes of S520 to S560 may be repeated a preset number of times or for a predetermined period, and when one cycle is completed, it may be determined whether the process is repeated for a predetermined period (S570).

만약, 상기 사이클이 소정 주기 동안 반복되지 않았으면 i는 i+1로 갱신되어 다시 S520 내지 S560의 과정을 반복하고(S580), 소정 주기 동안 반복되었으면 교차로 학습부(200)는 보상을 기반으로 최적의 신호 제어 변수를 출력할 수 있다. If the cycle is not repeated for a predetermined period, i is updated to i+1 and the processes of S520 to S560 are repeated again (S580). If the cycle is repeated for a predetermined period, the intersection learning unit 200 is optimized based on compensation of signal control variables can be output.

이러한 최적의 신호 제어 변수는 교통 신호 네트워크(500)에 적용될 수 있다(S590).These optimal signal control parameters may be applied to the traffic signal network 500 (S590).

실제 교통 신호 네트워크(500)에 적용된 신호 제어 변수는 일정한 주기 동안 적용되고, 그 적용 결과에 대한 새로운 신호 제어 변수들이 다시 새로운 교통 상태 정보로 수집될 수 있다. The signal control variables applied to the actual traffic signal network 500 are applied for a certain period, and new signal control variables for the application result may be collected again as new traffic state information.

교통 신호 제어 장치는 새롭게 수집된 데이터를 기반으로 도 5의 과정을 반복할 수 있다. The traffic signal control apparatus may repeat the process of FIG. 5 based on the newly collected data.

이와 같이, 본 실시예에 따른 교통 신호 제어 장치 및 이를 활용한 방법을 교통 패턴 변화가 많은 국내 도시부 교통신호 네트워크에 적용하여, 패턴 변화에 대응할 수 있고, 기존 신호 제어 시스템의 축단위 제어 체계에서 벗어나, 네트워크 단위에 대한 교통류 관리를 수행함으로써 네트워크 특성에 부합하는 실시간 신호운영을 수행할 수 있다. As described above, by applying the traffic signal control apparatus and the method using the same according to the present embodiment to the domestic urban traffic signal network with many traffic pattern changes, it is possible to respond to the pattern change, and in the axial unit control system of the existing signal control system. In addition, by performing traffic flow management for a network unit, real-time signal operation that conforms to network characteristics can be performed.

또한, AI가 교통패턴 변화에 대응해 직접 신호 제어 변수(주기, 현시별 녹색 시간, 옵셋)을 조정해 실시간 교통 신호 운영을 수행할 수 있으므로, AI 기술을 교통패턴변화에 능동적으로 활용할 수 있다. 또한 기존 기술과 달리 현장 적용을 위한 제약 사항(현시순서, 주기, 배리어)을 준수하여 실질적으로 교통 현장에 적용할 수 있다.In addition, since AI can perform real-time traffic signal operation by directly adjusting signal control variables (period, green time, offset) in response to changes in traffic patterns, AI technology can be actively used to change traffic patterns. In addition, unlike existing technologies, it can be applied to traffic sites in practice by observing the restrictions (order of appearance, period, barriers) for field application.

도 6은 본 발명의 실시예에 따른, 컴퓨팅 장치를 나타내는 도면이다. 도 6의 컴퓨팅 장치(TN100)는 본 명세서에서 기술된 장치(예, 교통 신호 제어 장치, 교차로 학습부, 네트워크 학습부 또는 교통 학습 모형부 등) 일 수 있다. 6 is a diagram illustrating a computing device according to an embodiment of the present invention. The computing device TN100 of FIG. 6 may be a device described herein (eg, a traffic signal control device, an intersection learning unit, a network learning unit, or a traffic learning model unit, etc.).

도 6의 실시예에서, 컴퓨팅 장치(TN100)는 적어도 하나의 프로세서(TN110), 송수신 장치(TN120), 및 메모리(TN130)를 포함할 수 있다. 또한, 컴퓨팅 장치(TN100)는 저장 장치(TN140), 입력 인터페이스 장치(TN150), 출력 인터페이스 장치(TN160) 등을 더 포함할 수 있다. 컴퓨팅 장치(TN100)에 포함된 구성 요소들은 버스(bus)(TN170)에 의해 연결되어 서로 통신을 수행할 수 있다.In the embodiment of FIG. 6 , the computing device TN100 may include at least one processor TN110 , a transceiver device TN120 , and a memory TN130 . In addition, the computing device TN100 may further include a storage device TN140 , an input interface device TN150 , an output interface device TN160 , and the like. Components included in the computing device TN100 may be connected by a bus TN170 to communicate with each other.

프로세서(TN110)는 메모리(TN130) 및 저장 장치(TN140) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. 프로세서(TN110)는 중앙 처리 장치(CPU: central processing unit), 그래픽 처리 장치(GPU: graphics processing unit), 또는 본 발명의 실시예에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. 프로세서(TN110)는 본 발명의 실시예와 관련하여 기술된 절차, 기능, 및 방법 등을 구현하도록 구성될 수 있다. 프로세서(TN110)는 컴퓨팅 장치(TN100)의 각 구성 요소를 제어할 수 있다.The processor TN110 may execute a program command stored in at least one of the memory TN130 and the storage device TN140. The processor TN110 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to an embodiment of the present invention are performed. The processor TN110 may be configured to implement procedures, functions, and methods described in connection with an embodiment of the present invention. The processor TN110 may control each component of the computing device TN100.

메모리(TN130) 및 저장 장치(TN140) 각각은 프로세서(TN110)의 동작과 관련된 다양한 정보를 저장할 수 있다. 메모리(TN130) 및 저장 장치(TN140) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(TN130)는 읽기 전용 메모리(ROM: read only memory) 및 랜덤 액세스 메모리(RAM: random access memory) 중에서 적어도 하나로 구성될 수 있다. Each of the memory TN130 and the storage device TN140 may store various information related to the operation of the processor TN110. Each of the memory TN130 and the storage device TN140 may be configured as at least one of a volatile storage medium and a nonvolatile storage medium. For example, the memory TN130 may include at least one of a read only memory (ROM) and a random access memory (RAM).

송수신 장치(TN120)는 유선 신호 또는 무선 신호를 송신 또는 수신할 수 있다. 송수신 장치(TN120)는 네트워크에 연결되어 통신을 수행할 수 있다. The transceiver TN120 may transmit or receive a wired signal or a wireless signal. The transceiver TN120 may be connected to a network to perform communication.

한편, 본 발명의 실시예는 지금까지 설명한 장치 및/또는 방법을 통해서만 구현되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있으며, 이러한 구현은 상술한 실시예의 기재로부터 본 발명이 속하는 기술 분야의 통상의 기술자라면 쉽게 구현할 수 있는 것이다. On the other hand, the embodiment of the present invention is not implemented only through the apparatus and/or method described so far, and a program for realizing a function corresponding to the configuration of the embodiment of the present invention or a recording medium in which the program is recorded may be implemented. And, such an implementation can be easily implemented by those skilled in the art from the description of the above-described embodiment.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 통상의 기술자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements by those skilled in the art using the basic concept of the present invention as defined in the following claims are also presented. It belongs to the scope of the invention.

Claims

A traffic signal control device comprising:
an intersection learning unit for deriving a green time adjustment ratio between barriers and a green time adjustment ratio for conflicting current time as a first action based on the first state, and outputting an intersection signal control variable;
a network learning unit that derives an offset adjustment ratio for each intersection and an offset for each intersection as a second action based on the second state, and outputs a network signal control variable;
The first state is output to the intersection learning unit by simulating the traffic situation based on the input passing traffic volume for each moving flow and the initial waiting matrix for each moving flow, and the second state is output to the network learning unit, and the first action and the a traffic learning model unit for re-simulating the traffic situation based on a second action and updating the first state and the second state;
and a signal controller for receiving the intersection signal control variable and the network signal control variable, and applying the intersection signal control variable and the network signal control variable to a traffic signal network.

According to claim 1,
The first state includes at least one of a space occupancy rate for each current flow, a green time ratio for each current flow, an intersection average control delay, and a saturation level for each current,
The second state includes a section average control delay between control intersections and an offset for each intersection,
The intersection signal control variable includes a green time for each time,
The network signal control variable includes a signal change period and an intersection offset.

According to claim 1,
The intersection learning unit and the network learning unit traffic signal control device, characterized in that it comprises a DDPG (Deep Deterministic Policy Gradient) algorithm for performing continuous action space learning.

4. The method of claim 3,
The traffic learning model unit,
Compensating for the first action when the intersection delay is reduced due to the first action,
and compensating for the second action when network delay is reduced due to the second action.

3. The method of claim 2,
The traffic learning model unit derives the delay of the spatiotemporal cell based on the cell propagation model representing the traffic flow shock wave propagation in units of spatiotemporal cells,
The traffic signal control apparatus of claim 1, wherein the first state and the second state are derived based on the delay.

According to claim 1,
The intersection signal control variable is optimized for every predetermined signal period,
The traffic signal control device, characterized in that the network signal control variable is optimized every predetermined time.

A traffic signal control method by a traffic signal control device including an intersection learning unit and a network learning unit, the method comprising:
outputting a first state to the intersection learning unit by simulating a traffic situation based on the input traffic volume for each moving flow and an initial waiting matrix for each moving flow;
learning a green time adjustment ratio between barriers and a conflicting current time green time adjustment ratio as a first action based on the first state;
updating the first state by re-simulating the traffic situation based on the first action;
compensating for the first action when the intersection delay is reduced due to the first action;
and deriving an optimal intersection signal control variable based on the first state, the first action, and the reward, and applying the derived intersection signal control variable to a traffic signal network. Way.

8. The method of claim 7,
outputting a second state to the network learning unit by simulating a traffic situation based on the input passing traffic for each flow and an initial waiting matrix for each flow;
learning an offset adjustment ratio for each intersection and an offset for each intersection as a second action based on the second state;
updating the second state by re-simulating the traffic situation based on the second action;
compensating for the second action when network delay is reduced due to the second action;
The method further comprising the step of deriving an optimal network signal control variable based on the second state, the second action, and the reward, and applying the derived network signal control variable to a traffic signal network. control method.

9. The method of claim 8,
The first state includes at least one of a space occupancy rate for each current flow, a green time ratio for each current flow, an average intersection control delay, and a saturation level for each current,
The second state includes a section average control delay between control intersections and an offset for each intersection,
The intersection signal control variable includes a green time for each display,
The traffic signal control method according to claim 1, wherein the network signal control variable includes a signal change period and an intersection offset.

Memory; and
A processor for controlling the memory,
The processor is
The first state is output by simulating the traffic situation based on the input traffic volume for each moving flow and the initial queue for each moving flow, and based on the first state, the green time adjustment ratio between barriers and the green time adjustment ratio for the conflicting current time are calculated based on the first state. Learning with 1 action, re-simulating the traffic situation based on the first action, and updating the first state
controller.