KR102479484B1

KR102479484B1 - System and Method for Improving Traffic for Autonomous Vehicles at Non Signalized Intersections

Info

Publication number: KR102479484B1
Application number: KR1020210004703A
Authority: KR
Inventors: 배상훈
Original assignee: 부경대학교 산학협력단; 에스에이엠(주)
Priority date: 2021-01-13
Filing date: 2021-01-13
Publication date: 2022-12-22
Also published as: KR20220102694A

Abstract

본 발명은 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법에 관한 것으로, 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 알고리즘의 학습 상태를 초기화하는 상태 초기화부;자율주행차량 운행 최적화를 위한 POMDP 모델에 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 적용하여 최적 행동을 도출하는 최적 행동 도출부;최적 행동 도출부에서 도출된 최적 행동을 실행하는 행동 실행부;자율주행차량의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰부;안전, 비안전, 실패, 목표 보상 인지에 따라 자율주행 차량용 RSS 알고리즘과 인간 운전 차량용 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 기반한 자율주행 차량 행동 수정을 하는 보상 결정부;를 포함하는 것이다.The present invention relates to an apparatus and method for improved passage of autonomous vehicles at unsignalized intersections, which enables efficient passage by utilizing the safety theory of responsibility sensitivity and a partial observation Markov decision procedure at unsignalized intersections. State initialization unit that initializes the learning state of the decision (Partial Observability Markov decision process; POMDP) algorithm; derives optimal behavior by applying Responsibility-Sensitive Safety (RSS) to the POMDP model for optimizing autonomous vehicle operation An optimal action derivation unit that performs; An action execution unit that executes the optimal behavior derived from the optimal behavior derivation unit; A state observation unit that receives data observed from the vision sensor and radar sensor of the autonomous vehicle and observes the driving state; Safety, non-safety , failure, compensation determination unit that corrects the autonomous vehicle behavior based on the RSS algorithm for autonomous vehicles and the adaptive model predictive control system for human-driven vehicles (Adaptive Model Predictive Control System) according to recognition of target compensation;

Description

System and Method for Improving Traffic for Autonomous Vehicles at Non Signalized Intersections

본 발명은 자율주행 차량 통행 제어에 관한 것으로, 구체적으로 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법에 관한 것이다.The present invention relates to autonomous vehicle traffic control, and specifically, for improved passage of autonomous vehicles at unsignalized intersections that enables efficient passage by utilizing the safety theory of responsibility sensitivity and a partially observed Markov decision procedure at unsignalized intersections. It relates to an apparatus and method.

자율 주행 차량(Autonomous Vehicle)은 카메라 또는 전방물체 감지센서를 이용하여 차선을 인식하고 자동 조향을 행하는 기술이 탑재된 차량이다. 자율 주행 차량은 카메라의 이미지 프로세싱 또는 전방물체 감지센싱을 기반으로 차선 폭, 차선상의 차량의 횡방향 위치, 양측 차선까지의 거리 및 차선의 형태, 도로의 곡률 반경이 측정되며, 이와 같이 얻어진 차량의 위치와 도로의 정보를 사용하여 차량의 주행 궤적을 추정하고, 추정된 주행 궤적을 따라 차선을 변경한다.An autonomous vehicle is a vehicle equipped with a technology for recognizing a lane and automatically steering using a camera or a front object detection sensor. Autonomous vehicles measure the lane width, the lateral position of the vehicle on the lane, the distance to both lanes, the shape of the lane, and the radius of curvature of the road based on image processing of the camera or front object detection. The driving trajectory of the vehicle is estimated using location and road information, and the lane is changed according to the estimated driving trajectory.

자율 주행 차량(Autonomous Vehicle)은 차량 전방에 장착된 카메라 또는 전방물체 감지센서에서 검출되는 선행차량의 위치 및 거리를 통하여 차량의 쓰로틀밸브, 브레이크 및 변속기를 자동 제어하여 적절한 가감속을 수행함으로써, 선행차량과 적정거리를 유지하도록할 수도 있다.An autonomous vehicle automatically controls the throttle valve, brake, and transmission of the vehicle through the location and distance of the preceding vehicle detected by a camera mounted on the front of the vehicle or a front object detection sensor to perform appropriate acceleration and deceleration, You can also ask them to keep an appropriate distance from the vehicle.

그러나 이와 같은 자율 주행 차량(Autonomous Vehicle)이 교차로를 통과하는 경우에는 신호등의 교통신호에 따라 정차 후 출발시 선행 차량의 움직임을 감지한 다음 출발하므로 차량들 간의 출발이 지체되어 교차로에서 정체가 발생될 수 있다.However, when such an autonomous vehicle passes through an intersection, it starts after stopping according to the traffic signal of the traffic light and detects the movement of the preceding vehicle before starting. can

특히, 자율주행 차량과 같이 센서로부터 입력되는 정보를 이용하여 주행 환경을 파악하는 경우 비신호 교차로에서의 주행은 일반적인 도로에서의 주행보다 훨씬 어려운 과제가 된다.In particular, when the driving environment is grasped using information input from a sensor, such as an autonomous vehicle, driving at an unsignalized intersection becomes a much more difficult task than driving on a general road.

자율주행 차량이 주행 환경을 파악하여 비신호 교차로에서의 효율적인 주행을 위한 연구들이 이루어지고 있으나, 혼합 교통류 상황(자율주행차량과 인간운전자의 혼재)에서 자율주행차량 군집주행에 따른 비신호 교차로 통행에서는 아직도 해결하여야 하는 과제가 많다.Researches are being conducted for efficient driving at unsignalized intersections by identifying the driving environment of autonomous vehicles, but in non-signalized intersection traffic due to platooning of autonomous vehicles in mixed traffic conditions (mixed autonomous vehicles and human drivers) There are still many challenges to be addressed.

따라서, 자율주행차량 군집주행에 따른 비신호 교차로 통행 개선 및 안전성 확보를 위한 새로운 기술의 개발이 요구되고 있다.Therefore, there is a demand for the development of new technologies to improve traffic at non-signalized intersections and secure safety in accordance with platooning of autonomous vehicles.

대한민국 공개특허 제10-2020-0071406호Republic of Korea Patent Publication No. 10-2020-0071406 대한민국 공개특허 제10-2020-0058613호Republic of Korea Patent Publication No. 10-2020-0058613 대한민국 공개특허 제10-2018-0065196호Republic of Korea Patent Publication No. 10-2018-0065196

본 발명은 종래 기술의 자율주행 차량 통행 제어 기술의 문제점을 해결하기 위한 것으로, 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is to solve the problems of the prior art autonomous vehicle traffic control technology, and to enable efficient passage of autonomous vehicles at unsignalized intersections by utilizing the responsibility-sensitivity safety theory and partially observed Markov decision procedure at unsignalized intersections. Its purpose is to provide a device and method for improved traffic.

본 발명은 비신호 교차로에서 다수의 인간 운전자 차량 사이의 자율주행차량의 교통안전 보장, 지체시간 등을 고려하여 자율 주행 행태를 학습시키는 모델을 구축하여 효율적인 자율주행 차량의 사고 방지가 가능하도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention establishes a model that learns autonomous driving behavior in consideration of traffic safety assurance and delay time of autonomous vehicles between multiple human driver vehicles at unsignalized intersections, thereby enabling efficient autonomous vehicle accident prevention. Its purpose is to provide a device and method for improved passage of autonomous vehicles at signalized intersections.

본 발명은 실제 상황과 같이 자율주행차량이 관찰할 수 있는 범위 내의 정보를 통하여 학습하는 방법으로 강화학습인 마르코프 의사결정 모델 사용(Partial Observability MDP, POMDP)으로 행동에 대한 강화학습의 보상을 최대화할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is a method of learning through information within the range that an autonomous vehicle can observe, such as in a real situation, and maximizes the reward of reinforcement learning for actions by using a Markov decision-making model (Partial Observability MDP, POMDP), which is reinforcement learning. Its purpose is to provide a device and method for improved passage of autonomous vehicles at unsignalized intersections.

본 발명은 Matlab의 Automated Driving Toolbox를 활용하여 레이다, 비전 센서 데이터를 활용하고, 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS) 기반의 강화학습-자율주행 시스템 프레임워크로 최적화를 하여 자율주행차량의 시스템이 다른 차량의 행태를 예측하며 운행할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention utilizes Matlab's Automated Driving Toolbox to utilize radar and vision sensor data, and optimizes it with a Responsibility-Sensitive Safety (RSS)-based reinforcement learning-autonomous driving system framework to improve the performance of autonomous vehicles. Its purpose is to provide a device and method for improved passage of autonomous vehicles at unsignalized intersections where the system predicts the behavior of other vehicles and allows them to operate.

본 발명은 부분적인 환경에 대한 관찰을 바탕으로 학습하는 대상의 의사결정을 할 수 있게 해주는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 포함하여 행동을 결정하고 행동에 대해 강화학습의 보상을 최대화할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention includes a Partial Observability Markov decision process (POMDP) process that enables the learning subject to make a decision based on partial environment observation, which determines behavior and reinforces learning about behavior. Its purpose is to provide a device and method for improved passage of autonomous vehicles at unsignalized intersections that can maximize the compensation of

본 발명은 시뮬레이션 환경에서 실제 자율주행 환경을 모사하기 위해 학습과 행동 결정의 근거를 시뮬레이션의 모든 환경(전체 관찰)이 아닌 자율주행차량 센서를 통하여 얻어진 데이터(부분만 관찰)를 기반으로 하여 행동을 결정하고 행동에 대해 강화학습의 보상을 최대화하는 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.In the present invention, in order to simulate the actual autonomous driving environment in a simulation environment, the basis for learning and behavioral decisions is based on data obtained through autonomous vehicle sensors (partial observation) rather than all environments (entire observation) of the simulation. The purpose is to provide a device and method for improved passage of autonomous vehicles at non-signalized intersections that determine and maximize the reward of reinforcement learning for actions.

본 발명은 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 비신호 교차로에서 자율주행차량과 인간운전자 사이의 안전 거리를 유지하기 위해 사용하여 자율주행차량 모델이 거리에 따라 위험한 상황이 발생할 수 있을 때 적절하게 대응할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention uses a Responsibility-Sensitive Safety (RSS) to maintain a safe distance between an autonomous vehicle and a human driver at an unsignalized intersection, so that the autonomous vehicle model can generate dangerous situations depending on the distance. Its purpose is to provide a device and method for improved passage of autonomous vehicles at non-signalized intersections that can respond appropriately when needed.

본 발명은 인간 운전자 차량용 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System) 적용으로, 시뮬레이션 상 자율주행차량의 센서를 통해 획득된 전방의 가장 가까운 인간 운전자와의 상대적 거리와 상대 속도를 파악하고, 제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행할 수 있도록 한 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention applies an Adaptive Model Predictive Control System for a human driver vehicle, and grasps the relative distance and relative speed with the nearest human driver in front obtained through a sensor of an autonomous vehicle in simulation, In the control variable, the self-driving vehicle autonomously maintains a certain distance from the vehicle in front and provides a device and method for improved passage of autonomous vehicles at non-signalized intersections so that they can operate in response to the behavior of human drivers. It has a purpose.

본 발명의 다른 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Other objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned above will be clearly understood by those skilled in the art from the description below.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 알고리즘의 학습 상태를 초기화하는 상태 초기화부;자율주행차량(AV) 운행 최적화를 위한 POMDP 모델에 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 적용하여 최적 행동을 도출하는 최적 행동 도출부;최적 행동 도출부에서 도출된 최적 행동을 실행하는 행동 실행부;자율주행차량(AV)의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰부;안전, 비안전, 실패, 목표 보상 인지에 따라 자율주행 차량용 RSS 알고리즘과 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 기반한 인간 운전자 차량에 대응하여 자율주행 차량 행동 수정을 하는 보상 결정부;를 포함하는 것을 특징으로 한다.In order to achieve the above object, the device for improved passage of autonomous vehicles at non-signalized intersections according to the present invention initializes the learning state of the Partial Observability Markov decision process (POMDP) algorithm. Unit; Optimal action derivation unit that derives optimal behavior by applying Responsibility-Sensitive Safety (RSS) to the POMDP model for autonomous vehicle (AV) operation optimization; Optimal behavior derived from the optimal behavior derivation unit Action execution unit to execute; State observation unit to receive data observed from the vision sensor and radar sensor of the autonomous vehicle (AV) and observe the driving state; RSS algorithm for autonomous vehicles according to safety, non-safety, failure, and target compensation recognition and a compensation determination unit for modifying behavior of the autonomous vehicle in response to a human driver vehicle based on an adaptive model predictive control system.

다른 목적을 달성하기 위한 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법은 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 알고리즘의 학습 상태를 초기화하는 상태 초기화 단계;자율주행차량(AV) 운행 최적화를 위한 POMDP 모델에 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 적용하여 최적 행동을 도출하는 최적 행동 도출 단계;최적 행동 도출 단계에서 도출된 최적 행동을 실행하는 행동 실행 단계;자율주행차량(AV)의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰 단계;안전, 비안전, 실패, 목표 보상 인지에 따라 자율주행 차량용 RSS 알고리즘과 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 기반한 인간 운전자 차량에 대응하여 자율주행 차량 행동 수정을 하는 보상 결정 단계;를 포함하는 것을 특징으로 한다.A method for improved passage of an autonomous vehicle at an unsignalized intersection according to the present invention for achieving another object includes a state initialization step of initializing a learning state of a Partial Observability Markov decision process (POMDP) algorithm; The optimal action derivation step of deriving the optimal action by applying the Responsibility-Sensitive Safety (RSS) to the POMDP model for autonomous vehicle (AV) operation optimization; Action execution step; State observation step of receiving data observed from the vision sensor and radar sensor of the autonomous vehicle (AV) and observing the driving state; and a reward determination step of correcting behavior of an autonomous vehicle in response to a human driver vehicle based on an adaptive model predictive control system.

이상에서 설명한 바와 같은 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법은 다음과 같은 효과가 있다.As described above, the device and method for improved passage of autonomous vehicles at unsignalized intersections according to the present invention have the following effects.

첫째, 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한다.First, it enables efficient traffic at unsignalized intersections by utilizing the safety theory of responsibility sensitivity and the partially observed Markov decision procedure.

둘째, 비신호 교차로에서 다수의 인간 운전자 차량 사이의 자율주행차량의 교통안전 보장, 지체시간 등을 고려하여 자율 주행 행태를 학습시키는 모델을 구축하여 효율적인 자율주행 차량의 사고 방지가 가능하도록 한다.Second, it is possible to efficiently prevent accidents of autonomous vehicles by establishing a model that learns autonomous driving behavior by considering traffic safety guarantee and delay time of autonomous vehicles between multiple human driver vehicles at unsignalized intersections.

셋째, 실제 상황과 같이 자율주행차량이 관찰할 수 있는 범위 내의 정보를 통하여 학습하는 방법으로 강화학습인 마르코프 의사결정 모델 사용(Partial Observability MDP, POMDP)으로 행동에 대한 강화학습의 보상을 최대화할 수 있도록 한다.Third, it is a method of learning through information within the range that autonomous vehicles can observe, such as in real situations, and the reward of reinforcement learning for actions can be maximized by using the reinforcement learning Markov decision-making model (Partial Observability MDP, POMDP). let it be

넷째, Matlab의 Automated Driving Toolbox를 활용하여 레이다, 비전 센서 데이터를 활용하고, 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS) 기반의 강화학습-자율주행 시스템 프레임워크로 최적화를 하여 자율주행차량의 시스템이 다른 차량의 행태를 예측하며 운행할 수 있도록 한다.Fourth, by using Matlab's Automated Driving Toolbox, radar and vision sensor data are utilized, and the autonomous vehicle system is optimized with a Responsibility-Sensitive Safety (RSS)-based reinforcement learning-autonomous driving system framework. It predicts the behavior of other vehicles and enables them to operate.

다섯째, 부분적인 환경에 대한 관찰을 바탕으로 학습하는 대상의 의사결정을 할 수 있게 해주는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 포함하여 행동을 결정하고 행동에 대해 강화학습의 보상을 최대화할 수 있도록 한다.Fifth, it is possible to determine behavior and use reinforcement learning for behavior, including the Partial Observability Markov decision process (POMDP) process that enables the learning subject to make decisions based on partial environment observations. to maximize the reward.

여섯째, 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 비신호 교차로에서 자율주행차량과 인간운전자 사이의 안전 거리를 유지하기 위해 사용하여 자율주행차량 모델이 거리에 따라 위험한 상황이 발생할 수 있을 때 적절하게 대응할 수 있도록 한다.Sixth, Responsibility-Sensitive Safety (RSS) is used to maintain a safe distance between an autonomous vehicle and a human driver at unsignalized intersections, when the autonomous vehicle model can cause dangerous situations depending on the distance. enable you to respond appropriately.

일곱째, 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)을 인간 운전자 차량의 제어에 적용하여, 시뮬레이션 상 자율주행차량의 센서에서 획득된 전방의 가장 가까운 인간 운전자 차량과의 상대적 거리와 상대 속도를 파악하고, 제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행할 수 있도록 한다.Seventh, by applying the Adaptive Model Predictive Control System to the control of the human driver vehicle, the relative distance and relative speed with the nearest human driver vehicle in front obtained from the sensor of the autonomous vehicle in simulation and, in the control variables, the self-driving vehicle autonomously maintains a certain distance from the vehicle in front so that it can operate in response to the behavior of the human driver.

도 1a는 ACC 시스템의 규칙을 나타낸 구성도
도 1b는 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치의 구성도
도 2a와 도 2b는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 설명하기 위한 구성도
도 3은 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법을 나타낸 동작 흐름도
도 4는 POMDP 프레임워크를 통한 RSS 알고리즘 최적화를 나타낸 의사 코드 형식 구성도
도 5a와 도 5b는 RSS 기반 POMDP 모델의 성능평가를 위한 시뮬레이션 실험 구성도
도 6a와 도 6b는 첫 번째 실험의 출력 프로파일을 사용한 시뮬레이션 결과 그래프
도 7a와 도 7b는 두 번째 실험의 출력 프로파일을 사용한 시뮬레이션 결과 그래프
도 8은 본 발명에 따른 모델과 이전의 적응형 MPC 모델의 성능 비교 그래프
도 9는 본 발명에 따른 모델에서 자율 주행 차량 속도에 따른 시뮬레이션 결과 그래프
도 10은 본 발명에 따른 모델에서 자율 주행 차량의 가속을 통한 시뮬레이션 결과 그래프Figure 1a is a configuration diagram showing the rules of the ACC system
1B is a block diagram of a device for improved passage of autonomous vehicles at an unsignalized intersection according to the present invention.
2a and 2b are block diagrams for explaining a Partial Observability Markov decision process (POMDP) process.
3 is an operational flow chart showing a method for improved passage of an autonomous vehicle at an unsignalized intersection according to the present invention.
Figure 4 is a pseudo code format diagram showing RSS algorithm optimization through the POMDP framework
5a and 5b are configuration diagrams of simulation experiments for performance evaluation of the RSS-based POMDP model.
6a and 6b are graphs of simulation results using the output profile of the first experiment.
7a and 7b are graphs of simulation results using the output profile of the second experiment.
8 is a performance comparison graph between a model according to the present invention and a previous adaptive MPC model
9 is a graph of simulation results according to autonomous vehicle speed in the model according to the present invention.
10 is a graph of simulation results through acceleration of an autonomous vehicle in a model according to the present invention

이하, 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법의 바람직한 실시 예에 관하여 상세히 설명하면 다음과 같다.Hereinafter, a preferred embodiment of a device and method for improved passage of autonomous vehicles at unsignalized intersections according to the present invention will be described in detail.

본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법의 특징 및 이점들은 이하에서의 각 실시 예에 대한 상세한 설명을 통해 명백해질 것이다.Features and advantages of the apparatus and method for improved passage of autonomous vehicles at unsignalized intersections according to the present invention will become clear through detailed descriptions of each embodiment below.

도 1a는 ACC 시스템의 규칙을 나타낸 구성도이고, 도 1b는 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치의 구성도이다.1A is a block diagram showing rules of an ACC system, and FIG. 1B is a block diagram of a device for improved passage of autonomous vehicles at an unsignalized intersection according to the present invention.

본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법은 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 것이다.An apparatus and method for improved passage of autonomous vehicles at unsignalized intersections according to the present invention enables efficient passage at unsignalized intersections by utilizing the safety theory of responsibility sensitivity and the partially observed Markov decision procedure.

이를 위하여, 본 발명은 비신호 교차로에서 다수의 인간 운전자 차량 사이의 자율주행차량의 교통안전 보장, 지체시간 등을 고려하여 자율 주행 행태를 학습시키는 모델을 구축하는 구성을 포함할 수 있다.To this end, the present invention may include a configuration for constructing a model for learning autonomous driving behavior in consideration of traffic safety guarantee, delay time, etc. of an autonomous vehicle between multiple human driver vehicles at an unsignalized intersection.

본 발명은 실제 상황과 같이 자율주행차량이 관찰할 수 있는 범위 내의 정보를 통하여 학습하는 방법으로 강화학습인 마르코프 의사결정 모델 사용(Partial Observability MDP, POMDP)으로 행동에 대한 강화학습의 보상을 최대화하는 구성을 포함할 수 있다.The present invention is a method of learning through information within a range that an autonomous vehicle can observe, such as in a real situation, and maximizes the reward of reinforcement learning for actions by using a Markov decision-making model (Partial Observability MDP, POMDP), which is reinforcement learning. configuration can be included.

본 발명은 Matlab의 Automated Driving Toolbox를 활용하여 레이다, 비전 센서 데이터를 활용하고, 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS) 기반의 강화학습-자율주행 시스템 프레임워크로 최적화를 하여 자율주행차량의 시스템이 다른 차량의 행태를 예측하며 운행할 수 있도록 하는 구성을 포함할 수 있다.The present invention utilizes Matlab's Automated Driving Toolbox to utilize radar and vision sensor data, and optimizes it with a Responsibility-Sensitive Safety (RSS)-based reinforcement learning-autonomous driving system framework to improve the performance of autonomous vehicles. It may include a configuration that allows the system to drive while predicting the behavior of other vehicles.

본 발명은 부분적인 환경에 대한 관찰을 바탕으로 학습하는 대상의 의사결정을 할 수 있게 해주는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 포함하여 행동을 결정하고 행동에 대해 강화학습의 보상을 하는 구성을 포함할 수 있다.The present invention includes a Partial Observability Markov decision process (POMDP) process that enables the learning subject to make a decision based on partial environment observation, which determines behavior and reinforces learning about behavior. It may include a configuration that compensates for

본 발명은 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)을 비신호 교차로에서 자율주행차량과 인간운전자 사이의 안전 거리를 유지하기 위해 사용하여 자율주행차량 모델이 거리에 따라 위험한 상황이 발생할 수 있을 때 적절하게 대응하는 구성을 포함할 수 있다.The present invention uses a Responsibility-Sensitive Safety (RSS) to maintain a safe distance between an autonomous vehicle and a human driver at an unsignalized intersection, so that the autonomous vehicle model can generate dangerous situations depending on the distance. When appropriate, it may include a corresponding configuration.

본 발명은 적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)을 인간 운전자 차량에 적용하였고, 시뮬레이션 상 자율주행차량의 센서에서 획득된 전방의 가장 가까운 인간 운전자 차량과의 상대적 거리와 상대 속도를 파악하고, 제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행하는 구성을 포함할 수 있다.In the present invention, the Adaptive Model Predictive Control System is applied to a human driver vehicle, and the relative distance and relative speed with the nearest human driver vehicle in front obtained from the sensor of the autonomous vehicle in simulation are grasped. In the control variable, the self-driving vehicle may include a configuration in which the autonomous vehicle operates in response to the human driver's behavior in a manner of autonomously maintaining a certain distance from the preceding vehicle.

적응형 모델 예측 제어 시스템(Adaptive Model Predictive Control System)에 관하여 설명하면 다음과 같다.An adaptive model predictive control system will be described as follows.

MPC(Model Predictive Control)는 인간 운전자 차량의 미래 행동을 추정하고 실시간 최적화를 사용해서 예측된 내용을 바탕으로 적절한 동작을 제어하도록 설계된 것으로, 이에 대응하여 자율주행차량에게 적절한 가속도를 선택하게 하는 역할을 한다.MPC (Model Predictive Control) is designed to estimate the future behavior of a human driver vehicle and use real-time optimization to control appropriate behavior based on the predicted content. do.

본 발명에서 MPC의 탐지 객체는 시뮬레이션 상 자율주행차량의 센서에서 획득된 전방의 가장 가까운 인간 운전자와의 상대적 거리와 상대 속도를 파악하고, 제어 변수에서 자율주행차량은 전방 차량과 일정한 거리를 자율적으로 유지하는 방식으로 인간 운전자의 행태에 반응하여 운행하도록 한다.In the present invention, the detection object of the MPC grasps the relative distance and relative speed with the nearest human driver in front obtained from the sensor of the autonomous vehicle in simulation, and the autonomous vehicle autonomously maintains a certain distance from the vehicle in front in the control variable. It operates in response to the behavior of the human driver in a manner that maintains

그리고 ACC 시스템은 하위 및 상위 레벨 컨트롤러로 구성되고, 상위 레벨 컨트롤러는 상대 속도와 상대 거리가 융합된 것아고, 하위 레벨 컨트롤러는 브레이크 시스템을 조정하여 최고의 가속도를 달성하는 것이이다.And the ACC system consists of lower and upper level controllers, the upper level controller is the fusion of relative speed and relative distance, and the lower level controller adjusts the brake system to achieve the best acceleration.

최적의 가속도를 계산하기 위해서는 자율 주행 차량(ego vehicle)과 선도 차량 사이의 관계가 설정되어야 한다. ACC는 전방 차량의 상대 위치와 상대 속도를 자율적으로 유지함으로써 자율 주행 차량의 종방향 가속을 제어한다. 컨트롤러는 온보드 센서(예: 레이더 및 비전 센서)의 실시간 측정을 기반으로 차량 간(V2V) 통신을 통해 상대 속도와 거리를 추정한다.In order to calculate the optimal acceleration, the relationship between the ego vehicle and the lead vehicle must be established. ACC controls the longitudinal acceleration of an autonomous vehicle by autonomously maintaining the relative position and relative speed of the vehicle in front. The controller estimates relative speed and distance through vehicle-to-vehicle (V2V) communication based on real-time measurements from on-board sensors (such as radar and vision sensors).

안전 거리는 다음과 같이 정의된다The safety distance is defined as

여기서,

는 ACC 시스템의 안전 거리,

는 자율주행차량(ego vehicle)의 실제 속도,

는 원하는 정지 거리,

은 차량 사이의 이동 시간을 나타낸다.here,

is the safety distance of the ACC system,

is the actual speed of the ego vehicle,

is the desired stopping distance,

represents the travel time between vehicles.

ACC 시스템의 운전 결정 함수는 다음과 같다.The driving decision function of the ACC system is as follows.

는 인간 운전 차량이 너무 가깝고, 안전 거리가 복구될 때까지 자율주행차량이 감속함을 의미한다.(공간 제어)

means that the human-driven vehicle is too close, and the autonomous vehicle decelerates until the safe distance is restored (space control).

는 인간 운전 차량이 너무 멀다는 것을 의미하며, 자율주행 차량은 설정 속도에 도달할 때까지 평상시와 같이 움직인다.(속도 제어)

means that the human-driven vehicle is too far away, and the autonomous vehicle moves as usual until it reaches the set speed (speed control).

센서 융합이 있는 종방향 상에서 추적 선도 차량은 동일한 차선의 자율주행 차량 전방과 센서의 감지 범위 내의 기타 차선에 있는 물체를 감지하여 자율 주행 차량과 유도 차량(자율 주행 차량 앞에 가장 가까운 인간 운전 차량) 사이의 상대 거리와 상대 속도를 찾는다. On longitudinal with sensor fusion, the track-leading vehicle detects objects in the same lane ahead of the autonomous vehicle and in other lanes within the sensor's detection range, between the autonomous vehicle and the guided vehicle (the closest human-driven vehicle in front of the autonomous vehicle). Find the relative distance and relative speed of

도 1a는 ACC 시스템의 규칙을 나타낸 것으로, ACC 시스템의 주행 결정 기능과 관련된 안전 거리와 상대적 거리 사이의 관계를 나타낸다(예: 간격 제어 및 속도 제어).1A shows the rules of the ACC system and shows the relationship between the safety distance and the relative distance related to the driving decision function of the ACC system (eg, interval control and speed control).

ACC 시스템에서는 이산 시간에 ACC 장착 차량의 가속도가 다음과 같이 제시된다.In the ACC system, the acceleration of an ACC-equipped vehicle at discrete time is presented as follows.

여기서,

는 ACC 장착 차량의 가속,

는 샘플링 기간이며,

는 하위 레벨 컨트롤러의 유한 대역폭에 해당하는 시간 지연이며,

는 가속에 관한 제어 변수 매트릭스를 나타낸다.here,

is the acceleration of the ACC-equipped vehicle,

is the sampling period,

is the time delay corresponding to the finite bandwidth of the lower-level controller,

represents a control variable matrix related to acceleration.

MPC 알고리즘은 미래 행동을 추정하고 온라인 최적화를 사용하여 예측 시야에서 적절한 제어 동작을 결정하도록 설계되었다.The MPC algorithm is designed to estimate future behavior and determine the appropriate control action in the predictive field using online optimization.

MPC는 출력과 입력 사이의 상호작용을 고려하는데, 이는 피드백 제어 알고리즘의 작동 방식과 비슷하다. 이 모델은 AV에 가장 적합한 가속도를 선택한다.MPC considers the interaction between output and input, similar to how feedback control algorithms work. This model selects the most suitable acceleration for AV.

MPC는 트랙션 제어 문제와 차선 유지 보조 시스템 등 일부 자율 제어 애플리케이션을 통해 도입되었다.MPC has been introduced through some autonomous control applications such as traction control issues and lane keeping assist systems.

MPC 알고리즘에서 샘플링 시간 k에서 측정할 수 있는 예측 및 전류 상태 파라미터는 다음과 같이 표시된다.In the MPC algorithm, the predictive and current state parameters that can be measured at sampling time k are shown as follows.

여기서,

는 예측 상태 행렬,

는 현재 상태 행렬이며, A와 B는 다음과 같이 상태 전환 행렬을 나타낸다.here,

is the predicted state matrix,

is the current state matrix, and A and B represent state transition matrices as follows.

이산 시간으로 분리된 연속 시간에서 세로 방향 차량 역학의 상대 거리

, 상대 속도

및 자기 속도

를 포함한 입력 데이터는 다음과 같다Relative distance of longitudinal vehicle dynamics in continuous time separated by discrete time

, relative speed

and magnetic speed

The input data including

여기서,

는 상대 가속도이고,

는 시간 t에서 자율 주행 차량의 가속도이다.here,

is the relative acceleration,

is the acceleration of the autonomous vehicle at time t.

적응형 MPC 시스템은 최대 스로틀 또는 제동 기능을 통해 더욱 강력해지고, 호스트 차량은 호스트 차량이 갑자기 차선을 변경하거나 제동할 때 즉시 응답할 수 있다. 이에 따라 적응형 MPC 시스템은 안전, 제어 추종 차량, 부드러운 주행 및 연비에 촛점을 둔다.The adaptive MPC system becomes more powerful with full throttle or braking capabilities, and the host vehicle can respond instantly when the host vehicle suddenly changes lanes or brakes. Accordingly, adaptive MPC systems focus on safety, control-following vehicles, smooth driving and fuel economy.

ACC 시스템과 유사한 하드 제약 조건에 통합된 속도(v), 가속(a), 제동(u) 및 저크(jerk)(j)의 제약 조건은 다음과 같이 표현할 수 있다.The constraints of velocity (v), acceleration (a), braking (u), and jerk (j) integrated into hard constraints similar to those of the ACC system can be expressed as follows.

센서 융합의 입력 매개 변수, 시뮬레이션 시간, 자동화 차량의 종방향 속도 및 도로 정보에 따르면 센서 융합이 적용된 추적 선도 차량은 먼저 자율주행 차량 앞에 있는 물체를 감지하고 다중 물체 추적기로 전달된다. According to the input parameters of sensor fusion, simulation time, longitudinal speed of the automated vehicle and road information, the tracked lead vehicle with sensor fusion first detects an object in front of the autonomous vehicle and then passes it to the multi-object tracker.

탐지 개체의 상태는 Kalman 필터 알고리즘에 의해 추정되고 융합된다.The state of the detection entity is estimated and fused by the Kalman filter algorithm.

그리고 책임 민감성 안전 알고리즘(Responsibility-Sensitive Safty; RSS)은 튜플에 의해 정의된 모든 주행 시나리오(안전 거리, 위험 상황 및 적절한 대응)와 관련된 인간의 개념에 기초하는 안전한 자동화 차량에 대한 규칙을 공식화하기 위해 도입되었다.And Responsibility-Sensitive Safety (RSS) to formulate rules for safe automated vehicles based on human concepts related to all driving scenarios defined by tuples (safe distance, hazardous situations and appropriate responses). has been introduced

안전 추종 거리 유지와 같은 몇 가지 간단한 규칙에 따라 RSS 알고리즘은 AV의 주변 환경에 대한 응답으로 AV에 대한 안전 보장을 제안한다. 예를 들어, AV는 AV와 사람이 운전하는 차량 사이의 충돌이 발생할 때 책임을 평가하고 결정한다.Following a few simple rules, such as maintaining a safe following distance, the RSS algorithm proposes safety guarantees for the AV in response to its surroundings. For example, an AV evaluates and determines liability when a collision between an AV and a human-driven vehicle occurs.

자율주행차량과 인간운전자 충돌 시 책임을 평가하는데 사용하고, 본 발명에서 RSS 알고리즘은 비신호 교차로에서 자율주행차량과 인간운전자 사이의 안전 거리를 유지하기 위해 사용하여 자율주행차량 모델은 거리에 따라 위험한 상황이 발생할 수 있을 때 적절하게 대응할 수 있도록 한다.It is used to evaluate liability in the event of a collision between an autonomous vehicle and a human driver, and in the present invention, the RSS algorithm is used to maintain a safe distance between an autonomous vehicle and a human driver at unsignalized intersections. Be able to respond appropriately when a situation may arise.

AV(Automated vehicles)는 RSS 알고리즘을 혼합 트래픽에 적용하고 위험한 상황을 피하기 위해 안전한 거리(예: 안전한 세로 거리와 안전한 가로 거리)를 유지해야 한다.Automated vehicles (AVs) must apply RSS algorithms to mixed traffic and maintain safe distances (e.g. safe vertical distance and safe horizontal distance) to avoid dangerous situations.

또한, AV가 다른 자동차와의 교통 사고를 피할 수 있다면, 그들은 최소 가속도로 감속되거나 차선을 바꾸어야 한다.Also, if an AV can avoid a traffic accident with another car, they must slow down or change lanes with minimal acceleration.

안전 거리(

)는 다음과 같이 표시된 AV의 반응 시간(

) 및 제동 거리(

)를 포함한다.safety distance (

) is the response time of AV expressed as (

) and braking distance (

).

여기서,

는 AV의 응답 시간을 나타내고,

은 AV의 실제 속도이고,

는 AV의 최소 가속도이다.here,

represents the response time of AV,

is the actual speed of AV,

is the minimum acceleration of AV.

본 발명에서는 운전자 없는 차량에서 수동 주행 차량까지의 안전 거리를 유지하기 위해 비신호화된 교차로에 RSS 알고리즘을 적용한다.In the present invention, the RSS algorithm is applied to non-signalized intersections in order to maintain a safe distance from a driverless vehicle to a manually driven vehicle.

AV는 수학적 수식에 기초하여 가능하면 밖의 경로와 다른 경로를 유지해야 한다. 즉, RSS 알고리즘은 다른 차량에 의해 위험한 상황이 발생할 수 있는 경우 자동화된 차량이 적절하게 대응하도록 보장하는 것이다. The AV should maintain a path different from the outside path if possible based on a mathematical expression. In other words, the RSS algorithm is to ensure that the automated vehicle responds appropriately if a dangerous situation may be caused by another vehicle.

첫째, 안전 거리는 수학적 정의(수학식 12)를 사용하여 계산한다. First, the safety distance is calculated using a mathematical definition (Equation 12).

둘째, 전방 충돌 경고는 확인된 트랙을 통한 자율 주행 차량과 MIO(Most Important Object)트랙 사이의 상대 거리와 상대 속도에서 결정된다.Second, the forward collision warning is determined from the relative distance and relative speed between the self-driving vehicle and the MIO (Most Important Object) track through the identified track.

마지막으로 AV는 안전 상태(예: ACC 시스템의 속도 제어 또는 간격 제어)를 복원하기 위해 적절한 조치를 취한다.Finally, the AV takes appropriate action to restore a safe state (e.g. speed control or clearance control in the ACC system).

RSS 알고리즘에 따르면, AV는 응답 시간

에서 최대 가속도에 도달할 때까지 가속되었고 수동 구동 차량으로부터 안전한 거리를 유지하기 위해 응답 시간 후 최소 가속도에 의해 감속된다.According to the RSS algorithm, AV is the response time

accelerated until reaching maximum acceleration at , and then decelerated by minimum acceleration after a response time to maintain a safe distance from manually driven vehicles.

따라서 자율 주행 결정 함수는 다음과 같다.Therefore, the autonomous driving decision function is:

만약,

인 경우, 두 차량 모두 정상 주행 및 설정 속도(ACC 시스템의 속도 제어)를 따를 수 있다.if,

If , both vehicles can follow the normal driving and set speed (speed control of the ACC system).

만약,

인 경우, 자율 주행 차량은 안전 거리가 복원될 때까지 최소 가속으로 감속한다(ACC 시스템의 공간 제어).if,

, the autonomous vehicle decelerates with minimum acceleration until the safe distance is restored (space control of the ACC system).

그리고 마르코프 결정 과정(MDP)은 완전히 관찰 가능한 무작위 환경에서 적절한 조치를 결정하는 데 사용되는 강력한 프레임워크이다. 그러나 AV는 부정확한 의도와 센서 소음을 고려하여 불확실한 환경으로 기동한다. And the Markov Decision Process (MDP) is a powerful framework used to determine the appropriate action in a fully observable random environment. However, AVs maneuver into uncertain environments, taking into account imprecise intent and sensor noise.

이 문제를 해결하기 위해, POMDP는 부분적으로 관찰 가능한 MDP로 제안된다. To solve this problem, POMDP is proposed as a partially observable MDP.

여기서 POMDP는 튜플(S, A, T, R, O, Z)로 지정된 타임 스텝(time step)에 걸쳐 가능한 각 믿음 상태(belief state)에 대한 적절한 조치를 결정하기 위해 사용된다,Here, POMDP is used to determine the appropriate action for each possible belief state over a time step specified by the tuple (S, A, T, R, O, Z).

S와 A는 참가자의 상태 및 행동이다. 각각,

는 전이 확률을 나타내고,

는 선택된 작용에 대한 보상을 정의하고, O는 관측치를 정의하고, Z는 관측 함수이다.S and A are the state and behavior of the participant. each,

represents the transition probability,

defines the reward for the selected action, O defines the observation, and Z is the observation function.

POMDP 시스템에서, 우리는 믿음 상태(belief state)를 유지한다.(예를 들어, 불완전한 시스템 상태로 인한 위치, 속도, yaw 및 yaw rate 포함)In a POMDP system, we maintain a belief state (e.g. including position, velocity, yaw and yaw rate due to imperfect system state).

자율 주행 차량이(가속, 감속 및 원하는 속도 유지 등) 조치를 취하고 레이더와 비전 센서를 통해 관찰을 수신하면, 새로운 믿음 상태는 베이즈의 규칙에 기초하여 얻어진다.When an autonomous vehicle takes action (such as accelerating, decelerating and maintaining a desired speed) and receives observations via radar and vision sensors, a new belief state is obtained based on Bayes' rules.

POMDP 프레임워크는 다음과 같이 정의된 몬테카를로 방법의 기초가 되는 예상 보상을 최대화하는 것을 목표로 한다The POMDP framework aims to maximize the expected reward underlying the Monte Carlo method, defined as

본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치는 도 1b에서와 같이, POMDP 알고리즘의 학습 상태를 초기화하는 상태 초기화부(10)와, 자율주행차량 운행 최적화를 위한 POMDP 모델에 RSS 알고리즘을 제공받아 최적 행동을 도출하는 최적 행동 도출부(20)와, 최적 행동 도출부(20)에서 도출된 최적 행동을 실행하는 행동 실행부(30)와, 시뮬레이션 상 자율주행차량의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰부(40)와, 안전, 비안전, 실패, 목표 보상 인지에 따라 RSS 알고리즘과 적응형 MPC 시스템에 기반한 자율주행 차량 행동 수정을 하는 보상 결정부(50)와, 보상 결정부(50)의 보상 수준이 적절한지 판단하는 보상 수준 판단부(60)와, 보상 수준 판단부(60)의 판단 결과 보상 수준이 목표치가 아닌 경우에 상태를 업데이트하는 상태 업데이트부(70)를 포함한다.As shown in FIG. 1B, the apparatus for improved passage of autonomous vehicles at non-signalized intersections according to the present invention includes a state initialization unit 10 for initializing the learning state of the POMDP algorithm, and a POMDP model for optimizing autonomous vehicle operation. The optimal action derivation unit 20 for deriving the optimal behavior by receiving the RSS algorithm, the action execution unit 30 for executing the optimal behavior derived from the optimal behavior derivation unit 20, and the vision of the autonomous vehicle in the simulation. The state observation unit 40, which observes the driving state by receiving data observed from sensors and radar sensors, and corrects autonomous vehicle behavior based on the RSS algorithm and adaptive MPC system according to safety, non-safety, failure, and target compensation recognition. If the compensation level is not the target value as a result of the determination of the compensation determination unit 50, the compensation level determination unit 60 for determining whether the compensation level of the compensation determination unit 50 is appropriate, and the compensation level determination unit 60 It includes a status update unit 70 that updates the status.

도 2a와 도 2b는 부분관찰 마르코프 의사결정(Partial Observability Markov decision process ; POMDP) 과정을 설명하기 위한 구성도이다.2A and 2B are configuration diagrams for explaining a Partial Observability Markov decision process (POMDP) process.

부분관찰 마르코프 의사결정 과정(Partial Observability Markov decision process ; POMDP)은 부분적인 환경에 대한 관찰을 바탕으로 학습하는 대상의 의사결정을 할 수 있게 해주는 프로세스이다.Partial Observability Markov decision process (POMDP) is a process that enables decision-making of learning subjects based on partial observations of the environment.

본 발명에서는 시뮬레이션 환경에서 실제 자율주행 환경을 모사하기 위해 학습과 행동 결정의 근거를 시뮬레이션의 모든 환경(전체 관찰)이 아닌 자율주행차량 센서를 통하여 얻어진 데이터(부분만 관찰)를 기반으로 하여 행동을 결정하고 행동에 대해 강화학습의 보상을 최대화하는 것을 목표로 사용된다.In the present invention, in order to simulate the actual self-driving environment in a simulation environment, the basis for learning and behavioral decisions is based on data obtained through autonomous vehicle sensors (partial observation) rather than all environments (entire observation) of the simulation. It is used with the goal of determining and maximizing the reward of reinforcement learning for an action.

POMDP는 자율주행차량 센서를 통하여 얻어진 데이터(부분만 관찰)를 기반으로 하여 행동을 결정하고 행동에 대해 강화학습의 보상을 하기 위하여, 자율주행차량과 인간운전자 상태(시간, 차량 위치, 속도, 결정된 경로) 및, 자율주행차 행태(가속, 유지, 감속) 및, 상태 관찰(시뮬레이션 시간, 차량 위치, 차량 속도, 차량 센서에서 얻어진 데이터) 및, 강화학습 보상을 위하여 안전거리 미확보 시의 부정적 보상, 안전거리 확보시의 일반적 보상, 차량이 목표까지 도달했는가를 기준으로 한 목표 보상, 사고 발생시의 실패 보상의 과정을 수행한다.POMDP determines an action based on data (partial observation) obtained through an autonomous vehicle sensor and compensates for the action by reinforcement learning. route) and autonomous vehicle behavior (acceleration, maintenance, deceleration), state observation (simulation time, vehicle location, vehicle speed, data obtained from vehicle sensors), and negative compensation when safety distance is not secured for reinforcement learning compensation, It carries out the process of general compensation when securing a safe distance, target compensation based on whether the vehicle has reached the target, and failure compensation when an accident occurs.

POMDP 프레임워크를 이용한 RSS 방법에 관하여 구체적으로 설명하면 다음과 같다.The RSS method using the POMDP framework is described in detail as follows.

자율 의사결정 프로세스(autonomous decision-making process)의 주요 문제는 불확실성을 이해하고 자율 주행 차량(자기 차량)에 대한 적절한 주행 전략을 결정하는 방법이다. 본 발명은 폐쇄 루프 설정에서 온라인 최적화에 초점을 맞춘다. A major problem in the autonomous decision-making process is how to understand uncertainty and determine an appropriate driving strategy for an autonomous vehicle (self-driving vehicle). The present invention focuses on online optimization in a closed loop setting.

본 발명에 따른 모델은 적응형 MPC 시스템에 기초한 RSS 방법과 POMDP 알고리즘의 융합으로, 불확실한 환경(예: 예측 불가능한 인간 운전자)에서 자율 주행 차량에 대한 진정한 안전 보장을 찾을 수 있다. The model according to the present invention is a fusion of the POMDP algorithm and the RSS method based on the adaptive MPC system, which can find true safety guarantees for autonomous vehicles in uncertain environments (e.g., unpredictable human drivers).

도 3은 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법을 나타낸 동작 흐름도이다.3 is an operational flowchart illustrating a method for improved passage of an autonomous vehicle at an unsignalized intersection according to the present invention.

본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 방법은 POMDP 알고리즘의 학습 상태를 초기화하는 상태 초기화 단계(S301)와, 자율주행차량 운행 최적화를 위한 POMDP 모델에 RSS 알고리즘을 제공받아 최적 행동을 도출하는 최적 행동 도출 단계(S302)와, 최적 행동 도출 단계에서 도출된 최적 행동을 실행하는 행동 실행 단계(S303)와, 시뮬레이션 상 자율주행차량의 비전 센서, 레이더 센서에서 관찰된 데이터를 제공받아 주행 상태를 관찰하는 상태 관찰 단계(S304))와, 안전, 비안전, 실패, 목표 보상 인지에 따라 RSS 알고리즘과 적응형 MPC 시스템에 기반한 자율주행 차량 행동 수정을 하는 보상 결정 단계(S305))와, 보상 결정 단계의 보상 수준이 적절한지 판단하는 보상 수준 판단 단계(S306))와, 보상 수준 판단 단계의 판단 결과 보상 수준이 목표치가 아닌 경우에 상태를 업데이트하는 상태 업데이트 단계(S307)를 포함한다.The method for improved passage of autonomous vehicles at unsignalized intersections according to the present invention includes a state initialization step of initializing the learning state of the POMDP algorithm (S301), and RSS algorithm is provided to the POMDP model for optimizing autonomous vehicle operation The optimal action derivation step (S302) of deriving the optimal action, the action execution step (S303) of executing the optimal action derived in the optimal action derivation step, and the data observed by the vision sensor and radar sensor of the autonomous vehicle in the simulation A state observation step (S304) of receiving and observing the driving state) and a compensation determination step (S305) of modifying autonomous vehicle behavior based on the RSS algorithm and the adaptive MPC system according to recognition of safety, non-safety, failure, and target compensation (S305) ), a compensation level determination step (S306) for determining whether the reward level in the compensation determination step is appropriate, and a state update step (S307) for updating the status when the reward level is not the target value as a result of the determination in the compensation level determination step (S307). include

POMDP 알고리즘의 믿음 상태는 다음과 같이 표시되는 연속 상태 공간이다.The belief state of the POMDP algorithm is a continuous state space denoted by

여기서,

는 자율 주행 차량의 믿음 상태(belief state)를 의미한다.here,

Means a belief state of the autonomous vehicle.

는 인간 운전 차량의 상태이며,

는 이 모델에서 인간이 운전하는 차량의 수를 나타낸다.

is the state of the human-driven vehicle,

represents the number of human-driven vehicles in this model.

상태(state)는 다음과 같이 폐쇄 루프 설정에서 레이더 및 비전 센서를 통해 각 타임 스텝(time step)에서 자율주행 차량과 기타 차량의 위치(x,y), 속도(v), yaw(

) 및 yaw rate(

)로 구성된다.The state is the location (x,y), speed (v), yaw (

) and yaw rate (

) is composed of

행동 공간(Action space)은 다음과 같다.The action space is as follows.

본 발명에 따른 자기 행동(ego action)에는 가속(

), 감속(

) 및 원하는 속도 유지(

)가 포함되었으며, ACC 모델의 간격과 속도 제어에 의해 제어된다.The ego action according to the present invention includes acceleration (

), deceleration (

) and maintaining the desired speed (

), which is controlled by the spacing and speed control of the ACC model.

자기 행동(ego action)은

로 정의되는 반면 수동 구동 차량의 작용은 MPC 알고리즘에 의한 시간 단계로 연속 시간 동안 추정한다.ego action is

While defined as , the action of a manually driven vehicle is estimated over continuous time in time steps by the MPC algorithm.

관측 공간(Observation space)에 관하여 설명하면 다음과 같다.An explanation of the observation space is as follows.

관측은 위치, 속도, yaw, yaw rate, 상대 속도(

) 및 운전자 없는 차량과 선행 차량의 상대 거리(

)와 같은 요인으로 구성된다. Observations are position, velocity, yaw, yaw rate, relative velocity (

) and the relative distance between the driverless vehicle and the preceding vehicle (

) is composed of factors such as

선도 차량(lead vehicle)은 자율주행차량(ego vehicle) 앞에서 가장 가까운 트랙인 MIO 트랙에 의해 감지된다. 또한 비전 및 레이더 센서는 자율 주행 차량과 관련하여 수동 구동 차량의 위치와 속도를 측정할 수 있다. The lead vehicle is sensed by the MIO track, which is the closest track in front of the ego vehicle. Vision and radar sensors can also measure the position and speed of manually driven vehicles relative to autonomous vehicles.

관측 함수는 다음과 같이 정의된다.The observation function is defined as

보상 기능 및 최적 행동에 관하여 설명하면 다음과 같다.The reward function and optimal behavior are explained as follows.

자동화 차량은 안전한 추종 거리를 유지하는 것과 같은 간단한 규칙을 따라야 한다.Automated vehicles must follow simple rules, such as maintaining a safe following distance.

따라서, 보상은 네 가지 측면(즉, 안전, 비안전, 실패 및 목표 보상)에 대해 고려된다. 보상은 센서 융합(비전 및 레이더 센서 사용)과 수동 차량을 통해 자율의 관찰 가능한 매개 변수에서 판단된다.Thus, compensation is considered for four aspects (i.e. safe, non-safe, failure and targeted compensation). Compensation is judged on autonomous observable parameters through sensor fusion (using vision and radar sensors) and passive vehicles.

적절한 응답을 실행하기 위해 RSS 알고리즘과 적응형 MPC 시스템에 기반한 AV의 동작은 보상 값에 기초하여 다음 타임 스텝에 최적화될 수 있다.The operation of the AV based on the RSS algorithm and the adaptive MPC system to execute an appropriate response can be optimized for the next time step based on the compensation value.

따라서 보상 기능과 최적 행동은 다음과 같은 규칙에 따라 갱신된다.Therefore, the reward function and the optimal action are updated according to the following rules.

(1)

은 안전한 보상을 의미한다. 따라서 AV는 설정된 속도(예: ACC 시스템의 속도 제어)까지 속도를 유지하거나 가속할 수 있다.(One)

means a safe reward. Thus, the AV can maintain speed or accelerate up to a set speed (e.g. speed control in an ACC system).

(2)

은 안전하지 않은 보상을 의미하므로 AV는 안전 거리(예: ACC 시스템의 차량 내 간격 제어)가 복원될 때까지 최소 가속으로 감속한다.(2)

means unsafe compensation, so the AV decelerates to minimum acceleration until a safe distance (e.g. in-vehicle clearance control of the ACC system) is restored.

(3)

은 고장 보상을 나타내는 것으로, 폐쇄 루프 설정이 중지된다.(3)

indicates fault compensation, the closed loop setup is stopped.

(4)목표 위치에 도달하는 차량 중 하나가 목표 보상을 안전하게 표시하므로 폐쇄 루프 설정이 중지된다.(4) Closed loop setup stops as one of the vehicles reaching the target position safely marks the target reward.

POMDP 프레임워크를 통한 전체 RSS 알고리즘은 다음과 같은 알고리즘으로 의사 코드와 함께 자세히 제시된다.The entire RSS algorithm through the POMDP framework is presented in detail with pseudocode as the following algorithm.

도 4는 POMDP 프레임워크를 통한 RSS 알고리즘 최적화를 나타낸 의사 코드 형식 구성도이다.4 is a pseudo code format configuration diagram showing RSS algorithm optimization through the POMDP framework.

도 4의 POMDP 프레임워크를 통한 RSS 알고리즘 최적화 과정을 단계별로 설명하면 다음과 같다.A step-by-step description of the RSS algorithm optimization process through the POMDP framework of FIG. 4 is as follows.

1 : 운행 상태 수집 S0, S1, S2, S3……1: Collection of operating conditions S0, S1, S2, S3… …

2 : 시뮬레이션 상에서 POMDP 프레임 워크를 기반으로 한 RSS 알고리즘 적용2 : Application of RSS algorithm based on POMDP framework on simulation

3 : 시뮬레이션 파라미터 입력3: Enter simulation parameters

4 : 시계열 마다 반복 (0.1s, 0.1 초마다 반응)4: Repeat every time series (response every 0.1s, 0.1s)

5 : 초기 상태 설정 (시뮬레이션 상 차량들의 초기 상태 설정)5: Initial state setting (setting the initial state of vehicles in simulation)

6 : S0 = (x0 y0 v0 θ0 ω0) T6: S0 = (x0 y0 v0 θ0 ω0) T

7 : Sk = (xk yk vk θk ωk) T7: Sk = (xk yk vk θk ωk) T

8 : 최적 주행 전략 연산8: Optimal driving strategy calculation

9 : π(b):= argmax aQ(b,a)9: π(b):= argmax aQ(b,a)

9 : 행동 실행, 행동양식 = [가속, 감속, 속도 유지]9: Action Execution, Action Mode = [Acceleration, deceleration, speed maintenance]

10 : 시뮬레이션 상에서 자율주행차량의 비전 및 레이더 센서로부터 데이터 수집10: Data collection from vision and radar sensors of autonomous vehicles in simulation

11 : 관찰된 데이터 수집 = = (x, y, v, q, w, rel_v, rel_d)T11: Observed data collection = = (x, y, v, q, w, rel_v, rel_d)T

12 : 강화학습 보상 기능 실행12: Reinforcement learning reward function execution

13 : 보상R = [안전 보상, 비안전 보상, 실패 보상, 목표 보상 수준]13: Compensation R = [safety compensation, non-safety compensation, failure compensation, target compensation level]

14 : 새로운 상태 업데이트 b = t(b, a, O)14 : Update new state b = t(b, a, O)

15 : 시뮬레이션 반복 결정 : 목표 보상에 도달할때까지 R = [failure reward, good reward]로 결정.15: Simulation repeat decision: R = [failure reward, good reward] until the target reward is reached.

도 5a와 도 5b는 RSS 기반 POMDP 모델의 성능평가를 위한 시뮬레이션 실험 구성도이다.5a and 5b are simulation experiment configuration diagrams for performance evaluation of the RSS-based POMDP model.

RSS 기반 POMDP 모델의 성능은 사람 운전 차량의 수가 증가하는 것으로 시뮬레이션하여 평가되었다. 즉, 군집 차량이 적응형 MPC 시스템하에서 제안된 RSS 기반 POMDP 모델에 미치는 영향을 고려했다. The performance of the RSS-based POMDP model was evaluated by simulating an increasing number of human-driven vehicles. That is, the effect of platoon vehicles on the proposed RSS-based POMDP model under the adaptive MPC system was considered.

RSS 기반 POMDP 모델은 고전적 적응형 MPC 모델이 제안된 모델과의 비교를 위해 사용되도록 적응형 MPC 시스템에서 구현되었다. The RSS-based POMDP model is implemented in the adaptive MPC system so that the classical adaptive MPC model is used for comparison with the proposed model.

교통 안전 보장 강화, 원활한 운전 개선, 지연 시간 단축을 위한 RSS 알고리즘의 성능을 평가하기 위해 두 가지 사례 실험과 특정 설정이 작성되었다. Two case experiments and specific settings were created to evaluate the performance of the RSS algorithm for enhancing traffic safety assurance, improving smooth driving, and reducing latency.

제안된 모델과 고전적인 적응형 MPC 모델은 동일한 안전 거리와 초기 속도를 가지고, 여기에서 사용되는 자율주행 차량은 센서 융합 및 확인된 트랙을 사용하여 MIO 트랙(예: 자율주행 차량 전방에서 가장 가까운 인간 운전 차량)의 상대 거리와 상대 속도를 예측한다. 자율주행 차량은 선도 차량의 동작을 이해하고 안전 거리를 유지할 것인지 여부를 결정해야 한다. 두 가지 실험은 다음과 같다.The proposed model and the classical adaptive MPC model have the same safety distance and initial speed, and the autonomous vehicle used here uses sensor fusion and identified tracks to track the MIO (e.g., the closest human in front of the autonomous vehicle). driving vehicle) predicts the relative distance and relative speed. Autonomous vehicles must understand the behavior of the lead vehicle and decide whether to maintain a safe distance. The two experiments are as follows.

도 5a는 첫번째 실험 환경을 나타낸 것으로, 자율주행차량은 좌회전을 시도하는 상황이고, 양방향에서 접근하는 사람 운전자가 탑승한 두대의 차량 고려하여여 한다. 사람 운전 차량은 양방향에서 직진으로 접근하는 것이다.5A shows a first experimental environment, in which an autonomous vehicle attempts to make a left turn, and considers two vehicles with human drivers approaching from both directions. Human-driven vehicles are approaching in a straight line from both directions.

도 5b는 두번째 실험 환경을 나타낸 것으로, 자율주행차량은 좌회전을 시도하는 상황이고, 양방향에서 다수의 차량이 진입하는 상황을 시뮬레이션하는 것으로 진입하는 차량의 간격은 40m로 설정한다.5B shows a second experimental environment, in which an autonomous vehicle attempts a left turn, and a situation in which multiple vehicles enter from both directions is simulated, and the interval between vehicles entering is set to 40 m.

표 1에서와 같이, 본 발명에 따른 모델을 검증하기 위한 비신호 교차로에서의 시뮬레이션을 위한 설정값으로, 시뮬레이션 시간 단위, 자율주행차량의 반응 시간, 자율주행차량의 초기 속도, 인간운전자의 초기 속도, 차량의 최소 가속도, 차량의 최대 가속도, 비신호 교차로 진입 도로의 차선 수, 비신호 교차로 진입 도로의 수, 차선의 너비 항목을 포함할 수 있고, 이 항목들은 실제 운행시에 자율 주행을 위한 파라미터로 사용될 수 있다.As shown in Table 1, as setting values for simulation at unsignalized intersections to verify the model according to the present invention, simulation time unit, autonomous vehicle reaction time, autonomous vehicle initial speed, and human driver initial speed , the minimum acceleration of the vehicle, the maximum acceleration of the vehicle, the number of lanes on roads entering non-signalized intersections, the number of roads entering non-signalized intersections, and the width of lanes. These items are parameters for autonomous driving during actual operation. can be used as

지연 시간을 고려한 성능 지수를 설명하면 다음과 같다.The figure of merit considering the delay time is described as follows.

시뮬레이션 시나리오에서 비전 및 레이더 센서를 통해 자율주행 차량과 선도 차량 사이의 상대 거리를 기준으로 감속 시작 시간(

))부터 가속 시작 시간(

)까지 제동 시간(

)을 계산하여 지연 시간을 고려한 성능을 분석하였다.Deceleration start time based on the relative distance between the autonomous vehicle and the lead vehicle via vision and radar sensors in the simulation scenario (

)) from the acceleration start time (

) until braking time (

) was calculated to analyze the performance considering delay time.

제동 시간은 다음과 같이 정의된다.The braking time is defined as:

따라서, 시간 성능 지수(

)는 다음과 같이 정리된다.Therefore, the time figure of merit (

) is arranged as follows.

여기서,

는 고전적인 적응형 MPC 모델의 제동시간,

은 고전적인 적응형 MPC 모델의 가속 시작 시간이고,

은 고전적인 적응형 MPC 모델의 감속 시작 시간이다.here,

is the braking time of the classic adaptive MPC model,

is the acceleration start time of the classical adaptive MPC model,

is the deceleration start time of the classical adaptive MPC model.

는 본 발명에 따른 모델의 제동 시간,

는 본 발명에 따른 모델의 가속 시작 시간이고,

는 본 발명에 따른 모델의 감속 시작 시간이다.

is the braking time of the model according to the present invention,

is the acceleration start time of the model according to the present invention,

is the deceleration start time of the model according to the present invention.

부드러운 운행을 고려한 성능지수에 관하여 설명하면 다음과 같다.The figure of merit considering smooth operation will be described as follows.

부드러운 운행 수준 점수(

)는 부드러운 주행 고려의 성능을 분석하기 위해 최소 속도(

)와 설정 속도(

)를 고려한다.smooth driving level score (

) is the minimum speed (

) and set speed (

) is taken into account.

여기서,

는 적응형 MPC 차량 기준 속도, here,

is the adaptive MPC vehicle reference speed,

는 본 발명에 따른 모델의 차량 기준속도이다.

Is the vehicle reference speed of the model according to the present invention.

는 적응형 모델 차량 최소 속도,

is the adaptive model vehicle minimum speed,

는 본 발명에 따른 모델의 차량 최소 속도이다.

is the minimum vehicle speed of the model according to the present invention.

도 6a와 도 6b는 첫 번째 실험의 출력 프로파일을 사용한 시뮬레이션 결과 그래프이고, 도 7a와 도 7b는 두 번째 실험의 출력 프로파일을 사용한 시뮬레이션 결과 그래프이다.6A and 6B are simulation result graphs using the output profile of the first experiment, and FIGS. 7A and 7B are simulation result graphs using the output profile of the second experiment.

그리고 도 8은 본 발명에 따른 모델과 이전의 적응형 MPC 모델의 성능 비교 그래프이다.8 is a performance comparison graph between the model according to the present invention and the previous adaptive MPC model.

부드러운 운행은 분명히 충돌이 증가하는 첫 번째 시나리오에서 두 번째 시나리오로 점차적으로 개선되는 경향이 있었다. Smooth running tended to improve gradually from the first to the second scenario, where there was clearly an increase in collisions.

표 2와 도 8은 모든 실험에서 자율 주행 차량과 사람이 운전하는 차량 간의 충돌이 감지되지 않았음을 보여준다.Table 2 and FIG. 8 show that no collisions between autonomous vehicles and human-driven vehicles were detected in all experiments.

예를 들어, 자율 주행 차량과 사람이 운전하는 차량 사이의 충돌이 없는 것을 보여주는 도 6a와 도 6b 그리고 도 7a와 도 7b의 충돌 상태 값은 0이다. For example, the collision state value of FIGS. 6A and 6B and 7A and 7B showing that there is no collision between an autonomous vehicle and a human-driven vehicle is 0.

그러므로 자동화된 차량은 안전하게 이동했다. 또한 두 번째 실험에서는 부드러운 운행 수준 점수 (53.26%)의 가장 높은 개선 효과가 관찰되었다. Therefore, the automated vehicle moved safely. Also, in the second experiment, the highest improvement effect of the smooth driving level score (53.26%) was observed.

첫 번째 실험에서 가장 높은 시간 성능 지수(31.60%)가 발생했다. The highest time figure of merit (31.60%) occurred in the first experiment.

비신호 교차로에서 본 발명에 따른 적응형 MPC 모델과 고전적 적응형 MPC 모델을 비교했을 때, 부드러운 주행의 개선은 상향 추세를 보였으며, 수동 차량의 수가 점차 증가하고 있을 때 지연 시간 단축이 감소하고 있었다.Comparing the adaptive MPC model according to the present invention and the classical adaptive MPC model at unsignalized intersections, the improvement in smooth driving showed an upward trend, and the reduction in delay time decreased when the number of manual vehicles gradually increased. .

군집 차량을 고려한 본 발명에 따른 부드러운 운행 성능은 다음과 같다.Smooth driving performance according to the present invention considering group vehicles is as follows.

도 9는 본 발명에 따른 모델에서 자율 주행 차량 속도에 따른 시뮬레이션 결과 그래프이고, 도 10은 본 발명에 따른 모델에서 자율 주행 차량의 가속을 통한 시뮬레이션 결과 그래프이다.9 is a simulation result graph according to the autonomous vehicle speed in the model according to the present invention, and FIG. 10 is a simulation result graph through acceleration of the autonomous vehicle in the model according to the present invention.

표 3과 표 4는 두 실험의 해당 최소, 최대 및 표준 편차를 나타낸 것이다.Tables 3 and 4 show the corresponding minimum, maximum and standard deviations of the two experiments.

RSS 알고리즘을 사용하여 AV는 안전한 거리를 유지하면서 유도 차량(사람 운전 차량)에 대한 상대 거리와 상대 속도를 자동으로 추적했다. Using RSS algorithms, the AV automatically tracked its relative distance and relative speed to guided vehicles (human-driven vehicles) while maintaining a safe distance.

도 9와 같이 AV 속도는 시뮬레이션 시작 시 설정된 속도를 항상 따를 수 있다. 두 차량 사이의 상대 거리가 제동 거리보다 작을 때 AV는 안전 거리가 복원될 때까지 최소 가속(-5.0m/s2)으로 감속했다. As shown in FIG. 9 , the AV speed may always follow the speed set at the start of the simulation. When the relative distance between the two vehicles was less than the braking distance, the AV decelerated with minimum acceleration (-5.0 m/s2) until the safe distance was restored.

첫 번째 실험의 속도 분포와 가속 분포의 표준 편차는 두 번째 실험보다 작았다. 따라서, 첫 번째 실험의 변동 범위는 속도와 가속도 분포 측면에서 두 번째 실험의 변동 범위보다 작았다. The standard deviation of the velocity distribution and acceleration distribution of the first experiment was smaller than that of the second experiment. Therefore, the fluctuation range of the first experiment was smaller than that of the second experiment in terms of velocity and acceleration distribution.

첫 번째 실험은 속도와 가속도 면에서 두 번째 실험보다 더 부드러웠다. 다시 말해, 부드러운 주행의 개선은 본 발명에 따른 모델에서 수동 차량의 수가 증가했을 때보다 적었다.The first experiment was smoother than the second in terms of speed and acceleration. In other words, the improvement in smooth running was less than when the number of manual vehicles increased in the model according to the present invention.

이상에서 설명한 본 발명에 따른 비신호 교차로에서 자율주행차량의 개선된 통행을 위한 장치 및 방법은 비신호 교차로에서 책임민감성 안전이론과 부분적 관측 마르코프 결정 절차를 활용하여 효율적 통행이 가능하도록 한 것이다.The device and method for improved passage of autonomous vehicles at unsignalized intersections according to the present invention described above enable efficient passage at unsignalized intersections by utilizing the safety theory of responsibility sensitivity and the partially observed Markov decision procedure.

본 발명은 실제 상황과 같이 자율주행차량이 관찰할 수 있는 범위 내의 정보를 통하여 학습하는 방법으로 강화학습인 마르코프 의사결정 모델 사용(Partial Observability MDP, POMDP)으로 행동에 대한 강화학습의 보상을 최대화할 수 있도록 한 것이다.The present invention is a method of learning through information within the range that an autonomous vehicle can observe, such as in a real situation, and maximizes the reward of reinforcement learning for actions by using a Markov decision-making model (Partial Observability MDP, POMDP), which is reinforcement learning. that made it possible

이상에서의 설명에서와 같이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명이 구현되어 있음을 이해할 수 있을 것이다.As described above, it will be understood that the present invention is implemented in a modified form without departing from the essential characteristics of the present invention.

그러므로 명시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 하고, 본 발명의 범위는 전술한 설명이 아니라 특허청구 범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Therefore, the specified embodiments should be considered from an explanatory point of view rather than a limiting point of view, and the scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the equivalent range are considered to be included in the present invention. will have to be interpreted

10. 상태 초기화부 20. 최적 행동 도출부
30. 행동 실행부 40. 상태 관찰부
50. 보상 결정부 60. 보상 수준 판단부
70. 상태 업데이트부10. State Initialization Unit 20. Optimal Action Derivation Unit
30. Action Execution Unit 40. State Observation Unit
50. Compensation determination unit 60. Compensation level determination unit
70. Status update unit

Claims

a state initialization unit that initializes a learning state of a Partial Observability Markov decision process (POMDP) algorithm;
An optimal action derivation unit that derives an optimal action by applying a Responsibility-Sensitive Safety (RSS) to a POMDP model for optimizing autonomous vehicle (AV) operation;
an action execution unit that executes the optimal behavior derived from the optimal behavior derivation unit;
a state observation unit that receives data observed from a vision sensor and a radar sensor of an autonomous vehicle (AV) and observes a driving state;
Compensation decision unit that modifies autonomous vehicle behavior based on the RSS algorithm for autonomous vehicles and the Adaptive Model Predictive Control System for human-driven vehicles according to safety, non-safety, failure, and target reward recognition. including;
By applying the Adaptive Model Predictive Control System to the human-driving vehicle, the relative distance and relative speed to the nearest human driver in front obtained from the sensor of the autonomous vehicle are identified, and the autonomous driving is controlled in the control variables. A device for improved passage of autonomous vehicles at non-signalized intersections, characterized in that the vehicle can operate in response to the behavior of a human driver in a way that autonomously maintains a certain distance from a human-driven vehicle in front.

The method according to claim 1, further comprising: a compensation level determination unit determining whether the compensation level of the compensation determination unit is appropriate;
An apparatus for improved passage of autonomous vehicles at non-signalized intersections, further comprising a state update unit for updating a state when the compensation level is not the target value as a result of the determination of the compensation level determination unit.

The method of claim 1, wherein the POMDP model determines an action based on data obtained through an autonomous vehicle sensor and compensates for the action by reinforcement learning,
Autonomous vehicle and human driver status, including time, vehicle position, speed, and determined route items;
Self-driving car behavior including acceleration, maintenance, and deceleration,
condition observation, including simulation time, vehicle position, vehicle speed, and data obtained from vehicle sensors;
For reinforcement learning compensation, non-safety compensation when the safety distance is not secured, safety compensation when the safety distance is secured, target compensation based on whether the vehicle has reached the target, and failure compensation in the event of an accident are performed. A device for improved passage of autonomous vehicles at signalized intersections.

4. The method of claim 3, the belief state of the POMDP model is

is a continuous state space denoted by
here,

is the belief state of the autonomous vehicle,
cast

is the state of the human-driven vehicle,

Apparatus for improved passage of autonomous vehicles at unsignalized intersections, characterized in that is the number of human-driven vehicles in this model.

The method of claim 4, wherein the state is the location (x, y), speed (v), yaw (

) and yaw rate (

) Device for improved passage of autonomous vehicles at non-signalized intersections, characterized in that consisting of.

The method of claim 3, wherein the ego action in the action space includes acceleration (

), deceleration (

) and maintaining the desired speed (

) is included and controlled by the spacing and speed control of the ACC model,
Observation in the observation space is position, velocity, yaw, yaw rate, relative velocity (

) Device for improved passage of autonomous vehicles at unsignalized intersections, characterized in that consisting of factors.

The method of claim 3, wherein the reward function and the optimal action in the reward decision unit are:
(One)

means safe compensation, and AV maintains speed up to the set speed or accelerates;
(2)

means unsafe compensation, AV decelerates to minimum acceleration until safe distance is restored,
(3)

indicates fault compensation, closed loop setting stops,
(4) A device for improved passage of autonomous vehicles at unsignalized intersections, characterized in that when one of the vehicles reaching the target position safely marks the target reward, it is updated according to the closed loop setting stop rule.

The method of claim 1, wherein a safe distance is maintained to avoid dangerous situations by applying an RSS algorithm to mixed traffic to optimize autonomous vehicle operation,
safety distance (

)Is,

is defined as,
here,

is the reaction time of AV,

is the braking distance,

is the response time of AV,

is the actual speed of AV,

Apparatus for improved passage of autonomous vehicles at non-signalized intersections, characterized in that is the minimum acceleration of AV.

9. The method of claim 8, wherein AV is the response time.

accelerating until reaching a maximum acceleration at , and decelerating by a minimum acceleration after a response time to maintain a safe distance from a manually driven vehicle;
The autonomous driving decision function is

If , both vehicles follow normal driving and set speed, if

When , the autonomous vehicle decelerates with minimum acceleration until the safe distance is restored.

delete

The method of claim 1, as a set value for simulation at an unsignalized intersection,
Unit of simulation time, response time of autonomous vehicle, initial speed of autonomous vehicle, initial speed of human driver, minimum acceleration of vehicle, maximum acceleration of vehicle, number of lanes of roads entering non-signalized intersections, number of roads entering non-signalized intersections , A device for improved passage of autonomous vehicles at an unsignalized intersection, comprising a lane width item.

The method of claim 11, for performance analysis considering delay time,
Deceleration start time based on the relative distance between the autonomous vehicle and the lead vehicle via vision and radar sensors in the simulation scenario (

)) from the acceleration start time (

) until braking time (

Apparatus for improved passage of autonomous vehicles at non-signalized intersections, characterized in that the performance is analyzed considering the delay time by calculating ).

13. The method of claim 12, wherein the braking time is

is defined as,
time performance index (

)Is,

,

is defined as,
here,

is the braking time of the classic adaptive MPC model,

is the acceleration start time of the classical adaptive MPC model,

is the deceleration start time of the classical adaptive MPC model,

is the braking time of the proposed model,

is the acceleration start time of the proposed model,

Apparatus for improved passage of autonomous vehicles at non-signalized intersections, characterized in that is the deceleration start time of the proposed model.

13. The method of claim 12, wherein the smooth driving level score (

) is the minimum speed (

) and set speed (

),

is defined as,

,

,
here,

is the adaptive MPC vehicle reference speed,

is the vehicle reference speed of the proposed model,

is the adaptive model vehicle minimum speed,

Apparatus for improved passage of autonomous vehicles at non-signalized intersections, characterized in that is the minimum vehicle speed of the proposed model.

A state initialization step of initializing a learning state of a Partial Observability Markov decision process (POMDP) algorithm;
An optimal action derivation step of deriving an optimal action by applying a Responsibility-Sensitive Safety (RSS) to a POMDP model for optimizing autonomous vehicle (AV) operation;
an action execution step of executing the optimal action derived in the optimal action derivation step;
A state observation step of observing a driving state by receiving data observed from a vision sensor and a radar sensor of an autonomous vehicle (AV);
A compensation determination step of modifying autonomous vehicle behavior based on the RSS algorithm of autonomous vehicles and the Adaptive Model Predictive Control System of human-driven vehicles according to safety, non-safety, failure, and target reward recognition; include,
According to the application of the Adaptive Model Predictive Control System for human-driven vehicles, the relative distance and relative speed with the nearest human driver obtained from the sensor of the self-driving vehicle are identified, and the self-driving vehicle in the control variables A method for improved passage of an autonomous vehicle at an unsignalized intersection, characterized in that it can operate in response to the behavior of a human driver in a way that autonomously maintains a certain distance from the vehicle in front.

[Claim 16] The method of claim 15, further comprising: a compensation level determination step of determining whether the compensation level of the compensation determination step is appropriate;
A method for improved passage of autonomous vehicles at non-signalized intersections, further comprising a state update step of updating a state when the compensation level is not the target value as a result of the determination in the compensation level determination step.

The method of claim 15, wherein the POMDP model determines an action based on data obtained through an autonomous vehicle sensor and compensates for the action by reinforcement learning,
Autonomous vehicle and human driver status, including time, vehicle position, speed, and determined route items;
Self-driving car behavior including acceleration, maintenance, and deceleration,
condition observation, including simulation time, vehicle position, vehicle speed, and data obtained from vehicle sensors;
For reinforcement learning compensation, non-safety compensation when safety distance is not secured, safety compensation when safety distance is secured, target compensation based on whether the vehicle has reached the target, and failure compensation when an accident occurs. Methods for improved passage of autonomous vehicles at signalized intersections.

16. The method of claim 15, wherein in the reward determination step, the reward function and the optimal action are:
(One)

indicates fault compensation, closed loop setting stops,
(4) A method for improved passage of autonomous vehicles at unsignaled intersections, characterized in that when one of the vehicles reaching the target position safely marks the target reward, it is updated according to the rule of stopping the closed loop setting.

delete