KR20190115539A

KR20190115539A - Autonomous driving reinforcement learning apparatus and method of vehicle

Info

Publication number: KR20190115539A
Application number: KR1020180033789A
Authority: KR
Inventors: 전병환
Original assignee: 현대모비스 주식회사
Priority date: 2018-03-23
Filing date: 2018-03-23
Publication date: 2019-10-14

Abstract

The present invention relates to an autonomous driving reinforcement learning device of a vehicle and a method thereof. The device comprises: a driver intervention recognition unit recognizing operation intervention of a driver during autonomous driving of a vehicle; a vehicle information detection unit detecting state information of the vehicle and situation information around the vehicle using a plurality of sensors; a communication unit which receives traffic information for driving, stopping and lane change of the vehicle by communicating with traffic infrastructure or other vehicles, and communicates with a server performing an autonomous driving algorithm improvement operation through reinforcement learning, to transmit information necessary for the improvement operation, and receives an improved autonomous driving algorithm from the server; a control unit which determines, upon the operation intervention of the driver during the autonomous driving of the vehicle, whether the improvement of the autonomous driving algorithm is necessary by comparing the intervention operation information of the driver recognized by the driver intervention recognition unit with control information determined by the autonomous driving algorithm, and transmits information necessary for the improvement operation through the communication unit when the improvement is necessary; and the server performing an autonomous driving algorithm improvement operation through reinforcement learning using information transmitted in a wired or wireless manner through the communication unit.

Description

Autonomous driving reinforcement learning apparatus and method of vehicle {AUTONOMOUS DRIVING REINFORCEMENT LEARNING APPARATUS AND METHOD OF VEHICLE}

본 발명은 차량의 자율주행 강화학습 장치 및 방법에 관한 것으로, 보다 상세하게는 차량의 자율주행 중 운전자의 조작 개입이 발생되었을 때, 상기 운전자의 조작 개입 시점 전후의 지정된 일정 시간 동안의 차량 및 주행 환경에 관련된 정보를 입력받아 강화학습을 통해 자율주행 알고리즘을 개선할 수 있도록 하는, 차량의 자율주행 강화학습 장치 및 방법에 관한 것이다.The present invention relates to an autonomous driving reinforcement learning apparatus and method of the vehicle, and more particularly, when the driver's operation intervention occurs during autonomous driving of the vehicle, the vehicle and the driving for a predetermined time before and after the operation intervention time of the driver The present invention relates to an autonomous driving reinforcement learning apparatus and method for improving an autonomous driving algorithm through reinforcement learning by receiving information related to an environment.

최근 자율주행 자동차에 대한 개발이 가속화되고 있다. Recently, the development of autonomous vehicles has been accelerated.

상기 자율주행 자동차는 운전자가 브레이크, 핸들, 가속 페달 등을 제어하지 않아도 도로의 상황을 파악해 자동으로 주행할 수 있도록 하는 차량이다. 정확하게는 무인 자동차와 다른 개념이지만 혼용돼 사용하고 있다.The autonomous vehicle is a vehicle that enables the driver to automatically grasp the situation of the road without controlling the brake, steering wheel, accelerator pedal, and the like. It is precisely a different concept from driverless cars, but they are used interchangeably.

이러한 자율주행 자동차를 위해서는 다수의 운전 지원 시스템이 지원되어야 한다. 예컨대 상기 운전 지원 시스템에는 고속도로 주행 지원 시스템(HDA, 자동차 간 거리를 자동으로 유지해 주는 기술)을 비롯해 후측방 경보 시스템(BSD, 후진 중 후측방 사각지대 차량을 감지, 경보를 울리는 기술), 자동 긴급 제동 시스템(AEB, 앞차를 인식하지 못할 시 제동 장치를 가동하는 기술), 차선 이탈 경보 시스템(LDWS), 차선 유지 지원 시스템(LKAS, 방향 지시등 없이 차선을 벗어나는 것을 보완하는 기술), 어드밴스드 스마트 크루즈 컨트롤(ASCC, 설정된 속도로 차 간 거리를 유지하며 정속 주행하는 기술), 혼잡 구간 주행 지원 시스템(TJA) 등이 구현돼야 한다.For such autonomous vehicles, a number of driving assistance systems must be supported. For example, the driving assistance system includes a highway driving assistance system (HDA, a technology for automatically maintaining a distance between cars), a rearward warning system (BSD, a technology for detecting rear blind spot vehicles while reversing, and an alarm) and an automatic emergency. Braking system (AEB, technology to start the braking system when the vehicle is not recognized), lane departure warning system (LDWS), lane keeping assistance system (LKAS, technology to compensate for leaving the lane without a turn signal), Advanced Smart Cruise Control (ASCC, technology that maintains the distance between cars at a set speed) and a congestion section driving support system (TJA) should be implemented.

아울러 주행 환경 정보, 차량 정보, 및 주행 중 특정 시점마다 적절한 대응 정보를 입력하여 지도학습(Supervised learning)을 통해 자율주행 알고리즘을 제작하여 자동차에 적용한다. 그리고 상기와 같이 획득된 데이터를 바탕으로 전역 및 지역의 자율주행 알고리즘을 생성하는데, 기존에는 상기와 같이 생성된 자율주행 알고리즘의 판단이 잘못될 경우 이를 동적으로 보완할 수 있는 방법이 없었다.In addition, the driving environment information, the vehicle information, and the corresponding response information are inputted at specific time points while driving, and autonomous driving algorithms are applied to the vehicle through supervised learning. In addition, global autonomous driving algorithms are generated based on the data obtained as described above. However, when the autonomous driving algorithm generated as described above is wrong, there is no method to dynamically compensate for this.

예컨대 차량의 운행 환경은 날씨나 교통상황 및 사고, 공사 등 환경(주행 환경) 요건이 지속적으로 변화하고 있고, 초기 자율주행 알고리즘(예 : 차량에 탑재되는 시점의 자율주행 알고리즘)은 예측되지 않은 모든 주행 상황에 대응하기 어려운 문제점이 있는데, 기존에는 이를 동적으로 보완할 수 있는 방법이 없었다.For example, the driving environment of the vehicle is constantly changing environmental (driving environment) requirements such as weather, traffic conditions, accidents, construction, etc., and the initial autonomous driving algorithm (e.g., autonomous driving algorithm when the vehicle is mounted) is unpredictable. There is a problem that is difficult to cope with the driving situation, there was no way to compensate for this dynamically.

본 발명의 배경기술은 대한민국 공개특허 10-2015-0115069호(2015.10.14. 공개, 무인자율주행 차량의 주행경로 생성방법 및 장치)에 개시되어 있다. Background art of the present invention is disclosed in Republic of Korea Patent Publication No. 10-2015-0115069 (2015.10.14. Disclosure, method and apparatus for generating a driving route of an unmanned autonomous vehicle).

본 발명의 일 측면에 따르면, 본 발명은 상기와 같은 문제점을 해결하기 위해 창작된 것으로서, 차량의 자율주행 중 운전자의 조작 개입이 발생되었을 때, 상기 운전자의 조작 개입 시점 전후의 지정된 일정 시간 동안의 차량 및 주행 환경에 관련된 정보를 입력받아 강화학습을 통해 자율주행 알고리즘을 개선할 수 있도록 하는, 차량의 자율주행 강화학습 장치 및 방법을 제공하는데 그 목적이 있다. According to an aspect of the present invention, the present invention was created to solve the above problems, and when the driver's operation intervention occurs during autonomous driving of the vehicle, a predetermined time period before and after the driver's operation intervention point is An object of the present invention is to provide an autonomous driving reinforcement learning apparatus and method for receiving an information related to a vehicle and a driving environment to improve an autonomous driving algorithm through reinforcement learning.

본 발명의 일 측면에 따른 차량의 자율주행 강화학습 장치는, 차량의 자율주행 중 운전자의 조작 개입을 인식하는 운전자 개입 인식부; 상기 차량의 상태 정보 및 차량 주변의 상황 정보를 복수의 센서를 이용해 검출하는 차량정보 검출부; 교통 인프라 혹은 다른 차량과 통신하여 차량의 주행과 정지 및 차선 변경을 위한 교통정보를 수신하고, 강화학습을 통해 자율주행 알고리즘 개선 동작을 수행하는 서버와 통신하여 이 개선 동작에 필요한 정보를 전송하며, 개선 완료된 자율주행 알고리즘을 상기 서버로부터 전달받는 통신부; 차량의 자율주행 중 운전자의 조작 개입 시, 상기 운전자 개입 인식부를 통해 인식한 운전자의 개입 조작 정보와 자율주행 알고리즘에 의해 판단한 제어 정보를 비교하여 상기 자율주행 알고리즘의 개선이 필요한지 판단하고, 개선이 필요할 경우에 상기 통신부를 통해 개선 동작에 필요한 정보를 전송하게 하는 제어부; 및 상기 통신부를 통해 유무선 방식으로 전달되는 정보를 이용해 강화학습을 통해 자율주행 알고리즘 개선 동작을 수행하는 상기 서버;를 포함하는 것을 특징으로 한다.An autonomous driving reinforcement learning apparatus for a vehicle according to an aspect of the present invention includes a driver intervention recognizing unit that recognizes a driver's manipulation intervention during autonomous driving of a vehicle; A vehicle information detection unit detecting state information of the vehicle and situation information around the vehicle using a plurality of sensors; Receives traffic information for driving, stopping and changing lanes by communicating with traffic infrastructure or other vehicles, and transmits the information required for this improvement by communicating with a server performing autonomous driving algorithm improvement through reinforcement learning. A communication unit for receiving an improved autonomous driving algorithm from the server; When the driver intervenes during autonomous driving of the vehicle, it is determined whether the autonomous driving algorithm needs improvement by comparing the driver's intervention operation information recognized by the driver intervention recognition unit with the control information determined by the autonomous driving algorithm. A control unit for transmitting information necessary for an improvement operation through the communication unit; And the server performing an autonomous driving algorithm improvement operation through reinforcement learning using information transmitted in a wired or wireless manner through the communication unit.

본 발명에 있어서, 상기 제어부는, 자율주행 알고리즘의 개선이 필요하다고 판단한 경우, 상기 운전자의 조작 개입 시점 전후의 지정된 일정 시간 동안의 기 지정된 정보들을 수집하고, 이 정보들을 서버에 전송하여 자율주행 알고리즘을 개선하게 하고, 상기 서버에서 개선된 자율주행 알고리즘을 전달받아 차량에 적용하는 것을 특징으로 한다.In the present invention, when the control unit determines that the improvement of the autonomous driving algorithm is necessary, the control unit collects predetermined information for a predetermined time before and after the point of the driver's operation intervention, and transmits the information to the server to autonomous driving algorithm. It is characterized in that to improve, and receive the improved autonomous driving algorithm from the server and apply to the vehicle.

본 발명에 있어서, 상기 제어부는, 차량정보 검출부를 통해 검출된 차량 자체의 정보 및 차량 주변의 정보; 통신부를 통해 인프라 시스템이나 주변 차량으로부터 수신된 정보; 및 자율주행 알고리즘 판단 정보; 및 운전자 조작 정보; 중 적어도 하나 이상의 정보를 상기 서버에 전송하는 것을 특징으로 한다.In the present invention, the control unit, the information of the vehicle itself and the information around the vehicle detected by the vehicle information detection unit; Information received from an infrastructure system or a nearby vehicle through a communication unit; And autonomous driving algorithm determination information; And driver operation information; At least one of the information is characterized in that for transmitting to the server.

본 발명에 있어서, 상기 통신부는, 상기 서버와 유무선 방식으로 통신하기 위한 서버 통신부; 차량 주변의 다른 차량과 유무선 방식으로 통신하기 위한 V-V 통신부; 및 차량 주변의 인프라 시스템과 유무선 방식으로 통신하기 위한 V-I 통신부;를 포함하는 것을 특징으로 한다.In the present invention, the communication unit, a server communication unit for communicating with the server in a wired or wireless manner; A V-V communication unit for communicating with another vehicle around the vehicle in a wired or wireless manner; And a V-I communication unit for communicating with the infrastructure system around the vehicle in a wired or wireless manner.

본 발명에 있어서, 상기 제어부는, 차량의 자율주행 중 운전자가 조작 개입하지 않더라도 수집된 차량정보와 자율주행 알고리즘 판단 정보가 서로 다른 상황이 발생하였을 경우, 자율주행 알고리즘의 개선이 필요하다고 판단하여, 상기 상황이 발생한 시점 전후의 지정된 일정 시간 동안의 기 지정된 정보들을 수집하여 상기 서버에 전송하는 것을 특징으로 한다.In the present invention, the controller determines that the autonomous driving algorithm needs to be improved when a situation occurs in which the collected vehicle information and the autonomous driving algorithm determination information are different even when the driver does not intervene during autonomous driving of the vehicle. And collecting predetermined information for a predetermined time period before and after the time point at which the situation occurs and transmitting the predetermined information to the server.

본 발명에 있어서, 상기 제어부는, 차량의 자율주행 중 운전자가 조작 개입하더라도 상기 운전자 개입 인식부를 통해 인식한 운전자의 개입 조작 정보와 자율주행 알고리즘에 의해 판단한 제어 정보가 다르지 않을 경우, 자율주행 알고리즘의 개선이 필요하지 않다고 판단하여, 개선되지 않은 기존의 상기 자율주행 알고리즘을 통해 자율주행을 계속하게 하는 것을 특징으로 한다.In the present invention, the control unit, even if the driver interventions during autonomous driving of the vehicle, if the driver's intervention operation information recognized by the driver intervention recognition unit and the control information determined by the autonomous driving algorithm is different, It is determined that the improvement is not necessary, and continues the autonomous driving through the existing autonomous driving algorithm which has not been improved.

본 발명에 있어서, 상기 강화학습은, 기존의 자율주행 알고리즘을 개선하기 위하여, 차량의 자율주행 중 운전자의 개입 조작이나 사고 발생을 포함한 기 지정된 특정 상황에서 발생한 정보들을 수집하여, 운전자가 개입 조작한 방식이나 사고 발생을 방지할 수 있는 조작 방식을 학습하는 것임을 특징으로 한다.In the present invention, the reinforcement learning, in order to improve the existing autonomous driving algorithm, by collecting information generated in a predetermined specific situation including the driver's intervention operation or accident occurrence during autonomous driving of the vehicle, It is characterized by learning a method or an operation method that can prevent the occurrence of an accident.

본 발명의 다른 측면에 따른 차량의 자율주행 강화학습 방법은, 제어부가 차량의 자율주행 중 운전자의 조작 개입이 있는지 체크하는 단계; 운전자의 조작 개입이 있을 경우, 상기 제어부가 상기 운전자의 개입 조작 정보와 상기 자율주행 알고리즘에 의해 판단한 제어 정보를 비교하는 단계; 상기 운전자의 개입 조작 정보와 상기 자율주행 알고리즘에 의해 판단한 제어 정보가 다를 경우, 상기 제어부가 상기 자율주행 알고리즘의 개선이 필요한 것으로 판단하여 개선 동작에 필요한 정보를 서버에 전달하는 단계; 및 상기 서버가 상기 전달되는 정보를 이용해 강화학습을 통해 자율주행 알고리즘 개선 동작을 수행하는 단계;를 포함하는 것을 특징으로 한다.In accordance with another aspect of the present invention, there is provided a method for reinforcing autonomous driving of a vehicle, including: checking, by a controller, whether a driver's manipulation is involved during autonomous driving of a vehicle; Comparing the driver's intervention operation information with control information determined by the autonomous driving algorithm, when there is a driver's manipulation intervention; If the driver's intervention operation information and the control information determined by the autonomous driving algorithm are different, determining that the autonomous driving algorithm needs to be improved, and transmitting the information necessary for the improvement operation to the server; And performing, by the server, an autonomous driving algorithm improvement operation through reinforcement learning using the transmitted information.

본 발명에 있어서, 상기 개선 동작에 필요한 정보를 서버에 전달하는 단계에서, 상기 제어부는, 상기 운전자의 조작 개입 시점 전후의 지정된 일정 시간 동안의 기 지정된 정보들을 수집하고, 이 정보들을 서버에 전송하여 자율주행 알고리즘을 개선하게 하고, 상기 서버에서 개선된 자율주행 알고리즘을 전달받아 차량에 적용하는 것을 특징으로 한다.In the present invention, in the step of transmitting the information necessary for the improvement operation to the server, the control unit collects predetermined information for a predetermined time before and after the time of the operation intervention of the driver, and transmits the information to the server Improve the autonomous driving algorithm, and receives the improved autonomous driving algorithm from the server and apply it to the vehicle.

본 발명에 있어서, 상기 서버에 전달하는 정보는, 차량정보 검출부를 통해 검출된 차량 자체의 정보 및 차량 주변의 정보; 통신부를 통해 인프라 시스템이나 주변 차량으로부터 수신된 정보; 및 자율주행 알고리즘 판단 정보; 및 운전자 조작 정보; 중 적어도 하나 이상의 정보를 포함하는 것을 특징으로 한다.In the present invention, the information transmitted to the server, the information of the vehicle itself and the information around the vehicle detected by the vehicle information detection unit; Information received from an infrastructure system or a nearby vehicle through a communication unit; And autonomous driving algorithm determination information; And driver operation information; At least one of the information characterized in that it comprises.

본 발명에 있어서, 차량의 자율주행 중 운전자가 조작 개입하지 않더라도 수집된 차량정보와 자율주행 알고리즘 판단 정보가 서로 다른 상황이 발생하였을 경우, 상기 제어부가 상기 자율주행 알고리즘의 개선이 필요하다고 판단하여, 상기 상황이 발생한 시점 전후의 지정된 일정 시간 동안의 기 지정된 정보들을 수집하여 상기 서버에 전송하는 단계;를 더 포함하는 것을 특징으로 한다.In the present invention, even when the driver does not intervene during the autonomous driving of the vehicle, when the collected vehicle information and the autonomous driving algorithm determination information are different from each other, the controller determines that the autonomous driving algorithm needs to be improved. And collecting predetermined information for a predetermined time period before and after the occurrence of the situation and transmitting the predetermined information to the server.

본 발명에 있어서, 차량의 자율주행 중 운전자가 조작 개입하더라도 상기 운전자의 개입 조작 정보와 자율주행 알고리즘에 의해 판단한 제어 정보가 다르지 않을 경우, 상기 제어부가 자율주행 알고리즘의 개선이 필요하지 않다고 판단하여, 개선되지 않은 기존의 상기 자율주행 알고리즘을 통해 자율주행을 계속하게 하는 것을 특징으로 한다.In the present invention, when the driver's intervention during autonomous driving of the vehicle does not differ from the driver's intervention operation information and the control information determined by the autonomous driving algorithm, the controller determines that the improvement of the autonomous driving algorithm is not necessary. It is characterized by continuing the autonomous driving through the existing autonomous driving algorithm that has not been improved.

본 발명의 일 측면에 따르면, 본 발명은 차량의 자율주행 중 운전자의 조작 개입이 발생되었을 때, 상기 운전자의 조작 개입 시점 전후의 지정된 일정 시간 동안의 차량 및 주행 환경에 관련된 정보를 입력받아 강화학습을 통해 자율주행 알고리즘을 개선할 수 있도록 한다.According to an aspect of the present invention, when the driver's operation intervention occurs during autonomous driving of the vehicle, reinforcement learning by receiving information related to the vehicle and the driving environment for a predetermined time before and after the driver's operation intervention time Through this, the autonomous driving algorithm can be improved.

도 1은 본 발명의 일 실시예에 따른 차량의 자율주행 강화학습 장치의 개략적인 구성을 보인 예시도.
도 2는 본 발명의 일 실시예에 따른 차량의 자율주행 강화학습 방법을 설명하기 위한 흐름도.
도 3은 본 발명의 일 실시예에 따라 차량의 자율주행 강화학습 방법을 적용하여 자율주행 알고리즘을 개선하는 제1 실시예를 보인 예시도.
도 4는 본 발명의 일 실시예에 따라 차량의 자율주행 강화학습 방법을 적용하여 자율주행 알고리즘을 개선하는 제2 실시예를 보인 예시도.1 is an exemplary view showing a schematic configuration of an autonomous driving reinforcement learning apparatus of a vehicle according to an embodiment of the present invention.
Figure 2 is a flow chart for explaining the autonomous driving reinforcement learning method of a vehicle according to an embodiment of the present invention.
Figure 3 is an exemplary view showing a first embodiment to improve the autonomous driving algorithm by applying the autonomous driving enhanced learning method of the vehicle according to an embodiment of the present invention.
Figure 4 is an exemplary view showing a second embodiment to improve the autonomous driving algorithm by applying the autonomous driving reinforcement learning method of the vehicle according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 따른 차량의 자율주행 강화학습 장치 및 방법의 일 실시예를 설명한다. Hereinafter, an embodiment of an autonomous driving reinforcement learning apparatus and method for a vehicle according to the present invention will be described with reference to the accompanying drawings.

이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. 또한, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In this process, the thickness of the lines or the size of the components shown in the drawings may be exaggerated for clarity and convenience of description. In addition, terms to be described below are terms defined in consideration of functions in the present invention, which may vary according to the intention or convention of a user or an operator. Therefore, the definitions of these terms should be made based on the contents throughout the specification.

도 1은 본 발명의 일 실시예에 따른 차량의 자율주행 강화학습 장치의 개략적인 구성을 보인 예시도이다.1 is an exemplary view showing a schematic configuration of an autonomous driving reinforcement learning apparatus of a vehicle according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 실시예에 따른 차량의 자율주행 강화학습 장치는 차량 장치(100)로서, 운전자 개입 인식부(110), 차량정보 검출부(120), 제어부(130), 및 통신부(140)를 포함한다.As shown in FIG. 1, the autonomous driving reinforcement learning apparatus for a vehicle according to the present embodiment is a vehicle apparatus 100, which includes a driver intervention recognizing unit 110, a vehicle information detecting unit 120, a control unit 130, and a communication unit. 140.

이때 본 실시예에 따른 차량의 자율주행 강화학습 장치는, 상기 차량 장치(100)와 상기 통신부(140)를 통해 무선(또는 유선) 방식으로 연결되는 서버(200)(또는 강화학습 서버)를 포함한다.At this time, the autonomous driving reinforcement learning apparatus of the vehicle according to the present embodiment includes a server 200 (or reinforcement learning server) that is connected in a wireless (or wired) manner through the vehicle apparatus 100 and the communication unit 140. do.

상기 통신부(140)는 상기 서버(200)와 유무선 방식(예 : USB, LAN, WiFi, 블루투스, LTE 등)으로 통신하기 위한 서버 통신부(141), 차량(또는 자차) 주변의 인프라(Infra) 시스템(300)과 유무선 방식(예 : USB, LAN, WiFi, 블루투스, LTE 등)으로 통신하기 위한 V-I 통신부(142), 및 차량(또는 자차) 주변의 다른 차량(400)과 유무선 방식(예 : USB, LAN, WiFi, 블루투스, LTE 등)으로 통신하기 위한 V-V 통신부(143)를 포함한다.The communication unit 140 is a server communication unit 141 for communicating with the server 200 in a wired or wireless manner (eg, USB, LAN, WiFi, Bluetooth, LTE, etc.), an infrastructure system around a vehicle (or a vehicle). VI communication unit 142 for communicating with the wired / wireless method (eg, USB, LAN, WiFi, Bluetooth, LTE, etc.), and the wired / wireless method (eg, USB) with another vehicle 400 around the vehicle (or own vehicle). , LAN, WiFi, Bluetooth, LTE, etc.) to the VV communication unit 143 for communication.

상기 운전자 개입 인식부(110)는 차량(자차)의 자율주행 중 조향, 가속 페달, 브레이크 페달, 및 방향지시등 등의 수동 조작 여부를 검출하여 운전자 개입을 인식한다.The driver intervention recognizing unit 110 detects driver intervention by detecting whether a manual operation of steering, acceleration pedal, brake pedal, direction indicator, etc. is performed during autonomous driving of the vehicle (vehicle).

예컨대 차량의 자율주행 중 예측되지 않은 상황 발생 시, 사고 발생을 방지하거나 운전자(또는 사용자)의 편의를 위하여 필연적으로 운전자가 개입하게 된다. 따라서 상기 운전자 개입 인식부(110)는 상기와 같이 자율주행 중인 차량에 대하여 운전자의 조작 개입을 인식한다.For example, when an unexpected situation occurs during autonomous driving of a vehicle, a driver inevitably intervenes to prevent an accident or for convenience of a driver (or a user). Accordingly, the driver intervention recognizing unit 110 recognizes the driver's manipulation intervention with respect to the autonomous vehicle as described above.

상기 차량정보 검출부(120)는 차량 자체의 현재 상태 및 차량 주변의 상태(또는 상황)를 복수의 센서를 이용하여 검출한다. The vehicle information detector 120 detects a current state of the vehicle itself and a state (or situation) around the vehicle using a plurality of sensors.

따라서 상기 복수의 센서는 차량 자체의 동작(또는 주행)상태를 검출하는 센서(예 : 자세 검출 센서, 속도 센서, 조도 센서, 온도 센서, GPS, 내비게이션 등) 및 주행중인 차량의 주변 상태를 검출하는 센서(예 : 카메라 센서, 초음파 센서, 레이더 센서, 적외선 센서 등)를 포함한다.Accordingly, the plurality of sensors may detect sensors (eg, posture detection sensors, speed sensors, illuminance sensors, temperature sensors, GPS, navigation, etc.) that detect the operation (or driving) state of the vehicle itself, and the surrounding conditions of the driving vehicle. This includes sensors (eg camera sensors, ultrasonic sensors, radar sensors, infrared sensors, etc.).

상기 통신부(140)는 교통 인프라(예 : 신호등, 교통량, 교통사고 등)와 통신하여, 차량 주행과 정지 및 차선 변경을 위한 교통정보를 수신할 수 있다(예 : V-I(Vehicle to Infrastructure) 통신부(142)에 의한 교통정보 수신).The communication unit 140 may communicate with a traffic infrastructure (eg, a traffic light, a traffic volume, a traffic accident, etc.) and receive traffic information for driving, stopping, and changing a lane (eg, a vehicle to infrastructure (VI) communication unit ( Traffic information received by 142).

또한 상기 통신부(140)는 차량(자차)의 전방, 후방, 및 측방에 있는 차량과 통신하여, 차량 주행과 정지 및 차선 변경을 위한 교통정보를 수신할 수 있다(예 : V-V(Vehicle to Vehicle) 통신부(143)에 의한 교통정보 수신).In addition, the communication unit 140 may communicate with a vehicle in front, rear, and side of a vehicle (vehicle) to receive traffic information for driving, stopping, and changing lanes (eg, a vehicle to vehicle (VV)). Traffic information received by the communication unit 143).

또는 상기 통신부(140)는 차량의 자율주행 중 운전자의 조작 개입 발생 시, 개입이 발생한 해당 시점 전후의 지정된 일정 시간 동안의 차량정보(즉, 차량정보 검출부를 통해 검출된 차량 자체의 정보 및 차량 주변의 정보)와 교통정보(즉, 통신부를 통해 인프라 시스템이나 주변 차량으로부터 수신된 정보)를 지정된 서버(200)에 전송한다.Alternatively, when the driver's manipulation intervention occurs during autonomous driving of the vehicle, the communication unit 140 may include vehicle information for a predetermined time period before and after a corresponding point of time when the intervention occurs (that is, information of the vehicle itself detected by the vehicle information detection unit and the surroundings of the vehicle). Information) and traffic information (that is, information received from an infrastructure system or a nearby vehicle through a communication unit) are transmitted to a designated server 200.

상기 서버(200)는 상기 차량의 자율주행 중 운전자의 조작 개입이 발생할 경우, 상기 통신부(140)를 통해 차량 장치(100)에서 전송되는 정보(예 : 차량정보 및 교통정보)를 이용하여 자율주행 알고리즘을 개선(또는 보정)한다. 상기와 같이 개선(또는 보정)된 자율주행 알고리즘은 다시 상기 통신부(140)를 통해 상기 차량 장치(100)에 전송한다.When the driver's manipulation intervention occurs during autonomous driving of the vehicle, the server 200 uses autonomous driving by using information (for example, vehicle information and traffic information) transmitted from the vehicle apparatus 100 through the communication unit 140. Improve (or correct) the algorithm. The autonomous driving algorithm improved (or corrected) as described above is transmitted to the vehicle apparatus 100 through the communication unit 140 again.

상기 제어부(130)는 상기 서버(200)로부터 전송받은 개선(또는 보정)된 자율주행 알고리즘을 차량(자차)에 적용함으로써, 기존(즉, 보정전)의 자율주행 알고리즘을 대체한다.The controller 130 replaces the existing (ie, before correction) autonomous driving algorithm by applying the improved (or corrected) autonomous driving algorithm received from the server 200 to the vehicle (vehicle).

상기 제어부(130)는 상기 운전자 개입 인식부(110), 차량정보 검출부(120), 및 통신부(140)를 제어하여, 차량(자차)의 자율주행 중 운전자의 조작 개입 시, 상기 운전자 개입 인식부(110)를 통해 인식한 운전자의 개입 조작과 기존의 자율주행 알고리즘에 의해 판단한 제어 정보(즉, 자율 조작할 정보)가 다른지 여부에 따라 자율주행 알고리즘의 개선(보정)이 필요한지 판단한다. The controller 130 controls the driver intervention recognition unit 110, the vehicle information detection unit 120, and the communication unit 140 to control the driver intervention during the autonomous driving of the vehicle. It is determined whether an improvement (correction) of the autonomous driving algorithm is necessary depending on whether the driver's intervention operation recognized through 110 is different from the control information (that is, the information to be autonomously operated) determined by the existing autonomous driving algorithm.

상기 판단에 따라, 상기 제어부(130)는 자율주행 알고리즘의 개선(보정)이 필요한 경우, 해당 시점(즉, 운전자의 조작 개입 시점) 전후의 지정된 일정 시간 동안의 차량정보(즉, 차량정보 검출부를 통해 검출된 차량 자체의 정보 및 차량 주변의 정보)와 교통정보(즉, 통신부를 통해 인프라 시스템이나 주변 차량으로부터 수신된 정보)를 서버(200)에 전송하여 자율주행 알고리즘을 개선(또는 보정)하게 하고, 상기 서버(200)에서 개선(또는 보정)된 자율주행 알고리즘을 차량(자차)에 적용할 수 있도록 한다.In response to the determination, when the autonomous driving algorithm needs to be improved (corrected), the controller 130 may determine the vehicle information (that is, the vehicle information detection unit) for a predetermined time before and after the corresponding time (that is, the driver's manipulation intervention point). Information of the vehicle itself and information around the vehicle) and traffic information (that is, information received from the infrastructure system or the surrounding vehicle through the communication unit) to the server 200 to improve (or correct) the autonomous driving algorithm. In addition, the autonomous driving algorithm improved (or corrected) by the server 200 may be applied to the vehicle (vehicle).

상기와 같이 본 실시예에 따른 차량의 자율주행 강화학습 장치는, 차량의 자율주행 중 운전자의 조작 개입시마다 상기 운전자의 조작 개입 정보를 바탕으로 자율주행 알고리즘을 개선(보정)함으로써, 운전자의 개입을 점차적으로 줄이고 차량의 자율주행 성능을 점차적으로 개선할 수 있도록 하는 효과가 있다. As described above, the autonomous driving reinforcement learning apparatus of the vehicle improves (corrects) the autonomous driving algorithm based on the driver's operation intervention information whenever the driver's operation intervention is performed during autonomous driving of the vehicle. It has the effect of gradually reducing and gradually improving the autonomous driving performance of the vehicle.

이때 상기 개선(보정)된 자율주행 알고리즘을 다른 차량에도 적용될 수 있음에 유의한다.Note that the improved (corrected) autonomous driving algorithm can be applied to other vehicles.

도 2는 본 발명의 일 실시예에 따른 차량의 자율주행 강화학습 방법을 설명하기 위한 흐름도이다.2 is a flowchart illustrating a method for enhancing autonomous driving of a vehicle according to an exemplary embodiment of the present invention.

도 2에 도시된 바와 같이, 차량(자차)의 자율주행 모드가 온(ON) 상태일 때(S101), 제어부(130)는 운전자의 조작 개입이 있는지 체크한다(S102).As shown in FIG. 2, when the autonomous driving mode of the vehicle (vehicle) is in an ON state (S101), the controller 130 checks whether there is an intervention of a driver (S102).

상기 체크(S102) 결과, 운전자의 조작 개입이 없을 경우(S102의 아니오), 차량(자차)은 기존의 자율주행 알고리즘에 의해 자율주행 운전을 계속한다(S110).As a result of the check (S102), when there is no driver's operation intervention (NO in S102), the vehicle (vehicle) continues autonomous driving by the existing autonomous driving algorithm (S110).

그러나 상기 체크(S102) 결과, 운전자의 조작 개입이 있을 경우(S102의 예), 상기 제어부(130)는 운전자 개입 인식부(110)를 통해 인식한 운전자의 개입 조작과 기존의 자율주행 알고리즘에 의해 판단한 제어 정보(즉, 자율 조작할 정보)를 비교한다(S103).However, if there is a driver's manipulation intervention as a result of the check (S102) (YES in S102), the control unit 130 is based on the driver's intervention operation recognized by the driver intervention recognition unit 110 and the existing autonomous driving algorithm. The determined control information (ie, information to be autonomously operated) is compared (S103).

여기서 상기 운전자 개입 인식부(110)는 차량(자차)의 자율주행 중 조향, 가속 페달, 브레이크 페달, 및 방향지시등 등의 수동 조작 여부를 검출하여 운전자 개입을 인식한다.The driver intervention recognizing unit 110 detects driver intervention by detecting whether the steering, the accelerator pedal, the brake pedal, the direction indicator, and the like are manually operated during autonomous driving of the vehicle (vehicle).

상기 비교(S103) 결과, 운전자의 개입 조작과 기존의 자율주행 알고리즘에 의해 판단한 제어 정보(즉, 자율 조작할 정보)가 같을 경우(S104의 아니오), 차량(자차)은 기존의 자율주행 알고리즘에 의해 자율주행 운전을 계속한다(S110).As a result of the comparison (S103), when the driver's intervention operation is equal to the control information determined by the existing autonomous driving algorithm (i.e., information to be autonomously operated) (NO in S104), the vehicle (vehicle) is connected to the existing autonomous driving algorithm. Autonomous driving continues by (S110).

그러나 상기 비교(S103) 결과, 운전자의 개입 조작과 기존의 자율주행 알고리즘에 의해 판단한 제어 정보(즉, 자율 조작할 정보)가 다를 경우(S104의 예), 상기 제어부(130)는 자율주행 알고리즘의 개선(보정)이 필요한 것으로 판단하여 자율주행 알고리즘의 개선(보정)을 위한 작업을 시작(또는 수행)한다(S105).However, as a result of the comparison (S103), when the driver's intervention operation and the control information determined by the existing autonomous driving algorithm (that is, information to be autonomously operated) are different (YES in S104), the control unit 130 may determine the autonomous driving algorithm. It is determined that the improvement (correction) is necessary to start (or perform) work for improvement (correction) of the autonomous driving algorithm (S105).

상기 자율주행 알고리즘의 개선(보정)을 위한 작업을 시작(또는 수행)이 시작되면, 상기 제어부(130)는 운전자의 조작 개입이 발생한 해당 시점 전후의 지정된 일정 시간 동안의 차량정보(즉, 차량정보 검출부를 통해 검출된 차량 자체의 정보 및 차량 주변의 정보), 교통정보(즉, 통신부를 통해 인프라 시스템이나 주변 차량으로부터 수신된 정보), 자율주행 알고리즘 판단 정보, 및 운전자 조작 정보 등, 기 지정된 복수의 정보를 수집한다(S106).When a task for improving (correcting) the autonomous driving algorithm is started (or performed), the controller 130 controls vehicle information (ie, vehicle information) for a predetermined time period before and after a corresponding time point at which a driver's operation intervention occurs. A plurality of predetermined information such as information of the vehicle itself detected through the detection unit and information around the vehicle, traffic information (ie, information received from an infrastructure system or a surrounding vehicle through the communication unit), autonomous driving algorithm determination information, and driver operation information. Collect information of (S106).

그리고 상기 제어부(130)는 상기 수집된 복수의 정보(예 : 차량정보, 교통정보, 자율주행 알고리즘 판단 정보, 운전자 조작 정보 등)를 서버(200)에 전송(전달)한다(S107).The controller 130 transmits (transmits) the collected plurality of information (eg, vehicle information, traffic information, autonomous driving algorithm determination information, driver operation information, etc.) to the server 200 (S107).

이에 따라 상기 서버(200)는 상기 차량의 자율주행 중 운전자의 조작 개입이 발생할 경우, 상기 제어부(130)가 통신부(140)를 통해 전송하는 복수의 정보(예 : 차량정보, 교통정보, 자율주행 알고리즘 판단 정보, 운전자 조작 정보 등)를 이용하여 자율주행 알고리즘을 개선(또는 보정)한다(S108). Accordingly, when the driver's manipulation intervention occurs during autonomous driving of the vehicle, the server 200 transmits a plurality of information (eg, vehicle information, traffic information, autonomous driving) transmitted by the controller 130 through the communication unit 140. Algorithm determination information, driver operation information, etc.) is used to improve (or correct) the autonomous driving algorithm (S108).

상기와 같이 자율주행 알고리즘이 개선(또는 보정)되면, 상기 서버(200)는 상기 개선(또는 보정)된 자율주행 알고리즘을 상기 차량 장치(100)에 전송한다(S109).When the autonomous driving algorithm is improved (or corrected) as described above, the server 200 transmits the improved (or corrected) autonomous driving algorithm to the vehicle apparatus 100 (S109).

이에 따라 상기 제어부(130)는 상기 서버(200)로부터 개선(또는 보정)되어 전송된 자율주행 알고리즘을 차량(자차)에 적용한다.Accordingly, the controller 130 applies the autonomous driving algorithm improved and / or corrected from the server 200 to the vehicle (vehicle).

상기와 같이 본 실시예에 따른 차량의 자율주행 강화학습 방법은, 차량의 자율주행 중 운전자의 조작 개입시마다 상기 운전자의 조작 개입 정보를 바탕으로 자율주행 알고리즘을 개선(보정)함으로써, 운전자의 개입을 점차적으로 줄이고 차량의 자율주행 성능을 점차적으로 개선할 수 있도록 하는 효과가 있다. As described above, the vehicle autonomous driving reinforcement learning method according to the present embodiment improves (corrects) the autonomous driving algorithm based on the driver's operation intervention information every time the driver's operation intervention during autonomous driving of the vehicle. It has the effect of gradually reducing and gradually improving the autonomous driving performance of the vehicle.

도 3은 본 발명의 일 실시예에 따라 차량의 자율주행 강화학습 방법을 적용하여 자율주행 알고리즘을 개선하는 제1 실시예를 보인 예시도이다.3 is an exemplary view showing a first embodiment of improving the autonomous driving algorithm by applying the autonomous driving reinforcement learning method of the vehicle according to an embodiment of the present invention.

도 3의 (a)를 참조하면, 자율주행 차량(자차)이 자율 주행하고 있는 차선의 전방에 차량에 장착된 센서로 인지할 수 없는 위험요소(예 : black ice)가 있고, 상기 자율주행 차량(자차)이 현재속도(예 : 80km/h)로 계속해서 주행할 경우, 차량이 전복(또는 미끄러짐)된다고 가정한다.Referring to FIG. 3A, there is a risk factor (for example, black ice) that cannot be recognized by a sensor mounted on a vehicle in front of a lane in which an autonomous vehicle (autonomous vehicle) runs autonomously, and the autonomous vehicle If (Own Car) continues to drive at the current speed (eg 80 km / h), it is assumed that the vehicle will roll over (or slip).

이러한 상황은 운전자가 조작 개입하지 않았지만, 수집된 차량정보가 자율주행 알고리즘 판단 정보(즉, 자율 조작할 정보)와 다른 상황이기 때문에 상기 제어부(130)는 상기 상황(예 : 전복, 미끄러짐)이 발생하였을 때의 전후의 지정된 일정 시간 동안의 차량정보(즉, 차량정보 검출부를 통해 검출된 차량 자체의 정보 및 차량 주변의 정보), 교통정보(즉, 통신부를 통해 인프라 시스템이나 주변 차량으로부터 수신된 정보), 자율주행 알고리즘 판단 정보, 및 운전자 조작 정보 등, 기 지정된 복수의 정보를 수집하여 상기 서버(200)에 전송한다.In this situation, the driver 130 does not operate, but the collected vehicle information is different from the autonomous driving algorithm determination information (ie, information to be autonomously operated), so that the controller 130 generates the situation (eg, overturning or slipping). Vehicle information (i.e., information of the vehicle itself detected through the vehicle information detection unit and information around the vehicle) for a predetermined time period before and after, and traffic information (i.e., information received from an infrastructure system or surrounding vehicle through the communication unit) ), A plurality of predetermined information such as autonomous driving algorithm determination information and driver operation information are collected and transmitted to the server 200.

이에 따라, 도 3의 (b)를 참조하면, 상기 서버(200)에서 개선(또는 보정)된 자율주행 알고리즘을 상기 도로를 주행하는 다른 자율주행 차량(자차)에 적용할 경우, 상기 위험요소(예 : black ice)가 있는 지점을 주행할 때 차량의 속도를 감소시킴으로써, 안전사고의 발생을 방지할 수 있게 된다.Accordingly, referring to FIG. 3B, when the autonomous driving algorithm improved (or corrected) in the server 200 is applied to another autonomous vehicle (vehicle) driving on the road, the risk factor ( Example: When driving at a point where there is black ice, reducing the speed of the vehicle can prevent the occurrence of a safety accident.

도 4는 본 발명의 일 실시예에 따라 차량의 자율주행 강화학습 방법을 적용하여 자율주행 알고리즘을 개선하는 제2 실시예를 보인 예시도이다.Figure 4 is an exemplary view showing a second embodiment to improve the autonomous driving algorithm by applying the autonomous driving reinforcement learning method of the vehicle according to an embodiment of the present invention.

도 4의 (a)를 참조하면, 자율주행 차량(자차)이 자율 주행하고 있는 차선(2차선)의 전방에 우측 진출로로 빠져 나가기 위한 차량(대기차량)들이 정체된 상태로 해당 차선(2차선)의 직진 주행을 가로막고 있는 경우, 상기 자율주행 차량(자차)이 정체가 해소될 때까지 대기하고 있는 상황을 나타낸다. Referring to (a) of FIG. 4, in a state in which a vehicle (atmospheric vehicle) for escaping to the right exit road in front of the lane (second lane) where the autonomous vehicle (autonomous vehicle) is autonomous driving is stuck, the corresponding lane (2). In the case of blocking the straight running of the lane, the autonomous vehicle (own vehicle) stands by until the traffic jam is resolved.

이때 운전자가 조작 개입하여 옆 차선(1차선)으로 진로(즉, 차선)를 변경하여 계속해서 직진 주행할 수 있다.At this time, the driver can continue driving straight by changing the course (that is, the lane) to the next lane (one lane) through the operation intervention.

상기 제어부(130)는 상기 운전자 조작 개입 시점 전후의 지정된 일정 시간 동안의 차량정보(즉, 차량정보 검출부를 통해 검출된 차량 자체의 정보 및 차량 주변의 정보), 교통정보(즉, 통신부를 통해 인프라 시스템이나 주변 차량으로부터 수신된 정보), 자율주행 알고리즘 판단 정보, 및 운전자 조작 정보 등, 기 지정된 복수의 정보를 수집하여 상기 서버(200)에 전송한다.The controller 130 may be configured to store vehicle information (ie, information of the vehicle itself detected through the vehicle information detection unit and information around the vehicle) for a predetermined time before and after the driver operation intervention point, and traffic information (ie, infrastructure through the communication unit). Information received from a system or surrounding vehicle), autonomous driving algorithm determination information, and driver operation information are collected and transmitted to the server 200.

이에 따라, 도 4의 (b)를 참조하면, 상기 서버(200)에서 개선(또는 보정)된 자율주행 알고리즘을 상기 도로를 주행하는 다른 자율주행 차량(자차)에 적용할 경우, 상기 정체구간이 있는 지점을 주행할 때 옆 차선(1차선)으로 진로(즉, 차선)를 자동으로 변경하여 계속해서 직진 주행할 수 있게 한다.Accordingly, referring to FIG. 4B, when the autonomous driving algorithm improved (or corrected) in the server 200 is applied to another autonomous vehicle (vehicle) driving on the road, the congestion section may be reduced. When traveling at a point, the course (ie lane) is automatically changed to the next lane (1 lane) to continue driving straight.

참고로 본 실시예에서 강화학습이란, 기존의 자율주행 알고리즘을 개선하기 위하여, 차량의 자율주행 중 운전자의 개입 조작이나 사고 발생을 포함한 기 지정된 특정 상황에서 발생한 정보들을 수집하여, 운전자가 개입 조작한 방식이나 사고 발생을 방지할 수 있는 조작 방식을 학습하는 것을 의미한다. For reference, in the present embodiment, reinforcement learning is performed by collecting information generated in a specific specific situation including driver's intervention operation or accident occurrence during autonomous driving of the vehicle to improve the existing autonomous driving algorithm. It means learning the method or the operation method to prevent the occurrence of an accident.

상기와 같이 본 실시예는 운전자가 조작 개입하거나 실제 차량정보가 자율주행 알고리즘 판단 정보(즉, 자율 조작할 정보)와 다른 상황이 발생할 경우, 해당 시점 전후의 지정된 일정 시간 동안의 복수의 정보를 수집하고, 이 정보들을 기초로 자율주행 알고리즘을 개선(보정)하여 다른 차량들에 적용함으로써, 자율주행 차량의 운전자(사용자)의 편의성과 안전성을 향상시키는 효과가 있다.As described above, the present embodiment collects a plurality of pieces of information for a predetermined time period before and after the point of time when a driver intervention or actual vehicle information differs from the autonomous driving algorithm determination information (ie, information to be autonomous). In addition, by improving (correcting) the autonomous driving algorithm based on the information and applying it to other vehicles, there is an effect of improving convenience and safety of the driver (user) of the autonomous driving vehicle.

이상으로 본 발명은 도면에 도시된 실시예를 참고로 하여 설명되었으나, 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 기술적 보호범위는 아래의 특허청구범위에 의해서 정하여져야 할 것이다.Although the present invention has been described with reference to the embodiments illustrated in the drawings, this is merely exemplary, and various modifications and equivalent other embodiments are possible for those skilled in the art to which the art pertains. I will understand the point. Therefore, the technical protection scope of the present invention will be defined by the claims below.

100 : 차량 장치 110 : 운전자 개입 인식부
120 : 차량정보 검출부 130 : 제어부
140 : 통신부 141 : 서버 통신부
142 : V-I 통신부 143 : V-V 통신부
200 : 서버 300 : 인프라 시스템
400 : 차량100: vehicle device 110: driver intervention recognition unit
120: vehicle information detection unit 130: control unit
140: communication unit 141: server communication unit
142: VI communication unit 143: VV communication unit
200: server 300: infrastructure system
400: vehicle

Claims

A driver intervention recognizing unit recognizing a driver's manipulation intervention during autonomous driving of the vehicle;
A vehicle information detection unit detecting state information of the vehicle and situation information around the vehicle using a plurality of sensors;
Receives traffic information for driving, stopping and changing lanes by communicating with traffic infrastructure or other vehicles, and transmits the information required for this improvement by communicating with a server performing autonomous driving algorithm improvement through reinforcement learning. A communication unit for receiving an improved autonomous driving algorithm from the server;
When the driver intervenes during autonomous driving of the vehicle, the driver's intervention recognition information recognized by the driver intervention recognition unit is compared with the control information determined by the autonomous driving algorithm to determine whether the autonomous driving algorithm needs to be improved, and needs to be improved. A control unit for transmitting information necessary for an improvement operation through the communication unit; And
And the server for performing an autonomous driving algorithm improvement operation through reinforcement learning using information transmitted in a wired or wireless manner through the communication unit.

The method of claim 1, wherein the control unit,
If it is determined that the improvement of the autonomous driving algorithm is necessary, the predetermined information is collected for a predetermined time period before and after the driver's operation intervention point,
Send this information to the server to improve the autonomous driving algorithm,
The autonomous driving reinforcement learning device for a vehicle, characterized in that applied to the vehicle receives the improved autonomous driving algorithm from the server.

The method of claim 1 or 2, wherein the control unit,
Information of the vehicle itself and information around the vehicle detected by the vehicle information detection unit;
Information received from an infrastructure system or a nearby vehicle through a communication unit; And
Autonomous driving algorithm determination information; And driver operation information; Autonomous driving reinforcement learning apparatus for a vehicle, characterized in that for transmitting at least one of the information to the server.

The method of claim 1, wherein the communication unit,
A server communication unit for communicating with the server in a wired or wireless manner; A VV communication unit for communicating with other vehicles around the vehicle in a wired or wireless manner; And a VI communication unit for communicating in a wired or wireless manner with the infrastructure system around the vehicle.

The method of claim 1, wherein the control unit,
If the collected vehicle information and the autonomous driving algorithm judgment information are different even if the driver does not intervene during autonomous driving of the vehicle,
Determining that the autonomous driving algorithm needs to be improved,
Self-driving reinforcement learning device for a vehicle, characterized in that for collecting the predetermined information for a predetermined time before and after the situation occurs to the server.

The method of claim 1, wherein the control unit,
If the driver's intervention operation information recognized by the driver intervention recognition unit does not differ from the control information determined by the autonomous driving algorithm, even if the driver intervenes in the autonomous driving of the vehicle,
Judging from the need for improvement of the autonomous driving algorithm,
An autonomous driving reinforcement learning apparatus for a vehicle, characterized in that to continue autonomous driving through the existing autonomous driving algorithm that has not been improved.

According to claim 1, wherein the reinforcement learning,
In order to improve the existing autonomous driving algorithm, it is possible to collect information generated in a specific specific situation including driver's intervention or accident during autonomous driving of vehicle, and to prevent driver's intervention or accident. Autonomous vehicle reinforcement learning device, characterized in that the learning method.

Checking, by the controller, whether there is an intervention of a driver during autonomous driving of the vehicle;
Comparing the driver's intervention operation information with control information determined by the autonomous driving algorithm, when there is a driver's manipulation intervention;
If the driver's intervention operation information and the control information determined by the autonomous driving algorithm are different, determining that the autonomous driving algorithm needs to be improved, and transmitting the information necessary for the improvement operation to the server; And
And the server performing an autonomous driving algorithm improvement operation through reinforcement learning using the transmitted information.

The method of claim 8,
In the step of transmitting the information necessary for the improvement operation to the server,
The control unit,
Collecting predetermined information for a predetermined time period before and after the driver's operation intervention point,
Send this information to the server to improve the autonomous driving algorithm,
The autonomous driving reinforcement learning method of the vehicle, characterized in that applied to the vehicle receives the improved autonomous driving algorithm from the server.

10. The method of claim 8 or 9, wherein the information delivered to the server,
Information of the vehicle itself and information around the vehicle detected by the vehicle information detection unit;
Information received from an infrastructure system or a nearby vehicle through a communication unit; And
Autonomous driving algorithm determination information; And driver operation information; Automated driving reinforcement learning method of a vehicle, characterized in that it comprises at least one of the information.

The method of claim 8,
If the collected vehicle information and the autonomous driving algorithm judgment information are different even if the driver does not intervene during autonomous driving of the vehicle,
The controller determines that the improvement of the autonomous driving algorithm is necessary,
And collecting predetermined information for a predetermined time period before and after the time point at which the situation occurs, and transmitting the predetermined information to the server.

The method of claim 8,
If the driver's intervention during autonomous driving of the vehicle does not differ from the driver's intervention operation information and the control information determined by the autonomous driving algorithm,
The controller determines that the improvement of the autonomous driving algorithm is not necessary.
Autonomous driving reinforcement learning method of the vehicle, characterized in that to continue the autonomous driving through the existing autonomous driving algorithm is not improved.