KR102588450B1

KR102588450B1 - Method and device for personalizing autonomous driving algorithm capable of enhancing autonomous driving experience of specific user

Info

Publication number: KR102588450B1
Application number: KR1020210149252A
Authority: KR
Inventors: 권순도; 이영석; 이선주; 신우찬
Original assignee: 주식회사 오비고
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2023-10-13
Also published as: KR20230063807A

Abstract

특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법이 개시된다. 즉, (a) 개인화 장치가, 자율주행 차량의 주행 모드가 수동 모드로 설정되면, (i) 제1 내지 제N 타이밍을 포함하는 시간 범위 동안 상기 자율주행 차량을 운전하는 상기 특정 사용자의 조작에 따른, 제1 내지 제N 상황 하에서의 상기 자율주행 차량의 제1 내지 제N 수동제어 데이터를 획득하는 프로세스 및 (ii) 상기 시간 범위 동안 상기 자율주행 차량의 자율주행 모듈이 상기 제1 내지 제N 상황에 대응하는 제1 내지 제N 상황 데이터를 입력받아 생성한 제1 내지 제N 자동제어 데이터 - 이는 상기 자율주행 차량의 동작에 영향을 끼치지 않음 - 를 획득하는 프로세스를 수행하는 단계; 및 (b) 상기 개인화 장치가, 피팅 모듈로 하여금, 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 상기 자율주행 모듈에 탑재된 자율주행 알고리즘을 상기 특정 사용자에 맞추어 개인화하도록 하는 단계를 포함하는 것을 특징으로 하는 방법이 개시된다.A method for personalizing a self-driving algorithm to improve the satisfaction of a specific user's self-driving experience is disclosed. That is, (a) when the driving mode of the autonomous vehicle is set to manual mode, the personalization device (i) responds to the specific user's operation of driving the autonomous vehicle during the time range including the first to Nth timings. A process of acquiring first to Nth manual control data of the autonomous vehicle under first to Nth situations, and (ii) the autonomous driving module of the autonomous vehicle during the time range according to the first to Nth situations. performing a process of obtaining first to Nth automatic control data generated by receiving first to Nth situation data corresponding to - which does not affect the operation of the autonomous vehicle; and (b) the personalization device causes the fitting module to use the autonomous driving algorithm mounted on the autonomous driving module with reference to the first to Nth manual control data and the first to Nth automatic control data to the specific user. A method is disclosed, which includes the step of personalizing to suit.

Description

Method and device for personalizing autonomous driving algorithm to improve satisfaction with autonomous driving experience for specific users

본 발명은 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법 및 장치에 관한 것이다.The present invention relates to a method and device for personalizing an autonomous driving algorithm to improve the satisfaction of a specific user's autonomous driving experience.

자율주행 차량의 탑승자는 제조사에서 구현 및 학습시킨 자율주행 차량의 운전 패턴에 만족할 수도 있으나, 본인의 평소 주행 패턴과의 차이점으로 인해 불만족할 수도 있다. 즉, 자율주행 차량은 인간의 감각과 다른 방식으로 정보를 받아들이는 센서를 사용하여 자율주행을 수행하므로, 인간의 주행과는 감성적인 질감에서 차이가 있기 때문에, 탑승자가 불만족할 수 있다는 것이다.Passengers of self-driving vehicles may be satisfied with the driving patterns of self-driving vehicles implemented and learned by the manufacturer, but may be dissatisfied due to differences from their own driving patterns. In other words, self-driving vehicles perform autonomous driving using sensors that receive information in a different way from human senses, so the emotional texture is different from human driving, so passengers may be dissatisfied.

따라서, 인공지능의 객관적인 자율주행 알고리즘에, 탑승자의 주관적인 감성을 반영할 수 있다면 자율주행 만족도 제고에 큰 도움이 될 것이나, 이에 대한 방법은 현재까지 유의미하게 연구되지 않은 것으로 보인다.Therefore, it would be a great help in improving self-driving satisfaction if the subjective emotions of passengers could be reflected in the objective self-driving algorithm of artificial intelligence, but the method for this does not appear to have been significantly studied to date.

본 발명은 상술한 문제점을 해결하는 것을 목적으로 한다.The present invention aims to solve the above-mentioned problems.

또한 본 발명은 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 제공함으로써 인공지능 자율주행 알고리즘에 특정 사용자의 감성을 반영하는 것을 목적으로 한다.In addition, the purpose of the present invention is to reflect the emotions of a specific user in the artificial intelligence autonomous driving algorithm by providing a method of personalizing the self-driving algorithm to improve the satisfaction of the self-driving experience of the specific user.

또한 본 발명은 특정 사용자의 주행을 옵티멀 정책으로 하여 역강화학습을 수행함으로써 상기 자율주행 알고리즘을 개인화하는 방법을 제공하는 것을 목적으로 한다.Additionally, the purpose of the present invention is to provide a method of personalizing the autonomous driving algorithm by performing inverse reinforcement learning using a specific user's driving as an optimal policy.

또한 본 발명은 특정 사용자의 주행 데이터 중 자율주행 만족도에 영향을 크게 끼치는 데이터들을 선택하여 상기 자율주행 알고리즘을 개인화하는 방법을 제공하는 것을 목적으로 한다.Additionally, the purpose of the present invention is to provide a method of personalizing the autonomous driving algorithm by selecting data that significantly influences autonomous driving satisfaction among a specific user's driving data.

상기한 바와 같은 본 발명의 목적을 달성하고, 후술하는 본 발명의 특징적인 효과를 실현하기 위한 본 발명의 특징적인 구성은 하기와 같다.The characteristic configuration of the present invention for achieving the purpose of the present invention as described above and realizing the characteristic effects of the present invention described later is as follows.

본 발명의 일 태양에 따르면, 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법에 있어서, (a) 개인화 장치가, 자율주행 차량의 주행 모드가 수동 모드로 설정되면, (i) 제1 내지 제N 타이밍을 포함하는 시간 범위 동안 상기 자율주행 차량을 운전하는 상기 특정 사용자의 조작에 따른, 제1 내지 제N 상황 하에서의 상기 자율주행 차량의 제1 내지 제N 수동제어 데이터를 획득하는 프로세스 및 (ii) 상기 시간 범위 동안 상기 자율주행 차량의 자율주행 모듈이 상기 제1 내지 제N 상황에 대응하는 제1 내지 제N 상황 데이터를 입력받아 생성한 제1 내지 제N 자동제어 데이터 - 이는 상기 자율주행 차량의 동작에 영향을 끼치지 않음 - 를 획득하는 프로세스를 수행하는 단계; 및 (b) 상기 개인화 장치가, 피팅 모듈로 하여금, 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 상기 자율주행 모듈에 탑재된 자율주행 알고리즘을 상기 특정 사용자에 맞추어 개인화하도록 하는 단계를 포함하는 것을 특징으로 하는 방법이 개시된다.According to one aspect of the present invention, in the method of personalizing an autonomous driving algorithm for improving the satisfaction of a specific user's autonomous driving experience, (a) the personalization device, when the driving mode of the autonomous vehicle is set to manual mode, (i) the first A process of acquiring first to Nth manual control data of the autonomous vehicle under first to Nth situations according to the operation of the specific user driving the autonomous vehicle during a time range including 1st to Nth timings. and (ii) first to Nth automatic control data generated by the autonomous driving module of the autonomous vehicle during the time range by receiving the first to Nth situation data corresponding to the first to Nth situations - this is the Does not affect the operation of the autonomous vehicle - performing a process to obtain; and (b) the personalization device causes the fitting module to use the autonomous driving algorithm mounted on the autonomous driving module with reference to the first to Nth manual control data and the first to Nth automatic control data to the specific user. A method is disclosed, which includes the step of personalizing to suit.

일례로서, 상기 (a) 단계는, 상기 개인화 장치가, 상기 시간 범위 동안의 상기 특정 사용자의 조작에 따른 로우 수동제어 데이터들 중, 상기 특정 사용자의 자율주행 만족도와 관련된 일반주행 관련도가 특정 임계치 이상인 데이터들을 상기 제1 내지 제N 수동제어 데이터로서 획득하고, 로우 상황 데이터를 상기 자율주행 모듈이 입력받아 생성한 로우 자동제어 데이터 중 중 상기 제1 내지 제 N 수동제어 데이터에 대응하는 것들을 상기 제1 내지 제N 자동제어 데이터로서 획득하는 것을 특징으로 하는 방법이 개시된다.As an example, in step (a), the personalization device determines that, among the raw manual control data according to the operation of the specific user during the time range, the general driving relevance related to the satisfaction of the specific user with autonomous driving is a certain threshold. The above data are acquired as the first to Nth manual control data, and among the raw automatic control data generated by receiving raw situation data from the autonomous driving module, those corresponding to the first to Nth manual control data are obtained as the first to Nth manual control data. A method is disclosed, characterized in that it is obtained as 1st to Nth automatic control data.

일례로서, 상기 (a) 단계는, 상기 개인화 장치가, 상기 로우 수동제어 데이터의 상기 자율주행 차량으로의 입력에 따른 상기 자율주행 차량의 속력 또는 방향의 순간변화율이 제1 임계치 이상이었는지 여부 및 상기 자율주행 차량의 안전장치 작동률이 제2 임계치 이상이었는지 여부 중 적어도 일부를 참조하여 상기 일반주행 관련도를 계산하는 것을 특징으로 하는 방법이 개시된다.As an example, in step (a), the personalization device determines whether the instantaneous rate of change in the speed or direction of the autonomous vehicle according to the input of the raw manual control data to the autonomous vehicle is greater than or equal to a first threshold, and A method is disclosed, wherein the general driving relevance is calculated with reference to at least part of whether the safety device operation rate of the autonomous vehicle is greater than or equal to a second threshold.

일례로서, 상기 (a) 단계는, 상기 개인화 장치가, 상기 제1 내지 제N 상황 데이터를 참조하여, 상기 특정 사용자의 주행에 대한 제한사항 데이터를 획득하고, 상기 제한사항 데이터를 참조하여 상기 제1 내지 제N 수동제어 데이터 중 적어도 일부를 조정하여 제1 내지 제N 조정수동제어 데이터를 생성하는 것을 특징으로 하고, 상기 (b) 단계는, 상기 개인화 장치가, 적어도 일부가 조정된 상기 제1 내지 제N 조정수동제어 데이터를 참조하여 상기 자율주행 알고리즘을 개인화함으로써 상기 자율주행 알고리즘에 따른 자율주행이 상기 제한사항에 위반하지 않도록 하는 것을 특징으로 하는 방법이 개시된다.As an example, in step (a), the personalization device obtains restriction data for driving of the specific user with reference to the first to Nth situation data, and refers to the restriction data to obtain the first to Nth situation data. Characterized in that the first to Nth adjusted manual control data are generated by adjusting at least some of the 1st to Nth manual control data, wherein step (b) is performed by the personalization device, the first adjusted manual control data, at least a portion of which has been adjusted. A method is disclosed, wherein autonomous driving according to the autonomous driving algorithm does not violate the above restrictions by personalizing the autonomous driving algorithm with reference to N-th adjusted manual control data.

일례로서, 상기 (b) 단계는, 상기 개인화 장치가, 상기 피팅 모듈로 하여금, 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 상기 자율주행 알고리즘에 역강화학습 방법론에 입각한 튜닝을 가함으로써 상기 자율주행 알고리즘을 개인화하도록 하는 것을 특징으로 하는 방법이 개시된다.As an example, in step (b), the personalization device causes the fitting module to reverse strengthen the autonomous driving algorithm with reference to the first to Nth manual control data and the first to Nth automatic control data. A method is disclosed, characterized in that the autonomous driving algorithm is personalized by applying tuning based on a learning methodology.

일례로서, 상기 (b) 단계는, (b1) 상기 개인화 장치가, 상기 피팅 모듈로 하여금, 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 상기 자율주행 알고리즘의 파라미터 조정을 위한 보상함수를 계산하도록 하는 단계; (b2) 상기 개인화 장치가, 상기 피팅 모듈로 하여금, 상기 보상함수를 참조하여 상기 자율주행 알고리즘에 강화학습을 적용함으로써 상기 자율주행 알고리즘의 상기 파라미터들 중 적어도 일부를 조정하도록 하는 단계; 및 (b3) 상기 개인화 장치가, 상기 피팅 모듈로 하여금, 상기 적어도 일부의 파라미터가 조정된 자율주행 알고리즘에 대응하는 조정 정책 기대값 및 상기 특정 사용자의 상기 조작에 따른 옵티멀 정책 기대값 간의 차이를 계산한 후, 상기 차이 값이 임계치 미만일 경우 상기 파라미터가 조정된 자율주행 알고리즘을 상기 특정 사용자에 대한 개인화된 알고리즘으로서 상기 자율주행 모듈에 탑재하고, 상기 차이 값이 임계치 이상일 경우 상기 (b1) 단계 및 상기 (b2) 단계에 따른 프로세스를 다시 수행하도록 하는 단계를 포함하는 것을 특징으로 하는 방법이 개시된다.As an example, in step (b), (b1) the personalization device causes the fitting module to refer to the first to Nth manual control data and the first to Nth automatic control data to perform the autonomous driving algorithm. Calculating a compensation function for parameter adjustment; (b2) allowing the personalization device to adjust at least some of the parameters of the self-driving algorithm by applying reinforcement learning to the self-driving algorithm with reference to the compensation function; and (b3) the personalization device causes the fitting module to calculate a difference between an adjusted policy expected value corresponding to an autonomous driving algorithm in which at least some parameters are adjusted and an optimal policy expected value according to the operation of the specific user. After that, if the difference value is less than the threshold, the autonomous driving algorithm with the adjusted parameters is loaded into the autonomous driving module as a personalized algorithm for the specific user, and if the difference value is more than the threshold, steps (b1) and A method comprising the step of re-performing the process according to step (b2) is disclosed.

일례로서, 상기 (b1) 단계는, 상기 개인화 장치가, 상기 피팅 모듈로 하여금, 하기 수식에 따라 상기 보상함수를 계산하도록 하되,As an example, in step (b1), the personalization device causes the fitting module to calculate the compensation function according to the following formula,

상기 수식에서, 는 상기 제1 내지 제N 상황 데이터를 각각의 성분으로서 포함하는 상황 벡터를 의미하고, 는 상기 제1 내지 제N 상황 데이터에 따른 상기 제1 내지 제N 수동제어 데이터를 각각의 성분으로서 포함하는 수동제어 벡터를 의미하며, 는 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 획득된 웨이트 벡터를 의미하는 것을 특징으로 하는 방법이 개시된다.In the above formula, means a situation vector including the first to Nth situation data as each component, means a manual control vector including the first to Nth manual control data according to the first to Nth situation data as each component, A method is disclosed, wherein means a weight vector obtained with reference to the first to Nth manual control data and the first to Nth automatic control data.

일례로서, 상기 개인화 장치가, (i) 상기 제1 내지 제N 수동제어 데이터를 각각의 성분으로서 포함하는 상기 수동제어 벡터 및 상기 제1 내지 제N 자동제어 데이터를 각각의 성분으로서 포함하는 자동제어 벡터 간의 차이 벡터와, (ii) 기설정된 크기를 가지며 상기 차이 벡터의 성분과 같은 개수의 성분을 가지는 소정 후보 웨이트 벡터를 곱한 결과 벡터를 최적화하여, 그 크기가 가장 커지도록 하는 후보 웨이트 벡터를 상기 웨이트 벡터로서 획득하는 것을 특징으로 하는 방법이 개시된다.As an example, the personalization device may include: (i) the manual control vector including the first to Nth manual control data as each component and the automatic control vector including the first to Nth automatic control data as each component; The difference vector between the vectors is multiplied by (ii) a predetermined candidate weight vector having a preset size and the same number of components as the difference vector, and the resulting vector is optimized to obtain the candidate weight vector whose size is the largest. A method characterized by obtaining as a weight vector is disclosed.

일례로서, 상기 (b3) 단계는, 상기 개인화 장치가, 하기 수식에 따라 상기 조정 정책 기대값 및 상기 옵티멀 정책 기대값 간의 상기 차이를 계산하도록 하되,As an example, step (b3) causes the personalization device to calculate the difference between the adjustment policy expected value and the optimal policy expected value according to the following formula,

상기 수식에서, 는 상기 특정 사용자에 대응하는 옵티멀 정책을 의미하고, 는 상기 강화 학습을 통해 그 파라미터가 조정된 상기 자율주행 알고리즘에 대응하는 조정 정책을 의미하며, 는 정책 X에 대한 기대값을 의미하고, 는 제t 상황 데이터가 상기 정책 X에 따라 연산된 제t 제어 데이터를 의미하며, 는 디스카운트 팩터를 의미하고, 는 각각의 성분에 대한 웨이트 벡터를 의미하는 것을 특징으로 하는 방법이 개시된다.In the above formula, means the optimal policy corresponding to the specific user, means an adjustment policy corresponding to the autonomous driving algorithm whose parameters have been adjusted through reinforcement learning, means the expected value for policy means that the t situation data is the t control data calculated according to the policy X, means the discount factor, A method is disclosed wherein means a weight vector for each component.

일례로서, (c) 상기 개인화 장치가, 상기 자율주행 차량의 상기 주행 모드가 자동 모드로 설정된 후, 상기 특정 사용자가 각각의 사용자별로 개인화되어 상기 자율주행 모듈에 탑재된 자율주행 알고리즘 리스트에서 자신에게 개인화된 상기 자율주행 알고리즘을 선택하면, 상기 자율주행 모듈로 하여금 상기 개인화된 상기 자율주행 알고리즘을 바탕으로 자율주행을 수행하도록 하는 단계를 더 포함하는 것을 특징으로 하는 방법이 개시된다.As an example, (c) after the personalization device sets the driving mode of the autonomous vehicle to automatic mode, the specific user is personalized for each user and selects the user from the autonomous driving algorithm list mounted on the autonomous driving module. When the personalized autonomous driving algorithm is selected, the method further includes allowing the autonomous driving module to perform autonomous driving based on the personalized autonomous driving algorithm.

본 발명의 다른 태양에 따르면, 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 수행하는 개인화 장치에 있어서, 인스트럭션들을 저장하는 하나 이상의 메모리; 및 상기 인스트럭션들을 수행하도록 설정된 하나 이상의 프로세서를 포함하되, 상기 프로세서는, (I) 자율주행 차량의 주행 모드가 수동 모드로 설정되면, (i) 제1 내지 제N 타이밍을 포함하는 시간 범위 동안 상기 자율주행 차량을 운전하는 상기 특정 사용자의 조작에 따른, 제1 내지 제N 상황 하에서의 상기 자율주행 차량의 제1 내지 제N 수동제어 데이터를 획득하는 프로세스 및 (ii) 상기 시간 범위 동안 상기 자율주행 차량의 자율주행 모듈이 상기 제1 내지 제N 상황에 대응하는 제1 내지 제N 상황 데이터를 입력받아 생성한 제1 내지 제N 자동제어 데이터 - 이는 상기 자율주행 차량의 동작에 영향을 끼치지 않음 - 를 획득하는 프로세스를 수행하는 프로세스; 및 (II) 피팅 모듈로 하여금, 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 상기 자율주행 모듈에 탑재된 자율주행 알고리즘을 상기 특정 사용자에 맞추어 개인화하도록 하는 프로세스를 수행하는 것을 특징으로 하는 유닛이 개시된다.According to another aspect of the present invention, a personalization device that performs a method of personalizing an autonomous driving algorithm to improve the satisfaction of a specific user's autonomous driving experience, comprising: one or more memories storing instructions; and one or more processors configured to perform the instructions, wherein the processor is configured to: (I) when the driving mode of the autonomous vehicle is set to a manual mode, (i) during a time range including the first to Nth timings; A process of acquiring first to Nth manual control data of the autonomous vehicle under first to Nth situations according to the operation of the specific user driving the autonomous vehicle, and (ii) the autonomous vehicle during the time range. The first to Nth automatic control data generated by the autonomous driving module by receiving the first to Nth situation data corresponding to the first to Nth situations - this does not affect the operation of the autonomous vehicle - the process of carrying out the process of acquiring; and (II) causing the fitting module to personalize the autonomous driving algorithm mounted on the autonomous driving module to the specific user by referring to the first to Nth manual control data and the first to Nth automatic control data. A unit characterized by performing a process is disclosed.

일례로서, 상기 (I) 프로세스는, 상기 프로세서가, 상기 시간 범위 동안의 상기 특정 사용자의 조작에 따른 로우 수동제어 데이터들 중, 상기 특정 사용자의 자율주행 만족도와 관련된 일반주행 관련도가 특정 임계치 이상인 데이터들을 상기 제1 내지 제N 수동제어 데이터로서 획득하고, 로우 상황 데이터를 상기 자율주행 모듈이 입력받아 생성한 로우 자동제어 데이터 중 중 상기 제1 내지 제 N 수동제어 데이터에 대응하는 것들을 상기 제1 내지 제N 자동제어 데이터로서 획득하는 것을 특징으로 하는 유닛이 개시된다.As an example, the (I) process is such that the processor determines that, among the raw manual control data according to the operation of the specific user during the time range, the general driving relevance related to the satisfaction of the specific user with autonomous driving is greater than or equal to a certain threshold. Data are acquired as the first to Nth manual control data, and among the raw automatic control data generated by receiving raw situation data from the autonomous driving module, those corresponding to the first to Nth manual control data are selected as the first A unit characterized in that acquisition as to Nth automatic control data is disclosed.

일례로서, 상기 (I) 프로세스는, 상기 프로세서가, 상기 로우 수동제어 데이터의 상기 자율주행 차량으로의 입력에 따른 상기 자율주행 차량의 속력 또는 방향의 순간변화율이 제1 임계치 이상이었는지 여부 및 상기 자율주행 차량의 안전장치 작동률이 제2 임계치 이상이었는지 여부 중 적어도 일부를 참조하여 상기 일반주행 관련도를 계산하는 것을 특징으로 하는 유닛이 개시된다.As an example, the (I) process may include the processor determining whether an instantaneous rate of change in the speed or direction of the autonomous vehicle according to the input of the raw manual control data to the autonomous vehicle is greater than or equal to a first threshold and A unit is disclosed, wherein the general driving relevance is calculated with reference to at least part of whether the safety device operation rate of the driving vehicle is greater than or equal to a second threshold.

일례로서, 상기 (I) 프로세스는, 상기 프로세서가, 상기 제1 내지 제N 상황 데이터를 참조하여, 상기 특정 사용자의 주행에 대한 제한사항 데이터를 획득하고, 상기 제한사항 데이터를 참조하여 상기 제1 내지 제N 수동제어 데이터 중 적어도 일부를 조정하여 제1 내지 제N 조정수동제어 데이터를 생성하는 것을 특징으로 하고, 상기 (II) 프로세스는, 상기 프로세서가, 적어도 일부가 조정된 상기 제1 내지 제N 조정수동제어 데이터를 참조하여 상기 자율주행 알고리즘을 개인화함으로써 상기 자율주행 알고리즘에 따른 자율주행이 상기 제한사항에 위반하지 않도록 하는 것을 특징으로 하는 유닛이 개시된다.As an example, the (I) process is such that the processor obtains restriction data for driving of the specific user by referring to the first to Nth situation data, and refers to the restriction data to obtain the first characterized by generating first to Nth adjusted manual control data by adjusting at least some of the to Nth manual control data, wherein the (II) process is performed by the processor, the first to Nth adjusted manual control data, at least some of which are adjusted A unit is disclosed, wherein autonomous driving according to the autonomous driving algorithm does not violate the restrictions by personalizing the autonomous driving algorithm with reference to N adjusted manual control data.

일례로서, 상기 (II) 프로세스는, 상기 프로세서가, 상기 피팅 모듈로 하여금, 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 상기 자율주행 알고리즘에 역강화학습 방법론에 입각한 튜닝을 가함으로써 상기 자율주행 알고리즘을 개인화하도록 하는 것을 특징으로 하는 유닛이 개시된다.As an example, the process (II) includes the processor causing the fitting module to perform reverse reinforcement learning on the autonomous driving algorithm with reference to the first to Nth manual control data and the first to Nth automatic control data. A unit characterized in that the autonomous driving algorithm is personalized by applying tuning based on a methodology is disclosed.

일례로서, 상기 (II) 프로세스는, (II1) 상기 프로세서가, 상기 피팅 모듈로 하여금, 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 상기 자율주행 알고리즘의 파라미터 조정을 위한 보상함수를 계산하도록 하는 프로세스; (II2) 상기 프로세서가, 상기 피팅 모듈로 하여금, 상기 보상함수를 참조하여 상기 자율주행 알고리즘에 강화학습을 적용함으로써 상기 자율주행 알고리즘의 상기 파라미터들 중 적어도 일부를 조정하도록 하는 프로세스; 및 (II3) 상기 프로세서가, 상기 피팅 모듈로 하여금, 상기 적어도 일부의 파라미터가 조정된 자율주행 알고리즘에 대응하는 조정 정책 기대값 및 상기 특정 사용자의 상기 조작에 따른 옵티멀 정책 기대값 간의 차이를 계산한 후, 상기 차이 값이 임계치 미만일 경우 상기 파라미터가 조정된 자율주행 알고리즘을 상기 특정 사용자에 대한 개인화된 알고리즘으로서 상기 자율주행 모듈에 탑재하고, 상기 차이 값이 임계치 이상일 경우 상기 (II1) 프로세스 및 상기 (II2) 프로세스에 따른 프로세스를 다시 수행하도록 하는 프로세스를 포함하는 것을 특징으로 하는 유닛이 개시된다.As an example, the process (II) includes: (II1) the processor causes the fitting module to determine the autonomous driving algorithm by referring to the first to Nth manual control data and the first to Nth automatic control data. A process for calculating a compensation function for parameter adjustment; (II2) a process in which the processor causes the fitting module to adjust at least some of the parameters of the autonomous driving algorithm by applying reinforcement learning to the autonomous driving algorithm with reference to the compensation function; and (II3) the processor causes the fitting module to calculate a difference between an adjusted policy expectation value corresponding to an autonomous driving algorithm in which at least some parameters are adjusted and an optimal policy expectation value according to the operation of the specific user. Afterwards, if the difference value is less than the threshold, the autonomous driving algorithm with the adjusted parameters is loaded into the autonomous driving module as a personalized algorithm for the specific user, and if the difference value is more than the threshold, the (II1) process and the ( II2) A unit is disclosed, characterized in that it includes a process for re-performing a process according to the process.

일례로서, 상기 (II1) 프로세스는, 상기 프로세서가, 상기 피팅 모듈로 하여금, 하기 수식에 따라 상기 보상함수를 계산하도록 하되,As an example, in the process (II1), the processor causes the fitting module to calculate the compensation function according to the following equation,

상기 수식에서, 는 상기 제1 내지 제N 상황 데이터를 각각의 성분으로서 포함하는 상황 벡터를 의미하고, 는 상기 제1 내지 제N 상황 데이터에 따른 상기 제1 내지 제N 수동제어 데이터를 각각의 성분으로서 포함하는 수동제어 벡터를 의미하며, 는 상기 제1 내지 제N 수동제어 데이터 및 상기 제1 내지 제N 자동제어 데이터를 참조하여 획득된 웨이트 벡터를 의미하는 것을 특징으로 하는 유닛이 개시된다.In the above formula, means a situation vector including the first to Nth situation data as each component, means a manual control vector including the first to Nth manual control data according to the first to Nth situation data as each component, means a weight vector obtained with reference to the first to Nth manual control data and the first to Nth automatic control data.

일례로서, 상기 프로세서가, (i) 상기 제1 내지 제N 수동제어 데이터를 각각의 성분으로서 포함하는 상기 수동제어 벡터 및 상기 제1 내지 제N 자동제어 데이터를 각각의 성분으로서 포함하는 자동제어 벡터 간의 차이 벡터와, (ii) 기설정된 크기를 가지며 상기 차이 벡터의 성분과 같은 개수의 성분을 가지는 소정 후보 웨이트 벡터를 곱한 결과 벡터를 최적화하여, 그 크기가 가장 커지도록 하는 후보 웨이트 벡터를 상기 웨이트 벡터로서 획득하는 것을 특징으로 하는 유닛이 개시된다.As an example, the processor: (i) the manual control vector including the first to Nth manual control data as each component and the automatic control vector including the first to Nth automatic control data as each component; The weight vector is optimized by multiplying the difference vector between the difference vector and (ii) a predetermined size by a predetermined candidate weight vector having the same number of components as the components of the difference vector, so that the size is the largest. A unit characterized by acquisition as a vector is disclosed.

일례로서, 상기 (II3) 프로세스는, 상기 프로세서가, 하기 수식에 따라 상기 조정 정책 기대값 및 상기 옵티멀 정책 기대값 간의 상기 차이를 계산하도록 하되,As an example, the process (II3) causes the processor to calculate the difference between the adjusted policy expectation and the optimal policy expectation according to the formula:

상기 수식에서, 는 상기 특정 사용자에 대응하는 옵티멀 정책을 의미하고, 는 상기 강화 학습을 통해 그 파라미터가 조정된 상기 자율주행 알고리즘에 대응하는 조정 정책을 의미하며, 는 정책 X에 대한 기대값을 의미하고, 는 제t 상황 데이터가 상기 정책 X에 따라 연산된 제t 제어 데이터를 의미하며, 는 디스카운트 팩터를 의미하고, 는 각각의 성분에 대한 웨이트 벡터를 의미하는 것을 특징으로 하는 유닛이 개시된다.In the above formula, means the optimal policy corresponding to the specific user, means an adjustment policy corresponding to the autonomous driving algorithm whose parameters have been adjusted through reinforcement learning, means the expected value for policy means that the t situation data is the t control data calculated according to the policy X, means the discount factor, A unit is disclosed wherein means a weight vector for each component.

일례로서, (III) 상기 프로세서가, 상기 자율주행 차량의 상기 주행 모드가 자동 모드로 설정된 후, 상기 특정 사용자가 각각의 사용자별로 개인화되어 상기 자율주행 모듈에 탑재된 자율주행 알고리즘 리스트에서 자신에게 개인화된 상기 자율주행 알고리즘을 선택하면, 상기 자율주행 모듈로 하여금 상기 개인화된 상기 자율주행 알고리즘을 바탕으로 자율주행을 수행하도록 하는 프로세스를 더 포함하는 것을 특징으로 하는 유닛이 개시된다.As an example, (III) the processor, after the driving mode of the autonomous vehicle is set to automatic mode, the specific user is personalized for each user and personalized to himself from the autonomous driving algorithm list mounted on the autonomous driving module. When the self-driving algorithm is selected, a unit is disclosed, further comprising a process for causing the self-driving module to perform autonomous driving based on the personalized self-driving algorithm.

본 발명은 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 제공함으로써 인공지능 자율주행 알고리즘에 특정 사용자의 감성을 반영할 수 있도록 하는 효과가 있다.The present invention has the effect of allowing the emotions of a specific user to be reflected in the artificial intelligence autonomous driving algorithm by providing a method of personalizing the self-driving algorithm to improve the satisfaction of the self-driving experience of a specific user.

또한 본 발명은 특정 사용자의 주행을 옵티멀 정책으로 하여 역강화학습을 수행함으로써 상기 자율주행 알고리즘을 개인화하는 방법을 제공할 수 있도록 하는 효과가 있다.In addition, the present invention has the effect of providing a method of personalizing the autonomous driving algorithm by performing inverse reinforcement learning using the driving of a specific user as an optimal policy.

또한 본 발명은 특정 사용자의 주행 데이터 중 자율주행 만족도에 영향을 크게 끼치는 데이터들만 선택하여 상기 자율주행 알고리즘을 개인화하는 방법을 제공할 수 있도록 하는 효과가 있다.In addition, the present invention has the effect of providing a method of personalizing the autonomous driving algorithm by selecting only data that significantly influences autonomous driving satisfaction among the driving data of a specific user.

도 1은 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 수행하는 개인화 장치의 구성을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 나타낸 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 수행함에 따라 수행되는 역강화학습 과정을 나타낸 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 수행함에 따라 변화되는 자율주행 차량의 주행을 나타낸 도면이다.Figure 1 is a diagram showing the configuration of a personalization device that performs a method of personalizing an autonomous driving algorithm to improve the satisfaction of a specific user's autonomous driving experience according to an embodiment of the present invention.
Figure 2 is a flowchart showing a method of personalizing an autonomous driving algorithm to improve the satisfaction of a specific user's autonomous driving experience according to an embodiment of the present invention.
Figure 3 is a flowchart showing a reverse reinforcement learning process performed while performing a self-driving algorithm personalization method to improve the satisfaction of a specific user's self-driving experience according to an embodiment of the present invention.
Figure 4 is a diagram illustrating the driving of an autonomous vehicle that changes as a method of personalizing an autonomous driving algorithm is performed to improve the satisfaction of a specific user's autonomous driving experience according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The detailed description of the present invention described below refers to the accompanying drawings, which show by way of example specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein with respect to one embodiment may be implemented in other embodiments without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description that follows is not intended to be taken in a limiting sense, and the scope of the invention is limited only by the appended claims, together with all equivalents to what those claims assert, if properly described. Similar reference numbers in the drawings refer to identical or similar functions across various aspects.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 바람직한 실시예들에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, in order to enable those skilled in the art to easily practice the present invention, preferred embodiments of the present invention will be described in detail with reference to the attached drawings.

도 1은 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 수행하는 개인화 장치의 구성을 나타낸 도면이다.Figure 1 is a diagram showing the configuration of a personalization device that performs a method of personalizing an autonomous driving algorithm to improve the satisfaction of a specific user's autonomous driving experience according to an embodiment of the present invention.

도 1을 참조하면, 개인화 장치(100)는 통신부(110), 메모리(115) 및 프로세서(120)를 포함할 수 있다. 이 때, 개인화 장치(100)의 입출력 및 연산 과정은 각각 통신부(110) 및 프로세서(120)에 의해 이루어질 수 있다. 다만, 도 1에서는 통신부(110) 및 프로세서(120)의 구체적인 연결 관계를 생략하였다. 또한, 메모리(115)는 후술할 여러 가지 인스트럭션들을 저장한 상태일 수 있고, 프로세서(120)는 메모리에 저장된 인스트럭션들을 수행하도록 됨으로써 추후 설명할 프로세스들을 수행하여 본 발명을 수행할 수 있다. 이와 같이 개인화 장치(100)가 묘사되었다고 하여, 개인화 장치(100)가 본 발명을 실시하기 위한 미디엄, 프로세서 및 메모리가 통합된 형태인 통합 프로세서를 포함하는 경우를 배제하는 것은 아니다.Referring to FIG. 1 , the personalization device 100 may include a communication unit 110, a memory 115, and a processor 120. At this time, the input/output and calculation processes of the personalization device 100 may be performed by the communication unit 110 and the processor 120, respectively. However, in Figure 1, the specific connection relationship between the communication unit 110 and the processor 120 is omitted. Additionally, the memory 115 may store various instructions to be described later, and the processor 120 may perform the instructions stored in the memory, thereby performing processes to be described later to perform the present invention. Even though the personalization device 100 is depicted in this way, the case where the personalization device 100 includes an integrated processor in which a medium, processor, and memory for implementing the present invention are integrated is not excluded.

이와 같은 개인화 장치(100)은 차량 내에 설치된 것으로서, 차량의 자율주행을 관장하는 자율주행 모듈(200) 및 피팅 모듈(300)과 연동하여 동작할 수 있다. 자율주행 모듈(200)은, 레이더, 라이더, 카메라 및 GPS 중 적어도 하나 또는 이들의 조합을 사용하여 딥 러닝 기반 자율주행 알고리즘을 수행하는 모듈일 수 있다. 피팅 모듈(300)은, 자율주행 모듈(200)에 탑재된 상기 자율주행 알고리즘의 파라미터를 조정할 수 있는 모듈로, 특정 사용자의 주행 데이터를 획득하여 이를 기반으로 파라미터 조정을 수행할 수 있다. 도면에서는 편의상 개인화 장치(100), 자율주행 모듈(200) 및 피팅 모듈(300)이 서로 다른 객체인 것으로 표시하였으나, 이들은 모두 같은 컴퓨팅 장치 안에 포함되어 있을 수도 있으며, 어떠한 형태라도 무방하다.This personalization device 100 is installed in a vehicle and can operate in conjunction with the autonomous driving module 200 and the fitting module 300 that manage the autonomous driving of the vehicle. The autonomous driving module 200 may be a module that performs a deep learning-based autonomous driving algorithm using at least one or a combination of radar, lidar, camera, and GPS. The fitting module 300 is a module that can adjust the parameters of the autonomous driving algorithm mounted on the autonomous driving module 200, and can obtain driving data of a specific user and perform parameter adjustment based on it. In the drawing, for convenience, the personalization device 100, the autonomous driving module 200, and the fitting module 300 are shown as different objects, but they may all be included in the same computing device and may take any form.

이상 개인화 장치(100)의 구성 및 이와 자율주행 모듈(200) 및 피팅 모듈(300)간의 연동 관계에 대해 설명한 바, 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법에 대해 구체적으로 살필 것이다. 이를 위해 도 2를 참조한다.As described above, the configuration of the personalization device 100 and the interlocking relationship between it and the autonomous driving module 200 and the fitting module 300 have been described. According to an embodiment of the present invention, autonomous driving to improve satisfaction with the autonomous driving experience of a specific user is provided. We will look specifically at algorithmic personalization methods. For this, refer to Figure 2.

도 2는 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 나타낸 흐름도이다.Figure 2 is a flowchart showing a method of personalizing an autonomous driving algorithm to improve the satisfaction of a specific user's autonomous driving experience according to an embodiment of the present invention.

도 2를 참조하면, 개인화 장치(100)이, 자율주행 차량의 주행 모드가 수동 모드로 설정되면, (i) 제1 내지 제N 타이밍을 포함하는 시간 범위 동안 자율주행 차량을 운전하는 특정 사용자의 조작에 따른, 제1 내지 제N 상황 하에서의 자율주행 차량의 제1 내지 제N 수동제어 데이터를 획득하는 프로세스 및 (ii) 시간 범위 동안 자율주행 차량의 자율주행 모듈(200)이 제1 내지 제N 상황에 대응하는 제1 내지 제N 상황 데이터를 입력받아 생성한 제1 내지 제N 자동제어 데이터(이는 자율주행 차량의 동작에 영향을 끼치지 않음)를 획득하는 프로세스를 수행할 수 있다(S01). 이후, 개인화 장치(100)이, 피팅 모듈(300)로 하여금, 제1 내지 제N 수동제어 데이터 및 제1 내지 제N 자동제어 데이터를 참조하여 자율주행 모듈(200)에 탑재된 자율주행 알고리즘을 특정 사용자에 맞추어 개인화하도록 할 수 있다(S02). 이하 이에 대해 더욱 자세히 설명하도록 한다.Referring to FIG. 2, when the driving mode of the autonomous vehicle is set to the manual mode, the personalization device 100 (i) selects a specific user driving the autonomous vehicle during the time range including the first to Nth timings. A process of acquiring the first to Nth manual control data of the autonomous vehicle under the first to Nth situations according to the operation, and (ii) the autonomous driving module 200 of the autonomous vehicle during the time range for the first to Nth situations. A process of acquiring the first to Nth automatic control data (which does not affect the operation of the autonomous vehicle) generated by receiving the first to Nth situation data corresponding to the situation can be performed (S01). . Thereafter, the personalization device 100 causes the fitting module 300 to use the autonomous driving algorithm mounted on the autonomous driving module 200 with reference to the first to Nth manual control data and the first to Nth automatic control data. It can be personalized to a specific user (S02). This will be explained in more detail below.

먼저, 개인화 장치(100)은, 자율주행 차량의 주행 모드가 수동 모드로 설정된 후 이하의 프로세스를 수행할 수 있다. 이는, 특정 사용자의 조작에 따른 자율주행 차량의 수동제어 데이터를 획득하기 위함으로, 주행 모드가 자동 모드에서 수동 모드로 전환되거나, 시동 시부터 수동 모드로 설정됨으로써 본 조건이 만족될 수 있다. 주행 모드가 수동 모드로 설정되면, 개인화 장치(100)은, 소정 시간 범위 동안의 특정 사용자의 조작에 따라 자율주행 차량에 입력되는 로우(Raw) 수동제어 데이터, 즉 특정 사용자의 감속, 가속, 조향 및 변속에 대한 조작 데이터를 DCU를 통해 획득할 수 있다. 이와 동시에, 개인화 장치(100)은, 상기 시간 범위 동안의 주변 상황에 대한 정보인 로우 상황 데이터, 즉 전술한 레이더, 라이더, 카메라 및 GPS 중 적어도 일부 또는 이들의 조합을 통해 획득한 정보를 자율주행 모듈(200)에 입력함으로써 로우 자동제어 데이터를 획득할 수 있다. 하지만, 이와 같은 로우 자동제어 데이터는 자율주행 차량의 동작에 영향을 미칠 수 없도록 설정되어 있다. 이와 같은 상황에서, 개인화 장치(100)은, 이와 같은 로우 수동제어 데이터 중, 특정 사용자의 자율주행 만족도와 관련된 일반주행 관련도가 임계치 이상인 데이터를 제1 내지 제N 수동제어 데이터로서 획득할 수 있다. 이 때, 각각의 제1 내지 제N 수동제어 데이터에 대응하는 상황들을 제1 내지 제N 상황으로 명명하고, 이들에 대한 데이터를 제1 내지 제N 상황 데이터로 명명할 수 있다. 또한, 개인화 장치(100)은, 로우 자동제어 데이터 중, 제1 내지 제N 수동제어 데이터에 대응하는, 즉 제1 내지 제N 상황 데이터를 입력받아 생성한 것들을 제1 내지 제N 자동제어 데이터로서 획득할 수 있다. First, the personalization device 100 may perform the following process after the driving mode of the autonomous vehicle is set to manual mode. This is to obtain manual control data of the autonomous vehicle according to the operation of a specific user, and this condition can be satisfied by switching the driving mode from automatic mode to manual mode or setting it to manual mode from the time of startup. When the driving mode is set to manual mode, the personalization device 100 collects raw manual control data input to the autonomous vehicle according to the operation of a specific user during a predetermined time range, that is, the specific user's deceleration, acceleration, and steering. and operation data for shifting can be obtained through the DCU. At the same time, the personalization device 100 uses raw situation data, which is information about the surrounding situation during the time range, that is, information acquired through at least some or a combination of the above-described radar, lidar, camera, and GPS, for autonomous driving. Raw automatic control data can be obtained by inputting it into the module 200. However, such raw automatic control data is set so that it cannot affect the operation of the autonomous vehicle. In this situation, the personalization device 100 may acquire, among such raw manual control data, data with a general driving relevance related to the satisfaction of autonomous driving of a specific user equal to or greater than a threshold as first to Nth manual control data. . At this time, the situations corresponding to each of the first to Nth manual control data may be named the first to Nth situations, and the data for these may be named as the first to Nth situation data. In addition, the personalization device 100, among the raw automatic control data, generates those corresponding to the first to Nth manual control data, that is, by receiving the first to Nth situation data, as the first to Nth automatic control data. It can be obtained.

여기서 로우 수동제어 데이터들 각각의 일반주행 관련도를 계산하는 방법에 대해 설명하도록 한다. 일례로, 개인화 장치(100)은, 로우 수동제어 데이터의 자율주행 차량으로의 입력에 따른 자율주행 차량의 속력 또는 방향의 순간변화율이 기설정된 제1 임계치 이상이었는지 여부 및 자율주행 차량의 안전장치 작동률이 기설정된 제2 임계치 이상이었는지 여부 중 적어도 일부를 참조하여 일반주행 관련도를 계산할 수 있다. 이와 같은 방식은, 특정 사용자가 특수한 상황에서 운전한 데이터는 제외하고, 일반적인 상황에서 일반적인 운전을 수행한 데이터만을 확보하기 위한 방식일 수 있다. 즉, 속력 또는 방향의 순간변화율이 제1 임계치 이상으로 크거나, ABS 또는 TCS 등 차량의 안전장치의 작동률이 제2 임계치 이상으로 작동하였을 경우 차량 주변의 예외적인 상황 때문에 특정 사용자가 일반적이지 않은 주행을 하였을 가능성이 크므로, 이와 같은 경우를 제외하고자 하는 것이다. 이에 따라, 제1 임계치 조건 및 제2 임계치 조건을 모두 만족하지 않는 경우(즉, 각각 제1 임계치 이상이고 제2 임계치 이상인 경우) 다소 낮은 값의 일반주행 관련도를 부여하고, 둘 중 일부만 만족할 경우 그보다는 높은 값을 부여하며, 둘 모두를 만족하는 경우(즉, 각각 제1 임계치 미만이고 제2 임계치 미만인 경우) 다소 높은 값을 부여하는 방식으로 일반주행 관련도를 계산할 수 있다. 이후, 개인화 장치(100)은, 로우 수동제어 데이터 중 일반주행 관련도가 특정 임계치 이상인 제1 내지 제N 수동제어 데이터를 획득하고, 이에 대응하는 제1 내지 제N 자동제어 데이터를 획득할 수 있다.Here, we will explain how to calculate the general driving relevance of each raw manual control data. For example, the personalization device 100 determines whether the instantaneous rate of change in the speed or direction of the autonomous vehicle according to the input of raw manual control data to the autonomous vehicle is greater than or equal to a preset first threshold and the operation of the safety device of the autonomous vehicle. The degree of general driving relevance may be calculated with reference to at least part of whether the rate is greater than or equal to a preset second threshold. This method may be a method for securing only data on general driving in general situations, excluding data on driving by a specific user in special situations. In other words, if the instantaneous rate of change in speed or direction is greater than the first threshold, or the operation rate of the vehicle's safety devices such as ABS or TCS operates more than the second threshold, a specific user may experience abnormal situations due to exceptional circumstances around the vehicle. Since it is highly likely that the driver was driving, we want to exclude cases like this. Accordingly, if both the first threshold condition and the second threshold condition are not satisfied (i.e., more than the first threshold and more than the second threshold, respectively), a somewhat lower value of general driving relevance is given, and if only part of the two is satisfied, Rather, a higher value is given, and when both are satisfied (i.e., when both are less than the first threshold and less than the second threshold, respectively), the general driving relevance can be calculated by giving a somewhat higher value. Thereafter, the personalization device 100 may acquire first to Nth manual control data whose general driving relevance is greater than a certain threshold among the raw manual control data, and obtain first to Nth automatic control data corresponding thereto. .

여기서, 제1 내지 제N 수동제어 데이터를 조정하는 실시예에 대해 설명하도록 한다. 즉, 개인화 장치(100)은, 제1 내지 제N 상황 데이터를 참조하여, 특정 사용자의 주행에 대한 제한사항 데이터를 획득할 수 있다. 일 예로, 제1 내지 제N 상황 데이터에 GPS를 통해 획득된 위치 정보가 포함된 경우, 특정 사용자가 주행하고 있는 도로 상의 제한사항 정보, 즉 제한속도 정보 또는 지정차로 정보 등을 획득할 수 있다. 이후, 개인화 장치(100)은, 제1 내지 제N 수동제어 데이터 중 이와 같은 제한사항 정보에 위반하는 것들을 골라, 이들을 제한사항에 위반하지 않도록 조정할 수 있다. 즉, 개인화 장치(100)은, 제한 속도가 70km/h인 도로에서 특정 사용자가 85km/h로 주행한 특정 수동제어 데이터를 골라, 이를 70km/h의 주행으로 조정할 수 있다. 이와 같은 과정을 통해, 제1 내지 제N 수동제어 데이터 중 적어도 일부를 조정하여 제1 내지 제N 조정수동제어 데이터가 획득되면, 개인화 장치(100)은, 적어도 일부가 조정된 제1 내지 제N 조정수동제어 데이터를 참조하여 자율주행 알고리즘을 개인화함으로써 자율주행 알고리즘에 따른 자율주행이 제한사항에 위반하지 않도록 할 수 있다.Here, an embodiment of adjusting the first to Nth manual control data will be described. That is, the personalization device 100 may obtain restriction data on driving of a specific user by referring to the first to Nth situation data. For example, when the first to Nth situation data include location information acquired through GPS, information on restrictions on the road on which a specific user is driving, such as speed limit information or designated lane information, can be obtained. Thereafter, the personalization device 100 may select those that violate the restriction information among the first to Nth manual control data and adjust them so that they do not violate the restrictions. That is, the personalization device 100 can select specific manual control data showing that a specific user drove at 85 km/h on a road with a speed limit of 70 km/h and adjust it to drive at 70 km/h. Through this process, when the first to Nth adjusted manual control data are obtained by adjusting at least some of the first to Nth manual control data, the personalization device 100 adjusts at least some of the first to Nth adjusted manual control data. By personalizing the autonomous driving algorithm by referring to the adjusted manual control data, it is possible to ensure that autonomous driving according to the autonomous driving algorithm does not violate restrictions.

이상의 과정이 수행되면, 개인화 장치(100)은, 피팅 모듈(300)로 하여금, 제1 내지 제N 수동제어 데이터 및 제1 내지 제N 자동제어 데이터를 참조하여 자율주행 모듈(200)에 탑재된 자율주행 알고리즘에 역강화학습 방법론에 입각한 튜닝을 가함으로써 자율주행 알고리즘을 개인화하도록 할 수 있다. 일례로, 역강화학습 방법론 중 Apprenticeship Learning을 적용할 수 있다. 이하 이에 대해 더욱 자세히 설명하도록 한다.When the above process is performed, the personalization device 100 causes the fitting module 300 to refer to the first to N-th manual control data and the first to N-th automatic control data to load the autonomous driving module 200. By tuning the autonomous driving algorithm based on inverse reinforcement learning methodology, the autonomous driving algorithm can be personalized. For example, Apprenticeship Learning can be applied among the reverse reinforcement learning methodologies. This will be explained in more detail below.

즉, 개인화 장치(100)은, 피팅 모듈(300)로 하여금, 제1 내지 제N 수동제어 데이터 및 제1 내지 제N 자동제어 데이터를 참조하여 자율주행 알고리즘의 파라미터 조정을 위한 보상함수를 계산하도록 할 수 있다. 일례로, 개인화 장치(100)은, 피팅 모듈(300)로 하여금, 하기 수식에 따라 보상함수를 계산하도록 할 수 있다.That is, the personalization device 100 causes the fitting module 300 to calculate a compensation function for parameter adjustment of the autonomous driving algorithm with reference to the first to Nth manual control data and the first to Nth automatic control data. can do. For example, the personalization device 100 may cause the fitting module 300 to calculate a compensation function according to the following formula.

상기 수식에서, 는 제1 내지 제N 상황 데이터를 각각의 성분으로서 포함하는 상황 벡터를 의미하고, 는 제1 내지 제N 상황 데이터에 따른 제1 내지 제N 수동제어 데이터를 각각의 성분으로서 포함하는 수동제어 벡터를 의미하며, 는 각각의 성분에 대한 웨이트 벡터를 의미할 수 있다. 여기서 웨이트 벡터를 구할 때 제1 내지 제N 수동제어 데이터 및 제1 내지 제N 자동제어 데이터가 사용될 수 있다. 즉, 제1 내지 제N 자동제어 데이터를 각각의 성분으로서 포함하는 벡터를 자동제어 벡터라고 하면, 기설정된 크기를 가지는 후보 웨이트 벡터를 수동제어 벡터와 자동제어 벡터 간의 차이 벡터와 곱한 다음, 그 결과 벡터의 크기를 최대화하는 후보 웨이트 벡터를 웨이트 벡터로서 획득할 수 있다. 본 과정은 Apprenticeship Learning via Inverse Reinforcement Learning, Abbeel et al. 논문을 통해 더욱 자세히 이해될 수 있을 것이다.In the above formula, means a situation vector including the first to Nth situation data as each component, means a manual control vector including the first to Nth manual control data according to the first to Nth situation data as each component, may mean a weight vector for each component. Here, when obtaining the weight vector, first to Nth manual control data and first to Nth automatic control data can be used. That is, if a vector containing the first to Nth automatic control data as each component is called an automatic control vector, a candidate weight vector with a preset size is multiplied by the difference vector between the manual control vector and the automatic control vector, and the result is A candidate weight vector that maximizes the size of the vector can be obtained as a weight vector. This course is Apprenticeship Learning via Inverse Reinforcement Learning, Abbeel et al. You will be able to understand it in more detail through the paper.

이와 같이 보상함수가 결정되면, 개인화 장치(100)은, 피팅 모듈(300)로 하여금, 보상함수를 참조하여 자율주행 알고리즘에 강화학습을 적용함으로써 자율주행 알고리즘의 파라미터들 중 적어도 일부를 조정하도록 할 수 있다. 일반적인 강화학습의 경우, 보상함수를 프로그래머가 직접 지정해주어야 하는 문제 때문에, 강화학습을 수행하더라도 원하는 방향으로 학습이 되지 않는 경우가 많았으나, 이와 같이 보상함수를 결정할 경우, 제1 내지 제N 수동제어 데이터가 보상함수에 적절히 반영되어 강화학습이 잘 이루어지는 장점이 있다. 여기서 널리 알려진 어떠한 강화학습 알고리즘이라도 사용될 수 있다.When the compensation function is determined in this way, the personalization device 100 causes the fitting module 300 to adjust at least some of the parameters of the self-driving algorithm by applying reinforcement learning to the self-driving algorithm with reference to the compensation function. You can. In the case of general reinforcement learning, due to the problem that the programmer must directly specify the compensation function, there are many cases where learning is not performed in the desired direction even when reinforcement learning is performed. However, when determining the compensation function like this, the first to Nth manual control There is an advantage that reinforcement learning is carried out well because the data is properly reflected in the reward function. Any well-known reinforcement learning algorithm can be used here.

이후, 개인화 장치(100)은, 피팅 모듈(300)로 하여금, 적어도 일부의 파라미터가 조정된 자율주행 알고리즘에 대응하는 조정 정책 기대값 및 특정 사용자의 조작에 따른 옵티멀 정책 기대값 간의 차이를 계산하도록 할 수 있다. 여기서 정책 기대값은, 강화학습 방법론에서 흔히 사용되는, 해당 정책에 따랐을 때의 보상 값들의 합을 의미할 수 있다. 즉, 수식으로 표현하면 하기와 같다.Thereafter, the personalization device 100 causes the fitting module 300 to calculate the difference between the adjusted policy expected value corresponding to the autonomous driving algorithm in which at least some parameters are adjusted and the optimal policy expected value according to the operation of a specific user. can do. Here, the policy expected value may mean the sum of reward values when following the policy, which is commonly used in reinforcement learning methodology. In other words, if expressed in a formula, it is as follows.

상기 수식에서, 는 정책 X에 대한 기대값을 의미하고, 는 제t 상황 데이터가 상기 정책 X에 따라 연산된 제t 제어 데이터를 의미하며, 는 디스카운트 팩터를 의미하고, 는 각각의 성분에 대한 웨이트 벡터를 의미할 수 있다. 이 때, 조정 정책 기대값 와 옵티멀 정책 기대값 간의 차이를 계산함으로써, 현재의 조정된 자율주행 알고리즘의 정책이 특정 사용자의 주행 정책과 얼마나 차이가 나는지를 확인할 수 있다.In the above formula, means the expected value for policy means that the t situation data is the t control data calculated according to the policy X, means the discount factor, may mean a weight vector for each component. At this time, the adjustment policy expected value and optimal policy expected value By calculating the difference between the two, it is possible to determine how much the policy of the current adjusted autonomous driving algorithm differs from the driving policy of a specific user.

이후, 조정 정책 기대값 및 옵티멀 정책 기대값 간의 차이가 계산되면, 피팅 모듈(300)은 해당 차이 값이 임계치 이하인지 또는 초과인지를 확인할 수 있다. 차이 값이 임계치 미만일 경우, 자율주행 알고리즘이 충분히 조정된 것이므로, 개인화 장치(100)은 피팅 모듈(300)로 하여금, 파라미터가 조정된 자율주행 알고리즘을 특정 사용자에 대한 개인화된 알고리즘으로서 자율주행 모듈(200)에 탑재하도록 할 수 있다. 차이 값이 임계치 이상일 경우, 전술한 보상함수 계산 과정부터 다시 수행하게 된다. 다만, 자율주행 알고리즘의 안전성 및 안정성을 위해, 파라미터의 변경은 기설정된 범위 내에서만 이루어지도록 설정될 수 있다. 이상의 과정에 대해 설명하기 위해 도 3을 참조하도록 한다.Thereafter, when the difference between the adjusted policy expected value and the optimal policy expected value is calculated, the fitting module 300 may check whether the difference value is below or above the threshold. If the difference value is less than the threshold, the autonomous driving algorithm has been sufficiently adjusted, so the personalization device 100 causes the fitting module 300 to use the autonomous driving algorithm with adjusted parameters as a personalized algorithm for a specific user. 200). If the difference value is greater than or equal to the threshold, the above-mentioned compensation function calculation process is performed again. However, for the safety and stability of the autonomous driving algorithm, parameter changes may be set to occur only within a preset range. Please refer to FIG. 3 to explain the above process.

도 3은 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 수행함에 따라 수행되는 역강화학습 과정을 나타낸 흐름도이다.Figure 3 is a flowchart showing a reverse reinforcement learning process performed while performing a self-driving algorithm personalization method to improve the satisfaction of a specific user's self-driving experience according to an embodiment of the present invention.

도 3을 참조하면, 개인화 장치(100)은, 피팅 모듈(300)로 하여금, 전술한 보상함수 계산 과정을 수행하고(S02-1), 강화학습 과정을 수행한 뒤(S02-2), 전술한 차이 값(S02-3)이 임계치() 미만일 경우 개인화 프로세스를 종료하고, 이상일 경우 보상함수 계산 과정(S02-1)부터 다시 수행하도록 하는 것을 확인할 수 있다. Referring to FIG. 3, the personalization device 100 causes the fitting module 300 to perform the above-described reward function calculation process (S02-1), perform the reinforcement learning process (S02-2), and perform the above-described compensation function calculation process (S02-1). One difference value (S02-3) is the threshold ( ) If it is less than, the personalization process is terminated, and if it is more than ), it can be confirmed that it is performed again from the compensation function calculation process (S02-1).

이상의 과정을 통해 자율주행 알고리즘이 개인화되면 어떠한 차이가 생길 수 있는지에 대해 설명하기 위해 도 4를 참조하도록 한다.Refer to FIG. 4 to explain what differences may occur when the autonomous driving algorithm is personalized through the above process.

도 4는 본 발명의 일 실시예에 따른 특정 사용자의 자율주행 경험 만족도 제고를 위한 자율주행 알고리즘 개인화 방법을 수행함에 따라 변화되는 자율주행 차량의 주행을 나타낸 도면이다.Figure 4 is a diagram illustrating the driving of an autonomous vehicle that changes as a method of personalizing an autonomous driving algorithm is performed to improve the satisfaction of a specific user's autonomous driving experience according to an embodiment of the present invention.

도 4에서, 개인화 전에는, 자율주행 모듈(200)에 의해 코너에서 100m 이전인 제1 위치(410)에서 -10m/s2의 가속도(420)로 감속하여, 3km/s의 속도(430)로 우회전하며 그 조향각(440)의 변화는 10도-30도-90도-30도-0도일 수 있다. 이는 특정 사용자의 주행 감성을 고려하지 않은 것으로, 본 발명의 방법에 따라 수동 주행될 때, 특정 사용자에 의해 코너에서 80m 이전인 제1-1 위치(미도시)에서 -15m/s2의 가속도(420)로 감속하여, 5km/s의 속도(430)로 우회전하며 그 조향각(440)의 변화는 10도-40도-85도-35도-0도인 수동제어 데이터가 획득될 수 있다. 이와 같은 데이터를 이용해 자율주행 알고리즘을 개인화하면, 자율주행 모듈(200)은, 코너에서 83m 이전인 제1-2 위치(미도시)에서 -14m/s2의 가속도(420)로 감속하여, 4.8km/s의 속도(430)로 우회전하며 그 조향각(440)의 변화는 10도-38도-86도-33도-0도일 수 있다. In Figure 4, before personalization, the autonomous driving module 200 decelerates from the first position 410, 100 m before the corner, to an acceleration 420 of -10 m/s2 and turns right at a speed 430 of 3 km/s. And the change in steering angle 440 may be 10 degrees - 30 degrees - 90 degrees - 30 degrees - 0 degrees. This does not take into account the driving sensibility of a specific user, and when driving manually according to the method of the present invention, an acceleration of -15 m/s2 (420 ), turns right at a speed 430 of 5 km/s, and the change in steering angle 440 is 10 degrees - 40 degrees - 85 degrees - 35 degrees - 0 degrees. Manual control data can be obtained. If the autonomous driving algorithm is personalized using such data, the autonomous driving module 200 decelerates to an acceleration 420 of -14 m/s2 at the 1-2 position (not shown), 83 m before the corner, and travels 4.8 km. It turns right at a speed 430 of /s and the change in steering angle 440 can be 10 degrees - 38 degrees - 86 degrees - 33 degrees - 0 degrees.

개인화 장치(100)은, 차량의 운전자별로 개인화된 자율주행 알고리즘들을 자율주행 모듈에 탑재해 두었다가, 자율주행 차량의 주행 모드가 자동 모드로 설정되면, 특정 사용자가 자율주행 알고리즘 리스트 상에서 자신에게 개인화된 알고리즘을 선택할 수 있도록 하고, 자율주행 모듈이 선택된 알고리즘을 사용하여 자울주행을 수행할 수 있도록 함으로써 특정 사용자의 자율주행 만족도를 제고할 수 있다.The personalization device 100 stores autonomous driving algorithms personalized for each driver of the vehicle in the autonomous driving module, and when the driving mode of the self-driving vehicle is set to automatic mode, a specific user selects the self-driving algorithm personalized to him or her from the autonomous driving algorithm list. By allowing an algorithm to be selected and allowing the autonomous driving module to perform self-driving using the selected algorithm, the satisfaction level of autonomous driving for specific users can be improved.

이상 설명된 본 발명에 따른 실시예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and usable by those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the invention and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기계로부터 다양한 수정 및 변형을 꾀할 수 있다.In the above, the present invention has been described with specific details such as specific components and limited embodiments and drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , a person skilled in the art to which the present invention pertains can make various modifications and changes to this machine.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위 뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described later as well as all modifications equivalent to or equivalent to the scope of the patent claims fall within the scope of the spirit of the present invention. They will say they do it.

Claims

In a method of personalizing a self-driving algorithm to improve the satisfaction of a specific user's self-driving experience,
(a) When the driving mode of the autonomous vehicle is set to manual mode, the personalization device (i) follows the operation of the specific user driving the autonomous vehicle during a time range including the first to Nth timings, A process of acquiring first to Nth manual control data of the autonomous vehicle under first to Nth situations, and (ii) an autonomous driving module of the autonomous vehicle responding to the first to Nth situations during the time range. performing a process of obtaining first to Nth automatic control data generated by receiving first to Nth situation data - which does not affect the operation of the autonomous vehicle; and
(b) The personalization device causes the fitting module to apply the autonomous driving algorithm mounted on the autonomous driving module to the specific user with reference to the first to Nth manual control data and the first to Nth automatic control data. Steps to personalize
Characterized by including,
In step (b),
The personalization device causes the fitting module to tune the autonomous driving algorithm based on inverse reinforcement learning methodology with reference to the first to Nth manual control data and the first to Nth automatic control data. It is characterized by personalizing the autonomous driving algorithm,
In step (b),
(b1) The personalization device causes the fitting module to calculate a compensation function for parameter adjustment of the autonomous driving algorithm with reference to the first to Nth manual control data and the first to Nth automatic control data. steps;
(b2) allowing the personalization device to adjust at least some of the parameters of the self-driving algorithm by applying reinforcement learning to the self-driving algorithm with reference to the compensation function; and
(b3) The personalization device causes the fitting module to calculate the difference between an adjusted policy expectation value corresponding to an autonomous driving algorithm in which at least some parameters are adjusted and an optimal policy expectation value according to the operation of the specific user. Afterwards, if the difference value is less than the threshold, the autonomous driving algorithm with the adjusted parameters is loaded into the autonomous driving module as a personalized algorithm for the specific user, and if the difference value is more than the threshold, the step (b1) and the ( b2) Step to re-perform the process according to the step
Characterized by including,
In step (b1),
The personalization device causes the fitting module to calculate the compensation function according to the following formula,

In the above formula, means a situation vector including the first to Nth situation data as each component, means a manual control vector including the first to Nth manual control data according to the first to Nth situation data as each component, means a weight vector obtained with reference to the first to Nth manual control data and the first to Nth automatic control data.

According to paragraph 1,
In step (a),
Among the raw manual control data according to the operation of the specific user during the time range, the personalization device selects data whose general driving relevance related to the autonomous driving satisfaction of the specific user is greater than a certain threshold to the first to Nth manual control data. Among the raw automatic control data obtained as control data and generated by receiving raw situation data from the autonomous driving module, those corresponding to the first to Nth manual control data are the first to Nth automatic control data. A method characterized by obtaining as.

According to paragraph 2,
In step (a),
The personalization device determines whether an instantaneous rate of change in the speed or direction of the autonomous vehicle according to the input of the raw manual control data to the autonomous vehicle is greater than or equal to a first threshold and whether the safety device operation rate of the autonomous vehicle is a second threshold. 2. A method characterized in that the general driving relevance is calculated with reference to at least part of whether or not it was above a threshold.

According to paragraph 2,
In step (a),
The personalization device obtains restriction data for driving of the specific user with reference to the first to Nth situation data, and obtains at least some of the first to Nth manual control data with reference to the restriction data. Characterized in generating first to Nth adjusted manual control data by adjusting,
In step (b),
The personalization device personalizes the autonomous driving algorithm with reference to the first to Nth adjusted manual control data, at least in part adjusted, so that autonomous driving according to the autonomous driving algorithm does not violate the restrictions. How to.

delete

According to paragraph 1,
The personalization device is configured to: (i) a difference between the manual control vector including the first to Nth manual control data as each component and the automatic control vector including the first to Nth automatic control data as each component; The weight vector is a candidate weight vector whose size is maximized by optimizing the resulting vector by multiplying a vector and (ii) a predetermined size and the same number of components as the difference vector. A method characterized by obtaining.

In a method of personalizing a self-driving algorithm to improve the satisfaction of a specific user's self-driving experience,
(a) When the driving mode of the autonomous vehicle is set to manual mode, the personalization device (i) follows the operation of the specific user driving the autonomous vehicle during a time range including the first to Nth timings, A process of acquiring first to Nth manual control data of the autonomous vehicle under first to Nth situations, and (ii) an autonomous driving module of the autonomous vehicle responding to the first to Nth situations during the time range. performing a process of obtaining first to Nth automatic control data generated by receiving first to Nth situation data - which does not affect the operation of the autonomous vehicle; and
(b) The personalization device causes the fitting module to apply the autonomous driving algorithm mounted on the autonomous driving module to the specific user with reference to the first to Nth manual control data and the first to Nth automatic control data. Steps to personalize
Characterized by including,
In step (b),
The personalization device causes the fitting module to tune the autonomous driving algorithm based on inverse reinforcement learning methodology with reference to the first to Nth manual control data and the first to Nth automatic control data. It is characterized by personalizing the autonomous driving algorithm,
In step (b),
(b1) The personalization device causes the fitting module to calculate a compensation function for parameter adjustment of the autonomous driving algorithm with reference to the first to Nth manual control data and the first to Nth automatic control data. steps;
(b2) allowing the personalization device to adjust at least some of the parameters of the self-driving algorithm by applying reinforcement learning to the self-driving algorithm with reference to the compensation function; and
(b3) The personalization device causes the fitting module to calculate the difference between an adjusted policy expectation value corresponding to an autonomous driving algorithm in which at least some parameters are adjusted and an optimal policy expectation value according to the operation of the specific user. Afterwards, if the difference value is less than the threshold, the autonomous driving algorithm with the adjusted parameters is loaded into the autonomous driving module as a personalized algorithm for the specific user, and if the difference value is more than the threshold, the step (b1) and the ( b2) Step to re-perform the process according to the step
Characterized by including,
In step (b3),
Let the personalization device calculate the difference between the adjustment policy expectation value and the optimal policy expectation value according to the formula below,

In the above formula, means the optimal policy corresponding to the specific user, means an adjustment policy corresponding to the autonomous driving algorithm whose parameters have been adjusted through reinforcement learning, means the expected value for policy means that the t situation data is the t control data calculated according to the policy X, means the discount factor, A method characterized in that means the weight vector for each component.

According to paragraph 1,
(c) After the personalization device sets the driving mode of the autonomous vehicle to automatic mode, the specific user is personalized for each user and receives the personalized information from the autonomous driving algorithm list mounted on the autonomous driving module. When an autonomous driving algorithm is selected, allowing the autonomous driving module to perform autonomous driving based on the personalized autonomous driving algorithm.
A method further comprising:

In a personalization device that performs a self-driving algorithm personalization method to improve the satisfaction of a specific user's self-driving experience,
one or more memories storing instructions; and
One or more processors configured to perform the instructions, wherein: (I) when the driving mode of the autonomous vehicle is set to manual mode, (i) the autonomous vehicle is configured to perform the autonomous vehicle during a time range including the first to Nth timings; A process of acquiring first to Nth manual control data of the autonomous vehicle under first to Nth situations according to the operation of the specific user driving the vehicle, and (ii) the operation of the autonomous vehicle during the time range. The first to Nth automatic control data generated by the autonomous driving module by receiving the first to Nth situation data corresponding to the first to Nth situations - this does not affect the operation of the autonomous vehicle - the process of carrying out the process of acquiring; and (II) causing the fitting module to personalize the autonomous driving algorithm mounted on the autonomous driving module to the specific user by referring to the first to Nth manual control data and the first to Nth automatic control data. Characterized by performing a process,
The process (II) above is,
The processor causes the fitting module to tune the autonomous driving algorithm based on inverse reinforcement learning methodology with reference to the first to Nth manual control data and the first to Nth automatic control data, thereby improving the autonomous driving algorithm. It is characterized by personalizing the driving algorithm,
The process (II) above is,
(II1) The processor causes the fitting module to calculate a compensation function for parameter adjustment of the autonomous driving algorithm with reference to the first to Nth manual control data and the first to Nth automatic control data. process;
(II2) a process in which the processor causes the fitting module to adjust at least some of the parameters of the autonomous driving algorithm by applying reinforcement learning to the autonomous driving algorithm with reference to the compensation function; and
(II3) After the processor causes the fitting module to calculate the difference between an adjusted policy expectation value corresponding to an autonomous driving algorithm in which at least some parameters are adjusted and an optimal policy expectation value according to the operation of the specific user , if the difference value is less than the threshold, the autonomous driving algorithm with the adjusted parameters is loaded into the autonomous driving module as a personalized algorithm for the specific user, and if the difference value is more than the threshold, the (II1) process and the (II2) ) A process that re-performs the process according to the process
Characterized by including,
The process (II1) above is,
The processor causes the fitting module to calculate the compensation function according to the following formula,

In the above formula, means a situation vector including the first to Nth situation data as each component, means a manual control vector including the first to Nth manual control data according to the first to Nth situation data as each component, means a weight vector obtained with reference to the first to Nth manual control data and the first to Nth automatic control data.

According to clause 11,
The (I) process is:
The processor, among the raw manual control data according to the specific user's operation during the time range, manually controls the first to Nth data whose general driving relevance related to the specific user's satisfaction with autonomous driving is greater than a certain threshold. Obtaining as data, and acquiring raw situation data corresponding to the first to Nth manual control data among the raw automatic control data generated by receiving raw situation data as the first to Nth automatic control data. Characterized personalization device.

According to clause 12,
The (I) process is:
The processor determines whether the instantaneous rate of change in speed or direction of the autonomous vehicle according to the input of the raw manual control data to the autonomous vehicle is greater than or equal to a first threshold and whether the safety device operation rate of the autonomous vehicle is a second threshold. A personalization device characterized in that the general driving relevance is calculated with reference to at least part of whether or not it was above a threshold.

According to clause 12,
The (I) process is:
The processor obtains restriction data for driving of the specific user with reference to the first to Nth situation data, and obtains at least some of the first to Nth manual control data with reference to the restriction data. Characterized by adjusting to generate first to Nth adjusted manual control data,
The process (II) above is,
The processor personalizes the autonomous driving algorithm with reference to the first to Nth adjusted manual control data, at least in part adjusted, so that autonomous driving according to the autonomous driving algorithm does not violate the restrictions. Personalization device.

delete

According to clause 11,
The processor, (i) a difference vector between the manual control vector including the first to Nth manual control data as each component and the automatic control vector including the first to Nth automatic control data as each component; and (ii) optimizing the resulting vector by multiplying it by a predetermined candidate weight vector having a preset size and the same number of components as the components of the difference vector, and obtaining a candidate weight vector whose size is the largest as the weight vector. A personalization device characterized by:

In a personalization device that performs a self-driving algorithm personalization method to improve the satisfaction of a specific user's self-driving experience,
one or more memories storing instructions; and
One or more processors configured to perform the instructions, wherein: (I) when the driving mode of the autonomous vehicle is set to manual mode, (i) the autonomous vehicle is configured to perform the autonomous vehicle during a time range including the first to Nth timings; A process of acquiring first to Nth manual control data of the autonomous vehicle under first to Nth situations according to the operation of the specific user driving the vehicle, and (ii) the operation of the autonomous vehicle during the time range. The first to Nth automatic control data generated by the autonomous driving module by receiving the first to Nth situation data corresponding to the first to Nth situations - this does not affect the operation of the autonomous vehicle - the process of carrying out the process of acquiring; and (II) causing the fitting module to personalize the autonomous driving algorithm mounted on the autonomous driving module to the specific user by referring to the first to Nth manual control data and the first to Nth automatic control data. Characterized by performing a process,
The process (II) above is,
The processor causes the fitting module to tune the autonomous driving algorithm based on inverse reinforcement learning methodology with reference to the first to Nth manual control data and the first to Nth automatic control data, thereby improving the autonomous driving algorithm. It is characterized by personalizing the driving algorithm,
The process (II) above is,
(II1) The processor causes the fitting module to calculate a compensation function for parameter adjustment of the autonomous driving algorithm with reference to the first to Nth manual control data and the first to Nth automatic control data. process;
(II2) a process in which the processor causes the fitting module to adjust at least some of the parameters of the autonomous driving algorithm by applying reinforcement learning to the autonomous driving algorithm with reference to the compensation function; and
(II3) After the processor causes the fitting module to calculate the difference between an adjusted policy expectation value corresponding to an autonomous driving algorithm in which at least some parameters are adjusted and an optimal policy expectation value according to the operation of the specific user , if the difference value is less than the threshold, the autonomous driving algorithm with the adjusted parameters is loaded into the autonomous driving module as a personalized algorithm for the specific user, and if the difference value is more than the threshold, the (II1) process and the (II2) ) A process that re-performs the process according to the process
Characterized by including,
The (II3) process above is,
Cause the processor to calculate the difference between the adjusted policy expectation and the optimal policy expectation according to the formula below,

In the above formula, means the optimal policy corresponding to the specific user, means an adjustment policy corresponding to the autonomous driving algorithm whose parameters are adjusted through reinforcement learning, means the expected value for policy means that the t situation data is the t control data calculated according to the policy X, means the discount factor, is a personalization device characterized in that means the weight vector for each component.

According to clause 11,
(III) After the processor sets the driving mode of the autonomous vehicle to automatic mode, the specific user is personalized for each user and selects the autonomous driving algorithm personalized to the user from the autonomous driving algorithm list mounted on the autonomous driving module. When a driving algorithm is selected, a process that causes the autonomous driving module to perform autonomous driving based on the personalized autonomous driving algorithm
A personalization device further comprising: