KR102559608B1

KR102559608B1 - Learning method for controlling unmanned aerial vehicle and electronic device for performing the same

Info

Publication number: KR102559608B1
Application number: KR1020220171515A
Authority: KR
Inventors: 배정호; 김석봉; 김성호; 김용덕; 황인수; 정호성
Original assignee: 국방과학연구소
Priority date: 2022-12-09
Filing date: 2022-12-09
Publication date: 2023-07-26

Abstract

조종 정보가 누락된 기동과 관련된 상태 정보로부터 조종 정보를 추론하여, 무인 항공기를 제어하는 제어 모델을 학습시키는 기술에 관한 것이다. 무인 항공기의 제어를 위한 학습 방법은, 유인 항공기의 기동과 관련된 상태 정보를 확인하는 단계; 상기 유인 항공기의 기동과 관련된 상태 정보로부터 상기 유인 항공기를 제어한 조종 정보를 추정하는 단계; 상기 조종 정보에 기반하여 무인 항공기에 탑재될 제어 모델을 학습시키는 단계를 포함할 수 있다. A technique for learning a control model for controlling an unmanned aerial vehicle by inferring steering information from state information related to a maneuver in which the steering information is missing. A learning method for controlling an unmanned aerial vehicle includes checking state information related to maneuvering of a manned aircraft; estimating steering information for controlling the manned aircraft from state information related to the maneuvering of the manned aircraft; The method may include learning a control model to be mounted on the unmanned aerial vehicle based on the steering information.

Description

Learning method for controlling unmanned aerial vehicle and electronic device performing the same

본 명세서의 실시 예는 기동과 관련된 상태 정보로부터 조종 정보를 추론하여 무인 항공기에 탑재될 제어 모델을 학습시키는 기술에 관한 것이다.An embodiment of the present specification relates to a technique for learning a control model to be loaded into an unmanned aerial vehicle by inferring steering information from state information related to maneuvering.

무인 항공기를 제어하는 인공지능 조종사와 관련된 연구들이 진행되고 있다. 인공지능 조종사를 효율적으로 학습시키기 위해서는 기존 유인 항공기의 조종사들의 항공기 기동에 따른 상태 정보(예컨대, 위치, 속도, 각속도 등)를 이용할 수 있다. 다만, 기존 상태 정보에 기반하여 인공지능 조종사를 학습시키기 위해서는 상태 정보에 대응하는 조종 정보가 필요하지만, 기존의 상태 정보는 조종 정보가 누락되어 있다. 예를 들면, 공군에서 도입한 공중 전투 훈련 시스템인 ACMI(Air Combat Maneuvering Instrumentation)에서는 전투기의 기동을 저장하기 위하여 미사일 형태의 AIS(Airborne Instrumentation Subsystem) 포드를 사용하지만, 이를 통해서는 전투기의 위치, 자세, 속도와 같은 상태 정보가 획득될 뿐 조종사가 입력한 조종 정보는 획득되지 않는다. 이 밖에도 민항기 기동과 관련된 상태 정보, DCS WORLD와 같은 오/오프라인 게임 데이터 및 다양한 목적으로 개발된 시뮬레이터로부터 획득한 로그 데이터 역시 상태 정보에서 조종 정보는 누락되어 있다. 따라서, 기존의 기동과 관련된 상태 정보로부터 이에 대응하는 조종 정보를 추론하여 인공지능 조종사를 학습시켜 효율성을 향상시킬 수 있는 기술이 필요하다.Research related to artificial intelligence pilots controlling unmanned aerial vehicles is being conducted. In order to efficiently train an artificial intelligence pilot, state information (eg, position, speed, angular velocity, etc.) according to aircraft maneuvering of pilots of existing manned aircraft may be used. However, in order to learn the artificial intelligence pilot based on the existing state information, steering information corresponding to the state information is required, but the existing state information is missing the steering information. For example, ACMI (Air Combat Maneuvering Instrumentation), an air combat training system introduced by the Air Force, uses a missile-type Airborne Instrumentation Subsystem (AIS) pod to store fighter maneuvers, but through this, only status information such as the position, attitude, and speed of the fighter is obtained, but the control information entered by the pilot is not obtained. In addition, control information is missing from status information related to civil aircraft operation, off/offline game data such as DCS WORLD, and log data obtained from simulators developed for various purposes. Therefore, there is a need for a technique capable of improving efficiency by learning artificial intelligence pilots by inferring corresponding pilot information from state information related to existing maneuvers.

본 명세서의 실시 예는 상술한 문제점을 해결하기 위하여 제안된 것으로 기동과 관련된 상태 정보로부터 조종 정보를 추론하여 무인 항공기에 탑재될 제어 모델을 학습시키는 기술에 관한 것이다. 본 실시 예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시 예 들로부터 또 다른 기술적 과제들이 유추될 수 있다.An embodiment of the present specification is proposed to solve the above problems, and relates to a technique for learning a control model to be mounted on an unmanned aerial vehicle by inferring steering information from state information related to maneuvering. The technical problem to be achieved by the present embodiment is not limited to the technical problems described above, and other technical problems may be inferred from the following embodiments.

상술한 과제를 달성하기 위하여, 본 명세서의 일 실시 예에 따르는 무인 항공기의 제어를 위한 학습 방법은, 유인 항공기의 기동과 관련된 상태 정보를 확인하는 단계; 상기 유인 항공기의 기동과 관련된 상태 정보로부터 상기 유인 항공기를 제어한 조종 정보를 추정하는 단계; 및 상기 조종 정보에 기반하여 무인 항공기에 탑재될 제어 모델을 학습시키는 단계를 포함할 수 있다.In order to achieve the above object, a learning method for controlling an unmanned aerial vehicle according to an embodiment of the present specification includes the steps of checking state information related to the maneuvering of a manned aircraft; estimating steering information for controlling the manned aircraft from state information related to the maneuvering of the manned aircraft; and learning a control model to be mounted on the unmanned aerial vehicle based on the steering information.

실시 예에 따르면, 상기 상태 정보는, 상기 유인 항공기에 대한 궤적 정보, 위치 정보, 자세 정보, 속도 정보 및 각속도 정보를 포함하며, 상기 조종 정보는, 상기 상태 정보를 출력하기 위하여 상기 유인 항공기에 입력된 정보를 포함할 수 있다. According to an embodiment, the state information includes trajectory information, position information, attitude information, speed information, and angular velocity information for the manned aircraft, and the steering information is input to the manned aircraft to output the state information. It may include information input.

실시 예에 따르면, 상기 조종 정보를 추정하는 단계는, 시간 t를 기준으로 시간 t-n(n은 0 이상의 정수)부터 시간 t+k(k는 1 이상의 자연수)까지 상기 상태 정보에 역 동역학(inverse dynamics)를 적용하여, 상기 시간 t에서의 조종 정보를 추정하는 단계를 포함할 수 있다. According to an embodiment, the step of estimating the steering information may include estimating the steering information at time t by applying inverse dynamics to the state information from time t−n (where n is an integer greater than or equal to 0) to time t+k (where k is a natural number greater than or equal to 1) based on time t.

실시 예에 따르면, 상기 시간 t+k에서 상기 k는, 상기 유인 항공기의 타입에 따라 상이하게 결정될 수 있다. According to an embodiment, the k at the time t+k may be determined differently according to the type of the manned aircraft.

실시 예에 따르면, 상기 조종 정보를 추정하는 단계는, 상기 유인 항공기의 타입과 상기 궤적 정보에 기반하여 사전에 학습된 모델을 이용하여 시간 t에서의 조종 정보를 추정하는 단계를 포함할 수 있다. According to an embodiment, the estimating the steering information may include estimating the steering information at time t using a model learned in advance based on the type of the manned aircraft and the trajectory information.

실시 예에 따르면, 상기 제어 모델을 학습시키는 단계는, 상기 추정된 조종 정보를 이용하여 상기 제어 모델을 지도 학습시키는 단계를 포함할 수 있다.According to an embodiment, the step of learning the control model may include supervising learning of the control model using the estimated steering information.

실시 예에 따르면, 상기 지도 학습시키는 단계는, 상기 추정된 조종 정보가 복수인 경우, 상기 복수의 조종 정보의 평균값으로 상기 제어 모델을 학습시키는 단계를 포함할 수 있다. According to an embodiment, the supervised learning may include learning the control model with an average value of the plurality of steering information when the estimated steering information is plural.

실시 예에 따르면, 상기 제어 모델을 학습시키는 단계는, 상기 제어 모델을 상기 추정된 조종 정보에 기반하여 미션 별로 서로 다른 리워드를 제공하여 강화 학습시키는 단계를 포함할 수 있다. According to an embodiment, the step of learning the control model may include reinforcement learning of the control model by providing different rewards for each mission based on the estimated steering information.

상술한 과제를 달성하기 위하여, 본 명세서의 일 실시 예에 따르는 비일시적 컴퓨터 판독 가능 저장 매체는, 컴퓨터 판독 가능 명령어들을 저장하도록 구성되는 매체를 포함하고, 상기 컴퓨터 판독 가능 명령어들은 프로세서에 의해 실행되는 경우 상기 프로세서가: 유인 항공기의 기동과 관련된 상태 정보를 확인하는 단계; 및 상기 유인 항공기의 기동과 관련된 상태 정보로부터 상기 유인 항공기를 제어한 조종 정보를 추정하는 단계; 및 상기 조종 정보에 기반하여 무인 항공기에 탑재될 제어 모델을 학습시키는 단계를 포함하는 무인 항공기의 제어를 위한 학습 방법을 수행할 수 있다. In order to achieve the above object, a non-transitory computer-readable storage medium according to an embodiment of the present specification includes a medium configured to store computer-readable instructions, wherein the computer-readable instructions when executed by a processor: Checking state information related to maneuvering of a manned aircraft; and estimating steering information for controlling the manned aircraft from state information related to the maneuvering of the manned aircraft. and learning a control model to be mounted on the unmanned aerial vehicle based on the steering information.

상술한 과제를 달성하기 위하여, 본 명세서의 일 실시 예에 따르는 전자 장치는, 적어도 하나의 명령어(instruction)를 저장하는 메모리; 및 유인 항공기의 기동과 관련된 상태 정보를 확인하고, 상기 유인 항공기의 기동과 관련된 상태 정보로부터 상기 유인 항공기를 제어한 조종 정보를 추정하고, 상기 조종 정보에 기반하여 무인 항공기에 탑재될 제어 모델이 학습하는 제어하는 제어부(controller)를 포함할 수 있다. In order to achieve the above object, an electronic device according to an embodiment of the present specification includes a memory for storing at least one instruction; and a controller that checks state information related to the maneuvering of the manned aircraft, estimates steering information for controlling the manned aircraft from the state information related to the maneuvering of the manned aircraft, and controls the learning of a control model to be mounted on the unmanned aerial vehicle based on the steering information.

기타 실시 예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Details of other embodiments are included in the detailed description and drawings.

본 명세서의 실시 예에 따르면 아래와 같은 효과가 하나 혹은 그 이상 있다. According to an embodiment of the present specification, one or more of the following effects are provided.

실시 예에 따르면, 조종 정보가 누락된 기존의 상태 정보를 이용하여 무인 항공기를 제어하는 인공지능 조종사를 학습시킬 수 있다. 이때, 기존의 상태 정보로부터 이에 대응하는 조종 정보를 추정하고, 지도 학습 및 강화 학습을 통해 인공지능 조종사를 학습시킴으로써 학습 효율이 보다 향상될 수 있다. 학습이 완료된 인공지능 조종사는 미션을 수행하기 위하여 조종 정보를 스스로 입력하여 무인 항공기를 제어할 수 있다. According to an embodiment, an artificial intelligence pilot controlling an unmanned aerial vehicle may be trained using existing state information in which pilot information is omitted. At this time, learning efficiency can be further improved by estimating pilot information corresponding thereto from existing state information and learning artificial intelligence pilots through supervised learning and reinforcement learning. AI pilots who have completed learning can control the unmanned aerial vehicle by inputting pilot information themselves in order to carry out missions.

실시 예의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 청구범위의 기재로부터 당해 기술 분야의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the embodiments are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the claims.

도 1은 일 실시 예에 따른 무인 항공기를 제어하는 인공지능 조종사를 학습시키는 방법을 나타내는 도면이다.
도 2는 일 실시 예에 따른 조종 정보를 추론하는 과정을 나타내는 도면이다.
도 3은 일 실시 예에 따른 시간에 따른 항공기의 궤적 정보를 나타내는 도면이다.
도 4는 일 실시 예에 따른 조종 정보를 설명하기 위한 도면이다.
도 5는 일 실시 예에 따른 무인 항공기의 제어를 위한 학습 방법을 나타내는 흐름도이다.
도 6은 일 실시 예에 따른 전자 장치의 블록도이다.1 is a diagram illustrating a method of training an artificial intelligence pilot controlling an unmanned aerial vehicle according to an embodiment.
2 is a diagram illustrating a process of inferring steering information according to an exemplary embodiment.
3 is a diagram illustrating trajectory information of an aircraft over time according to an embodiment.
4 is a diagram for explaining steering information according to an exemplary embodiment.
5 is a flowchart illustrating a learning method for controlling an unmanned aerial vehicle according to an embodiment.
6 is a block diagram of an electronic device according to an exemplary embodiment.

실시 예들에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the embodiments have been selected as general terms that are currently widely used as much as possible while considering the functions in the present disclosure, but they may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technologies, and the like. In addition, in a specific case, there are also terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the corresponding description. Therefore, terms used in the present disclosure should be defined based on the meaning of the term and the general content of the present disclosure, not simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 “포함”한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 “~부”, “~모듈” 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.In the entire specification, when a part is said to "include" a certain component, it means that it may further include other components, not excluding other components unless otherwise stated. In addition, terms such as “~unit” and “~module” described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software, or a combination of hardware and software.

명세서 전체에서 기재된 “a, b, 및 c 중 적어도 하나”의 표현은, ‘a 단독’, ‘b 단독’, ‘c 단독’, ‘a 및 b’, ‘a 및 c’, ‘b 및 c’, 또는 ‘a,b,c 모두’를 포괄할 수 있다.The expression of "at least one of a, b, and c" described throughout the specification may include 'a alone', 'b alone', 'c alone', 'a and b', 'a and c', 'b and c', or 'all of a, b, and c'.

이하에서 언급되는 "단말"은 네트워크를 통해 학습 장치나 타 단말에 접속할 수 있는 컴퓨터나 휴대용 단말로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop) 등을 포함하고, 휴대용 단말은 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, IMT(International Mobile Telecommunication), CDMA(Code Division Multiple Access), W-CDMA(W-Code Division Multiple Access), LTE(Long Term Evolution) 등의 통신 기반 단말, 스마트폰, 태블릿 PC 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.A “terminal” referred to below may be implemented as a computer or portable terminal capable of accessing a learning device or other terminals through a network. Here, the computer includes, for example, a laptop, desktop, or laptop equipped with a web browser, and the portable terminal is, for example, a wireless communication device that ensures portability and mobility, and is a communication-based terminal such as International Mobile Telecommunication (IMT), Code Division Multiple Access (CDMA), W-Code Division Multiple Access (W-CDMA), Long Term Evolution (LTE), and all types of handheld-based terminals such as smartphones and tablet PCs. It may include a wireless communication device of.

아래에서는 첨부한 도면을 참고하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.Hereinafter, with reference to the accompanying drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily carry out the present disclosure. However, the present disclosure may be implemented in many different forms and is not limited to the embodiments described herein.

이하에서는 도면을 참조하여 본 개시의 실시 예들을 상세히 설명한다.Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings.

도 1은 일 실시 예에 따른 무인 항공기를 제어하는 인공지능 조종사를 학습시키는 방법을 나타내는 도면이다.1 is a diagram illustrating a method of training an artificial intelligence pilot controlling an unmanned aerial vehicle according to an embodiment.

도 1을 참조하면, 무인 항공기를 제어하는 인공지능 조종사를 학습시키기 위해서는 유인 항공기의 기동과 관련된 상태 정보(110)뿐만 아니라 이에 대응하는 조종 정보(120)도 함께 필요하다. 여기서, 조종 정보(120)는 예컨대, X-stick, Y-stick, 쓰로틀(Throttle), 러더(Rudder)와 같이 유인 항공기를 제어하기 위하여 조종사가 입력한 정보이다. 즉, 조종사가 입력한 조종 정보(120)에 기초하여 유인 항공기가 제어되어 결과로서 기동과 관련된 상태 정보(110)가 감지될 수 있다. Referring to FIG. 1 , in order to train an artificial intelligence pilot controlling an unmanned aerial vehicle, not only state information 110 related to maneuvering of a manned aircraft but also pilot information 120 corresponding thereto are required. Here, the steering information 120 is information input by a pilot to control a manned aircraft, such as X-stick, Y-stick, throttle, and rudder. That is, the manned aircraft is controlled based on the steering information 120 input by the pilot, and as a result, the state information 110 related to maneuvering can be sensed.

기존의 획득된 상태 정보(110)는 유인 항공기의 궤적, 위치, 자세, 속도 및 각속도와 같은 기동의 결과 정보만 포함하고, 입력 정보인 조종 정보(120)를 포함하지 않을 수 있다. 따라서 무인 항공기를 제어하는 인공지능 조종사를 학습시키기 위해서는 입력 정보인 조종 정보(120)가 함께 필요하므로, 상태 정보(110)로부터 조종 정보(120)을 추론할 필요가 있다. The existing acquired state information 110 may include only maneuver result information such as the trajectory, position, attitude, speed, and angular velocity of the manned aircraft, and may not include steering information 120 as input information. Therefore, in order to learn the artificial intelligence pilot controlling the unmanned aerial vehicle, the input information, the steering information 120, is required together, and thus the steering information 120 needs to be inferred from the state information 110.

구체적으로, 상태 정보(110)에 기 설정된 추정 방법을 적용하여 조종 정보(120)가 추론될 수 있다. 예를 들면, inverse dynamics를 적용하여 상태 정보(110)로부터 조정 정보(120)를 추출할 수 있다. 이때, 항공기의 타입(사이즈, 동력 성능 등)을 고려하여 inverse dynamics가 적용되어 정확한 조정 정보(120)가 추출될 수 있다. 또는, 항공기의 타입 별로 상태 정보(110)에 포함된 궤적에 기초하여 기계 학습된 모델에 기초하여 상기 궤적에 대한 조종 정보(120)가 추출될 수 있다. Specifically, the steering information 120 may be inferred by applying a preset estimation method to the state information 110 . For example, adjustment information 120 may be extracted from state information 110 by applying inverse dynamics. At this time, accurate adjustment information 120 may be extracted by applying inverse dynamics in consideration of the type of aircraft (size, power performance, etc.). Alternatively, the steering information 120 for the trajectory may be extracted based on a machine-learned model based on the trajectory included in the state information 110 for each type of aircraft.

제어 모델(130)은 기동과 관련된 상태 정보(110)와 함께 추론된 조종 정보(120)을 이용하여 학습된 모델로서, 무인 항공기에 탑재되어 무인 항공기를 제어하는 인공지능 조종사에 대응할 수 있다. The control model 130 is a model learned using the steering information 120 inferred together with the state information 110 related to maneuvering, and may correspond to an artificial intelligence pilot mounted on an unmanned aerial vehicle and controlling the unmanned aerial vehicle.

이때, 제어 모델(130)은 입력 정보인 조종 정보(120)와 결과 정보인 상태 정보(110)를 고려하여 지도 학습될 수 있다. 이때, 조종 정보(120)와 상태 정보(110)가 복수인 경우, 제어 모델은 복수의 조종 정보(120)의 평균값과 복수의 상태 정보(110)의 평균값에 기초하여 지도 학습될 수 있다. 예를 들면, 상태 정보 1에 대응하는 조종 정보 1, 상태 정보 2에 대응하는 조종 정보 2 ~ 상태 정보 N에 대응하는 조종 정보 N인 경우, 제어 모델은 상태 정보 1 ~ 상태 정보 N의 평균값과 조종 정보 1 ~ 조종 정보 N의 평균값에 기초하여 지도 학습될 수 있다. At this time, the control model 130 may be supervised and learned in consideration of steering information 120 as input information and state information 110 as result information. At this time, when there is a plurality of steering information 120 and state information 110, the control model may be supervised and learned based on the average value of the plurality of steering information 120 and the average value of the plurality of state information 110. For example, in the case of steering information 1 corresponding to state information 1 and steering information 2 corresponding to state information 2 to steering information N corresponding to state information N, the control model can be supervised and learned based on the average value of state information 1 to state information N and the average value of steering information 1 to steering information N.

보다 구체적으로, 제어 모델(130)은 지도 학습에 더하여 미션 별로 서로 다른 리워드에 기초하여 강화 학습될 수 있다. 구체적으로, 항공기의 제어와 관련하여 이륙, 착륙과 같은 다양한 미션이 있으며, 미션 별로 중요한 요소가 상이할 수 있다. 예를 들면, 착륙 미션인 경우 착륙 과정에서 항공기의 흔들림 정도, 착륙시 속도 제한과 같은 요소가 중요할 수 있다. 따라서, 미션 별로 중요한 요소에 따라 서로 다른 리워드가 할당될 수 있고, 이에 제어 모델(130)은 미션 별 리워드를 고려하여 강화 학습됨으로써 해당 미션을 보다 잘 수행할 수 있다. More specifically, the control model 130 may be reinforced based on different rewards for each mission in addition to supervised learning. Specifically, there are various missions such as take-off and landing related to aircraft control, and important elements may be different for each mission. For example, in the case of a landing mission, factors such as the degree of shaking of the aircraft during landing and the speed limit during landing may be important. Accordingly, different rewards may be allocated according to important factors for each mission, and thus, the control model 130 may perform the corresponding mission better by being reinforced-learned in consideration of the reward for each mission.

실시 예에 따르면, 기동과 관련된 상태 정보(110)로부터 조종 정보(120)를 추론하고, 상태 정보(110)와 조종 정보(120)을 함께 고려하여 제어 모델(130)을 지도 학습 및 강화 학습시킴으로써 학습 효율이 향상될 수 있다. 이때, 지도 학습만 이용할 경우 학습 데이터를 그대로 모방하도록 제어 모델(130)이 학습되지만, 강화 학습을 함께 이용할 경우 미션 별 설정된 리워드에 따라 제어 모델(130)의 학습 효율이 보다 향상될 수 있다. According to the embodiment, learning efficiency can be improved by inferring the steering information 120 from the state information 110 related to the maneuver, and supervising and reinforcing the control model 130 by considering the state information 110 and the steering information 120 together. In this case, when only supervised learning is used, the control model 130 is trained to imitate the learning data as it is. However, when reinforcement learning is used together, the learning efficiency of the control model 130 can be further improved according to a reward set for each mission.

도 2는 일 실시 예에 따른 조종 정보를 추론하는 과정을 나타내는 도면이다. 2 is a diagram illustrating a process of inferring steering information according to an exemplary embodiment.

도 2를 참조하면, 상태 정보(210)와 상태 정보(220)은 하나의 궤적으로부터 감지된 정보(예컨대 궤적, 위치, 자세, 속도, 각속도 등)에 대응할 수 있다. 즉, 여러 시간 동안 감지된 하나의 궤적 정보로부터, 시간 t-n(n은 0 이상의 정수)부터 시간 t까지의 상태 정보(210)와 시간 t+1부터 시간 t+k(k는 1 이상의 자연수)까지의 상태 정보(220)가 확인될 수 있다. Referring to FIG. 2 , state information 210 and state information 220 may correspond to information detected from one trajectory (eg, trajectory, position, attitude, speed, angular velocity, etc.). That is, from one trace information sensed for several times, state information 210 from time t−n (n is an integer greater than or equal to 0) to time t and time t+1 to time t+k (k is a natural number greater than or equal to 1). State information 220 can be identified.

시간 t에서의 조종 정보 는 시간 t-n부터 시간 t까지의 상태 정보(210)와 시간 t+1부터 시간 t+k까지의 상태 정보(220)에 기반하여 추론될 수 있다. 즉, 과거 시간 t-n부터 현재 시간 t까지의 상태 정보(210)와 미래 시간 t+1부터 시간 t+k까지의 상태 정보(220)에 기초하여, 현재 시간 t에서 입력되어야 하는 조종 정보 가 추론될 수 있다. 이는, 항공기와 같은 빠른 속도와 무게가 있는 물체는 궤적이 불연속적이지 않고 연속적으로 변화하는 움직임을 보이므로, 이러한 특성을 고려하여 시간 t에서의 조종 정보 를 추정하기 위하여 과거 시간 t-n부터 미래 시간 t+k까지의 상태 정보가 고려될 수 있다. 예를 들면, 0 ~ 10초 동안 랜딩 과정에서의 궤적 정보로부터 시간 3.7초일 때 조종 정보 는 시간 3.0초부터 시간 3.7초까지의 상태 정보와 시간 3.8초부터 시간 5초까지의 상태 정보에 기초하여 추론될 수 있다. Steering information at time t may be inferred based on state information 210 from time tn to time t and state information 220 from time t+1 to time t+k. That is, based on the state information 210 from the past time tn to the present time t and the state information 220 from the future time t+1 to the time t+k, steering information to be input at the current time t can be inferred. This is because an object with high speed and weight, such as an aircraft, has a trajectory that is not discontinuous but continuously changes, so considering these characteristics, the steering information at time t State information from past time tn to future time t+k may be considered in order to estimate . For example, from trajectory information in the landing process for 0 to 10 seconds, steering information at time 3.7 seconds can be inferred based on state information from time 3.0 seconds to time 3.7 seconds and state information from time 3.8 seconds to time 5 seconds.

이때, 시간 t-n 및 시간 t+k에서, n과 k는 항공기의 타입을 고려하여 상이하게 결정될 수 있다. 예를 들면, 상대적으로 무거운 항공기의 경우 관성에 의해 움직임의 변화가 상대적으로 어려우므로 n과 k는 상대적으로 작을 수 있고, 상대적으로 가벼운 항공기의 경우 관성에 의해 움직임의 변화가 상대적으로 쉬우므로 n과 k는 상대적으로 클 수 있다. 항공기의 무게 외에도, 기동 성능, 조종 정보가 입력되는 단위 시간과 같은 정보를 고려하여 n과 k의 값이 상이하게 결정될 수 있다.At this time, at time t-n and time t+k, n and k may be determined differently in consideration of the type of aircraft. For example, in the case of a relatively heavy aircraft, n and k may be relatively small because change in motion is relatively difficult due to inertia, and in the case of a relatively light aircraft, change in motion is relatively easy due to inertia, so n and k may be relatively large. In addition to the weight of the aircraft, the values of n and k may be determined differently in consideration of information such as maneuvering performance and unit time in which pilot information is input.

도 3은 일 실시 예에 따른 시간에 따른 항공기의 궤적 정보를 나타내는 도면이다. 3 is a diagram illustrating trajectory information of an aircraft over time according to an embodiment.

도 3을 참조하면, 시간 별로 항공기의 궤적을 확인할 수 있다. 시간 t-n부터 시간 t까지의 상태 정보(예컨대, 궤적, 위치, 자세, 속도, 각속도 등)와 시간 t+1부터 시간 t+k까지의 상태 정보를 고려하여, 시간 t에서의 조종 정보 가 추론될 수 있다. Referring to FIG. 3 , it is possible to check the trajectory of an aircraft by time. Steering information at time t, considering state information from time tn to time t (e.g., trajectory, position, attitude, speed, angular velocity, etc.) and state information from time t+1 to time t+k can be inferred.

구체적으로, 시간 t-n부터 시간 t까지의 상태 정보와 시간 t+1부터 시간 t+k까지의 상태 정보를 고려하여, 시간 t에서 입력되어야하는 조종 정보 (예컨대, X-stick, Y-stick, 쓰로틀, 러더 등)가 추론될 수 있다. 또한, 시간 t-n+1부터 시간 t+1까지의 상태 정보와 시간 t+2부터 시간 t+k+1까지의 상태 정보를 고려하여, 시간 t+1에서 입력되어야 하는 조종 정보가 추론될 수 있다. 또한, 시간 t-n+2부터 시간 t+2까지의 상태 정보와 시간 t+3부터 시간 t+k+2까지의 상태 정보를 고려하여 시간 t+2에서 입력되어야 하는 조종 정보가 추론될 수 있다. 마찬가지로, 시간 t-n+m부터 시간 t+m까지의 상태 정보와 시간 t+m+1부터 시간 t+k+m까지의 상태 정보를 고려하여 시간 t+m에서 입력되어야 하는 조종 정보가 추론될 수 있다. 이와 같은 방식에 기초하여, 항공기의 하나의 궤적에서 시간 별로 입력된 조종 정보가 추론될 수 있다. Specifically, considering the state information from time tn to time t and state information from time t + 1 to time t + k, steering information to be input at time t (eg, X-stick, Y-stick, throttle, rudder, etc.) can be inferred. In addition, steering information to be input at time t+1 can be inferred by considering state information from time t-n+1 to time t+1 and state information from time t+2 to time t+k+1. In addition, steering information to be input at time t+2 can be inferred by considering state information from time t-n+2 to time t+2 and state information from time t+3 to time t+k+2. Similarly, steering information to be input at time t+m can be inferred by considering state information from time t-n+m to time t+m and state information from time t+m+1 to time t+k+m. Based on this method, steering information input for each time in one trajectory of the aircraft can be inferred.

도 4는 일 실시 예에 따른 조종 정보를 설명하기 위한 도면이다. 4 is a diagram for explaining steering information according to an exemplary embodiment.

도 4를 참조하면, 그림 410은 항공기의 움직임을 나타내고 그림 420은 항공기를 조종하는 장치를 나타낸다. 그림 420은 일례에 불과하고, 항공기를 조종하는 장치는 이와 상이한 형태일 수 있다. 무인 항공기를 제어하는 제어 모델로부터 조종 장치에 입력된 조종 정보에 따라 항공기는 x축, y축, z축으로 다양한 방향으로 자유롭게 이동할 수 있다. 또는, 항공기를 조종하는 장치에 입력된 조종 정보에 따라 항공기는 롤(roll, Φ) 방향, 피치(pitch, γ) 방향 및 요(yaw, Ψ) 방향 중에서 적어도 하나의 방향으로 회전할 수 있다. 즉, 무인 항공기는 제어 모델에 따라 입력된 조종 정보에 따라 X축 방향, Y축 방향, Z축 방향, 롤 방향, 피치 방향 및 요 방향 중에서 적어도 하나의 방향으로 기동할 수 있다.Referring to FIG. 4 , FIG. 410 shows the movement of an aircraft and FIG. 420 shows a device for controlling an aircraft. Figure 420 is just an example, and the device that controls the aircraft may be of a different type. The aircraft can freely move in various directions in the x-axis, y-axis, and z-axis according to the control information input to the control device from the control model that controls the unmanned aerial vehicle. Alternatively, the aircraft may rotate in at least one of a roll (Φ) direction, a pitch (γ) direction, and a yaw (Ψ) direction according to steering information input to a device for controlling the aircraft. That is, the unmanned aerial vehicle may be maneuvered in at least one of the X-axis direction, the Y-axis direction, the Z-axis direction, the roll direction, the pitch direction, and the yaw direction according to the steering information input according to the control model.

도 5는 일 실시 예에 따른 무인 항공기의 제어를 위한 학습 방법을 나타내는 흐름도이다.5 is a flowchart illustrating a learning method for controlling an unmanned aerial vehicle according to an embodiment.

도 5를 참조하면, 단계 S510에서 유인 항공기의 기동과 관련된 상태 정보를 확인할 수 있다. 여기서, 기동과 관련된 상태 정보는, 유인 항공기의 기동의 결과인 궤적 정보, 위치 정보, 자세 정보, 속도 정보 및 각속도 정보를 포함할 수 있다. 이때, 기동과 관련된 상태 정보는 기동의 결과 정보로서, 유인 항공기에 입력된 조종 정보를 포함하지 않을 수 있다. Referring to FIG. 5 , in step S510 , state information related to the maneuvering of a manned aircraft may be checked. Here, the status information related to maneuvering may include trajectory information, location information, posture information, speed information, and angular velocity information, which are results of the maneuvering of the manned aircraft. At this time, the state information related to the maneuver is the result information of the maneuver and may not include steering information input to the manned aircraft.

단계 S520에서 유인 항공기의 기동과 관련된 상태 정보로부터 유인 항공기를 제어한 조종 정보를 추정할 수 있다. 기동과 관련된 상태 정보는 입력 정보인 조종 정보에 따른 결과 정보로서 조종 정보를 포함하지 않을 수 있다. 즉, 조종 정보의 입력에 의해 상태 정보가 출력되어 감지될 수 있다. In step S520, steering information for controlling the manned aircraft may be estimated from state information related to the maneuvering of the manned aircraft. State information related to maneuvering may not include steering information as result information according to steering information that is input information. That is, state information can be output and sensed by the input of steering information.

이때, 시간 t를 기준으로 시간 t-n(n은 0 이상의 정수)부터 시간 t+k(k는 1 이상의 자연수)까지 상태 정보에 역 동역학(inverse dynamics)를 적용하여 시간 t에서의 조종 정보를 추정할 수 있다. 이는, 항공기와 같은 빠른 속도와 무게가 있는 물체는 궤적이 불연속적이지 않고 연속적으로 변화하는 움직임을 보이므로, 이러한 특성을 고려하여 시간 t에서의 조종 정보 를 추정하기 위하여 과거 시간 t-n부터 미래 시간 t+k까지의 상태 정보가 고려될 수 있다. In this case, steering information at time t can be estimated by applying inverse dynamics to state information from time tn (n is an integer greater than or equal to 0) to time t+k (k is a natural number greater than or equal to 1) based on time t. This is because an object with high speed and weight, such as an aircraft, has a trajectory that is not discontinuous but continuously changes, so considering these characteristics, the steering information at time t State information from past time tn to future time t+k may be considered in order to estimate .

이때, 시간 t-n 및 시간 t+k에서, n과 k는 유인 항공기의 타입을 고려하여 상이하게 결정될 수 있다. 예를 들면, 상대적으로 무거운 항공기의 경우 관성에 의해 움직임의 변화가 상대적으로 어려우므로 n과 k는 상대적으로 작을 수 있고, 상대적으로 가벼운 항공기의 경우 관성에 의해 움직임의 변화가 상대적으로 쉬우므로 n과 k는 상대적으로 클 수 있다. 항공기의 무게 외에도, 기동 성능, 조종 정보가 입력되는 단위 시간과 같은 정보를 고려하여 n과 k의 값이 상이하게 결정될 수 있다.At this time, at time t−n and time t+k, n and k may be determined differently in consideration of the type of manned aircraft. For example, in the case of a relatively heavy aircraft, n and k may be relatively small because change in motion is relatively difficult due to inertia, and in the case of a relatively light aircraft, change in motion is relatively easy due to inertia, so n and k may be relatively large. In addition to the weight of the aircraft, the values of n and k may be determined differently in consideration of information such as maneuvering performance and unit time in which pilot information is input.

또는, 유인 항공기의 타입과 궤적 정보에 기반하여 사전에 기계 학습된 모델을 이용하여 시간 t에서의 조종 정보가 추정될 수 있다. 보다 구체적으로, 사전에 기계 학습된 모델을 이용하여 궤적에 대한 조종 정보가 추정될 수 있다. 예를 들면, 항공기 타입 1의 궤적 정보 1 ~ N, 항공기 타입 2의 궤적 정보 1 ~ M ~ 항공기 타입 T의 궤적 정보 1 ~ Q에 기반하여, 모델은 항공기 타입 별 궤적 정보에 따른 조종 정보에 대해 사전 기계 학습될 수 있다. 이때, 사전 기계 학습된 모델에 기초하여, 항공기 타입 X의 궤적 정보 X에 대응하는 시간 별 조종 정보가 추정될 수 있다. Alternatively, steering information at time t may be estimated using a machine-learned model based on the type and trajectory information of the manned aircraft. More specifically, steering information about a trajectory may be estimated using a model that has been machine-learned in advance. For example, based on the trajectory information 1 to N of aircraft type 1, the trajectory information 1 to M of aircraft type 2, and the trajectory information 1 to Q of aircraft type T, the model can be pre-machine-learned for steering information according to the trajectory information for each aircraft type. At this time, based on the pre-machine-learned model, time-specific steering information corresponding to trajectory information X of aircraft type X may be estimated.

단계 S530에서 조종 정보에 기반하여 무인 항공기에 탑재될 제어 모델을 학습시킬 수 있다. 유인 항공기의 기동과 관련된 상태 정보로부터 추정된 조종 정보에 기반하여, 무인 항공기에 탑재될 인공지능 조종사인 제어 모델이 학습될 수 있다. In step S530, a control model to be loaded into the unmanned aerial vehicle may be learned based on the steering information. Based on the pilot information estimated from state information related to the maneuvering of the manned aircraft, a control model that is an artificial intelligence pilot to be loaded into the unmanned aerial vehicle may be learned.

구체적으로, 추정된 조종 정보를 이용하여 제어 모델이 지도 학습될 수 있다. 이때, 상태 정보와 추정된 조종 정보가 복수인 경우, 제어 모델은 지도 학습 알고리즘을 고려하여 복수의 조종 정보의 평균값과 복수의 상태 정보의 평균값에 기초하여 지도 학습될 수 있다. Specifically, a control model may be supervised and learned using the estimated steering information. In this case, when there is a plurality of state information and estimated steering information, the control model may be supervised and learned based on the average value of the plurality of steering information and the average value of the plurality of state information in consideration of the supervised learning algorithm.

보다 구체적으로, 지도 학습에 더하여 미션 별로 서로 다른 리워드에 기초하여 강화 학습될 수 있다. 구체적으로 미션 별로 중요한 요소에 따라 서로 다른 리워드가 할당될 수 있고, 제어 모델은 미션 별 리워드를 고려하여 강화 학습됨으로써 해당 미션을 보다 잘 수행할 수 있다. More specifically, in addition to supervised learning, reinforcement learning may be performed based on different rewards for each mission. In detail, different rewards may be assigned according to important factors for each mission, and the control model may perform the corresponding mission better by being reinforced-learned in consideration of the reward for each mission.

참고로, 강화 학습(Reinforcement learning)은 기계 학습의 일종으로서, 어떤 환경에서 무인 항공기는 현재의 상태를 인식하여 선택 가능한 액션들 중 리워드를 최대화하는 액션 혹은 액션 순서를 선택하는 학습 방법일 수 있다. 강화 학습에서 다루는 환경은 예를 들어, 마르코프 결정 과정으로 주어질 수 있다. 어떤 시점 t에, 마르코프 결정 과정은 어떤 상태 s에 존재할 수 있다. 항공기는 해당 상태 s에서 어떤 액션 a를 취할 수 있고, 다음 시점 t+1에서 마르코프 결정 과정은 확률적으로 새로운 상태 s'로 전이할 수 있고 이때 리워드가 획득될 수 있다. 무인 항공기는 기계 학습된 모델에 따라 각 시간 단계에서 액션 또는 액션 순서를 결정할 수 있고, 이에 새로운 상태로 전이함에 따라 리워드가 획득될 수 있다. For reference, reinforcement learning is a type of machine learning, and may be a learning method in which an unmanned aerial vehicle recognizes a current state in an environment and selects an action or action sequence that maximizes a reward among selectable actions. The environment dealt with in reinforcement learning may be given, for example, as a Markov decision process. At any point in time t, the Markov decision process can exist in some state s. The aircraft can take some action a in the corresponding state s, and at the next time point t+1, the Markov decision process can stochastically transition to a new state s', at which time a reward can be obtained. The unmanned aerial vehicle may determine an action or an action sequence at each time step according to a machine-learned model, and thus, a reward may be obtained as it transitions to a new state.

도 6은 일 실시 예에 따른 전자 장치의 블록도이다. 도 6은 본 실시 예와 관련된 구성요소들이 도시되어 있으나 이에 제한되는 것은 아니며 도 6에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있다. 6 is a block diagram of an electronic device according to an exemplary embodiment. 6 shows components related to the present embodiment, but is not limited thereto, and other general-purpose components may be further included in addition to the components shown in FIG. 6 .

도 6의 전자 장치(600)는 전술한 무인 항공기의 제어를 위한 학습 방법을 수행하는 기기로서, 무인 항공기에 탑재되어 무인 항공기의 동작을 제어할 수 있다. 전술한 기재와 중복되는 내용이 생략될 수 있다. The electronic device 600 of FIG. 6 is a device that performs the above-described learning method for controlling the unmanned aerial vehicle, and can be mounted on the unmanned aerial vehicle to control the operation of the unmanned aerial vehicle. Redundant content with the above description may be omitted.

도 6을 참조하면, 전자 장치(600)는 메모리(610), 제어부(620) 및 버스(미도시)를 포함할 수 있다. 메모리(610) 및 제어부(620)는 버스(bus)(미도시)를 통하여 서로 통신할 수 있다. 메모리(610) 및 제어부(620) 각각은 적어도 하나의 기능이나 동작을 처리하는 단위를 의미할 수 있으며, 하드웨어나 소프트웨어, 또는, 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Referring to FIG. 6 , an electronic device 600 may include a memory 610, a controller 620, and a bus (not shown). The memory 610 and the controller 620 may communicate with each other through a bus (not shown). Each of the memory 610 and the control unit 620 may refer to a unit that processes at least one function or operation, and may be implemented as hardware or software, or a combination of hardware and software.

실시 예에서, 전자 장치(600)는 다양한 데이터를 저장하는 메모리(610)를 포함할 수 있다. 예를 들어 메모리(610)에는 전자 장치(600)의 동작을 위한 적어도 하나의 명령어(instruction)가 저장될 수 있다. 이러한 경우 메모리(610) 및 제어부(620)는 이러한 명령어를 기반으로 다양한 동작을 수행할 수 있다. 제어부(620)는 메모리(610)에 저장된 명령어가 제어부(620)에서 실행됨에 따라 앞서 언급된 동작들을 수행할 수 있다. 메모리(610)는 휘발성 메모리 또는 비휘발성 메모리일 수 있다. 예를 들어, 메모리(620)는 프로세서(미도시)에서 처리된 데이터들 및 처리될 데이터들을 저장할 수 있다. 메모리(620)는 DRAM(Dynamic Random Access Memory), SRAM(Static Random Access Memory)등과 같은 RAM(Random Access Memory), ROM(Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), CD-ROM, 블루레이 또는 다른 광학 디스크 스토리지, HDD(Hard Disk Drive), SSD(Solid State Drive) 또는 플래시 메모리를 포함할 수 있다.In an embodiment, the electronic device 600 may include a memory 610 that stores various data. For example, at least one instruction for operating the electronic device 600 may be stored in the memory 610 . In this case, the memory 610 and the control unit 620 may perform various operations based on these commands. The controller 620 may perform the aforementioned operations as the command stored in the memory 610 is executed by the controller 620 . Memory 610 may be volatile memory or non-volatile memory. For example, the memory 620 may store data processed by a processor (not shown) and data to be processed. The memory 620 may include random access memory (RAM) such as dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), CD-ROM, Blu-ray or other optical disk storage, hard disk drive (HDD), solid state drive (SSD), or flash memory.

실시 예에서, 명령어가 제어부(620)에서 실행되며, 제어부(620)는 유인 항공기의 기동과 관련된 상태 정보를 확인하고, 상기 유인 항공기의 기동과 관련된 상태 정보로부터 상기 유인 항공기를 제어한 조종 정보를 추정하고, 상기 조종 정보에 기반하여 무인 항공기에 탑재될 제어 모델이 학습하는 제어할 수 있다.In an embodiment, a command is executed by the control unit 620, and the control unit 620 can check state information related to the maneuvering of the manned aircraft, estimate steering information for controlling the manned aircraft from the state information related to the maneuvering of the manned aircraft, and learn and control a control model to be mounted on the unmanned aerial vehicle based on the steering information.

전술한 실시 예들에 따른 전자 장치 또는 단말은, 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-Access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. An electronic device or terminal according to the above-described embodiments may include a processor, a memory for storing and executing program data, a permanent storage unit such as a disk drive, a communication port for communicating with an external device, and a user interface device such as a touch panel, a key, and a button. Methods implemented as software modules or algorithms may be stored on a computer-readable recording medium as computer-readable codes or program instructions executable on the processor. Here, computer-readable recording media include magnetic storage media (e.g., read-only memory (ROM), random-access memory (RAM), floppy disk, hard disk, etc.) and optical reading media (e.g., CD-ROM), DVD (Digital Versatile Disc) and the like. A computer-readable recording medium may be distributed among computer systems connected through a network, and computer-readable codes may be stored and executed in a distributed manner. The medium may be readable by a computer, stored in a memory, and executed by a processor.

본 실시 예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들은 특정 기능들을 실행하는 다양한 개수의 하드웨어 또는/및 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 실시 예는 하나 이상의 마이크로프로세서들의 제어 또는 다른 제어 장치들에 의해서 다양한 기능들을 실행할 수 있는, 메모리, 프로세싱, 로직(logic), 룩 업 테이블(look-up table) 등과 같은 직접 회로 구성들을 채용할 수 있다. 구성 요소들이 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있는 것과 유사하게, 본 실시 예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, C#, 파이썬(python), 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 실시 예는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. “매커니즘”, “요소”, “수단”, “구성”과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 상기 용어는 프로세서 등과 연계하여 소프트웨어의 일련의 처리들(routines)의 의미를 포함할 수 있다.This embodiment can be presented as functional block structures and various processing steps. These functional blocks may be implemented with any number of hardware or/and software components that perform specific functions. For example, an embodiment may employ integrated circuit configurations such as memory, processing, logic, look-up tables, etc. that may execute various functions by control of one or more microprocessors or other control devices. Similar to components that can be implemented as software programming or software elements, embodiments can be implemented in programming or scripting languages such as C, C++, C#, Python, Java, assembler, etc., including various algorithms implemented as data structures, processes, routines, or combinations of other programming constructs. Functional aspects may be implemented in an algorithm running on one or more processors. In addition, this embodiment may employ conventional techniques for electronic environment setting, signal processing, and/or data processing. Terms such as “mechanism”, “element”, “means” and “composition” may be used broadly and are not limited to mechanical and physical components. The term may include a meaning of a series of software routines in association with a processor or the like.

전술한 실시예들은 일 예시일 뿐 후술하는 청구항들의 범위 내에서 다른 실시예들이 구현될 수 있다.The foregoing embodiments are merely examples and other embodiments may be implemented within the scope of the claims described below.

Claims

Sensing time-specific status information corresponding to a trajectory according to the maneuvering of the manned aircraft;
estimating steering information for controlling the manned aircraft from the state information for each time period corresponding to the trajectory of the manned aircraft; and
Learning a control model to be mounted on an unmanned aerial vehicle based on the steering information;
The state information includes position information, attitude information, speed information, and angular velocity information for the manned aircraft,
The steering information includes information input to the manned aircraft to output the status information;
The step of estimating the steering information,
Estimating steering information at time t by applying inverse dynamics to the state information from time tn (n is an integer greater than or equal to 0) to time t+k (k is a natural number greater than or equal to 1) based on time t,
A learning method for controlling unmanned aerial vehicles.

delete

According to claim 1,
At the time t + k, the k is determined differently depending on the type of the manned aircraft,
A learning method for controlling unmanned aerial vehicles.

According to claim 1,
The step of estimating the steering information,
Estimating steering information at time t using a model learned in advance based on the type of the manned aircraft and the trajectory information,
A learning method for controlling unmanned aerial vehicles.

According to claim 1,
The step of learning the control model,
Including the step of supervising and learning the control model using the estimated steering information,
A learning method for controlling unmanned aerial vehicles.

According to claim 6,
The step of learning the control model,
Reinforcement learning of the control model by providing different rewards for each mission based on the estimated steering information.
A learning method for controlling unmanned aerial vehicles.

As a non-transitory computer-readable storage medium,
a medium configured to store computer readable instructions;
The computer readable instructions, when executed by a processor, cause the processor to:
Sensing time-specific status information corresponding to a trajectory according to the maneuvering of the manned aircraft;
estimating steering information for controlling the manned aircraft from the state information for each time period corresponding to the trajectory of the manned aircraft; and
Learning a control model to be mounted on an unmanned aerial vehicle based on the steering information
including,
The state information includes position information, attitude information, speed information, and angular velocity information for the manned aircraft,
The steering information includes information input to the manned aircraft to output the status information;
The step of estimating the steering information,
Estimating steering information at time t by applying inverse dynamics to the state information from time tn (n is an integer greater than or equal to 0) to time t+k (k is a natural number greater than or equal to 1) based on time t,
A non-transitory computer-readable storage medium that enables a learning method for controlling an unmanned aerial vehicle to be performed.

a memory for storing at least one instruction; and
A controller that detects state information for each time period corresponding to the trajectory of the manned aircraft, estimates steering information for controlling the manned aircraft from the state information for each time period corresponding to the trajectory of the manned aircraft, and controls a control model to be installed in the unmanned aerial vehicle based on the steering information to learn;
The state information includes position information, attitude information, speed information, and angular velocity information for the manned aircraft,
The steering information includes information input to the manned aircraft to output the status information;
The control unit applies inverse dynamics to the state information from time tn (n is an integer greater than or equal to 0) to time t+k (k is a natural number greater than or equal to 1) based on time t to estimate steering information at time t,
electronic device.