KR102473196B1

KR102473196B1 - System and method for training ai character type of npc for virtual training

Info

Publication number: KR102473196B1
Application number: KR1020200095621A
Authority: KR
Inventors: 장민혁; 김동현
Original assignee: 한국전자기술연구원; 전주대학교 산학협력단
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2022-12-01
Also published as: KR20220015523A

Abstract

가상훈련을 위한 NPC형 AI 캐릭터 학습 방법이 제공된다. 상기 방법은 가상훈련 콘텐츠 내에서의 NPC(Non Player Character)형 AI 캐릭터의 가상훈련을 위한 시나리오(이하, 가상훈련 시나리오)에 대하여 상태 정보-동작 정보에 대한 각 변수의 쌍으로 구성된 적어도 하나의 행동 가치 함수에 기반하여 정의된 가치 테이블 모델을 생성하는 단계; 상기 상태 정보, 상기 동작 정보와 리워드 정보 및 다음 상태 정보를 입력 데이터로 설정하고, 상기 가상훈련 시나리오에서 정의된 규칙 과정에 해당하는 리워드를 출력 데이터로 설정하여 상기 가치 테이블 모델을 갱신 및 강화 학습하는 단계; 및 상기 NPC형 AI 캐릭터에 대하여, 상기 가상훈련 시나리오에서의 상태 정보가 변경됨에 따라 상기 갱신 및 강화 학습된 가치 테이블 모델에 기초하여 결정된 최적 행동을 수행시키는 단계를 포함한다.An NPC type AI character learning method for virtual training is provided. The method includes at least one action consisting of a pair of variables for state information-action information for a scenario for virtual training of an NPC (Non Player Character) type AI character in virtual training content (hereinafter, a virtual training scenario). generating a value table model defined based on the value function; Setting the state information, the action information and reward information, and the next state information as input data, and setting a reward corresponding to the rule process defined in the virtual training scenario as output data to update and reinforce the value table model step; and performing, for the NPC-type AI character, an optimal action determined based on the updated and reinforced-learned value table model as state information in the virtual training scenario is changed.

Description

NPC type AI character learning system and method for virtual training {SYSTEM AND METHOD FOR TRAINING AI CHARACTER TYPE OF NPC FOR VIRTUAL TRAINING}

본 발명은 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템 및 방법에 관한 것이다.The present invention relates to an NPC type AI character learning system and method for virtual training.

가상현실 콘텐츠를 이용한 가상훈련의 경우 실제와 같은 상황에 대처하여 훈련이 가능하다는 장점이 있다. 하지만, 종래 가상훈련의 경우 대부분 고정된 스토리만으로 훈련이 진행되었으며, 반복 훈련을 하는 사용자는 쉽게 지루함을 느끼게 되어 훈련 효과가 점점 낮아진다는 문제가 있었다.In the case of virtual training using virtual reality contents, there is an advantage that training is possible by coping with real situations. However, in the case of conventional virtual training, most of the training was conducted with only a fixed story, and users who practiced repetitive training easily became bored, and there was a problem that the training effect gradually decreased.

또한, 기존의 게임 또는 가상훈련에서 사용되는 NPC(Non Player Character)의 경우, 지정 행동만을 수행하거나 개발자가 정의해 놓은 규칙에 따라 특정 입력 값이 발생할 경우 이에 대응하는 행동만을 수행하였다. 이에 따라 기존 NPC의 경우 지정되지 않은 상황에 대한 대처가 불가능하였으며, 이는 다양한 훈련 상황 연출에 많은 제한을 가져오는 문제가 있었다.In addition, in the case of NPCs (Non Player Characters) used in existing games or virtual training, only designated actions were performed or only corresponding actions were performed when a specific input value occurred according to the rules defined by the developer. Accordingly, in the case of existing NPCs, it was impossible to cope with unspecified situations, which had a problem that brought many limitations to directing various training situations.

본 발명의 실시예는 가상훈련에서 사용자의 행동 및 훈련상황을 돕거나 협업하여 다양한 훈련을 수행하는 NPC형 AI 캐릭터에 관한 것으로, 훈련에 참여하는 사용자의 훈련을 돕거나 상황에 맞게 자율적으로 행동하여 다양한 훈련 상황을 연출하고 사용자와 협동하여 훈련상황을 해결하는 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템 및 방법을 제공한다.Embodiments of the present invention relate to an NPC-type AI character that performs various training by helping or collaborating with a user's behavior and training situation in virtual training, helping users participating in training or autonomously acting according to the situation Provides an NPC-type AI character learning system and method for virtual training that creates various training situations and cooperates with users to solve training situations.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제1 측면에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템 및 방법은 가상훈련 콘텐츠 내에서의 NPC(Non Player Character)형 AI 캐릭터의 가상훈련을 위한 시나리오(이하, 가상훈련 시나리오)에 대하여 상태 정보-동작 정보에 대한 각 변수의 쌍으로 구성된 적어도 하나의 행동 가치 함수에 기반하여 정의된 가치 테이블 모델을 생성하는 단계; 상기 상태 정보, 상기 동작 정보와 리워드 정보 및 다음 상태 정보를 입력 데이터로 설정하고, 상기 가상훈련 시나리오에서 정의된 규칙 과정에 해당하는 리워드를 출력 데이터로 설정하여 상기 가치 테이블 모델을 갱신 및 강화 학습하는 단계; 및 상기 NPC형 AI 캐릭터에 대하여, 상기 가상훈련 시나리오에서의 상태 정보가 변경됨에 따라 상기 갱신 및 강화 학습된 가치 테이블 모델에 기초하여 결정된 최적 행동을 수행시키는 단계를 포함한다.As a technical means for achieving the above-described technical problem, the NPC type AI character learning system and method for virtual training according to the first aspect of the present invention is a virtual non player character (NPC) AI character in virtual training content. generating a value table model defined based on at least one action value function composed of each variable pair of state information and motion information for a scenario for training (hereinafter, a virtual training scenario); Setting the state information, the action information and reward information, and the next state information as input data, and setting a reward corresponding to the rule process defined in the virtual training scenario as output data to update and reinforce the value table model step; and performing, for the NPC-type AI character, an optimal action determined based on the updated and reinforced-learned value table model as state information in the virtual training scenario is changed.

본 발명의 일부 실시예에서, 상기 가치 테이블 모델을 생성하는 단계는, 상기 가상훈련 시나리오에 대하여 적어도 하나의 상태 정보-동작 정보의 쌍에 대응되도록 설정된 상기 리워드 정보를 포함하는 리워드 테이블 모델을 생성하는 단계; 상기 가상훈련 시나리오에서의 현재 상태 이후 다음 상태 정보 및 상기 다음 상태 정보에서의 모든 동작 정보를 변수로 하는 상기 행동 가치 함수의 최대값을 산출하는 단계; 및 상기 행동 가치 함수의 최대값에 가중치를 부여하고, 상기 리워드 테이블 모델에 이를 합산하여 상기 가치 테이블 모델을 생성하는 단계를 포함한다.In some embodiments of the present invention, the generating of the value table model includes generating a reward table model including the reward information set to correspond to at least one state information-action information pair for the virtual training scenario. step; calculating a maximum value of the action value function having next state information after the current state in the virtual training scenario and all motion information in the next state information as variables; and generating the value table model by assigning a weight to a maximum value of the action value function and adding the maximum value to the reward table model.

본 발명의 일부 실시예에서, 상기 리워드 정보는 상기 가상훈련 시나리오에서 정의된 제1 규칙 과정에서 제2 규칙 과정으로 이동함에 따라 제공되며, 상기 제2 규칙 과정이 상기 가상훈련 시나리오에서의 목표점에 해당하는 경우 제1 리워드보다 큰 제2 리워드가 부여될 수 있다.In some embodiments of the present invention, the reward information is provided when moving from a first rule process defined in the virtual training scenario to a second rule process, and the second rule process corresponds to a target point in the virtual training scenario. In this case, a second reward greater than the first reward may be granted.

본 발명의 일부 실시예에서, 상기 가치 테이블 모델을 갱신 및 강화 학습하는 단계는, 상기 가상훈련 시나리오에서 정의된 규칙 과정에 도달시마다, DNN(Deep Neural Network)과 LSTM(Long Short Term Memory)이 결합된 알고리즘에 기초하여 상기 가치 테이블 모델을 갱신 및 강화 학습할 수 있다.In some embodiments of the present invention, in the updating and reinforcement learning of the value table model, whenever a rule process defined in the virtual training scenario is reached, a deep neural network (DNN) and a long short term memory (LSTM) are combined. Based on the algorithm, the value table model may be updated and reinforced.

본 발명의 일부 실시예에서, 상기 갱신 및 강화 학습된 가치 테이블 모델에 기초하여 결정된 최적 행동을 수행시키는 단계는, 상기 결정된 최적 행동에 상응하는 모션 데이터 셋을 독출하는 단계; 및 상기 모션 데이터 셋에 기초하여 가상훈련 콘텐츠 내 상기 NPC형 AI 캐릭터를 위한 애니메이션을 구동시키는 단계를 포함할 수 있다.In some embodiments of the present invention, performing the determined optimal action based on the updated and reinforcement-learned value table model may include reading a motion data set corresponding to the determined optimal action; and driving an animation for the NPC-type AI character in the virtual training content based on the motion data set.

본 발명의 일부 실시예에서, 상기 NPC형 AI 캐릭터는 조작 가능한 신체 구성을 위한 관절 정보를 포함하며, 상기 동작 정보에 상응하도록 상기 관절 정보에 따른 관절이 조작되며, 상기 동작 정보는 적어도 하나의 상기 모션 데이터를 포함하며, 상기 가상훈련 시나리오에 따라 정의되고, 상기 가상훈련 시나리오에서 발생 가능한 동작에 대하여 정의될 수 있다.In some embodiments of the present invention, the NPC-type AI character includes joint information for a manipulable body composition, and a joint according to the joint information is manipulated to correspond to the motion information, and the motion information includes at least one of the above It includes motion data, is defined according to the virtual training scenario, and can be defined for motions that can occur in the virtual training scenario.

본 발명의 일부 실시예에서, 상기 상태 정보는 상기 가상훈련 시나리오를 위한 가상훈련 맵(Map) 데이터에서의 상기 NPC형 AI 캐릭터의 위치, 방향 및 이동 가능 경로의 위치변수를 정의하고, 상기 가상훈련 시나리오에 따른 현재 환경 상황변수, 오브젝트의 상태값, 상기 NPC형 AI 캐릭터의 행동값 및 사용자의 행동을 정의한 데이터일 수 있다.In some embodiments of the present invention, the state information defines the location, direction, and location variables of the movable path of the NPC-type AI character in the virtual training map data for the virtual training scenario, and the virtual training It may be data defining the current environment situation variable according to the scenario, the state value of the object, the behavior value of the NPC-type AI character, and the user's behavior.

또한, 본 발명의 제2측면에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템은 가상훈련 콘텐츠 내에서의 NPC(Non Player Character)형 AI 캐릭터의 가상훈련을 위한 시나리오(이하, 가상훈련 시나리오)에 기초하여 상기 NPC형 AI 캐릭터의 가상훈련을 위한 프로그램이 저장된 메모리 및 상기 메모리에 저장된 프로그램을 실행시키는 프로세서를 포함한다. 이때, 상기 프로세서는 상기 프로그램을 실행시킴에 따라, 상기 가상훈련 시나리오에 대하여 적어도 하나의 상태 정보-동작 정보의 쌍으로 구성된 행동 가치 함수에 기반하여 정의된 가치 테이블 모델을 생성하고, 상기 생성된 가치 테이블 모델을 갱신 및 강화 학습하며, 상기 NPC형 AI 캐릭터에 대하여, 상기 가상훈련 시나리오에서의 상태 정보가 변경됨에 따라 상기 갱신 및 강화 학습된 가치 테이블 모델에 기초하여 결정된 최적 행동을 수행시키며, 상기 프로세서는 상기 상태 정보 및 상기 동작 정보와 리워드 정보 및 다음 상태 정보를 입력 데이터로 설정하고, 상기 가상훈련 시나리오에서 정의된 규칙 과정에 해당하는 리워드를 출력 데이터로 설정하여 상기 가치 테이블 모델을 갱신 및 강화 학습시킨다.In addition, the NPC type AI character learning system for virtual training according to the second aspect of the present invention is a scenario for virtual training of NPC (Non Player Character) type AI characters in virtual training contents (hereinafter referred to as virtual training scenarios). Based on the above, a memory for storing a program for virtual training of the NPC-type AI character and a processor for executing the program stored in the memory. At this time, as the processor executes the program, the processor generates a value table model defined based on an action value function composed of at least one state information-action information pair for the virtual training scenario, and the generated value Updating and reinforcement learning a table model, and performing an optimal action determined based on the updated and reinforcement-learned value table model as state information in the virtual training scenario changes for the NPC-type AI character, and the processor sets the state information, the action information, the reward information, and the next state information as input data, and sets the reward corresponding to the rule process defined in the virtual training scenario as output data to update the value table model and to perform reinforcement learning. let it

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, another method for implementing the present invention, another system, and a computer readable recording medium recording a computer program for executing the method may be further provided.

기존 AI 캐릭터들은 사용자와의 직접적인 인터랙션이 요구되며, 특히 음성인식을 통한 대화형 서비스에 국한되는 반면, 상기와 같은 본 발명에 따르면, 가상훈련에 특화된 NPC형 AI 캐릭터가 가상훈련 상황에 따른 최적의 행동을 수행하여 사용자와 공동으로 훈련에 참여하는 역할을 수행할 수 있다.Existing AI characters require direct interaction with users, and are limited to interactive services through voice recognition. It can perform a role to participate in training jointly with the user by performing an action.

이는 단순히 훈련 상황을 보조하는 것에 그치지 않고, NPC형 AI 캐릭터의 행동 결과에 따라 새로운 훈련 상황을 발생시키는 등 현실에서 발생 가능한 다양한 훈련상황 연출을 통한 훈련 시나리오의 복잡성 향상을 기반으로 실제 훈련과 같은 효과를 극대화할 수 있다. This does not stop at simply assisting the training situation, but has the same effect as actual training based on the improvement of the complexity of training scenarios through the production of various training situations that can occur in reality, such as generating new training situations according to the behavioral results of NPC-type AI characters. can maximize

또한, 기존의 가상훈련들은 정해진 시나리오를 기반으로 1인 혹은 다수 사용자가 동시에 접속하여 주어진 학습 목표를 반복적으로 수행하는 것으로, 동일한 훈련과정 및 결과 도출로 인하여 실제 상황에 대한 다양한 대처 능력 상황에는 도움이 되지 않는다.In addition, existing virtual training is a method in which one or multiple users simultaneously access and repeatedly perform a given learning goal based on a set scenario, which is helpful for various situations in coping with real situations due to the same training process and derivation of results. It doesn't work.

반면, 본 발명의 일 실시예는 NPC형 AI 캐릭터를 통하여 다인 훈련, 협동 훈련, 피훈련자 대응 훈련 등 다양한 가상훈련이 가능하며, 사용자의 행동 또는 NPC의 행동에 따른 다양한 중간 과정이 발생하여 실제 상황과 같은 대처 훈련이 가능하다는 장점이 있다.On the other hand, in one embodiment of the present invention, various virtual trainings such as multi-person training, cooperative training, and trainee response training are possible through NPC-type AI characters, and various intermediate processes occur according to the user's behavior or NPC's behavior, resulting in a real situation. It has the advantage that coping training such as

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 일 실시예에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 방법의 순서도이다.
도 2는 가치 테이블 모델을 생성하는 과정을 설명하기 위한 순서도이다.
도 3은 행동 가치 함수에 기반하여 생성되는 가치 테이블 모델을 설명하기 위한 도면이다.
도 4는 보상 테이블 모델을 설명하기 위한 도면이다.
도 5는 DNN과 LSTM이 결합된 알고리즘에 기초하여 가치 테이블 모델을 갱신 및 강화학습하는 내용의 일 예시를 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템의 블록도이다.1 is a flowchart of a NPC-type AI character learning method for virtual training according to an embodiment of the present invention.
2 is a flowchart for explaining a process of generating a value table model.
3 is a diagram for explaining a value table model generated based on an action value function.
4 is a diagram for explaining a compensation table model.
5 is a diagram showing an example of updating and reinforcement learning of a value table model based on an algorithm combining DNN and LSTM.
6 is a block diagram of an NPC-type AI character learning system for virtual training according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, only these embodiments are intended to complete the disclosure of the present invention, and are common in the art to which the present invention belongs. It is provided to fully inform the person skilled in the art of the scope of the invention, and the invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.Terminology used herein is for describing the embodiments and is not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. As used herein, "comprises" and/or "comprising" does not exclude the presence or addition of one or more other elements other than the recited elements. Like reference numerals throughout the specification refer to like elements, and “and/or” includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various components, these components are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first element mentioned below may also be the second element within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those skilled in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

본 발명은 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템(100) 및 방법에 관한 것이다.The present invention relates to a NPC type AI character learning system 100 and method for virtual training.

기존의 AI 기술을 활용한 캐릭터 관련 기술들은 대화형, 정보 제공형 서비스를 위한 기술들이 대부분이었다. 이는 사용자의 질문에 적절한 답변을 제공하거나 관련 명령을 수행하여 정보를 제공해주는 기술에 대한 학습이 주된 목적이었다.Most of the character-related technologies using existing AI technologies were for interactive and information-providing services. The main purpose of this was to learn about technology that provides information by providing appropriate answers to user's questions or executing related commands.

이와 달리, 본 발명은 사용자의 명령이 없어도 주어진 상황과 사용자의 행동에 따라 최적의 행동 요령을 학습하여 가상훈련에 적합한 자율적 행동이 가능하며, 상황에 맞게 능동적 대처가 가능한 NPC형 AI 캐릭터 구현이 가능하다.In contrast, the present invention enables autonomous behavior suitable for virtual training by learning optimal behavioral tips according to a given situation and user's behavior even without a user's command, and it is possible to implement an NPC-type AI character capable of actively coping according to the situation do.

이러한 본 발명의 일 실시예는, 종래기술에서의 음성인식을 통한 최적 대화 생성 학습보다는, 가상훈련 시나리오 상황, 사용자 상태 값 등을 인식하고 해당 훈련의 표준절차에 따른 행동 값을 결정하는 강화 학습을 중점으로 하는 것을 특징으로 한다.In one embodiment of the present invention, reinforcement learning for recognizing virtual training scenario situations, user state values, etc. and determining action values according to the standard procedure of the corresponding training, rather than optimal dialogue creation learning through voice recognition in the prior art. It is characterized by being the focus.

이하에서는 도 1 내지 도 5를 참조하여 본 발명의 일 실시예에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템(100)에 의해 수행되는 방법에 대하여 설명하도록 한다.Hereinafter, a method performed by the NPC-type AI character learning system 100 for virtual training according to an embodiment of the present invention will be described with reference to FIGS. 1 to 5.

도 1은 본 발명의 일 실시예에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 방법의 순서도이다.1 is a flowchart of a NPC-type AI character learning method for virtual training according to an embodiment of the present invention.

먼저, 가상훈련 콘텐츠 내에서의 NPC(Non Player Character)형 AI 캐릭터의 가상훈련을 위한 시나리오(이하, 가상훈련 시나리오)에 대하여 상태 정보 및 동작 정보에 대한 각 변수의 쌍으로 구성된 적어도 하나의 행동 가치 함수에 기반하여 정의된 가치 테이블 모델을 생성한다(S110).First, at least one action value composed of a pair of variables for state information and motion information for a scenario for virtual training of an NPC (Non Player Character) type AI character in virtual training content (hereinafter, a virtual training scenario) A value table model defined based on the function is created (S110).

일 실시예로, 본 발명은 NPC형 AI 캐릭터를 위해 가상훈련에 필요한 캐릭터의 3D 모델링을 수행할 수 있다.As an embodiment, the present invention may perform 3D modeling of a character required for virtual training for an NPC-type AI character.

참고로, NPC(Non Player Character)는 게임 안에서 사용자 또는 훈련자가 직접 조종할 수 없는 캐릭터로, 사용자에게 퀘스트 등 다양한 콘텐츠를 제공하는 일종의 도우미 캐릭터이다.For reference, an NPC (Non Player Character) is a character that cannot be directly controlled by a user or trainer in a game, and is a kind of helper character that provides various contents such as quests to the user.

NPC는 TRPG(Table Role Playing Game)에서 유래된 말로, PC(Player Character)와 상반된 의미를 갖는다. 대부분 NPC는 한 자리 또는 한 지역에 머물면서 게임의 원활한 진행을 위한 도우미 역할을 한다. 영화나 드라마의 엑스트라 조연처럼 배경 역할을 하기도 하고, 사용자가 수행해야 할 퀘스트나 퀘스트 수행 이후 아이템 등 콘텐츠를 제공하기도 한다.NPC is a word derived from TRPG (Table Role Playing Game), and has the opposite meaning to PC (Player Character). Most NPCs stay in one place or area and serve as helpers for the smooth progress of the game. It serves as a background like an extra supporting character in a movie or drama, and also provides content such as a quest to be performed by the user or an item after completing the quest.

이러한 NPC는 주로 온라인 게임 속에서 서비스 공급 업체가 직접 조종하는 캐릭터로 사용되며, PC 게임에서는 점수를 얻기 위한 대상 등으로 자주 등장한다. 본 발명의 일 실시예에서는 사용자의 훈련을 돕거나 협력하는 가상훈련에 적용되는 것을 특징으로 한다.These NPCs are mainly used as characters directly controlled by service providers in online games, and often appear as objects to obtain points in PC games. In one embodiment of the present invention, it is characterized in that it is applied to virtual training that helps or cooperates with the user's training.

일 실시예로, 본 발명에서의 NPC형 AI 캐릭터는 조작 가능한 신체 구성을 위한 관절(Born) 정보를 포함하며, 동작 정보에 상응하도록 관절 정보에 따른 관절이 조작된다.As an embodiment, the NPC-type AI character in the present invention includes born information for a body composition that can be manipulated, and the joints are manipulated according to the joint information to correspond to the motion information.

여기에서 동작 정보는 적어도 하나의 모션 데이터를 포함한다. 동작 정보는 가상훈련 시나리오에 따라 정의되며, 가상훈련 시나리오에서 발생 가능한 동작에 대하여 정의된다. 일 예로, 동작 정보는 가상훈련 콘텐츠를 플레이하는 NPC형 AI 캐릭터의 행동을 의미하며, 캐릭터의 움직임이 이에 해당한다.Here, motion information includes at least one piece of motion data. Motion information is defined according to the virtual training scenario, and is defined for motions that can occur in the virtual training scenario. As an example, the motion information refers to the behavior of an NPC-type AI character playing virtual training content, and the motion of the character corresponds to this.

동작 정보에 포함되는 모션 데이터는 NPC형 AI 캐릭터 모델링에 실시간 적용이 가능하도록 모션 데이터 셋(Rigging data sat)으로 구성되며, 모션 데이터는 상황별 결정 행동에 대응하여 동기화될 수 있다.Motion data included in the motion information is composed of a motion data set (Rigging data sat) so that it can be applied to NPC-type AI character modeling in real time, and the motion data can be synchronized in response to decision actions for each situation.

한편, S110 단계에서 생성되는 가치 테이블 모델(Quality Table Model)은 가상훈련 시나리오에 대하여 상태 정보-동작 정보에 대한 각 변수의 쌍으로 구성된 적어도 하나의 행동 가치 함수에 기반하여 정의된다.Meanwhile, the value table model (Quality Table Model) generated in step S110 is defined based on at least one action value function composed of each variable pair of state information and motion information for the virtual training scenario.

도 2는 가치 테이블 모델을 생성하는 과정을 설명하기 위한 순서도이다. 도 3은 행동 가치 함수에 기반하여 생성되는 가치 테이블 모델을 설명하기 위한 도면이다. 도 4는 보상 테이블 모델을 설명하기 위한 도면이다.2 is a flowchart for explaining a process of generating a value table model. 3 is a diagram for explaining a value table model generated based on an action value function. 4 is a diagram for explaining a compensation table model.

일 실시예로, 본 발명은 가상훈련 시나리오에 대하여 적어도 하나의 상태 정보-동작 정보의 쌍에 대응되도록 설정된 리워드 정보를 포함하는 리워드 테이블 모델(Reward Table Model)을 생성한다(S111).As an embodiment, the present invention generates a reward table model including reward information set to correspond to at least one state information-action information pair for a virtual training scenario (S111).

즉, 본 발명의 일 실시예는 지정된 가상 훈련 시나리오에서의 보상 정보를 리워드 테이블로 저장하여 관리할 수 있다.That is, an embodiment of the present invention may store and manage compensation information in a designated virtual training scenario as a reward table.

여기에서 리워드 정보는 가상훈련 콘텐츠를 플레이하면서 회득하는 일종의 점수(score)로서, 가상훈련 시나리오에서 정의된 제1 규칙 과정에서 제2 규칙 과정으로 이동함에 따라 제공될 수 있다. Here, the reward information is a kind of score obtained while playing the virtual training content, and may be provided as the user moves from the first rule process defined in the virtual training scenario to the second rule process.

이때, 제2 규칙 과정이 가상훈련 시나리오에서의 목표점에 해당하는 경우 제1 리워드보다 큰 제2 리워드가 부여될 수 있다. 이때, 규칙 과정은 NPC형 AI 캐릭터가 가상훈련 콘텐츠에서의 현재 상태 정보에서 리워드를 최대한 얻기 위한 동작 정보를 선택하는 전략을 의미한다.In this case, when the second rule process corresponds to the target point in the virtual training scenario, a second reward greater than the first reward may be granted. At this time, the rule process means a strategy for selecting motion information for the NPC-type AI character to obtain the maximum reward from the current state information in the virtual training content.

도 4의 예시를 참조하면, 보상 테이블 모델에 따라 어느 규칙 과정에서 다음 규칙 과정으로 이동시 리워드가 부여되며, 목표점에 도달하는 이동시에는 제2 리워드로 100점이 부여된다. NPC형 AI 캐릭터는 이러한 리워드가 누적됨에 따라 가장 큰 리워드 점수를 갖는 규칙 과정 플로우를 최적 행동으로 결정하여 수행할 수 있다. Referring to the example of FIG. 4 , a reward is given when moving from one rule process to the next rule process according to the reward table model, and 100 points are given as a second reward when moving to reach a target point. As these rewards accumulate, the NPC-type AI character can determine and perform the rule process flow with the largest reward score as the optimal action.

그 다음, 가상훈련 시나리오에서의 현재 상태 이후 다음 상태 정보(next state) 및 다음 상태 정보에서의 모든 동작 정보(all actions)를 변수로 하는 행동 가치 함수의 최대값(Max[Q(next state, all actions)])을 산출한다(S113).Then, the maximum value of the action value function (Max [Q (next state, all actions)]) is calculated (S113).

그 다음, 행동 가치 함수의 최대값에 가중치를 부여하고, 리워드 테이블 모델에 이를 합산하여 하기 식 1에 따라 가치 테이블 모델을 생성할 수 있다(S115).Then, a value table model may be generated according to Equation 1 below by assigning a weight to the maximum value of the action value function and summing it to the reward table model (S115).

[식 1][Equation 1]

Q(state, action)=R(state, action) +Gamma(가중치)*Max[Q(next state, all actions)]Q(state, action)=R(state, action) +Gamma(weight)*Max[Q(next state, all actions)]

이와 같이 생성되는 가치 테이블 모델은 적어도 하나의 행동 가치 함수를 포함하여 구성되며, 행동 가치 함수는 상태 정보와 동작 정보를 변수로 하여 정의될 수 있다.The value table model created in this way includes at least one action value function, and the action value function may be defined using state information and action information as variables.

여기에서 상태 정보는 가상훈련 시나리오를 위한 가상훈련 맵(Map) 데이터에서의 NPC형 AI 캐릭터의 위치, 방향 및 이동 가능 경로의 위치변수를 정의하고, 가상훈련 시나리오에 따른 현재 환경 상황변수, 오브젝트의 상태 값, NPC형 AI 캐릭터의 행동 값 및 사용자의 행동을 정의한 데이터를 의미한다. 예를 들어, 상태 정보는 가상훈련 콘텐츠에서의 각 물체들의 위치, 속도, 벽과의 거리 등을 의미한다.Here, the state information defines the position and direction of the NPC-type AI character in the virtual training map data for the virtual training scenario, and the location variable of the movable path, and the current environment situation variable according to the virtual training scenario, the object It refers to data that defines state values, behavior values of NPC-type AI characters, and user behavior. For example, state information means the position, speed, and distance of each object in the virtual training content, and the wall.

가치 테이블 모델의 각 행동 가치 함수는 상태 정보와 동작 정보를 변수로 하여 가치 값(Q)을 산출한다. 이러한 가치 테이블 모델은 해당 동작 또는 상태 정보가 미래에 얼마나 큰 리워드를 가져올 것인지에 관한 기대 값에 관한 정보를 별도로 획득할 수도 있다.Each action value function of the value table model calculates a value value (Q) using state information and action information as variables. Such a value table model may separately acquire information on an expected value of how much reward the corresponding operation or state information will bring in the future.

이와 같은 본 발명의 일 실시예는 동작 정보에 따른 지정 행동 수행 중 환경 변화, 사용자의 행동에 대응하여 실시간 인터랙션을 수행하기 위해 강화 학습을 적용하는 것을 특징으로 한다.An embodiment of the present invention as described above is characterized in that reinforcement learning is applied to perform real-time interaction in response to a user's behavior and environmental change while performing a specified behavior according to motion information.

다시 도 1을 참조하면, 다음으로 S110 단계에서 생성된 가상 테이블 모델을 갱신 및 강화 학습한다(S120). Referring back to FIG. 1 , next, the virtual table model generated in step S110 is updated and reinforced (S120).

일 실시예로, 가상훈련 시나리오에서 정의된 규칙 과정에 도달시마다, DNN(Deep Neural Network)과 LSTM(Long Short Term Memory)이 결합된 알고리즘에 기초하여 가치 테이블 모델을 갱신 및 강화 학습할 수 있다. As an embodiment, whenever a rule process defined in a virtual training scenario is reached, a value table model may be updated and reinforced based on an algorithm combining a deep neural network (DNN) and a long short term memory (LSTM).

도 5는 DNN과 LSTM이 결합된 알고리즘에 기초하여 가치 테이블 모델을 갱신 및 강화 학습하는 내용의 일 예시를 도시한 도면이다.5 is a diagram showing an example of updating and reinforcement learning of a value table model based on an algorithm combining DNN and LSTM.

구체적으로 강화 학습을 위해, 상태 정보, 동작 정보, 리워드 정보, 다음 상태 정보를 입력 데이터로 설정하고, 가상훈련 시나리오에서 정의된 규칙 과정에 해당하는 리워드를 출력 데이터로 설정하여 가치 테이블 모델을 갱신 및 강화 학습할 수 있다.Specifically, for reinforcement learning, state information, action information, reward information, and next state information are set as input data, and rewards corresponding to the rule process defined in the virtual training scenario are set as output data to update the value table model and Reinforcement learning can be done.

이러한 가치 테이블 모델의 갱신 과정은, 학습이 진행되면서 가치 값을 갱신하게 되며, 가중치 값이 적용되어 0으로 시작된 가치 값이 누적됨에 따라 목표에 도달하기 위한 최적의 방법을 찾는 방식으로 수행될 수 있다.The process of updating the value table model may be performed in such a way that value values are updated as learning progresses, and weight values are applied to find an optimal method to reach a goal as value values that start with 0 are accumulated. .

다시 도 1을 참조하면, 다음으로 NPC형 AI 캐릭터에 대하여, 가상훈련 시나리오에서의 상태 정보가 변경됨에 따라 갱신 및 강화 학습된 가치 테이블 모델에 기초하여 결정된 최적 행동을 수행시킨다(S130).Referring back to FIG. 1, next, as the state information in the virtual training scenario is changed, the optimal action determined based on the updated and reinforced-learned value table model is performed on the NPC-type AI character (S130).

일 실시예로, 본 발명은 학습된 가치 테이블 모델을 통하여 가상훈련 시나리오에서의 각 상황변화가 발생하였을 때, NPC형 AI 캐릭터가 최적 행동을 결정하고, 결정된 결과에 따라 대응되는 행동을 수행할 수 있도록 결정된 최적 행동에 상응하는 모션 데이터 셋을 독출한다. 그리고 독출된 모션 데이터 셋에 기초하여 가상훈련 콘텐츠 내 NPC형 AI 캐릭터를 위한 애니메이션을 구동시킬 수 있다.In one embodiment, the present invention can determine the optimal action for the NPC-type AI character when each situation change occurs in the virtual training scenario through the learned value table model, and perform the corresponding action according to the determined result. A motion data set corresponding to the determined optimal behavior is read. In addition, based on the read motion data set, animation for the NPC type AI character in the virtual training content can be driven.

이때, 가상훈련 콘텐츠 내 NPC형 AI 캐릭터를 위한 상기 구동 결과에 대하여 해당 가상훈련 전문가에 의한 피드백이 제공될 수 있으며, 피드백을 반영하여 가치 테이블 모델이 갱신될 수 있다.At this time, feedback from a corresponding virtual training expert may be provided for the driving result for the NPC-type AI character in the virtual training content, and the value table model may be updated by reflecting the feedback.

그밖에 본 발명의 일 실시예는 가상훈련 시나리오에서의 상황별 최적 행동에 대한 학습을 위하여, 현재 상태 정보 및 캐릭터들의 행동에 대한 훈련 표준 절차 및 이에 대응하는 최대 기대값을 설정할 수 있다.In addition, in an embodiment of the present invention, in order to learn optimal behavior for each situation in a virtual training scenario, standard training procedures for current state information and behaviors of characters and corresponding maximum expected values may be set.

또한 최적 행동과 관련하여, NPC형 AI 캐릭터의 가상훈련 자율행동 학습을 위하여, 훈련 상황별 DQN(Deep Q Network) 신경망 학습 기반의 자율행동 및 상호반응 동작을 위한 행동별 활동 모듈을 적용할 수도 있다.In addition, in relation to optimal behavior, for virtual training autonomous behavior learning of NPC-type AI characters, behavior-specific activity modules for autonomous behavior and interactive behavior based on DQN (Deep Q Network) neural network learning for each training situation can be applied. .

상술한 설명에서, 단계 S110 내지 S130은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. 아울러, 기타 생략된 내용이라 하더라도 도 1 내지 도 5에 기술된 내용은 도 6의 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템(100)에도 적용된다.In the foregoing description, steps S110 to S130 may be further divided into additional steps or combined into fewer steps, depending on an embodiment of the present invention. Also, some steps may be omitted if necessary, and the order of steps may be changed. In addition, even if other omitted contents, the contents described in FIGS. 1 to 5 are also applied to the NPC type AI character learning system 100 for virtual training of FIG. 6.

이하에서는 도 6을 참조하여 본 발명의 일 실시예에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템(100)을 설명하도록 한다.Hereinafter, an NPC-type AI character learning system 100 for virtual training according to an embodiment of the present invention will be described with reference to FIG. 6 .

도 6은 본 발명의 일 실시예에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템(100)의 블록도이다.6 is a block diagram of an NPC-type AI character learning system 100 for virtual training according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템(100)은 메모리(110) 및 프로세서(120)를 포함한다.NPC type AI character learning system 100 for virtual training according to an embodiment of the present invention includes a memory 110 and a processor 120.

메모리(110)에는 가상훈련 콘텐츠 내에서의 NPC형 AI 캐릭터의 가상훈련을 위한 시나리오에 기초하여 NPC형 AI 캐릭터의 가상훈련을 위한 프로그램이 저장되며, 프로세서(120)는 메모리(110)에 저장된 프로그램을 실행시킨다.The memory 110 stores programs for virtual training of NPC-type AI characters based on scenarios for virtual training of NPC-type AI characters within virtual training contents, and the processor 120 stores programs stored in the memory 110. run

한편, 메모리(110)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 휘발성 저장장치를 통칭하는 것이다. 메모리(120)는 콤팩트 플래시(compact flash; CF) 카드, SD(secure digital) 카드, 메모리 스틱(memory stick), 솔리드 스테이트 드라이브(solid-state drive; SSD) 및 마이크로(micro) SD 카드 등과 같은 낸드 플래시 메모리(NAND flash memory), 하드 디스크 드라이브(hard disk drive; HDD) 등과 같은 마그네틱 컴퓨터 기억 장치 및 CD-ROM, DVD-ROM 등과 같은 광학 디스크 드라이브(optical disc drive) 등을 포함할 수 있다.Meanwhile, the memory 110 collectively refers to a non-volatile storage device and a volatile storage device that continuously maintain stored information even when power is not supplied. The memory 120 may be NAND memory such as a compact flash (CF) card, a secure digital (SD) card, a memory stick, a solid-state drive (SSD), and a micro SD card. It may include a magnetic computer storage device such as a NAND flash memory, a hard disk drive (HDD), or the like, and an optical disc drive such as a CD-ROM or DVD-ROM.

프로세서(120)는 메모리(110)에 저장된 프로그램을 실행시킴에 따라, 가상훈련 시나리오에 대하여 적어도 하나의 상태 정보-동작 정보의 쌍으로 구성된 행동 가치 함수에 기반하여 정의된 가치 테이블 모델을 생성하고, 가치 테이블 모델을 갱신 및 강화 학습하며, NPC형 AI 캐릭터에 대하여, 가상훈련 시나리오에서의 상태 정보가 변경됨에 따라 갱신 및 강화 학습된 가치 테이블 모델에 기초하여 결정된 최적 행동을 수행시킨다.As the processor 120 executes the program stored in the memory 110, it generates a value table model defined based on an action value function composed of at least one state information-action information pair with respect to the virtual training scenario, The value table model is updated and reinforced, and the optimal action determined based on the updated and reinforced-learned value table model is performed for the NPC-type AI character as the state information in the virtual training scenario is changed.

이때, 프로세서(120)는 상태 정보 및 동작 정보와 리워드 정보 및 다음 상태 정보를 입력 데이터로 설정하고, 가상훈련 시나리오에서 정의된 규칙 과정에 해당하는 리워드를 출력 데이터로 설정하여 가치 테이블 모델을 갱신 및 강화 학습시킨다.At this time, the processor 120 sets state information, operation information, reward information, and next state information as input data, and sets a reward corresponding to a rule process defined in the virtual training scenario as output data to update the value table model and learn reinforcement.

이상에서 전술한 본 발명의 일 실시예에 따른 가상훈련을 위한 NPC형 AI 캐릭터 학습 방법은, 하드웨어인 서버와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The NPC-type AI character learning method for virtual training according to an embodiment of the present invention described above may be implemented as a program (or application) to be executed in combination with a server, which is hardware, and stored in a medium.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The aforementioned program is C, C++, JAVA, machine language, etc. It may include a code coded in a computer language of. These codes may include functional codes related to functions defining necessary functions for executing the methods, and include control codes related to execution procedures necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do. In addition, these codes may further include memory reference related codes for which location (address address) of the computer's internal or external memory should be referenced for additional information or media required for the computer's processor to execute the functions. have. In addition, when the processor of the computer needs to communicate with any other remote computer or server in order to execute the functions, the code uses the computer's communication module to determine how to communicate with any other remote computer or server. It may further include communication-related codes for whether to communicate, what kind of information or media to transmit/receive during communication, and the like.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The storage medium is not a medium that stores data for a short moment, such as a register, cache, or memory, but a medium that stores data semi-permanently and is readable by a device. Specifically, examples of the storage medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., but are not limited thereto. That is, the program may be stored in various recording media on various servers accessible by the computer or various recording media on the user's computer. In addition, the medium may be distributed to computer systems connected through a network, and computer readable codes may be stored in a distributed manner.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.Steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, implemented in a software module executed by hardware, or implemented by a combination thereof. A software module may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any form of computer readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains can be implemented in other specific forms without changing the technical spirit or essential features of the present invention. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100: 가상훈련을 위한 NPC형 AI 캐릭터 학습 시스템
110: 메모리
120: 프로세서100: NPC type AI character learning system for virtual training
110: memory
120: processor

Claims

As a method performed by a computer,
For scenarios for virtual training of NPC (Non Player Character) type AI characters in virtual training contents (hereinafter referred to as virtual training scenarios), at least one action value function composed of each variable pair for state information-action information generating a value table model defined based on;
Setting the state information, the action information and reward information, and the next state information as input data, and setting a reward corresponding to the rule process defined in the virtual training scenario as output data to update and reinforce the value table model step; and
For the NPC-type AI character, as the state information in the virtual training scenario changes, performing the optimal action determined based on the updated and reinforced-learned value table model,
The step of generating the value table model,
generating a reward table model including the reward information set to correspond to at least one state information-action information pair for the virtual training scenario;
calculating a maximum value of the action value function having next state information after the current state in the virtual training scenario and all motion information in the next state information as variables; and
Generating the value table model by assigning a weight to the maximum value of the action value function and adding it to the reward table model.
NPC type AI character learning method for virtual training.

delete

According to claim 1,
The reward information is provided as one moves from a first rule process defined in the virtual training scenario to a second rule process, and when the second rule process corresponds to a target point in the virtual training scenario, a first reward greater than the first reward information is provided. 2 that the reward is granted,
NPC type AI character learning method for virtual training.

According to claim 3,
Updating and reinforcement learning the value table model,
Whenever a rule process defined in the virtual training scenario is reached, the value table model is updated and reinforced based on an algorithm combining Deep Neural Network (DNN) and Long Short Term Memory (LSTM),
NPC type AI character learning method for virtual training.

According to claim 1,
The step of performing the optimal action determined based on the updated and reinforcement-learned value table model,
reading a motion data set corresponding to the determined optimal behavior; and
Based on the motion data set, driving animation for the NPC-type AI character in virtual training content,
NPC type AI character learning method for virtual training.

According to claim 5,
The NPC-type AI character includes joint information for a manipulable body composition, and joints are manipulated according to the joint information to correspond to the motion information,
The motion information includes at least one motion data, is defined according to the virtual training scenario, and is defined for motions that can occur in the virtual training scenario.
NPC type AI character learning method for virtual training.

According to claim 1,
The state information defines the position, direction, and location variables of the NPC-type AI character in the virtual training map data for the virtual training scenario, and the position variable of the movable path, and the current environmental situation variable according to the virtual training scenario, Data defining the state value of the object, the behavior value of the NPC-type AI character, and the user's behavior,
NPC type AI character learning method for virtual training.

In the NPC type AI character learning system for virtual training,
A memory in which a program for virtual training of an NPC-type AI character is stored based on a scenario (hereinafter referred to as a virtual training scenario) for virtual training of a non-player character (NPC)-type AI character in virtual training contents; and
Including a processor that executes the program stored in the memory,
As the program is executed, the processor generates a value table model defined based on an action value function composed of at least one state information-action information pair with respect to the virtual training scenario, and the generated value table model Renewing and reinforcement learning, and for the NPC-type AI character, as the state information in the virtual training scenario changes, performing the optimal action determined based on the updated and reinforcement-learned value table model,
The processor sets the state information, the operation information, the reward information, and the next state information as input data, and sets a reward corresponding to a rule process defined in the virtual training scenario as output data to update the value table model, and learn reinforcement,
The processor generates a reward table model including the reward information set to correspond to at least one state information-action information pair for the virtual training scenario, next state information after the current state in the virtual training scenario, and the The value table model is generated by calculating the maximum value of the action value function having all action information in the next state information as variables, assigning a weight to the maximum value of the action value function, and adding them to the reward table model. to do,
NPC type AI character learning system for virtual training.