KR102336067B1

KR102336067B1 - A compuer program for generating audio data corresponding to a game play situation

Info

Publication number: KR102336067B1
Application number: KR1020190076455A
Authority: KR
Inventors: 황영태; 오인수; 석영민; 전선영
Original assignee: 넷마블 주식회사
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2021-12-06
Also published as: KR20210001035A

Abstract

전술한 과제를 해결하기 위한 본 개시의 일 실시예에, 하나 이상의 프로세서들에 의해 실행 가능한 컴퓨터 판독가능 매체에 저장된 컴퓨터 프로그램이 개시된다. 상기 컴퓨터 프로그램은 하나 이상의 프로세서에 의해 실행되는 경우, 상기 하나 이상의 프로세서들로 하여금 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 동작들을 수행하도록 하며 상기 동작들은: 게임 데이터에 기초하여 음성 합성을 위한 메타 데이터를 생성하는 동작, 상기 메타 데이터 및 대사 텍스트를 음성 합성 모델에 입력시키는 동작 및 상기 음성 합성 모델을 통해 합성된 음성 데이터를 생성하는 동작을 포함할 수 있다.In one embodiment of the present disclosure for solving the above problems, a computer program stored in a computer readable medium executable by one or more processors is disclosed. The computer program, when executed by one or more processors, causes the one or more processors to perform operations for generating voice data corresponding to an in-game play situation, the operations comprising: synthesizing voice based on the game data; It may include an operation of generating metadata for the purpose of the present invention, an operation of inputting the metadata and dialogue text into a speech synthesis model, and an operation of generating speech data synthesized through the speech synthesis model.

Description

{A COMPUER PROGRAM FOR GENERATING AUDIO DATA CORRESPONDING TO A GAME PLAY SITUATION}

본 개시는 게임 서비스에 관한 것으로, 보다 구체적으로 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 컴퓨터 프로그램에 관한 것이다.The present disclosure relates to a game service, and more particularly, to a computer program for generating voice data corresponding to an in-game play situation.

최근 IT기술의 급격한 발달과 함께 게임 산업이 함께 성장하고 있다. 이러한 게임 산업이 성장함에 따라, 게임 서비스를 제공하는 게임 업체 또한 폭발적으로 생성되고 있다. 한편, 게임 서비스를 제공하는 게임 업체의 증가로 인해, 게임 서비스 시장의 경쟁이 치열해지고 있다. 이에 따라, 게임 업체들은 각자 제공하는 게임의 경쟁력을 갖추기 위한 노력을 지속적으로 하고 있다. 특히, 게임 업체들은 게임 유저의 흥미를 유발할 수 있는 요소에 대한 탐색을 지속적으로 수행하고 있다.Recently, with the rapid development of IT technology, the game industry is growing together. As the game industry grows, game companies that provide game services are also being created explosively. On the other hand, due to an increase in the number of game companies that provide game services, competition in the game service market is intensifying. Accordingly, game companies are continuously making efforts to secure the competitiveness of their respective games. In particular, game companies are continuously searching for factors that can arouse the interest of game users.

예를 들어, 수많은 동시 접속자들이 게임 내에서 미션이나 퀘스트를 수행함으로써 진행되는 MMORPG(Massively Multiplayer Online Role Playing Game)는 게임의 원활한 진행을 위해 게임의 스토리를 이해하고, 게임의 스토리와 관련된 퀘스트(미션)를 수행하는 것이 중요하다.For example, in a Massively Multiplayer Online Role Playing Game (MMORPG), in which numerous simultaneous players perform missions or quests in the game, understand the game's story for the smooth progress of the game, and ) is important to do.

여기서, 게임 서비스를 제공하는 게임 업체들은 자신이 제공하는 게임의 경쟁력을 갖추기 위해 게임 내에 존재하는 캐릭터의 음성을 다양한 성우의 목소리로 제공한다. 구체적으로, 게임 업체들은 게임 유저에게 게임의 스토리를 안내하거나, 게임 유저가 게임 내 퀘스트를 수행하는 경우, 게임 내에 존재하는 캐릭터의 음성을 성우의 목소리로 제공한다. 예를 들어, 일반적인 게임 서비스에서는 게임 내에 존재하는 NPC(Non-Player Character)가 게임 스토리를 안내할 때, 텍스트와 함께 미리 녹음된 성우의 목소리를 제공한다. 다른 예를 들어, 일반적인 게임 서비스에서는 게임 내에 존재하는 NPC가 퀘스트 진행과 관련된 정보를 제공할 때, 텍스트와 함께 미리 녹음된 성우의 목소리를 제공한다.Here, game companies that provide game services provide the voices of characters in the game as the voices of various voice actors in order to secure the competitiveness of the games they provide. Specifically, game companies provide the voice of a character existing in the game as the voice of a voice actor when the game user guides the game user to the story or when the game user performs an in-game quest. For example, in a general game service, when a non-player character (NPC) existing in a game guides a game story, a pre-recorded voice of a voice actor is provided along with text. For another example, in a general game service, when an NPC existing in a game provides information related to quest progress, a pre-recorded voice of a voice actor is provided along with text.

다만, 게임 내 존재하는 방대한 양의 안내 음성을 모두 성우의 목소리를 통해 사전 녹음하여 제공하는 경우, 성우 고용 비용 및 녹음 비용에 따라 음성 제공 비용이 증가할 우려가 있으며, 녹음된 녹음 파일의 일부를 수정하고자 하는 경우에도 어려움이 존재할 수 있다.However, if the vast amount of guidance voices in the game are all pre-recorded through the voice of a voice actor, there is a risk that the cost of providing voice will increase depending on the cost of hiring a voice actor and recording cost, and Difficulties may exist even when trying to modify it.

따라서, 게임의 스토리를 안내하는 음성, 또는 게임 내 퀘스트를 안내하기 위한 음성을 생성하는 방안에 대하여 다양성을 확보하는 방법에 대한 필요성이 당업계에 존재할 수 있다.Accordingly, there may be a need in the art for a method for securing diversity in a method of generating a voice for guiding a story of a game or a voice for guiding a quest in a game.

한국 공개 특허 제2007-0078682호Korean Patent Publication No. 2007-0078682

본 개시는 전술한 배경기술에 대응하여 안출된 것으로, 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 컴퓨터 프로그램을 제공하기 위한 것이다.The present disclosure has been made in response to the above-described background technology, and is intended to provide a computer program for generating voice data corresponding to a play situation in a game.

전술한 과제를 해결하기 위한 본 개시의 일 실시예에, 하나 이상의 프로세서들에 의해 실행 가능한 컴퓨터 판독가능 매체에 저장된 컴퓨터 프로그램이 개시된다. 상기 컴퓨터 프로그램은 하나 이상의 프로세서들로 하여금 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 동작들을 수행하도록 하며 상기 동작들은: 게임 데이터에 기초하여 음성 합성을 위한 메타 데이터를 생성하는 동작, 상기 메타 데이터 및 대사 텍스트를 음성 합성 모델에 입력시키는 동작 및 상기 음성 합성 모델을 통해 합성된 음성 데이터를 생성하는 동작을 포함할 수 있다.In one embodiment of the present disclosure for solving the above problems, a computer program stored in a computer readable medium executable by one or more processors is disclosed. The computer program causes one or more processors to perform operations for generating voice data corresponding to an in-game play situation, and the operations include: generating metadata for voice synthesis based on the game data; It may include inputting data and dialogue text into a speech synthesis model and generating speech data synthesized through the speech synthesis model.

대안적으로, 상기 메타 데이터는, 상기 음성 데이터를 생성하기 위한 속성 지시 정보로, 이미지 벡터 정보, 스타일 벡터 정보 및 보이스 식별 정보 중 적어도 하나를 포함할 수 있다. Alternatively, the metadata is attribute indication information for generating the voice data, and may include at least one of image vector information, style vector information, and voice identification information.

대안적으로, 상기 속성 지시 정보는, 상기 게임 내 포함된 하나 이상의 콘텐츠 각각의 속성 정보, 상기 게임 내 존재하는 복수의 오브젝트 간의 관계 정보 및 상기 복수의 오브젝트 각각의 속성 정보 중 적어도 하나의 정보를 포함할 수 있다.Alternatively, the attribute indication information includes at least one of attribute information of each of the one or more contents included in the game, relationship information between a plurality of objects existing in the game, and attribute information of each of the plurality of objects can do.

대안적으로, 상기 이미지 벡터 정보는, 게임 이미지에 기초하여 생성되는 음성 데이터 생성을 위한 지시 정보를 포함하고, 상기 스타일 벡터 정보는, 상기 게임 내 포함된 하나 이상의 콘텐츠 각각에 대하여 사전 결정된 지시 정보를 포함하고, 그리고 상기 보이스 식별 정보는, 음성 합성을 위한 시드(seed) 음성 데이터를 식별하기 위한 정보를 포함할 수 있다.Alternatively, the image vector information includes instruction information for generating voice data generated based on a game image, and the style vector information includes predetermined instruction information for each of one or more contents included in the game. and the voice identification information may include information for identifying seed voice data for voice synthesis.

대안적으로, 상기 대사 텍스트는, 상기 음성 데이터를 생성하기 위한 콘텐츠 정보를 포함하며, 상기 게임 내 포함된 하나 이상의 콘텐츠 각각에 기초하여 사전 결정될 수 있다.Alternatively, the dialogue text may include content information for generating the voice data, and may be predetermined based on each of one or more contents included in the game.

대안적으로, 클라이언트 단말로부터 음성 출력 제어 신호를 수신하는 경우, 상기 클라이언트 단말로 하여금 상기 게임 내 플레이 상황에 대응하는 음성 데이터를 출력하도록 야기시키는 동작을 더 포함할 수 있다.Alternatively, the method may further include, when receiving a voice output control signal from the client terminal, causing the client terminal to output voice data corresponding to the in-game play situation.

대안적으로, 상기 하나 이상의 시드 음성 데이터, 상기 하나 이상의 시드 음성 데이터 각각에 대응하는 텍스트 및 상기 하나 이상의 시드 음성 데이터에 연관된 메타 데이터에 기초하여 상기 음성 합성 모델을 학습시키기 위한 학습 데이터를 구축하는 동작을 더 포함할 수 있다.Alternatively, the operation of constructing training data for training the speech synthesis model based on the one or more seed speech data, text corresponding to each of the one or more seed speech data, and metadata associated with the one or more seed speech data may further include.

대안적으로, 상기 음성 합성 모델은, 차원 감소 서브 모델, 차원 복원 서브 모델, 어텐션 모듈 및 음성 출력 모듈을 포함하며, 상기 메타 데이터, 학습 대사 텍스트 및 학습 타겟 음성 데이터를 포함하는 학습 데이터를 통해 학습될 수 있다.Alternatively, the speech synthesis model includes a dimensionality reduction submodel, a dimension restoration submodel, an attention module, and an audio output module, and is trained through training data including the metadata, training dialogue text, and training target voice data. can be

대안적으로, 상기 음성 합성 모델은, 상기 메타 데이터 및 상기 학습 대사 텍스트를 포함하는 학습 입력 텍스트를 상기 차원 감소 서브 모델의 입력으로 하여 상기 차원 복원 서브 모델이 상기 타겟 음성 데이터에 대응하는 학습 스펙트로그램(Spectrogram)을 출력하도록 학습될 수 있다.Alternatively, the speech synthesis model may include a learning spectrogram corresponding to the target speech data by using the learning input text including the metadata and the training dialogue text as an input of the dimension reduction sub-model, and the dimension reconstruction sub-model corresponding to the target speech data. (Spectrogram) can be learned to output.

대안적으로, 상기 차원 감소 서브 모델은, 상기 메타 데이터 및 상기 대사 텍스트를 입력으로 하여 상기 대사 텍스트로부터 언어 음성 피처를 출력할 수 있다.Alternatively, the dimensionality reduction sub-model may take the metadata and the dialogue text as inputs, and output a language speech feature from the dialogue text.

대안적으로, 상기 차원 복원 서브 모델은, 하나 이상의 RNN(Recurrent Neural Network)을 포함하며, 상기 하나 이상의 RNN은, 제 1 타임 스탬프의 스펙트로그램을 입력으로 하여 제 2 타임 스탬프의 스펙트로그램을 출력할 수 있다. Alternatively, the dimensional reconstruction submodel includes one or more recurrent neural networks (RNNs), wherein the one or more RNNs receive a spectrogram of a first timestamp as an input and output a spectrogram of a second timestamp. can

대안적으로, 상기 어텐션 모듈은, 상기 대사 텍스트의 음소 및 상기 차원 복원 서브 모델의 타임 스텝 간의 연관 정보를 생성할 수 있다. Alternatively, the attention module may generate association information between a phoneme of the dialogue text and a time step of the dimensional reconstruction submodel.

대안적으로, 상기 음성 출력 모듈은, 상기 차원 복원 서브 모델의 출력인 스펙트로그램을 음성 재구성 알고리즘을 이용하여 합성된 음성 데이터를 생성할 수 있다.Alternatively, the speech output module may generate speech data synthesized using a speech reconstruction algorithm using a spectrogram that is an output of the dimensional reconstruction sub-model.

본 개시의 다른 실시예에서 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하는 방법이 개시된다. 상기 방법은, 게임 데이터에 기초하여 음성 합성을 위한 메타 데이터를 생성하는 단계, 상기 메타 데이터 및 대사 텍스트를 음성 합성 모델에 입력시키는 단계 및 상기 음성 합성 모델을 통해 합성된 음성 데이터를 생성하는 단계를 포함할 수 있다.In another embodiment of the present disclosure, a method of generating voice data corresponding to an in-game play situation is disclosed. The method includes the steps of generating metadata for speech synthesis based on game data, inputting the metadata and dialogue text into a speech synthesis model, and generating speech data synthesized through the speech synthesis model. may include

본 개시의 또 다른 실시예에서 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 서버가 개시된다. 상기 서버는, 하나 이상의 코어를 포함하는 프로세서, 상기 프로세서에 의해 실행가능한 프로그램 코드들을 저장하는 메모리 및 게임 서버 및 클라이언트 단말과 데이터를 송수신하는 네트워크부를 포함하고, 상기 프로세서는, 게임 데이터에 기초하여 음성 합성을 위한 메타 데이터를 생성하고, 상기 메타 데이터 및 대사 텍스트를 음성 합성 모델에 입력시키고, 그리고 상기 음성 합성 모델을 통해 합성된 음성 데이터를 생성할 수 있다.In another embodiment of the present disclosure, a server for generating voice data corresponding to an in-game play situation is disclosed. The server includes a processor including one or more cores, a memory for storing program codes executable by the processor, and a network unit for transmitting and receiving data to and from a game server and a client terminal, wherein the processor includes a voice based on the game data Meta data for synthesis may be generated, the metadata and dialogue text may be input to a speech synthesis model, and synthesized speech data may be generated through the speech synthesis model.

본 개시는 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 컴퓨터 프로그램을 제공할 수 있다. The present disclosure may provide a computer program for generating voice data corresponding to an in-game play situation.

다양한 양상들이 이제 도면들을 참조로 기재되며, 여기서 유사한 참조 번호들은 총괄적으로 유사한 구성요소들을 지칭하는데 이용된다. 이하의 실시예에서, 설명 목적을 위해, 다수의 특정 세부사항들이 하나 이상의 양상들의 총체적 이해를 제공하기 위해 제시된다. 그러나, 그러한 양상(들)이 이러한 구체적인 세부사항들 없이 실시될 수 있음은 명백할 것이다.
도 1은 본 개시의 일 실시예에 따른 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 시스템에 대한 개략도를 도시한다.
도 2는 본 개시의 일 실시예와 관련된 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 서버의 블록 구성도를 도시한다.
도 3은 본 개시의 일 실시예와 관련된 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 예시적인 순서도를 도시한다.
도 4는 본 개시의 일 실시예와 관련된 게임 내 플레이 상황 각각에 대응하여 상이하게 결정되는 음성 데이터 및 텍스트 정보를 설명하기 위한 예시도를 도시한다.
도 5는 본 개시의 일 실시예와 관련된 게임 내 플레이 상황 각각에 대응하여 상이하게 결정되는 음성 데이터 및 텍스트 정보를 설명하기 위한 예시도를 도시한다.
도 6은 본 개시의 일 실시예와 관련된 음성 합성 모델의 블록 구성도를 도시한다.
도 7은 본 개시의 일 실시예와 관련된 음성 합성 모델의 세부 구성도를 도시한다.
도 8은 본 개시의 일 실시예와 관련된 네트워크 함수를 나타낸 개략도이다.
도 9는 본 개시의 일 실시예와 관련된 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 방법을 구현하기 위한 모듈을 도시한다.
도 10은 본 개시의 일 실시예와 관련된 본 개시의 일 실시예들이 구현될 수 있는 예시적인 컴퓨팅 환경에 대한 간략하고 일반적인 개략도를 도시한다.Various aspects are now described with reference to the drawings, wherein like reference numbers are used to refer to like elements collectively. In the following example, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects. It will be evident, however, that such aspect(s) may be practiced without these specific details.
1 shows a schematic diagram of a system for generating voice data corresponding to an in-game play situation according to an embodiment of the present disclosure;
2 is a block diagram of a server for generating voice data corresponding to an in-game play situation related to an embodiment of the present disclosure.
3 illustrates an exemplary flowchart for generating voice data corresponding to an in-game play situation related to an embodiment of the present disclosure.
4 illustrates an exemplary diagram for explaining voice data and text information that are differently determined in response to each in-game play situation related to an embodiment of the present disclosure.
5 illustrates an exemplary diagram for explaining voice data and text information that are differently determined in response to each in-game play situation related to an embodiment of the present disclosure.
6 is a block diagram illustrating a speech synthesis model according to an embodiment of the present disclosure.
7 is a detailed configuration diagram of a speech synthesis model related to an embodiment of the present disclosure.
8 is a schematic diagram illustrating a network function related to an embodiment of the present disclosure.
9 illustrates a module for implementing a method for generating voice data corresponding to an in-game play situation related to an embodiment of the present disclosure.
10 shows a simplified, general schematic diagram of an exemplary computing environment in which one embodiment of the present disclosure may be implemented, in conjunction with one embodiment of the present disclosure.

다양한 실시예들이 이제 도면을 참조하여 설명된다. 본 명세서에서, 다양한 설명들이 본 개시의 이해를 제공하기 위해서 제시된다. 그러나, 이러한 실시예들은 이러한 구체적인 설명 없이도 실행될 수 있음이 명백하다.Various embodiments are now described with reference to the drawings. In this specification, various descriptions are presented to provide an understanding of the present disclosure. However, it is apparent that these embodiments may be practiced without these specific descriptions.

본 명세서에서 사용되는 용어 "컴포넌트", "모듈", "시스템" 등은 컴퓨터-관련 엔티티, 하드웨어, 펌웨어, 소프트웨어, 소프트웨어 및 하드웨어의 조합, 또는 소프트웨어의 실행을 지칭한다. 예를 들어, 컴포넌트는 프로세서상에서 실행되는 처리과정(procedure), 프로세서, 객체, 실행 스레드, 프로그램, 및/또는 컴퓨터일 수 있지만, 이들로 제한되는 것은 아니다. 예를 들어, 컴퓨팅 장치에서 실행되는 애플리케이션 및 컴퓨팅 장치 모두 컴포넌트일 수 있다. 하나 이상의 컴포넌트는 프로세서 및/또는 실행 스레드 내에 상주할 수 있다. 일 컴포넌트는 하나의 컴퓨터 내에 로컬화 될 수 있다. 일 컴포넌트는 2개 이상의 컴퓨터들 사이에 분배될 수 있다. 또한, 이러한 컴포넌트들은 그 내부에 저장된 다양한 데이터 구조들을 갖는 다양한 컴퓨터 판독가능한 매체로부터 실행할 수 있다. 컴포넌트들은 예를 들어 하나 이상의 데이터 패킷들을 갖는 신호(예를 들면, 로컬 시스템, 분산 시스템에서 다른 컴포넌트와 상호작용하는 하나의 컴포넌트로부터의 데이터 및/또는 신호를 통해 다른 시스템과 인터넷과 같은 네트워크를 통해 전송되는 데이터)에 따라 로컬 및/또는 원격 처리들을 통해 통신할 수 있다.The terms “component,” “module,” “system,” and the like, as used herein, refer to a computer-related entity, hardware, firmware, software, a combination of software and hardware, or execution of software. For example, a component can be, but is not limited to being, a process running on a processor, a processor, an object, a thread of execution, a program, and/or a computer. For example, both an application running on a computing device and the computing device may be a component. One or more components may reside within a processor and/or thread of execution. A component may be localized within one computer. A component may be distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored therein. Components may communicate via a network such as the Internet with another system, for example, via a signal having one or more data packets (eg, data and/or signals from one component interacting with another component in a local system, distributed system, etc.) may communicate via local and/or remote processes depending on the data being transmitted).

더불어, 용어 "또는"은 배타적 "또는"이 아니라 내포적 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되지 않거나 문맥상 명확하지 않은 경우에, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 의도된다. 즉, X가 A를 이용하거나; X가 B를 이용하거나; 또는 X가 A 및 B 모두를 이용하는 경우, "X는 A 또는 B를 이용한다"가 이들 경우들 어느 것으로도 적용될 수 있다. 또한, 본 명세서에 사용된 "및/또는"이라는 용어는 열거된 관련 아이템들 중 하나 이상의 아이템의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.In addition, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless otherwise specified or clear from context, "X employs A or B" is intended to mean one of the natural implicit substitutions. That is, X employs A; X employs B; or when X employs both A and B, "X employs A or B" may apply to either of these cases. It should also be understood that the term “and/or” as used herein refers to and includes all possible combinations of one or more of the listed related items.

또한, "포함한다" 및/또는 "포함하는"이라는 용어는, 해당 특징 및/또는 구성요소가 존재함을 의미하는 것으로 이해되어야 한다. 다만, "포함한다" 및/또는 "포함하는"이라는 용어는, 하나 이상의 다른 특징, 구성요소 및/또는 이들의 그룹의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. 또한, 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 본 명세서와 청구범위에서 단수는 일반적으로 "하나 또는 그 이상"을 의미하는 것으로 해석되어야 한다.Also, the terms "comprises" and/or "comprising" should be understood to mean that the feature and/or element in question is present. However, it should be understood that the terms "comprises" and/or "comprising" do not exclude the presence or addition of one or more other features, elements and/or groups thereof. Also, unless otherwise specified or unless it is clear from context to refer to a singular form, the singular in the specification and claims should generally be construed to mean “one or more”.

당업자들은 추가적으로 여기서 개시된 실시예들과 관련되어 설명된 다양한 예시 적 논리적 블록들, 구성들, 모듈들, 회로들, 수단들, 로직들, 및 알고리즘 단계들이 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양쪽 모두의 조합들로 구현될 수 있음을 인식해야 한다. 하드웨어 및 소프트웨어의 상호교환성을 명백하게 예시하기 위해, 다양한 예시 적 컴포넌트들, 블록들, 구성들, 수단들, 로직들, 모듈들, 회로들, 및 단계들은 그들의 기능성 측면에서 일반적으로 위에서 설명되었다. 그러한 기능성이 하드웨어로 또는 소프트웨어로서 구현되는지 여부는 전반적인 시스템에 부과된 특정 어플리케이션(application) 및 설계 제한들에 달려 있다. 숙련된 기술자들은 각각의 특정 어플리케이션들을 위해 다양한 방법들로 설명된 기능성을 구현할 수 있다. 다만, 그러한 구현의 결정들이 본 개시내용의 영역을 벗어나게 하는 것으로 해석되어서는 안된다.Those skilled in the art will further appreciate that the various illustrative logical blocks, configurations, modules, circuits, means, logics, and algorithm steps described in connection with the embodiments disclosed herein may be implemented in electronic hardware, computer software, or combinations of both. It should be recognized that they can be implemented with To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application. However, such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

본 개시내용의 청구범위에서의 동작들에 대한 권리범위는, 각 동작들에 기재된 기능 및 특징들에 의해 발생되는 것이지, 각각의 동작에서 그 순서의 선후관계를 명시하지 않는 이상, 청구범위에서의 각 동작들의 기재 순서에 영향을 받지 않는다. 예를 들어, A동작 및 B동작를 포함하는 동작으로 기재된 청구범위에서, A동작이 B동작 보다 먼저 기재되었다고 하더라도, A동작이 B동작에 선행되어야 한다는 것으로 권리범위가 제한되지는 않는다.The scope of rights to the operations in the claims of the present disclosure is generated by the functions and features described in the respective operations, and unless a precedence of the order in each operation is specified, the scope of the claims is The order of description of each operation is not affected. For example, in a claim described as an action including action A and action B, even if action A is described before action B, the scope of rights is not limited to that action A must precede action B.

도 1은 본 개시의 일 실시예에 따른 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 시스템에 대한 개략도를 도시한다.1 shows a schematic diagram of a system for generating voice data corresponding to an in-game play situation according to an embodiment of the present disclosure;

도 1에 도시된 바와 같이, 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 시스템은 음성 데이터 생성 서버(100), 클라이언트 단말(10), 게임 서버(200) 및 통신 네트워크를 포함할 수 있다. 도 1에서 도시되는 컴포넌트들은 예시적인 것으로서, 추가적인 컴포넌트들이 존재하거나 또는 도 1에서 도시되는 컴포넌트들 중 일부는 생략될 수 있다.As shown in FIG. 1 , a system for generating voice data corresponding to an in-game play situation may include a voice data generating server 100 , a client terminal 10 , a game server 200 , and a communication network. . The components illustrated in FIG. 1 are exemplary, and additional components may be present or some of the components illustrated in FIG. 1 may be omitted.

도 1에서의 클라이언트 단말(10)은 게임 서버(200) 및 음성 데이터 생성 서버(100) 중 적어도 하나에 액세스하고자 하는 사용자와 관련될 수 있다.The client terminal 10 in FIG. 1 may be related to a user who wants to access at least one of the game server 200 and the voice data generating server 100 .

본 개시에서의 클라이언트 단말(10)은 사용자가 게임을 플레이할 수 있도록 하는 적어도 하나의 단말일 수 있다. 또한, 클라이언트 단말(10)은 게임 서버(200)에서 전송되는 신호에 기초하여 게임 화면을 디스플레이 하거나, 게임에 관련한 음향 등을 출력할 수 있다.The client terminal 10 in the present disclosure may be at least one terminal that allows a user to play a game. Also, the client terminal 10 may display a game screen or output a sound related to a game based on a signal transmitted from the game server 200 .

클라이언트 단말(10)은 음성 데이터 생성 서버(100) 및 게임 서버(200)와 통신을 위한 메커니즘을 갖는 시스템에서의 임의의 형태의 엔티티(들)를 의미할 수 있다. 예를 들어, 이러한 클라이언트 단말(10)은 PC(personal computer), 노트북(note book), 모바일 단말기(mobile terminal), 스마트 폰(smart phone), 태블릿 PC(tablet pc) 및 웨어러블 디바이스(wearable device) 등을 포함할 수 있으며, 유/무선 네트워크에 접속할 수 있는 모든 종류의 단말을 포함할 수 있다. 또한, 클라이언트 단말(10)은 에이전트, API(Application Programming Interface) 및 플러그-인(Plug-in) 중 적어도 하나에 의해 구현되는 임의의 서버를 포함할 수도 있다. 또한, 클라이언트 단말(10)은 애플리케이션 소스 및/또는 클라이언트 애플리케이션을 포함할 수 있다.The client terminal 10 may mean any type of entity(s) in the system having a mechanism for communication with the voice data generating server 100 and the game server 200 . For example, such a client terminal 10 is a PC (personal computer), a notebook (note book), a mobile terminal (mobile terminal), a smart phone (smart phone), a tablet PC (tablet pc) and a wearable device (wearable device) and the like, and may include all types of terminals capable of accessing a wired/wireless network. Also, the client terminal 10 may include an arbitrary server implemented by at least one of an agent, an application programming interface (API), and a plug-in. In addition, the client terminal 10 may include an application source and/or a client application.

클라이언트 단말(10)은 게임 서버(200)에 접속하여 게임 서버(200)가 제공하는 게임을 수행할 수 있으며, 사용자의 게임 플레이에 대응하는 시각적인 효과 및 청각적인 효과를 출력할 수 있다. 본 명세서에서의 게임은 모바일 게임, 웹 게임, VR 게임, P2P 게임, 온라인/오프라인 게임 등 임의의 형태의 게임을 포함할 수 있다. The client terminal 10 may access the game server 200 to play a game provided by the game server 200 , and may output visual and auditory effects corresponding to the user's game play. The game in the present specification may include any type of game, such as a mobile game, a web game, a VR game, a P2P game, an online/offline game, and the like.

클라이언트 단말(10)은 디스플레이를 구비하고 있어서, 게임 플레이에 관련한 사용자의 입력을 수신하고 사용자에게 임의의 형태의 출력을 제공할 수 있다. 또한, 클라이언트 단말(10)은 음향출력부를 구비하고 있어서, 게임 플레이에 관련한 임의의 형태의 출력을 제공할 수 있다. Since the client terminal 10 has a display, it can receive a user's input related to game play and provide an output in any form to the user. In addition, since the client terminal 10 is provided with a sound output unit, it is possible to provide any form of output related to game play.

본 개시의 일 실시예에 따르면, 게임 서버(200) 및 음성 데이터 생성 서버(100)는 예를 들어, 마이크로프로세서, 메인프레임 컴퓨터, 디지털 프로세서, 휴대용 디바이스 및 디바이스 제어기 등과 같은 임의의 타입의 컴퓨터 시스템 또는 컴퓨터 디바이스를 포함할 수 있다. 도 1에서 도시되지는 않았지만 이러한 게임 서버(200) 및 음성 데이터 생성 서버(100)는 메모리 및 프로세서를 포함할 수 있다.According to one embodiment of the present disclosure, the game server 200 and the voice data generation server 100 are any type of computer system, such as, for example, a microprocessor, a mainframe computer, a digital processor, a portable device and a device controller, and the like. or a computer device. Although not shown in FIG. 1 , the game server 200 and the voice data generating server 100 may include a memory and a processor.

게임 서버(200)는 클라이언트 단말(10)로 하여금 게임 플레이를 허용할 수 있다. 음성 데이터 생성 서버(100)는 클라이언트 단말(10)이 게임 서버(200)에 접속하여 게임을 수행하는 경우, 게임 내 플레이 상황에 대응하는 임의의 형태의 음성 데이터를 출력하는 것을 허용할 수 있다.The game server 200 may allow the client terminal 10 to play a game. The voice data generation server 100 may allow the client terminal 10 to output any type of voice data corresponding to an in-game play situation when the client terminal 10 accesses the game server 200 to play a game.

도 1에서 게임 서버(200) 및 음성 데이터 생성 서버(100)가 별도의 엔티티로서 분리되어 표현되었지만, 본 개시내용의 실시예에 따라서 음성 데이터 생성 서버(100)가 게임 서버(200) 내에 포함되어, 게임 플레이 기능 및 게임에 대응하는 음성 데이터 생성 기능을 통합 서버에서 수행할 수도 있다. 이러한 예시에서, 음성 데이터 생성 서버(100)의 기능이 게임 서버(200)에 통합되는 경우, 클라이언트 단말(10)은 인-게임(in-game) 음성 데이터 생성을 수행할 수 있다.Although the game server 200 and the voice data generating server 100 are represented as separate entities in FIG. 1 , the voice data generating server 100 is included in the game server 200 according to an embodiment of the present disclosure. , a game play function and a voice data generation function corresponding to the game may be performed in the integrated server. In this example, when the function of the voice data generation server 100 is integrated into the game server 200 , the client terminal 10 may perform in-game voice data generation.

또한, 음성 데이터 생성 서버(100)와 게임 서버(200)가 분리된 경우, 클라이언트 단말은 게임 서버(200) 외부에 존재하는 음성 데이터 생성 서버(100)를 통해 생성된 음성 데이터를 출력할 수 있다. 이러한 경우, 음성 데이터 생성 서버(100)는 게임 서버(200)와 통신하여 게임 플레이에 관련된 음성 데이터 생성을 구현할 수 있다.In addition, when the voice data generating server 100 and the game server 200 are separated, the client terminal may output the voice data generated through the voice data generating server 100 existing outside the game server 200 . . In this case, the voice data generation server 100 may communicate with the game server 200 to implement voice data generation related to game play.

본 개시의 일 실시예에 따르면, 음성 데이터 생성 서버(100)는 CPU, GPGPU, 및 TPU 중 적어도 하나를 이용하여 모델을 분산하여 처리할 수 있다. 또한, 본 개시의 일 실시예에서 음성 데이터 생성 서버(100)는 다른 컴퓨팅 장치와 함께 모델을 분산하여 처리할 수 있다.According to an embodiment of the present disclosure, the voice data generation server 100 may distribute and process the model using at least one of a CPU, a GPGPU, and a TPU. In addition, in an embodiment of the present disclosure, the voice data generation server 100 may distribute and process the model together with other computing devices.

본 명세서에서 네트워크 함수는 서브 모델, 인공 신경망, 뉴럴 네트워크와 상호 교환 가능하게 사용될 수 있다. 본 명세서에서 네트워크 함수는 하나 이상의 뉴럴 네트워크를 포함할 수도 있으며, 이 경우 네트워크 함수의 출력은 하나 이상의 뉴럴 네트워크의 출력의 앙상블(ensemble)일 수 있다.In the present specification, a network function may be used interchangeably with a sub-model, an artificial neural network, and a neural network. In the present specification, the network function may include one or more neural networks, and in this case, the output of the network function may be an ensemble of the outputs of the one or more neural networks.

본 명세서에서 모델은 네트워크 함수를 포함할 수 있다. 모델은 하나 이상의 네트워크 함수를 포함할 수도 있으며, 이 경우 모델의 출력은 하나 이상의 네트워크 함수의 출력의 앙상블일 수 있다.In this specification, a model may include a network function. The model may include one or more network functions, in which case the output of the model may be an ensemble of outputs of the one or more network functions.

본 개시의 일 실시예에 따르면, 통신 네트워크는 공중전화 교환망(PSTN: Public Switched Telephone Network), xDSL(x Digital Subscriber Line), RADSL(Rate Adaptive DSL), MDSL(Multi Rate DSL), VDSL(Very High Speed DSL), UADSL(Universal Asymmetric DSL), HDSL(High Bit Rate DSL) 및 근거리 통신망(LAN) 등과 같은 다양한 유선 통신 시스템들을 사용할 수 있다.According to an embodiment of the present disclosure, a communication network is a Public Switched Telephone Network (PSTN), x Digital Subscriber Line (xDSL), Rate Adaptive DSL (RADSL), Multi Rate DSL (MDSL), Very High (VDSL) Various wired communication systems such as Speed DSL), Universal Asymmetric DSL (UADSL), High Bit Rate DSL (HDSL), and Local Area Network (LAN) can be used.

또한, 본 개시내용에서 제시되는 통신 네트워크는 CDMA(Code Division Multi Access), TDMA(Time Division Multi Access), FDMA(Frequency Division Multi Access), OFDMA(Orthogonal Frequency Division Multi Access), SC-FDMA(Single Carrier-FDMA) 및 다른 시스템들과 같은 다양한 무선 통신 시스템들을 사용할 수 있다. 본 개시내용에서 설명된 기술들은 위에서 언급된 네트워크들뿐만 아니라, 임의의 형태의 다른 통신 네트워크들에서도 사용될 수 있다.In addition, the communication network presented in the present disclosure is CDMA (Code Division Multi Access), TDMA (Time Division Multi Access), FDMA (Frequency Division Multi Access), OFDMA (Orthogonal Frequency Division Multi Access), SC-FDMA (Single Carrier) -FDMA) and other systems can be used for various wireless communication systems. The techniques described in this disclosure may be used in the networks mentioned above, as well as in any form of other communication networks.

이하 도 2를 참조하여, 음성 데이터 생성 서버(100)가 게임 서비스를 이용하는 복수의 사용자의 게임 내 플레이 상황에 대응하는 음성 데이터를 제공하는 방법을 자세히 후술하도록 한다.Hereinafter, a method in which the voice data generating server 100 provides voice data corresponding to in-game play situations of a plurality of users using a game service will be described in detail with reference to FIG. 2 .

도 2는 본 개시의 일 실시예와 관련된 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 서버의 블록 구성도를 도시한다.2 is a block diagram of a server for generating voice data corresponding to an in-game play situation related to an embodiment of the present disclosure.

도 2에서 도시되는 바와 같이, 음성 데이터 생성 서버(100)는 네트워크부(110), 메모리(120) 및 프로세서(130)를 포함할 수 있다. 전술한 컴포넌트들은 예시적인 것으로서 본 개시내용의 권리범위가 전술한 컴포넌트들로 제한되지 않는다. 즉, 본 개시내용의 실시예들에 대한 구현 양태에 따라서 추가적인 컴포넌트들이 포함되거나 또는 전술한 컴포넌트들 중 일부가 생략될 수 있다.As shown in FIG. 2 , the voice data generation server 100 may include a network unit 110 , a memory 120 , and a processor 130 . The above-described components are exemplary, and the scope of the present disclosure is not limited to the above-described components. That is, additional components may be included or some of the above-described components may be omitted depending on implementation aspects for the embodiments of the present disclosure.

본 개시의 일 실시예에 따르면, 음성 데이터 생성 서버(100)는 클라이언트 단말(10) 및 게임 서버(200)와 게임 내 플레이에 대응하는 음성 데이터를 생성하기 위한 데이터를 송수신하는 네트워크부(110)를 포함할 수 있다. According to an embodiment of the present disclosure, the voice data generation server 100 transmits/receives data for generating voice data corresponding to in-game play with the client terminal 10 and the game server 200 and the network unit 110. may include

본 개시의 일 실시예에 따른 네트워크부(110)는 공중전화 교환망(PSTN: Public Switched Telephone Network), xDSL(x Digital Subscriber Line), RADSL(Rate Adaptive DSL), MDSL(Multi Rate DSL), VDSL(Very High Speed DSL), UADSL(Universal Asymmetric DSL), HDSL(High Bit Rate DSL) 및 근거리 통신망(LAN) 등과 같은 다양한 유선 통신 시스템들을 사용할 수 있다.The network unit 110 according to an embodiment of the present disclosure includes a Public Switched Telephone Network (PSTN), x Digital Subscriber Line (xDSL), Rate Adaptive DSL (RADSL), Multi Rate DSL (MDSL), VDSL ( A variety of wired communication systems such as Very High Speed DSL), Universal Asymmetric DSL (UADSL), High Bit Rate DSL (HDSL), and Local Area Network (LAN) can be used.

또한, 본 명세서에서 제시되는 네트워크부(110)는 CDMA(Code Division Multi Access), TDMA(Time Division Multi Access), FDMA(Frequency Division Multi Access), OFDMA(Orthogonal Frequency Division Multi Access), SC-FDMA(Single Carrier-FDMA) 및 다른 시스템들과 같은 다양한 무선 통신 시스템들을 사용할 수 있다.In addition, the network unit 110 presented herein is CDMA (Code Division Multi Access), TDMA (Time Division Multi Access), FDMA (Frequency Division Multi Access), OFDMA (Orthogonal Frequency Division Multi Access), SC-FDMA ( A variety of wireless communication systems can be used, such as Single Carrier-FDMA) and other systems.

본 개시에서 네트워크부(110)는 유선 및 무선 등과 같은 그 통신 양태를 가리지 않고 구성될 수 있으며, 단거리 통신망(PAN: Personal Area Network), 근거리 통신망(WAN: Wide Area Network) 등 다양한 통신망으로 구성될 수 있다. 또한, 상기 네트워크는 공지의 월드와이드웹(WWW: World Wide Web)일 수 있으며, 적외선(IrDA: Infrared Data Association) 또는 블루투스(Bluetooth)와 같이 단거리 통신에 이용되는 무선 전송 기술을 이용할 수도 있다.In the present disclosure, the network unit 110 may be configured regardless of its communication mode, such as wired and wireless, and may be composed of various communication networks such as a short-range network (PAN: Personal Area Network) and a local area network (WAN: Wide Area Network). can In addition, the network may be a well-known World Wide Web (WWW), and may use a wireless transmission technology used for short-range communication, such as infrared (IrDA) or Bluetooth (Bluetooth).

본 명세서에서 설명된 기술들은 위에서 언급된 네트워크들뿐만 아니라, 다른 네트워크들에서도 사용될 수 있다.The techniques described herein may be used in the networks mentioned above, as well as in other networks.

본 개시의 일 실시예에 따르면, 메모리(120)는 프로세서(130)가 생성하거나 결정한 임의의 형태의 정보 및 네트워크부(110)가 수신한 임의의 형태의 정보를 저장할 수 있다.According to an embodiment of the present disclosure, the memory 120 may store any type of information generated or determined by the processor 130 and any type of information received by the network unit 110 .

본 개시의 일 실시예에 따르면, 메모리(120)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 음성 데이터 생성 서버(100)는 인터넷(internet) 상에서 상기 메모리(120)의 저장 기능을 수행하는 웹 스토리지(web storage)와 관련되어 동작할 수도 있다. 전술한 메모리에 대한 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.According to an embodiment of the present disclosure, the memory 120 is a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (eg, a SD or XD memory, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read (PROM) -Only Memory), a magnetic memory, a magnetic disk, and an optical disk may include at least one type of storage medium. The voice data generation server 100 may operate in relation to a web storage that performs a storage function of the memory 120 on the Internet. The description of the above-described memory is only an example, and the present disclosure is not limited thereto.

본 개시의 일 실시예에 따르면, 프로세서(130)는 하나 이상의 코어로 구성될 수 있으며, 음성 데이터 생성 서버(100)의 중앙 처리 장치(CPU: central processing unit), 범용 그래픽 처리 장치(GPGPU: general purpose graphics processing unit), 텐서 처리 장치(TPU: tensor processing unit) 등의 데이터 분석, 딥러닝을 위한 프로세서(130)를 포함할 수 있다. 프로세서(130)는 메모리(120)에 저장된 컴퓨터 프로그램을 판독하여 본 개시의 일 실시예에 따른 게임 내 플레이 상황에 대응하는 음성 데이터를 생성할 수 있다.According to an embodiment of the present disclosure, the processor 130 may be configured with one or more cores, and a central processing unit (CPU) of the voice data generation server 100, a general-purpose graphic processing unit (GPGPU: general) It may include a processor 130 for data analysis, deep learning, such as a purpose graphics processing unit) and a tensor processing unit (TPU). The processor 130 may read the computer program stored in the memory 120 to generate voice data corresponding to an in-game play situation according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따라 프로세서(130)는 신경망의 학습을 위한 계산을 수행할 수 있다. 프로세서(130)는 딥러닝(DL: deep learning)에서 학습을 위한 입력 데이터의 처리, 입력 데이터에서의 피처(feature) 추출, 오차 계산, 역전파(back-propagation)를 이용한 신경망의 가중치 업데이트 등의 신경망의 학습을 위한 계산을 수행할 수 있다.According to an embodiment of the present disclosure, the processor 130 may perform a calculation for learning the neural network. The processor 130 is configured to process input data for learning in deep learning (DL), extract features from input data, calculate errors, update weights of neural networks using back-propagation, etc. Calculations for training neural networks can be performed.

또한, 프로세서(130)의 CPU, GPGPU, 및 TPU 중 적어도 하나가 모델의 학습을 처리할 수 있다. 예를 들어, CPU 와 GPGPU가 함께 모델의 학습, 모델을 이용하여 게임 내 플레이 상황에 대응하는 음성 데이터에 대한 연산을 처리할 수 있다. 또한, 본 개시의 일 실시예에서 복수의 컴퓨팅 장치의 프로세서(130)를 함께 사용하여 모델의 학습, 게임 내 플레이 상황에 대응하는 음성 데이터에 대한 연산을 처리할 수 있다. 또한, 본 개시의 일 실시예에 따른 컴퓨팅 장치에서 수행되는 컴퓨터 프로그램은 CPU, GPGPU 또는 TPU 실행가능 프로그램일 수 있다.In addition, at least one of the CPU, GPGPU, and TPU of the processor 130 may process the training of the model. For example, the CPU and the GPGPU can learn the model together and process the operation on the voice data corresponding to the in-game play situation using the model. In addition, in an embodiment of the present disclosure, by using the processors 130 of a plurality of computing devices together, model learning and operation on voice data corresponding to an in-game play situation may be processed. In addition, the computer program executed in the computing device according to an embodiment of the present disclosure may be a CPU, GPGPU or TPU executable program.

본 개시의 일 실시예에 따르면, 프로세서(130)는 통상적으로 음성 데이터 생성 서버(100)의 전반적인 동작을 처리할 수 있다. 프로세서(130)는 위에서 살펴본 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 메모리(120)에 저장된 응용 프로그램을 구동함으로써, 사용자에게 적절한 정보 또는, 기능을 제공하거나 처리할 수 있다.According to an embodiment of the present disclosure, the processor 130 may typically process the overall operation of the voice data generating server 100 . The processor 130 may provide or process appropriate information or functions to the user by processing signals, data, information, etc. input or output through the above-described components or by driving an application program stored in the memory 120 . .

본 개시의 일 실시예에 따르면, 프로세서(130)는 게임 데이터에 기초하여 음성 합성을 위한 메타 데이터를 생성할 수 있다. 구체적으로, 프로세서(130)는 게임 서버(200)로부터 게임 빌드에 관한 게임 데이터, 또는 게임 내 포함된 복수의 오브젝트에 관련한 이미지 데이터를 수신할 수 있으며, 상기 게임 빌드에 관한 게임 데이터 및 복수의 오브젝트에 관련한 이미지 데이터에 기초하여 음성 합성을 위한 메타 데이터를 생성할 수 있다.According to an embodiment of the present disclosure, the processor 130 may generate metadata for voice synthesis based on game data. Specifically, the processor 130 may receive, from the game server 200 , game data related to a game build, or image data related to a plurality of objects included in the game, and the game data and a plurality of objects related to the game build. Meta data for voice synthesis may be generated based on image data related to .

메타 데이터는 음성 데이터를 생성하기 위한 속성 지시 정보로, 이미지 벡터 정보, 스타일 벡터 정보 및 보이스 식별 정보를 포함할 수 있다. 이 경우, 속성 지시 정보는 게임 서버(200)가 제공하는 게임에 관련한 정보로, 게임 내 포함된 하나 이상의 콘텐츠 각각의 속성 정보, 게임 내 존재하는 복수의 오브젝트 간의 관계 정보 및 복수의 오브젝트 각각의 속성 정보 중 적어도 하나의 정보를 포함할 수 있다.Meta data is attribute indication information for generating voice data, and may include image vector information, style vector information, and voice identification information. In this case, the attribute indication information is game-related information provided by the game server 200 , and includes attribute information of each of one or more contents included in the game, relationship information between a plurality of objects existing in the game, and attributes of each of the plurality of objects. It may include at least one piece of information among the information.

게임 내 포함된 하나 이상의 콘텐츠는, 게임 내에서 복수의 사용자가 즐길 수 있는 다양한 콘텐츠를 의미하며, 예를 들어, 게임 내 존재하는 다양한 플레이들을 수행 목적, 수행 효과, 수행 방법, 수행 과정 및 수행 결과 중 적어도 하나를 기준으로 하여 분류된 것일 수 있다. 예를 들어, 게임 내 포함된 하나 이상의 콘텐츠는 게임 서버(200)가 상기 게임 서버(200)에 접속한 클라이언트 단말(10)로 제공하는 게임 플레이에 포함된 복수의 퀘스트 또는 복수의 스테이지들을 의미할 수 있다. 전술한 게임 내 포함된 하나 이상의 콘텐츠에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.One or more contents included in the game means various contents that can be enjoyed by a plurality of users in the game, for example, the purpose of performing various plays existing in the game, performance effect, performance method, performance process, and performance result It may be classified based on at least one of. For example, one or more contents included in the game may mean a plurality of quests or a plurality of stages included in game play provided by the game server 200 to the client terminal 10 connected to the game server 200 . can The detailed description of one or more contents included in the above-described game is merely an example, and the present disclosure is not limited thereto.

메타 데이터에 포함된 이미지 벡터 정보는, 게임 이미지에 기초하여 생성되는 음성 데이터 생성을 위한 지시 정보를 포함할 수 있다. 구체적으로, 이미지 벡터 정보는 게임 서버(200)가 제공하는 게임 내에 존재하는 복수의 오브젝트 또는 하나 이상의 콘텐츠를 포함하는 게임 이미지들에 기초하여 생성될 수 있다. 또한, 이미지 벡터 정보는 음성 데이터 생성을 위한 지시 정보를 포함할 수 있다.The image vector information included in the metadata may include instruction information for generating voice data generated based on a game image. Specifically, the image vector information may be generated based on game images including a plurality of objects or one or more contents existing in a game provided by the game server 200 . Also, the image vector information may include instruction information for generating voice data.

예를 들어, 게임 서버(200)로부터 수신한 게임 이미지가 제 1 NPC를 포함하며, 제 1 NPC의 외관 이미지가 엘프 종족인 경우, 프로세서(130)는 게임 이미지에 포함된 제 1 NPC가 여성임에 기초하여 차분하고 높은 톤의 여성스러운 음성 데이터 생성을 위한 지시 정보를 포함하는 이미지 벡터 정보를 생성할 수 있다.For example, if the game image received from the game server 200 includes the first NPC, and the appearance image of the first NPC is of an elf race, the processor 130 determines that the first NPC included in the game image is female. Image vector information including instruction information for generating calm and high-tone feminine voice data may be generated based on the .

다른 예를 들어, 게임 서버(200)로부터 수신한 게임 이미지가 제 2 NPC를 포함하며, 제 2 NPC의 외관 이미지가 오크 종족인 경우, 프로세서(130)는 게임 이미지에 포함된 제 2 NPC가 남성임에 기초하여 보다 거칠고 낮은 톤의 남성스러운 음성 데이터 생성을 위한 지시 정보를 포함하는 이미지 벡터 정보를 생성할 수 있다. 전술한 이미지 벡터 정보에 포함된 지시 정보에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.As another example, if the game image received from the game server 200 includes the second NPC, and the appearance image of the second NPC is of an orc race, the processor 130 determines that the second NPC included in the game image is male. Based on this, image vector information including instruction information for generating coarser and lower tone masculine voice data may be generated. The detailed description of the indication information included in the above-described image vector information is only an example, and the present disclosure is not limited thereto.

즉, 프로세서(130)는 게임 이미지에 기초하여 특정 음성 데이터를 생성하기 위한 지시 정보를 포함하는 이미지 벡터 정보를 생성할 수 있다.That is, the processor 130 may generate image vector information including instruction information for generating specific voice data based on the game image.

메타 데이터에 포함된 스타일 벡터 정보는, 게임 내 포함된 하나 이상의 콘텐츠 각각에 대하여 사전 결정된 지시 정보를 포함할 수 있다. 구체적으로, 스타일 벡터 정보는 게임 서버(200)가 제공하는 게임 내 포함된 하나 이상의 콘텐츠 각각에 대하여 게임 개발자가 사전 결정한 게임 빌드에 대응하는 음성 데이터를 생성하기 위한 지시 정보를 포함할 수 있다.The style vector information included in the metadata may include predetermined indication information for each of one or more contents included in the game. Specifically, the style vector information may include instruction information for generating voice data corresponding to a game build predetermined by a game developer for each of one or more contents included in a game provided by the game server 200 .

예를 들어, 게임 서버(200)가 제공하는 게임의 게임 빌드가 제 1 퀘스트가 전반적으로 음산하고 어두운 분위기로 사전 결정된 경우, 프로세서(130)는 게임 빌드에 포함된 제 1 퀘스트에 대한 정보에 대응하여 보다 느린 템포 및 낮은 톤의 음성 데이터 생성을 위한 지시 정보를 포함하는 스타일 벡터 정보를 생성할 수 있다.For example, when the game build of the game provided by the game server 200 is pre-determined that the first quest has an overall gloomy and dark atmosphere, the processor 130 responds to information about the first quest included in the game build Accordingly, style vector information including instruction information for generating voice data of a slower tempo and low tone may be generated.

다른 예를 들어, 게임 서버(200)가 제공하는 게임의 게임 빌드가 제 2 퀘스트가 전반적으로 활기차고 발랄한 분위기로 사전 결정된 경우, 프로세서(130)는 게임 빌드에 포함된 제 2 퀘스트에 대한 정보에 대응하여 보다 빠른 템포 및 높은 톤의 음성 데이터 생성을 위한 지시 정보를 포함하는 스타일 벡터 정보를 생성할 수 있다. For another example, when the game build of the game provided by the game server 200 is predetermined with the second quest as an overall lively and lively atmosphere, the processor 130 responds to information about the second quest included in the game build Accordingly, style vector information including instruction information for generating faster tempo and higher tone voice data may be generated.

또 다른 예를 들어, 게임 서버(200)가 제공하는 게임의 게임 빌드가 제 3 퀘스트에 포함된 제 1 NPC에 대한 음성 데이터를 차분하고 담담한 어조로 사전 결정한 경우, 프로세서(130)는 게임 빌드에 포함된 제 3 퀘스트에 대한 정보에 대응하여 차분하고 담담한 어조의 음성 데이터 생성을 위한 지시 정보를 포함하는 스타일 벡터 정보를 생성할 수 있다. 전술한 스타일 벡터 정보에 포함된 지시 정보에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다. As another example, when the game build of the game provided by the game server 200 pre-determines the voice data for the first NPC included in the third quest in a calm and calm tone, the processor 130 is included in the game build Style vector information including instruction information for generating voice data having a calm and calm tone may be generated in response to the information on the third quest. The detailed description of the indication information included in the above-described style vector information is only an example, and the present disclosure is not limited thereto.

즉, 프로세서(130)는 특정 음성 데이터를 생성하기 위한 사전 결정된 지시 정보를 포함하는 스타일 벡터 정보를 생성할 수 있다. That is, the processor 130 may generate style vector information including predetermined instruction information for generating specific voice data.

메타 데이터에 포함된 보이스 식별 정보는, 음성 합성을 위한 시드(seed) 음성 데이터를 식별하기 위한 정보를 포함할 수 있다. 보다 구체적으로, 프로세서(130)는 음성 합성에 기초가 되는 하나 이상의 시드 음성 데이터를 획득할 수 있으며, 상기 하나 이상의 시드 음성 데이터 각각에 보이스 식별 정보를 매칭할 수 있다. 즉, 보이스 식별 정보는 하나 이상의 시드 음성 데이터 중 특정 시드 음성 데이터를 식별하기 위한 정보일 수 있다. 예를 들어, 제 1 시드 음성 데이터는 제 1 성우로부터 획득한 것일 수 있으며, 제 1 시드 음성 데이터를 식별하기 위한 보이스 식별 정보는 'voice_1'일 수 있다. 또한, 제 2 시드 음성 데이터는 제 2 성우로부터 획득한 것일 수 있으며, 제 2 시드 음성 데이터를 식별하기 위한 보이스 식별 정보는 'voice_2'일 수 있다. 즉, 프로세서(130)는 보이스 식별 정보를 통해 음성 데이터 생성에 기초가 되는 성우의 보이스를 식별할 수 있다. 전술한 시드 음성 데이터, 성우 및 보이스 식별 정보에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.The voice identification information included in the metadata may include information for identifying seed voice data for voice synthesis. More specifically, the processor 130 may acquire one or more seed voice data that is a basis for voice synthesis, and may match voice identification information to each of the one or more seed voice data. That is, the voice identification information may be information for identifying specific seed voice data among one or more seed voice data. For example, the first seed voice data may be obtained from the first voice actor, and voice identification information for identifying the first seed voice data may be 'voice_1'. In addition, the second seed voice data may be obtained from the second voice actor, and voice identification information for identifying the second seed voice data may be 'voice_2'. That is, the processor 130 may identify the voice of the voice actor, which is the basis for generating the voice data, through the voice identification information. The detailed description of the aforementioned seed voice data, voice actors, and voice identification information is merely an example, and the present disclosure is not limited thereto.

본 개시의 일 실시예에 따르면, 프로세서(130)는 음성 합성 모델을 학습시키기 위한 학습 데이터를 구축할 수 있다. 보다 구체적으로, 프로세서(130)는 하나 이상의 시드 음성 데이터, 하나 이상의 시드 음성 데이터 각각에 대응하는 텍스트 및 하나 이상의 시드 음성 데이터에 연관된 메타 데이터에 기초하여 음성 합성 모델을 학습시키기 위한 학습 데이터를 구축할 수 있다.According to an embodiment of the present disclosure, the processor 130 may construct training data for training the speech synthesis model. More specifically, the processor 130 is configured to construct training data for training a speech synthesis model based on one or more seed voice data, text corresponding to each of the one or more seed voice data, and metadata associated with the one or more seed voice data. can

자세히 설명하면, 프로세서(130)는 하나 이상의 시드 음성 데이터, 상기 하나 이상의 시드 음성 데이터 각각에 대응하는 텍스트 및 하나 이상의 시드 음성 데이터에 연관된 이미지 벡터 정보에 기초하여 음성 합성 모델(600)을 학습시키기 위한 학습 데이터를 구축할 수 있다. 또한, 프로세서(130)는 하나 이상의 시드 음성 데이터, 상기 하나 이상의 시드 음성 데이터 대응하는 텍스트 및 하나 이상의 시드 음성 데이터에 연관된 스타일 벡터 정보에 기초하여 음성 합성 모델(600)을 학습시키기 위한 학습 데이터를 구축할 수 있다. 즉, 프로세서(130)는 메타 데이터에 포함된 이미지 벡터 정보 또는 스타일 벡터 정보 중 적어도 하나의 정보와 하나 이상의 시드 음성 데이터, 하나 이상의 시드 음성 데이터에 대응하는 텍스트에 기초하여 음성 합성 모델(600)을 학습시키기 위한 학습 데이터를 구축할 수 있다.More specifically, the processor 130 is configured to train the speech synthesis model 600 based on one or more seed speech data, text corresponding to each of the one or more seed speech data, and image vector information associated with the one or more seed speech data. You can build training data. In addition, the processor 130 builds training data for training the speech synthesis model 600 based on one or more seed voice data, text corresponding to the one or more seed voice data, and style vector information associated with the one or more seed voice data. can do. That is, the processor 130 generates the speech synthesis model 600 based on at least one of image vector information and style vector information included in the metadata, one or more seed voice data, and text corresponding to one or more seed voice data. Learning data for learning can be built.

구체적인 예를 들어, 제 1 NPC를 포함하는 제 1 게임 이미지에 대응하여 사전 결정된 제 1 대사 텍스트가 “물건을 보관해드립니다.”이며, 제 1 대사 텍스트에 대한 하나 이상의 시드 음성 데이터(제 1 시드 음성 데이터, 제 2 시드 음성 데이터 및 제 3 음성 데이터)를 서로 각각 상이한 성우들의 목소리를 통해 사전 획득한 경우, 프로세서(130)는 제 1 게임 이미지에 대응하는 제 1 이미지 벡터 정보, 제 1 대사 텍스트 및 하나 이상의 시드 음성 데이터(제 1 시드 음성 데이터, 제 2 시드 음성 데이터 및 제 3 음성 데이터)를 통해 학습 데이터를 구축할 수 있다. 이 경우, 프로세서(130)는 제 1 시드 음성 데이터 내지 제 3 시드 음성 데이터에 대한 보이스 식별 정보 각각을 'voice_1', 'voice_2' 및 'voice_3'으로 결정할 수 있다. 또한, 프로세서(130)는 제 1 게임 이미지에 대응하는 제 1 이미지 벡터 정보, 보이스 식별 정보 및 제 1 대사 텍스트를 포함하는 학습 입력 데이터를 결정할 수 있으며, 제 1 시드 음성 데이터 내지 제 3 시드 음성 데이터를 학습 타겟 음성 데이터로 결정하여 학습 데이터를 구축할 수 있다. As a specific example, in response to the first game image including the first NPC, the predetermined first dialogue text is “I will keep the item”, and one or more seed voice data for the first dialogue text (the first seed voice) When the data, the second seed voice data, and the third voice data) are pre-acquired through the voices of different voice actors, the processor 130 performs the first image vector information corresponding to the first game image, the first dialogue text, and Learning data may be constructed through one or more seed voice data (first seed voice data, second seed voice data, and third voice data). In this case, the processor 130 may determine the voice identification information for the first to third seed voice data as 'voice_1', 'voice_2', and 'voice_3', respectively. In addition, the processor 130 may determine learning input data including first image vector information corresponding to the first game image, voice identification information, and first dialogue text, the first seed voice data to the third seed voice data can be determined as the learning target voice data to construct the learning data.

다른 예를 들어, 제 2 스테이지에 대응하여 사전 결정된 제 2 대사 텍스트가 “시간이 없네”이며, 제 2 대사 텍스트에 대한 하나 이상의 시드 음성 데이터(제 1 시드 음성 데이터 및 제 2 시드 음성 데이터)를 서로 각각 상이한 성우들을 통해 사전 획득한 경우, 프로세서(130)는 제 2 스테이지에 대응하여 사전 결정된 제 2 스타일 벡터 정보, 제 2 대사 텍스트 및 하나 이상의 시드 음성 데이터(제 1 시드 음성 데이터 및 제 2 시드 음성 데이터)를 통해 학습 데이터를 구축할 수 있다. 이 경우, 프로세서(130)는 제 1 시드 음성 데이터 및 제 2 시드 음성 데이터 각각에 대한 보이스 식별 정보를 'voice_1' 및 'voice_2'로 결정할 수 있다. 또한, 프로세서(130)는 제 2 퀘스트에 대응하여 사전 결정된 제 2 스타일 벡터 정보, 보이스 식별 정보 및 제 2 대사 텍스트 정보를 포함하는 학습 입력 데이터를 결정할 수 있으며, 제 1 시드 음성 데이터 및 제 2 시드 음성 데이터를 학습 타겟 음성 데이터로 결정하여 학습 데이터를 구축할 수 있다. 전술한 대사 텍스트, 하나 이상의 시드 음성 데이터, 이미지 벡터 정보 및 스타일 벡터 정보에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.As another example, the second dialogue text predetermined in response to the second stage is “I do not have time”, and one or more seed voice data (the first seed voice data and the second seed voice data) for the second dialogue text is generated. When pre-obtained through voice actors different from each other, the processor 130 performs the second style vector information, the second dialogue text and one or more seed voice data (the first seed voice data and the second seed preset) corresponding to the second stage. Voice data) can be used to build learning data. In this case, the processor 130 may determine the voice identification information for each of the first seed voice data and the second seed voice data as 'voice_1' and 'voice_2'. In addition, the processor 130 may determine learning input data including predetermined second style vector information, voice identification information, and second dialogue text information in response to the second quest, the first seed voice data and the second seed Learning data may be constructed by determining voice data as learning target voice data. Specific descriptions of the aforementioned dialogue text, one or more seed voice data, image vector information, and style vector information are merely examples, and the present disclosure is not limited thereto.

즉, 학습 입력 데이터는 학습 메타 데이터(학습 이미지 벡터 정보와 학습 보이스 식별 정보 또는, 학습 스타일 벡터 정보와 학습 보이스 식별 정보) 및 학습 대사 텍스트 정보를 포함할 수 있다. 또한, 학습 타겟 음성 데이터는 사전 획득한 하나 이상의 시드 음성 데이터일 수 있다.That is, the learning input data may include learning metadata (learning image vector information and learning voice identification information, or learning style vector information and learning voice identification information) and learning dialogue text information. In addition, the learning target voice data may be one or more pre-obtained seed voice data.

본 개시의 일 실시예에 따르면, 프로세서(130)는 학습 데이터를 통해 음성 합성 모델을 학습시킬 수 있다. 구체적으로, 프로세서(130)는 학습 입력 데이터를 입력으로 하여 학습 입력 데이터를 출력하도록 음성 합성 모델(600)을 학습시킬 수 있다.According to an embodiment of the present disclosure, the processor 130 may train a speech synthesis model through training data. Specifically, the processor 130 may train the speech synthesis model 600 to output the training input data by taking the training input data as an input.

음성 합성 모델(600)은 도 6에 도시된 바와 같이, 차원 감소 서브 모델(610), 차원 복원 서브 모델(630), 어텐션 모듈(620) 및 음성 출력 모듈(640)을 포함할 수 있다. 프로세서(130)가 학습 데이터를 통해 음성 합성 모델(600)을 학습시키는 방법에 대한 구체적인 설명은 도 6 및 도 7을 참조하여 후술하도록 한다.As shown in FIG. 6 , the speech synthesis model 600 may include a dimension reduction submodel 610 , a dimension reconstruction submodel 630 , an attention module 620 , and a speech output module 640 . A detailed description of how the processor 130 trains the speech synthesis model 600 through the training data will be described later with reference to FIGS. 6 and 7 .

프로세서(130)는 학습 입력 데이터를 차원 감소 서브 모델(610)의 입력으로 하여 상기 차원 복원 서브 모델(630)이 타겟 음성 데이터에 대응하는 학습 스펙트로그램을 출력하도록 학습시킬 수 있다. 스펙트로그램은, 음성 데이터를 시각화하여 파악하기 위한 것으로, 시간과 주파수의 변화에 따른 진폭의 차이를 인쇄 농도 및/또는 표시 색상으로 나타낸 그래프일 수 있다. 차원 감소 서브 모델(610)은 메타 데이터 및 대사 텍스트를 입력으로 하여 언어 음성 피처(즉, 임베딩)를 추출하는 모델일 수 있다. 즉, 차원 감소 서브 모델(610)은 프로세서(130)로부터 학습 메타 데이터 및 학습 대사 텍스트를 포함하는 학습 입력 데이터를 수신하여 학습 입력 데이터의 스펙트럼 특징 벡터 열을 출력으로 지정하여 입력 데이터가 스펙트럼으로 변환되는 중간 과정을 학습할 수 있다.The processor 130 may use the training input data as an input of the dimension reduction sub-model 610 to train the dimension restoration sub-model 630 to output a learning spectrogram corresponding to the target voice data. The spectrogram is for visualizing and grasping voice data, and may be a graph in which a difference in amplitude according to time and frequency changes is expressed as print density and/or display color. The dimension reduction sub-model 610 may be a model for extracting speech speech features (ie, embeddings) by inputting metadata and dialogue text as inputs. That is, the dimension reduction sub-model 610 receives learning input data including learning metadata and learning dialogue text from the processor 130 and designates a spectral feature vector column of the learning input data as an output to convert the input data into a spectrum Intermediate courses can be learned.

또한, 프로세서(130)는 차원 감소 서브 모델(610)로부터 메타 데이터 및 대사 텍스트에 대한 임베딩을 어텐션 모듈(620)로 전달할 수 있다. 어텐션 모듈(620)은 대사 텍스트의 음소 및 차원 복원 서브 모델의 타임 스텝 간의 연관 정보를 생성할 수 있다. 다시 말해, 프로세서(130)는 학습 입력 데이터와 학습 스펙트로그램을 통해 신경망 층을 포함하는 어텐션 모듈(620)로 하여금 입력(즉, 학습 메타 데이터 및 학습 대사 텍스트)과 출력(즉, 학습 스펙트로그램) 사이의 매핑 관계를 학습하도록 할 수 있다.Also, the processor 130 may transmit the embeddings for the meta data and the dialogue text from the dimension reduction sub-model 610 to the attention module 620 . The attention module 620 may generate association information between the phoneme of the dialogue text and the time step of the dimension reconstruction sub-model. In other words, the processor 130 causes the attention module 620 including the neural network layer to input (ie, learning metadata and learning dialogue text) and output (ie, learning spectrogram) through the learning input data and the learning spectrogram. It is possible to learn the mapping relationship between

차원 복원 서브 모델(630)은 하나 이상의 RNN(Recurrent Neural Networks)을 포함하며, 상기 하나 이상의 RNN을 통해 제 1 타임 스탬프의 스펙트로그램을 입력으로 하여 제 2 타임 스탬프의 스펙트로그램을 출력하는 모델일 수 있다. 이 경우, 제 1 타임 스탬프는 제 2 타임 스탬프 보다 앞선 시점일 수 있다. 구체적으로, 차원 복원 서브 모델(630)의 제 1 RNN을 통해 예측된 스펙트로그램은 다음 타임 스탬프의 제 2 RNN으로 전달되고, 제 2 RNN을 통해 예측된 스펙트로그램은 다음 타임 스탬프의 제 3 RNN으로 전달될 수 있다. 이 과정에서 차원 복원 서브 모델(630)은 어텐션 모듈(620)로부터의 대사 텍스트의 음소 및 차원 복원 서브 모델의 타임 스텝 간의 연관 정보를 통해 집중해야 할 텍스트를 결정할 수 있다. 즉, 차원 복원 서브 모델(630)은 하나 이상의 RNN을 통해 스펙트로그램을 반복하여 예측하는 과정에서 어텐션 모듈(620)을 통해 집중할 텍스트를 결정함으로써, 학습되지 않은 텍스트에 대한 스펙트로그램을 출력하도록 학습될 수 있다.The dimension reconstruction submodel 630 includes one or more Recurrent Neural Networks (RNNs), and may be a model that outputs a spectrogram of a second timestamp by inputting a spectrogram of a first timestamp as an input through the one or more RNNs. have. In this case, the first time stamp may be earlier than the second time stamp. Specifically, the spectrogram predicted through the first RNN of the dimension reconstruction submodel 630 is transferred to the second RNN of the next time stamp, and the spectrogram predicted through the second RNN is transferred to the third RNN of the next time stamp. can be transmitted. In this process, the dimension reconstruction sub-model 630 may determine the text to be focused on through association information between the phoneme of the dialogue text from the attention module 620 and the time step of the dimension reconstruction sub-model. That is, the dimension restoration sub-model 630 determines the text to focus on through the attention module 620 in the process of repeatedly predicting the spectrogram through one or more RNNs, thereby learning to output the spectrogram for the unlearned text. can

프로세서(130)는 차원 복원 서브 모델(630)의 출력인 스펙트로그램을 음성 출력 모듈(640)에 전달할 수 있다. 음성 출력 모듈(640)은 스펙트로그램을 음성 데이터로 합성하여 합성된 음성 데이터를 출력할 수 있다. 이 경우, 음성 출력 모듈(640)은 그리핌-림(Griffin-Lim) 알고리즘을 통해 구현될 수 있다. The processor 130 may transmit a spectrogram that is an output of the dimension reconstruction submodel 630 to the voice output module 640 . The voice output module 640 may synthesize the spectrogram into voice data and output the synthesized voice data. In this case, the voice output module 640 may be implemented through a Griffin-Lim algorithm.

상술한 바와 같은 학습 과정을 통해 음성 합성 모델은 메타 정보 및 대사 텍스트에 대응하여 보다 자연스러운 합성된 음성 데이터를 출력하여 게임에 반영할 수 있다. 이에 따라, 게임 유저들은 게임 안내 음성에 대하여 어색함으로 느끼지 않아 게임에 대한 몰입도 증가할 수 있으며, 게임의 흥미가 증가할 수 있다. 또한, 사전 학습하지 않은 대사 텍스트에 대응하는 합성된 음성 데이터를 생성할 수 있어, 비교적 적은 양의 학습 데이터만으로도 억양, 어조, 의도 및 감정 등을 포함하는 음성 데이터를 생성할 수 있다. Through the learning process as described above, the voice synthesis model may output more natural synthesized voice data in response to meta information and dialogue text to reflect it in the game. Accordingly, game users do not feel awkward about the game guide voice, so their immersion in the game may increase, and the interest in the game may increase. In addition, it is possible to generate synthesized voice data corresponding to non-pre-trained dialogue text, so that voice data including intonation, tone, intention and emotion can be generated only with a relatively small amount of learning data.

본 개시의 일 실시예에 따르면, 프로세서(130)는 메타 데이터 및 대사 텍스트를 음성 합성 모델에 입력시킬 수 있다. 대사 텍스트는 음성 데이터를 생성하기 위한 콘텐츠 정보를 포함하며, 게임 내 포함된 하나 이상의 콘텐츠 각각에 기초하여 사전 결정될 수 있다. 구체적으로, 프로세서(130)는 이미지 벡터 정보, 스타일 벡터 정보 및 보이스 식별 정보 중 적어도 하나를 포함하는 메타 데이터와 상기 메타 데이터에 대응하는 대사 텍스트를 음성 합성 모델에 입력시킬 수 있다.According to an embodiment of the present disclosure, the processor 130 may input metadata and dialogue text to the speech synthesis model. The dialogue text includes content information for generating voice data, and may be predetermined based on each of one or more contents included in the game. Specifically, the processor 130 may input metadata including at least one of image vector information, style vector information, and voice identification information and a dialogue text corresponding to the metadata to the speech synthesis model.

보다 구체적인 예를 들어, 제 1 이미지 벡터 정보가 제 1 NPC의 외관 이미지에 대응하는 차분하고 높은 톤의 여성스러운 음성 데이터 생성을 위한 지시 정보를 포함하며, 제 1 보이스 식별 정보가 제 1 성우의 목소리에 대응하는 'Voice_1'에 대한 정보를 포함하는 경우, 프로세서(130)는 해당 제 1 NPC에 사전 결정된 제 1 대사 텍스트(예컨대, “안녕하세요, 반가워요!”), 상기 제 1 이미지 벡터 정보 및 상기 제 1 보이스 식별 정보를 음성 합성 모델에 입력시킬 수 있다.As a more specific example, the first image vector information includes instruction information for generating calm, high-toned, feminine voice data corresponding to the appearance image of the first NPC, and the first voice identification information corresponds to the voice of the first voice actor. When the information on the corresponding 'Voice_1' is included, the processor 130 transmits a predetermined first dialogue text to the corresponding first NPC (eg, “Hello, nice to meet you!”), the first image vector information, and the first Voice identification information may be input to the speech synthesis model.

다른 예를 들어, 제 2 스타일 벡터 정보가 제 2 스테이지가 전반적으로 음산하고 어두운 분위기로 사전 결정된 게임 빌드에 대응하여 보다 느린 템포 및 낮은 톤의 음성 데이터 생성을 위한 지시 정보를 포함하며, 제 2 보이스 식별 정보가 제 2 성우의 목소리에 대응하는 'Voice_2'에 대한 정보를 포함하는 경우, 프로세서(130)는 제 2 스테이지에 사전 결정된 제 2 대사 텍스트(예컨대, “후후, 어서오시게”), 상기 제 2 스타일 벡터 정보 및 상기 제 2 보이스 식별 정보를 음성 합성 모델에 입력시킬 수 있다. 전술한 이미지 벡터 정보에 포함된 지시 정보, 보이스 식별 정보 및 대사 텍스트에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.For another example, the second style vector information includes instructional information for generating voice data of a slower tempo and lower tone in response to a game build predetermined in which the second stage is generally gloomy and dark atmosphere, and the second voice If the identification information includes information on 'Voice_2' corresponding to the voice of the second voice actor, the processor 130 may perform a second pre-determined second dialogue text (eg, “whohoo, welcome”) in the second stage, the second The second style vector information and the second voice identification information may be input to the speech synthesis model. The detailed description of the instruction information, the voice identification information, and the dialogue text included in the above-described image vector information is only an example, and the present disclosure is not limited thereto.

본 개시의 일 실시예에 따르면, 프로세서(130)는 음성 합성 모델(600)을 통해 합성된 음성 데이터를 생성할 수 있다. 구체적으로, 프로세서(130)는 메타 데이터 및 대사 텍스트를 음성 합성 모델(600)의 입력으로 하여 합성된 음성 데이터를 생성할 수 있다. 또한, 프로세서(130)는 합성된 음성 데이터가 게임 내 포함된 하나 이상의 콘텐츠 및 복수의 오브젝트 각각에 대응하여 출력되도록 사전 결정할 수 있다.According to an embodiment of the present disclosure, the processor 130 may generate synthesized voice data through the voice synthesis model 600 . Specifically, the processor 130 may generate synthesized voice data by using metadata and dialogue text as inputs of the voice synthesis model 600 . Also, the processor 130 may pre-determine that the synthesized voice data be output corresponding to one or more contents included in the game and each of a plurality of objects.

구체적인 예를 들어, 도 4에 도시된 바와 같이, 클라이언트 단말(10)이 게임 서버(200)에 접속하여 게임을 플레이하는 경우, 상기 클라이언트 단말(10)의 캐릭터에게 게임에 대한 안내 음성을 제공하는 오브젝트는, 휴먼 종족에 대응하는 제 1 캐릭터 오브젝트(410) 및 오크 종족에 대응하는 제 2 캐릭터 오브젝트(420)를 포함할 수 있다.As a specific example, as shown in FIG. 4 , when the client terminal 10 connects to the game server 200 and plays a game, a voice guidance about the game is provided to the character of the client terminal 10 The object may include a first character object 410 corresponding to a human race and a second character object 420 corresponding to an orc race.

프로세서(130)는 제 1 캐릭터 오브젝트(410)에 대한 이미지 정보에 기초하여 근엄하고 부드러우며 남성스러운 음성 데이터 생성을 위한 지시 정보를 포함하는 제 1 이미지 벡터 정보를 생성할 수 있으며, 상기 제 1 이미지 벡터 정보, 제 1 대사 텍스트(411) 및 제 1 보이스 식별 정보(예컨대, 제 1 성우의 목소리)에 기초하여 생성된 제 1 음성 데이터를 상기 제 1 캐릭터 오브젝트(410)에 대응하여 출력 가능한 음성 데이터로 사전 결정할 수 있다. 즉, 클라이언트 단말(10)의 캐릭터가 게임을 플레이하는 과정에서 제 1 캐릭터 오브젝트(410)와 연계된 플레이를 수행하는 경우, 제 1 음성 데이터(즉, 제 1 성우의 목소리에 기초한 근엄하고 부드러우며 남성스러운 음성 데이터)가 제공될 수 있다.The processor 130 may generate first image vector information including instruction information for generating serious, soft, and masculine voice data based on the image information on the first character object 410, and the first image Voice data capable of outputting first voice data generated based on the vector information, the first dialogue text 411 , and the first voice identification information (eg, the voice of the first voice actor) corresponding to the first character object 410 . can be determined in advance. That is, when the character of the client terminal 10 performs a play associated with the first character object 410 in the process of playing the game, the first voice data (ie, the solemn and soft masculine voice data) may be provided.

또한, 프로세서(130)는 제 2 캐릭터 오브젝트(420)에 대한 이미지 정보에 기초하여 보다 낮은 톤 및 분노에 찬 어조의 음성 데이터 생성을 위한 지시 정보를 포함하는 제 2 이미지 벡터 정보를 생성할 수 있으며, 상기 제 2 이미지 벡터 정보, 제 2 대사 텍스트(421) 및 제 2 보이스 식별 정보(예컨대, 제 2 성우의 목소리)에 기초하여 생성된 제 2 음성 데이터를 상기 제 2 캐릭터 오브젝트(420)에 대응하여 출력 가능한 음성 데이터로 사전 결정할 수 있다. 즉, 클라이언트 단말(10)의 캐릭터가 게임을 플레이하는 과정에서 제 2 캐릭터 오브젝트(420)와 연계된 플레이를 수행하는 경우, 제 2 음성 데이터(즉, 제 2 성우 목소리에 기초한 보다 낮은 톤 및 분노에 찬 어조의 음성 데이터)가 제공될 수 있다. 전술한 캐릭터 오브젝트, 대사 텍스트, 보이스 식별 정보 및 음성 데이터에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.In addition, the processor 130 may generate second image vector information including instruction information for generating voice data of a lower tone and an angry tone based on the image information about the second character object 420 , , corresponding to the second voice data generated based on the second image vector information, the second dialogue text 421 and the second voice identification information (eg, the voice of the second voice actor) to the second character object 420 . This can be pre-determined as outputable voice data. That is, when the character of the client terminal 10 performs play associated with the second character object 420 in the process of playing the game, the second voice data (ie, lower tone and anger based on the second voice actor's voice) voice data in a full tone) may be provided. Specific description of the above-described character object, dialogue text, voice identification information, and voice data is merely an example, and the present disclosure is not limited thereto.

즉, 프로세서(130)는 클라이언트 단말(10)의 캐릭터가 제 1 캐릭터 오브젝트(410) 및 제 2 캐릭터 오브젝트(420) 각각에 연계된 게임 플레이를 수행하는 경우, 각각의 플레이에 대응하여 상이한 음성 데이터를 제공할 수 있다. 다시 말해, 프로세서(130)는 클라이언트 단말(10)의 캐릭터가 수행하는 플레이 상황에 대응하는 음성 데이터를 제공할 수 있다. That is, when the character of the client terminal 10 performs a game play associated with each of the first character object 410 and the second character object 420 , the processor 130 performs different voice data corresponding to each play. can provide In other words, the processor 130 may provide voice data corresponding to a play situation performed by the character of the client terminal 10 .

다른 예를 들어, 도 5에 도시된 바와 같이, 클라이언트 단말(10)이 게임 서버(200)에 접속하여 게임을 플레이하는 경우, 상기 클라이언트 단말(10)의 사용자 캐릭터 오브젝트(510)에게 게임에 대한 안내 음성을 제공하는 오브젝트는 여성 NPC인 제 3 캐릭터 오브젝트(520)일 수 있다. For another example, as shown in FIG. 5 , when the client terminal 10 connects to the game server 200 and plays a game, the user character object 510 of the client terminal 10 is provided with information about the game. The object providing the guidance voice may be a third character object 520 that is a female NPC.

프로세서(130)는 여성 NPC인 제 3 캐릭터 오브젝트(520)에 대한 이미지 정보(즉, 여성) 및 클라이언트 단말(10)의 사용자 캐릭터 오브젝트(510)에 이미지 정보(즉, 남성)에 기초하여 수줍은 어조 및 부끄러운 감정의 음성 데이터 생성을 위한 지시 정보를 포함하는 제 3 이미지 벡터 정보를 생성할 수 있으며, 상기 제 3 이미지 벡터 정보, 제 3 대사 텍스트(530) 및 제 3 보이스 식별 정보(예컨대, 제 3 성우의 목소리)에 기초하여 생성된 제 3 음성 데이터를 상기 제 3 캐릭터 오브젝트(520)에 대응하여 출력 가능한 음성 데이터로 사전 결정할 수 있다. 즉, 프로세서(130)는 오브젝트 간의 관계가 고려된 음성 데이터를 클라이언트 단말(10)의 캐릭터가 수행하는 플레이 상황에 대응하는 음성 데이터로 제공할 수 있다. 전술한 예시에서는 각 오브젝트 간의 관계가 성별인 것에 기초한 것을 서술하였지만, 본 개시는 각 오브젝트 간의 형성될 수 있는 보다 다양한 관계(예컨대, 게임 스토리와 연관된 플레이어 캐릭터와 NPC의 종족에 따른 관계 또는 직책, 직업 상의 상, 하위 관계 등)에 대응하는 음성 데이터를 제공할 수도 있다. The processor 130 has a shy tone based on image information (ie, female) about the third character object 520 that is a female NPC and image information (ie, male) on the user character object 510 of the client terminal 10 . and third image vector information including instruction information for generating voice data of shameful emotion, the third image vector information, the third dialogue text 530 and the third voice identification information (eg, the third The third voice data generated based on the voice of a voice actor) may be pre-determined as outputable voice data corresponding to the third character object 520 . That is, the processor 130 may provide voice data in consideration of the relationship between objects as voice data corresponding to a play situation performed by the character of the client terminal 10 . Although the above-described example describes that the relationship between each object is based on gender, the present disclosure provides more diverse relationships that can be formed between each object (eg, a relationship or position, occupation according to the race of the player character and NPC associated with the game story). It is also possible to provide voice data corresponding to the upper and lower relationships of the upper and lower levels).

따라서, 게임 내 존재하는 방대한 양의 안내 음성을 모두 성우 목소리를 통해 사전 녹음할 필요가 없어, 성우 고용 비용 및 녹음 비용에 따라 상승하는 게임 내 안내 음성 제공 비용의 감소를 초래하여 게임 서비스의 수익률을 재고할 수 있다. 이와 더불어, 성우를 통해 특정 텍스트에 대한 녹음을 수행하는 경우, 추후 수정이 어려울 수 있으나, 본 개시의 전술한 구성들을 통해 파라미터를 조절하여 감정이 들어간 음성을 합성하거나, 화자의 목소리를 변환하는 등 다양한 음성 합성을 통해 게임 플레이 상황에 대응하는 제공되는 음성 데이터의 수정이 가능해질 수 있다.Therefore, there is no need to pre-record all of the vast amount of guidance voices in the game through the voice actors, resulting in a reduction in the cost of providing in-game guidance voices, which rises according to the cost of hiring voice actors and recording costs, thereby increasing the profitability of the game service. can be stocked In addition, in the case of recording a specific text through a voice actor, it may be difficult to modify it later, but through the above-described configurations of the present disclosure, parameters are adjusted to synthesize a voice with emotion, convert the speaker's voice, etc. Through various voice synthesis, it may be possible to modify the provided voice data corresponding to the game play situation.

본 개시의 일 실시예에 따르면, 프로세서(130)는 클라이언트 단말(10)로 하여금 게임 내 플레이 상황에 대응하는 음성 데이터를 출력하도록 야기시킬 수 있다. 구체적으로, 프로세서(130)는 클라이언트 단말(10)로부터 음성 출력 제어 신호를 수신하는 경우, 클라이언트 단말(10)로 하여금 게임 내 플레이 상황에 대응하는 음성 데이터를 출력하도록 야기시킬 수 있다. 음성 출력 제어 신호는, 클라이언트 단말(10)의 캐릭터의 플레이에 대응하여 생성되는 것으로, 예를 들어, 클라이언트 단말(10)의 캐릭터가 게임 내 존재하는 하나 이상의 콘텐츠를 수행하고자 하거나 완료하는 경우 생성될 수 있다. 다른 예를 들어, 클라이언트 단말(10)의 캐릭터가 게임 내 존재하는 복수의 오브젝트 중 클라이언트 단말(10)의 사용자에게 게임 정보를 제공하기 위한 음성 데이터를 출력하는 오브젝트에 대한 연계 플레이에 대응하여 음성 출력 제어 신호가 생성될 수 있다. 전술한 음성 출력 제어 신호에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.According to an embodiment of the present disclosure, the processor 130 may cause the client terminal 10 to output voice data corresponding to an in-game play situation. Specifically, when receiving the voice output control signal from the client terminal 10 , the processor 130 may cause the client terminal 10 to output voice data corresponding to the in-game play situation. The voice output control signal is generated in response to the play of the character of the client terminal 10, for example, to be generated when the character of the client terminal 10 intends to perform or completes one or more contents existing in the game. can For another example, the character of the client terminal 10 outputs a voice in response to a linked play on an object that outputs voice data for providing game information to the user of the client terminal 10 among a plurality of objects existing in the game A control signal may be generated. The detailed description of the above-described audio output control signal is only an example, and the present disclosure is not limited thereto.

자세히 설명하면, 프로세서(130)는 게임 서버(200)를 통해 게임이 업데이트 되는 경우, 음성 합성 모델(600)을 통해 생성된 복수의 합성된 음성 데이터를 CDN(Content Delivery Network)에 사전 저장할 수 있다. CDN은 게임 서버(200)에 대응하여 캐시(cache) 서버를 복수개로 구성하고, 클라이언트 단말(10)로부터 사용 요청 및/또는 다운로드 요청을 수신하는 경우, 요청지에서 가장 인접한 캐시 서버에서 요청에 대응하는 정보를 전달하는 것을 의미할 수 있다. 또한, 프로세서(130)는 CDN을 통해 게임 서버(200)에 접속하여 게임을 수행하는 클라이언트 단말(10) 각각으로 복수의 음성 데이터가 전달되도록 할 수 있다. 즉, 클라이언트 단말(10)은 게임 업데이트 시 CDN을 통해 음성 합성 모델(600)을 통해 생성된 복수의 합성된 음성 데이터를 다운로드할 수 있다. 다시 말해, 클라이언트 단말(10) 각각은 게임 업데이트 시, 게임 내 플레이 상황 각각에 대응하는 복수의 음성 데이터를 CDN을 통해 다운로드 할 수 있으며, 게임 서버(200)에 게임을 플레이하는 과정에서 사용자가 수행하는 플레이에 대응하는 음성 데이터를 출력할 수 있다.More specifically, when a game is updated through the game server 200 , the processor 130 may pre-store a plurality of synthesized voice data generated through the voice synthesis model 600 in a content delivery network (CDN). . The CDN configures a plurality of cache servers to correspond to the game server 200 , and when receiving a use request and/or a download request from the client terminal 10 , the CDN responds to the request from the cache server closest to the request site It can mean conveying information that In addition, the processor 130 may access the game server 200 through the CDN to transmit a plurality of voice data to each of the client terminals 10 performing a game. That is, the client terminal 10 may download a plurality of synthesized voice data generated through the voice synthesis model 600 through the CDN during game update. In other words, each of the client terminals 10 may download a plurality of voice data corresponding to each play situation in the game through the CDN when the game is updated, and the user performs the game on the game server 200 while playing the game. It is possible to output audio data corresponding to the play being played.

따라서, 클라이언트 단말(10)은 CDN을 통해 생성된 복수의 합성된 음성 데이터를 다운로드할 수 있어, 해당 음성 데이터를 다운로드하는 과정에서 발생할 수 있는 병목현상을 해결할 수 있으며, 음성 데이터를 빠르고 안정적으로 제공받을 수 있다. 또한, 특정 캐시 서버에 장애가 발생하더라도 다른 캐시 서버에서 음성 데이터의 다운로드가 가능하므로, 다운로드 중단이 발생하지 않을 수 있다. Accordingly, the client terminal 10 can download a plurality of synthesized voice data generated through the CDN, thereby solving a bottleneck that may occur in the process of downloading the corresponding voice data, and providing voice data quickly and stably can receive In addition, even if a specific cache server fails, since voice data can be downloaded from another cache server, the download may not be interrupted.

또한, 프로세서(130)는 클라이언트 단말(10)이 음성 합성 모델(600)을 통해 생성된 복수의 합성된 음성 데이터를 다운로드하지 않은 경우, 실시간으로 합성된 음성 데이터를 전송할 수 있다. 구체적으로, 프로세서(130)는 클라이언트 단말(10)이 수행하는 플레이에 대응하는 합성된 음성 데이터를 다운로드하였는지 여부를 식별할 수 있다. 또한, 프로세서(130)는 클라이언트 단말(10)이 수행하는 플레이에 대응하는 합성된 음성 데이터를 다운로드 하지 않은 경우, CDN을 통해 실시간으로 합성된 음성 데이터가 전송되도록 할 수 있다.Also, when the client terminal 10 does not download a plurality of synthesized voice data generated through the voice synthesis model 600 , the processor 130 may transmit synthesized voice data in real time. Specifically, the processor 130 may identify whether synthesized voice data corresponding to the play performed by the client terminal 10 has been downloaded. In addition, when the synthesized voice data corresponding to the play performed by the client terminal 10 is not downloaded, the processor 130 may transmit the synthesized voice data in real time through the CDN.

예를 들어, 클라이언트 단말(10)이 게임 서버(200)에 접속하여 제 1 스테이지에 대한 플레이를 수행하고자 하며, 상기 클라이언트 단말(10)에 상기 제 1 스테이지에 대응하는 제 1 합성된 음성 데이터가 다운로드 되지 않은 경우, 프로세서(130)는 상기 클라이언트 단말(10)에 제 1 합성된 음성 데이터가 다운로드 되지 않은 것을 식별하여 상기 클라이언트 단말(10)로 CDN을 통해 제 1 합성된 음성 데이터를 전송할 수 있다. 전술한 스테이지 및 합성된 음성 데이터에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.For example, the client terminal 10 connects to the game server 200 to play a first stage, and the client terminal 10 receives first synthesized voice data corresponding to the first stage. If it is not downloaded, the processor 130 may identify that the first synthesized voice data is not downloaded to the client terminal 10 and transmit the first synthesized voice data to the client terminal 10 through the CDN. . The detailed description of the above stage and synthesized voice data is only an example, and the present disclosure is not limited thereto.

본 개시의 다른 실시예에 따르면, 프로세서(130)는 합성된 음성 데이터에 대응하는 URL을 클라이언트 단말(10)로 제공함으로써, 상기 클라이언트 단말(10)로 하여금 합성된 음성 데이터를 출력하도록 야기시킬 수 있다. 보다 구체적으로, 클라이언트 단말(10)이 게임 내 포함된 하나 이상의 콘텐츠를 수행하고자 하는 경우, 프로세서(130)는 상기 하나 이상의 콘텐츠 각각에 대응하는 합성된 음성 데이터에 대응하는 URL을 상기 클라이언트 단말(10)로 제공함으로써, 상기 클라이언트 단말(10)로 하여금 상기 하나 이상의 콘텐츠 각각에 대응하는 합성된 음성 데이터를 출력하도록 야기시킬 수 있다. According to another embodiment of the present disclosure, the processor 130 may cause the client terminal 10 to output the synthesized voice data by providing a URL corresponding to the synthesized voice data to the client terminal 10 . have. More specifically, when the client terminal 10 intends to perform one or more contents included in the game, the processor 130 transmits a URL corresponding to synthesized voice data corresponding to each of the one or more contents to the client terminal 10 ) to cause the client terminal 10 to output synthesized voice data corresponding to each of the one or more contents.

예를 들어, 클라이언트 단말(10)이 제 1 NPC에 대한 상호 작용을 수행하고자 하는 경우, 프로세서(130)는 제 1 NPC에 대응하여 사전 저장된 제 1 음성 데이터의 URL을 클라이언트 단말(10)로 제공할 수 있다, 이 경우, 클라이언트 단말(10)은 프로세서(130)로부터 수신한 제 1 음성 데이터를 제 1 NPC에 대한 상호 작용에 대한 음성으로써 출력할 수 있다. 전술한 NPC, 음성 데이터에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다. For example, when the client terminal 10 wants to interact with the first NPC, the processor 130 provides the URL of the first voice data stored in advance in response to the first NPC to the client terminal 10 . In this case, the client terminal 10 may output the first voice data received from the processor 130 as a voice for interaction with the first NPC. The detailed description of the NPC and voice data described above is only an example, and the present disclosure is not limited thereto.

즉, 클라이언트 단말(10)은 게임에 반영되는 복수의 합성된 음성 데이터를 다운로드를 통해 사전 저장할 필요없이, 프로세서(130)로부터 사용자가 수행하고자 하는 플레이에 대응하는 합성된 음성 데이터의 URL을 제공받아 사용자의 플레이에 대응하는 음성 데이터를 출력시킬 수 있다.That is, the client terminal 10 receives the URL of the synthesized voice data corresponding to the play the user wants to perform from the processor 130 without the need to pre-store a plurality of synthesized voice data reflected in the game through downloading. It is possible to output audio data corresponding to the user's play.

도 3은 본 개시의 일 실시예와 관련된 게임 내 플레이 상황에 대응하는 음성 데이터를 제공하기 위한 예시적인 순서도를 도시한다.3 illustrates an exemplary flowchart for providing voice data corresponding to an in-game play situation related to an embodiment of the present disclosure.

본 개시의 일 실시예에 따르면, 음성 데이터 생성 서버(100)는 게임 데이터에 기초하여 음성 합성을 위한 데이터 데이터를 생성할 수 있다(310).According to an embodiment of the present disclosure, the voice data generation server 100 may generate data data for voice synthesis based on game data ( 310 ).

본 개시의 일 실시예에 따르면, 음성 데이터 생성 서버(100)는 메타 데이터 및 대사 텍스트를 음성 합성 모델에 입력시킬 수 있다(320).According to an embodiment of the present disclosure, the voice data generation server 100 may input metadata and dialogue text into the voice synthesis model ( 320 ).

본 개시의 일 실시예에 따르면, 음성 데이터 생성 서버(100)는 음성 합성 모델을 통해 합성된 음성 데이터를 생성할 수 있다(330).According to an embodiment of the present disclosure, the voice data generation server 100 may generate synthesized voice data through a voice synthesis model ( 330 ).

전술한 도 3에 도시된 단계들은 필요에 의해 순서가 변경될 수 있으며, 적어도 하나 이상의 단계가 생략 또는 추가될 수 있다. 즉, 전술한 단계는 본 개시의 일 실시예에 불과할 뿐, 본 개시의 권리 범위는 이에 제한되지 않는다.The order of the steps illustrated in FIG. 3 described above may be changed if necessary, and at least one or more steps may be omitted or added. That is, the above-described steps are merely an embodiment of the present disclosure, and the scope of the present disclosure is not limited thereto.

도 8은 본 개시의 일 실시예와 관련된 네트워크 함수를 나타낸 개략도이다.8 is a schematic diagram illustrating a network function related to an embodiment of the present disclosure.

본 명세서에 걸쳐, 서브 모델, 연산 모델, 신경망, 네트워크 함수, 뉴럴 네트워크(neural network)는 동일한 의미로 사용될 수 있다. 신경망은 일반적으로 "노드"라 지칭될 수 있는 상호 연결된 계산 단위들의 집합으로 구성될 수 있다. 이러한 "노드"들은 "뉴런(neuron)"들로 지칭될 수도 있다. 신경망은 적어도 하나 이상의 노드들을 포함하여 구성된다. 신경망들을 구성하는 노드(또는 뉴런)들은 하나 이상의"링크"에 의해 상호 연결될 수 있다.Throughout this specification, the terms sub-model, computational model, neural network, network function, and neural network may be used interchangeably. A neural network may be composed of a set of interconnected computational units, which may generally be referred to as “nodes”. These “nodes” may be referred to as “neurons”. A neural network is configured to include at least one or more nodes. Nodes (or neurons) constituting the neural networks may be interconnected by one or more “links”.

신경망 내에서, 링크를 통해 연결된 하나 이상의 노드들은 상대적으로 입력 노드 및 출력 노드의 관계를 형성할 수 있다. 입력 노드 및 출력 노드의 개념은 상대적인 것으로서, 하나의 노드에 대하여 출력 노드 관계에 있는 임의의 노드는 다른 노드와의 관계에서 입력 노드 관계에 있을 수 있으며, 그 역도 성립할 수 있다. 상술한 바와 같이, 입력 노드 대 출력 노드 관계는 링크를 중심으로 생성될 수 있다. 하나의 입력 노드에 하나 이상의 출력 노드가 링크를 통해 연결될 수 있으며, 그 역도 성립할 수 있다. In the neural network, one or more nodes connected through a link may relatively form a relationship between an input node and an output node. The concepts of an input node and an output node are relative, and any node in an output node relationship with respect to one node may be in an input node relationship in a relationship with another node, and vice versa. As described above, an input node-to-output node relationship may be created around a link. One or more output nodes may be connected to one input node through a link, and vice versa.

하나의 링크를 통해 연결된 입력 노드 및 출력 노드 관계에서, 출력 노드는 입력 노드에 입력된 데이터에 기초하여 그 값이 결정될 수 있다. 여기서 입력 노드와 출력 노드를 상호 연결하는 노드는 가중치(weight)를 가질 수 있다. 가중치는 가변적일 수 있으며, 신경망이 원하는 기능을 수행하기 위해, 사용자 또는 알고리즘에 의해 가변될 수 있다. 예를 들어, 하나의 출력 노드에 하나 이상의 입력 노드가 각각의 링크에 의해 상호 연결된 경우, 출력 노드는 상기 출력 노드와 연결된 입력 노드들에 입력된 값들 및 각각의 입력 노드들에 대응하는 링크에 설정된 가중치에 기초하여 출력 노드 값을 결정할 수 있다.In the relationship between the input node and the output node connected through one link, the value of the output node may be determined based on data input to the input node. Here, a node interconnecting the input node and the output node may have a weight. The weight may be variable, and may be changed by a user or an algorithm in order for the neural network to perform a desired function. For example, when one or more input nodes are interconnected to one output node by respective links, the output node sets values input to input nodes connected to the output node and links corresponding to the respective input nodes. An output node value may be determined based on the weight.

상술한 바와 같이, 신경망은 하나 이상의 노드들이 하나 이상의 링크를 통해 상호 연결되어 신경망 내에서 입력 노드 및 출력 노드 관계를 형성한다. 신경망 내에서 노드들과 링크들의 개수 및 노드들과 링크들 사이의 연관관계, 링크들 각각에 부여된 가중치의 값에 따라, 신경망의 특성이 결정될 수 있다. 예를 들어, 동일한 개수의 노드 및 링크들이 존재하고, 링크들 사이의 가중치 값이 상이한 두 신경망이 존재하는 경우, 두 개의 신경망들은 서로 상이한 것으로 인식될 수 있다.As described above, in a neural network, one or more nodes are interconnected through one or more links to form an input node and an output node relationship in the neural network. The characteristics of the neural network may be determined according to the number of nodes and links in the neural network, the correlation between the nodes and the links, and the value of a weight assigned to each of the links. For example, when the same number of nodes and links exist and there are two neural networks having different weight values between the links, the two neural networks may be recognized as different from each other.

신경망은 하나 이상의 노드들을 포함하여 구성될 수 있다. 신경망을 구성하는 노드들 중 일부는, 최초 입력 노드로부터의 거리들에 기초하여, 하나의 레이어(layer)를 구성할 수 있다, 예를 들어, 최초 입력 노드로부터 거리가 n인 노드들의 집합은, n 레이어를 구성할 수 있다. 최초 입력 노드로부터 거리는, 최초 입력 노드로부터 해당 노드까지 도달하기 위해 거쳐야 하는 링크들의 최소 개수에 의해 정의될 수 있다. 그러나, 이러한 레이어의 정의는 설명을 위한 임의적인 것으로서, 신경망 내에서 레이어의 차수는 상술한 것과 상이한 방법으로 정의될 수 있다. 예를 들어, 노드들의 레이어는 최종 출력 노드로부터 거리에 의해 정의될 수도 있다.A neural network may include one or more nodes. Some of the nodes constituting the neural network may configure one layer based on distances from the initial input node. For example, a set of nodes having a distance of n from the initial input node is You can configure n layers. The distance from the initial input node may be defined by the minimum number of links that must be passed to reach the corresponding node from the initial input node. However, the definition of such a layer is arbitrary for description, and the order of the layer in the neural network may be defined in a different way from the above. For example, a layer of nodes may be defined by a distance from the final output node.

최초 입력 노드는 신경망 내의 노드들 중 다른 노드들과의 관계에서 링크를 거치지 않고 데이터가 직접 입력되는 하나 이상의 노드들을 의미할 수 있다. 또는, 신경망 네트워크 내에서, 링크를 기준으로 한 노드 간의 관계에 있어서, 링크로 연결된 다른 입력 노드들 가지지 않는 노드들을 의미할 수 있다. 이와 유사하게, 최종 출력 노드는 신경망 내의 노드들 중 다른 노드들과의 관계에서, 출력 노드를 가지지 않는 하나 이상의 노드들을 의미할 수 있다. 또한, 히든 노드는 최초 입력 노드 및 최후 출력 노드가 아닌 신경망을 구성하는 노드들을 의미할 수 있다. 본 개시의 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수와 동일할 수 있으며, 입력 레이어에서 히든 레이어로 진행됨에 따라 노드의 수가 감소하다가 다시 증가하는 형태의 신경망일 수 있다. 또한, 본 개시의 다른 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수 보다 적을 수 있으며, 입력 레이어에서 히든 레이어로 진행됨에 따라 노드의 수가 감소하는 형태의 신경망일 수 있다. 또한, 본 개시의 또 다른 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수보다 많을 수 있으며, 입력 레이어에서 히든 레이어로 진행됨에 따라 노드의 수가 증가하는 형태의 신경망일 수 있다. 본 개시의 또 다른 일 실시예에 따른 신경망은 상술한 신경망들의 조합된 형태의 신경망일 수 있다.The initial input node may mean one or more nodes to which data is directly input without going through a link in a relationship with other nodes among nodes in the neural network. Alternatively, in a relationship between nodes based on a link in a neural network, it may mean nodes that do not have other input nodes connected by a link. Similarly, the final output node may refer to one or more nodes that do not have an output node in relation to other nodes among nodes in the neural network. In addition, the hidden node may mean nodes constituting the neural network other than the first input node and the last output node. The neural network according to an embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer may be the same as the number of nodes in the output layer, and the number of nodes decreases and then increases again as progresses from the input layer to the hidden layer. can Also, in the neural network according to another embodiment of the present disclosure, the number of nodes in the input layer may be less than the number of nodes in the output layer, and the number of nodes may be reduced as the number of nodes progresses from the input layer to the hidden layer. have. In addition, the neural network according to another embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer may be greater than the number of nodes in the output layer, and the number of nodes increases as the number of nodes progresses from the input layer to the hidden layer. can The neural network according to another embodiment of the present disclosure may be a neural network in a combined form of the aforementioned neural networks.

딥 뉴럴 네트워크(DNN: deep neural network, 심층신경망)는 입력레이어와 출력 레이어 외에 복수의 히든 레이어를 포함하는 신경망을 의미할 수 있다. 딥 뉴럴 네트워크를 이용하면 데이터의 잠재적인 구조(latent structures)를 파악할 수 있다. 즉, 사진, 글, 비디오, 음성, 음악의 잠재적인 구조(예를 들어, 어떤 물체가 사진에 있는지, 글의 내용과 감정이 무엇인지, 음성의 내용과 감정이 무엇인지 등)를 파악할 수 있다. 딥 뉴럴 네트워크는 컨볼루션 뉴럴 네트워크(CNN: convolutional neural network), 리커런트 뉴럴 네트워크(RNN: recurrent neural network), 오토 인코더(auto encoder), GAN(Generative Adversarial Networks), 제한 볼츠만 머신(RBM: restricted boltzmann machine), 심층 신뢰 네트워크(DBN: deep belief network), Q 네트워크, U 네트워크, 샴 네트워크 등을 포함할 수 있다. 전술한 딥 뉴럴 네트워크의 기재는 예시일 뿐이며 본 개시는 이에 제한되지 않는다.A deep neural network (DNN) may refer to a neural network including a plurality of hidden layers in addition to an input layer and an output layer. Deep neural networks can be used to identify the latent structures of data. In other words, it can identify the potential structure of photos, texts, videos, voices, and music (e.g., what objects are in the photos, what the text and emotions are, what the texts and emotions are, etc.) . Deep neural networks include convolutional neural networks (CNNs), recurrent neural networks (RNNs), auto encoders, generative adversarial networks (GANs), and restricted boltzmann machines (RBMs). machine), a deep trust network (DBN), a Q network, a U network, a Siamese network, and the like. The description of the deep neural network described above is only an example, and the present disclosure is not limited thereto.

뉴럴 네트워크는 교사 학습(supervised learning), 비교사 학습(unsupervised learning), 및 반교사학습(semi supervised learning) 중 적어도 하나의 방식으로 학습될 수 있다. 뉴럴 네트워크의 학습은 출력의 오류를 최소화하기 위한 것이다. 뉴럴 네트워크의 학습에서 반복적으로 학습 데이터를 뉴럴 네트워크에 입력시키고 학습 데이터에 대한 뉴럴 네트워크의 출력과 타겟의 에러를 계산하고, 에러를 줄이기 위한 방향으로 뉴럴 네트워크의 에러를 뉴럴 네트워크의 출력 레이어에서부터 입력 레이어 방향으로 역전파(backpropagation)하여 뉴럴 네트워크의 각 노드의 가중치를 업데이트 하는 과정이다. 교사 학습의 경우 각각의 학습 데이터에 정답이 라벨링되어있는 학습 데이터를 사용하며(즉, 라벨링된 학습 데이터), 비교사 학습의 경우는 각각의 학습 데이터에 정답이 라벨링되어 있지 않을 수 있다. 즉, 예를 들어 데이터 분류에 관한 교사 학습의 경우의 학습 데이터는 학습 데이터 각각에 카테고리가 라벨링 된 데이터 일 수 있다. 라벨링된 학습 데이터가 뉴럴 네트워크에 입력되고, 뉴럴 네트워크의 출력(카테고리)과 학습 데이터의 라벨이 비교함으로써 오류(error)가 계산될 수 있다. 다른 예로, 데이터 분류에 관한 비교사 학습의 경우 입력인 학습 데이터가 뉴럴 네트워크 출력과 비교됨으로써 오류가 계산될 수 있다. 계산된 오류는 뉴럴 네트워크에서 역방향(즉, 출력 레이어에서 입력 레이어 방향)으로 역전파 되며, 역전파에 따라 뉴럴 네트워크의 각 레이어의 각 노드들의 연결 가중치가 업데이트 될 수 있다. 업데이트 되는 각 노드의 연결 가중치는 학습률(learning rate)에 따라 변화량이 결정될 수 있다. 입력 데이터에 대한 뉴럴 네트워크의 계산과 에러의 역전파는 학습 사이클(epoch)을 구성할 수 있다. 학습률은 뉴럴 네트워크의 학습 사이클의 반복 횟수에 따라 상이하게 적용될 수 있다. 예를 들어, 뉴럴 네트워크의 학습 초기에는 높은 학습률을 사용하여 뉴럴 네트워크가 빠르게 일정 수준의 성능을 확보하도록 하여 효율성을 높이고, 학습 후기에는 낮은 학습률을 사용하여 정확도를 높일 수 있다.The neural network may be trained by at least one of supervised learning, unsupervised learning, and semi-supervised learning. The training of the neural network is to minimize the error in the output. In the training of a neural network, iteratively input the training data into the neural network, calculate the output of the neural network and the target error for the training data, and calculate the error of the neural network from the output layer of the neural network to the input layer in the direction to reduce the error. It is a process of updating the weight of each node in the neural network by backpropagation in the direction. In the case of teacher learning, learning data in which the correct answer is labeled in each learning data is used (ie, labeled learning data), and in the case of comparative learning, the correct answer may not be labeled in each learning data. That is, for example, learning data in the case of teacher learning related to data classification may be data in which categories are labeled in each of the learning data. The labeled training data is input to the neural network, and an error can be calculated by comparing the output (category) of the neural network with the label of the training data. As another example, in the case of comparison learning related to data classification, an error may be calculated by comparing the input training data with the neural network output. The calculated error is back propagated in the reverse direction (ie, from the output layer to the input layer) in the neural network, and the connection weight of each node of each layer of the neural network may be updated according to the back propagation. The change amount of the connection weight of each node to be updated may be determined according to a learning rate. The computation of the neural network on the input data and the backpropagation of errors can constitute a learning cycle (epoch). The learning rate may be applied differently according to the number of repetitions of the learning cycle of the neural network. For example, in the early stage of learning of a neural network, a high learning rate can be used to enable the neural network to quickly obtain a certain level of performance, thereby increasing efficiency, and using a low learning rate at a later stage of learning can increase accuracy.

뉴럴 네트워크의 학습에서 일반적으로 학습 데이터는 실제 데이터(즉, 학습된 뉴럴 네트워크를 이용하여 처리하고자 하는 데이터)의 부분집합일 수 있으며, 따라서, 학습 데이터에 대한 오류는 감소하나 실제 데이터에 대해서는 오류가 증가하는 학습 사이클이 존재할 수 있다. 과적합(overfitting)은 이와 같이 학습 데이터에 과하게 학습하여 실제 데이터에 대한 오류가 증가하는 현상이다. 예를 들어, 노란색 고양이를 보여 고양이를 학습한 뉴럴 네트워크가 노란색 이외의 고양이를 보고는 고양이임을 인식하지 못하는 현상이 과적합의 일종일 수 있다. 과적합은 머신러닝 알고리즘의 오류를 증가시키는 원인으로 작용할 수 있다. 이러한 과적합을 막기 위하여 다양한 최적화 방법이 사용될 수 있다. 과적합을 막기 위해서는 학습 데이터를 증가시키거나, 레귤라이제이션(regulaization), 학습의 과정에서 네트워크의 노드 일부를 생략하는 드롭아웃(dropout) 등의 방법이 적용될 수 있다.In the training of neural networks, in general, the training data may be a subset of real data (that is, data to be processed using the trained neural network), and thus, the error on the training data is reduced, but the error on the real data is reduced. There may be increasing learning cycles. Overfitting is a phenomenon in which errors on actual data increase by over-learning on training data as described above. For example, a phenomenon in which a neural network that has learned a cat by seeing a yellow cat does not recognize that it is a cat when it sees a cat other than yellow may be a type of overfitting. Overfitting can act as a cause of increasing errors in machine learning algorithms. In order to prevent such overfitting, various optimization methods can be used. In order to prevent overfitting, methods such as increasing training data, regulation, or dropout omitting a part of nodes in the network in the process of learning may be applied.

도 9는 본 개시의 일 실시예와 관련된 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하기 위한 방법을 구현하기 위한 모듈을 도시한다.9 illustrates a module for implementing a method for generating voice data corresponding to an in-game play situation related to an embodiment of the present disclosure.

본 개시의 일 실시예에 따르면, 컴퓨터 프로그램은 다음과 같은 모듈에 의해 구현될 수 있다.According to an embodiment of the present disclosure, a computer program may be implemented by the following modules.

본 개시의 일 실시예에 따르면, 상기 컴퓨터 프로그램은, 게임 데이터에 기초하여 음성 합성을 위한 메타 데이터를 생성하기 위한 모듈(710), 상기 메타 데이터 및 대사 텍스트를 음성 합성 모델에 입력시키기 위한 모듈(720) 및 상기 음성 합성 모델을 통해 합성된 음성 데이터를 생성하기 위한 모듈(730)을 포함할 수 있다.According to an embodiment of the present disclosure, the computer program includes a module 710 for generating metadata for speech synthesis based on game data, and a module for inputting the metadata and dialogue text into a speech synthesis model ( 720) and a module 730 for generating synthesized voice data through the voice synthesis model.

대안적으로, 클라이언트 단말로부터 음성 출력 제어 신호를 수신하는 경우, 상기 클라이언트 단말로 하여금 상기 게임 내 플레이 상황에 대응하는 음성 데이터를 출력하도록 야기시키기 위한 모듈을 더 포함할 수 있다.Alternatively, when receiving a voice output control signal from the client terminal, the module may further include a module for causing the client terminal to output voice data corresponding to the in-game play situation.

대안적으로, 상기 하나 이상의 시드 음성 데이터, 상기 하나 이상의 시드 음성 데이터 각각에 대응하는 텍스트 및 상기 하나 이상의 시드 음성 데이터에 연관된 메타 데이터에 기초하여 상기 음성 합성 모델을 학습시키기 위한 학습 데이터를 구축하기 위한 모듈을 더 포함할 수 있다.Alternatively, for constructing training data for training the speech synthesis model based on the one or more seed speech data, text corresponding to each of the one or more seed speech data, and metadata associated with the one or more seed speech data; It may further include a module.

대안적으로, 상기 음성 출력 모듈은, 상기 차원 복원 서브 모델의 출력인 스펙트로그램을 음성 재구성 알고리즘을 이용하여 합성된 음성 데이터를 생성할 수 있다. Alternatively, the speech output module may generate speech data synthesized using a speech reconstruction algorithm using a spectrogram that is an output of the dimensional reconstruction sub-model.

본 개시의 일 실시예에 따르면 게임 내 플레이 상황에 대응하는 음성 데이터를 생성하는 방법을 구현하기 위한 모듈은 컴퓨팅 프로그램을 구현하기 위한 수단, 회로 또는 로직에 의하여 구현될 수도 있다.According to an embodiment of the present disclosure, a module for implementing a method of generating voice data corresponding to an in-game play situation may be implemented by means, circuits or logic for implementing a computing program.

당업자들은 추가적으로 여기서 개시된 실시예들과 관련되어 설명된 다양한 예시적 논리적 블록들, 구성들, 모듈들, 회로들, 수단들, 로직들 및 알고리즘 단계들이 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양쪽 모두의 조합들로 구현될 수 있음을 인식해야 한다. 하드웨어 및 소프트웨어의 상호교환성을 명백하게 예시하기 위해, 다양한 예시적 컴포넌트들, 블록들, 구성들, 수단들, 로직들, 모듈들, 회로들, 및 단계들은 그들의 기능성 측면에서 일반적으로 위에서 설명되었다. 그러한 기능성이 하드웨어로 또는 소프트웨어로서 구현되는지 여부는 전반적인 시스템에 부과된 특정 어플리케이션(application) 및 설계 제한들에 달려 있다. 숙련된 기술자들은 각각의 특정 어플리케이션들을 위해 다양한 방법들로 설명된 기능성을 구현할 수 있으나, 그러한 구현의 결정들이 본 개시내용의 영역을 벗어나게 하는 것으로 해석되어서는 안된다.Those skilled in the art will further appreciate that the various illustrative logical blocks, configurations, modules, circuits, means, logics, and algorithm steps described in connection with the embodiments disclosed herein may be combined with electronic hardware, computer software, or combinations of both. It should be recognized that it can be implemented as To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

도 10은 본 개시의 일 실시예와 관련된 본 개시의 일 실시예들이 구현될 수 있는 예시적인 컴퓨팅 환경에 대한 간략하고 일반적인 개략도를 도시한다.10 shows a simplified, general schematic diagram of an exemplary computing environment in which one embodiment of the present disclosure may be implemented, in conjunction with one embodiment of the present disclosure.

본 개시가 일반적으로 하나 이상의 컴퓨터 상에서 실행될 수 있는 컴퓨터 실행가능 명령어와 관련하여 전술되었지만, 당업자라면 본 개시가 기타 프로그램 모듈들과 결합되어 및/또는 하드웨어와 소프트웨어의 조합으로 구현될 수 있다는 것을 잘 알 것이다.Although the present disclosure has been described above generally in the context of computer-executable instructions that may be executed on one or more computers, those skilled in the art will appreciate that the present disclosure may be implemented in combination with other program modules and/or in a combination of hardware and software. will be.

일반적으로, 프로그램 모듈은 특정의 태스크를 수행하거나 특정의 추상 데이터 유형을 구현하는 루틴, 프로시져, 프로그램, 컴포넌트, 데이터 구조, 기타 등등을 포함한다. 또한, 당업자라면 본 개시의 방법이 단일-프로세서 또는 멀티프로세서 컴퓨터 시스템, 미니컴퓨터, 메인프레임 컴퓨터는 물론 퍼스널 컴퓨터, 핸드헬드 컴퓨팅 장치, 마이크로프로세서-기반 또는 프로그램가능 가전 제품, 기타 등등(이들 각각은 하나 이상의 연관된 장치와 연결되어 동작할 수 있음)을 비롯한 다른 컴퓨터 시스템 구성으로 실시될 수 있다는 것을 잘 알 것이다.Generally, program modules include routines, procedures, programs, components, data structures, etc. that perform particular tasks or implement particular abstract data types. In addition, those skilled in the art will appreciate that the methods of the present disclosure can be applied to single-processor or multiprocessor computer systems, minicomputers, mainframe computers as well as personal computers, handheld computing devices, microprocessor-based or programmable consumer electronics, etc. (each of which is It will be appreciated that other computer system configurations may be implemented, including those that may operate in connection with one or more associated devices.

본 개시의 설명된 실시예들은 또한 어떤 태스크들이 통신 네트워크를 통해 연결되어 있는 원격 처리 장치들에 의해 수행되는 분산 컴퓨팅 환경에서 실시될 수 있다. 분산 컴퓨팅 환경에서, 프로그램 모듈은 로컬 및 원격 메모리 저장 장치 둘다에 위치할 수 있다.The described embodiments of the present disclosure may also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

컴퓨터는 통상적으로 다양한 컴퓨터 판독가능 매체를 포함한다. 컴퓨터에 의해 액세스 가능한 매체는 그 어떤 것이든지 컴퓨터 판독가능 매체가 될 수 있고, 컴퓨터 판독가능 매체는 컴퓨터 판독가능 저장 매체 및 컴퓨터 판독가능 전송 매체를 포함할 수 있다. 이러한 컴퓨터 판독가능 저장 매체는 휘발성 및 비휘발성 매체, 이동식 및 비-이동식 매체를 포함한다. 컴퓨터 판독가능 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보를 저장하는 임의의 방법 또는 기술로 구현되는 휘발성 및 비휘발성 매체, 이동식 및 비이동식 매체를 포함한다. 컴퓨터 판독가능 저장 매체는 RAM, ROM, EEPROM, 플래시 메모리 또는 기타 메모리 기술, CD-ROM, DVD(digital video disk) 또는 기타 광 디스크 저장 장치, 자기 카세트, 자기 테이프, 자기 디스크 저장 장치 또는 기타 자기 저장 장치, 또는 컴퓨터에 의해 액세스될 수 있고 원하는 정보를 저장하는 데 사용될 수 있는 임의의 기타 매체를 포함하지만, 이에 한정되지 않는다.Computers typically include a variety of computer-readable media. Any medium accessible by a computer may be a computer-readable medium, and the computer-readable medium may include a computer-readable storage medium and a computer-readable transmission medium. Such computer-readable storage media includes volatile and nonvolatile media, removable and non-removable media. Computer readable storage media includes volatile and nonvolatile media, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. A computer-readable storage medium may be RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disk (DVD) or other optical disk storage device, magnetic cassette, magnetic tape, magnetic disk storage device, or other magnetic storage device. device, or any other medium that can be accessed by a computer and used to store the desired information.

컴퓨터 판독가능 전송 매체는 통상적으로 반송파(carrier wave) 또는 기타 전송 메커니즘(transport mechanism)과 같은 피변조 데이터 신호(modulated data signal)에 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터등을 구현하는 정보 전달 매체를 포함한다. 피변조 데이터 신호라는 용어는 신호 내에 정보를 인코딩하도록 그 신호의 특성들 중 하나 이상을 설정 또는 변경시킨 신호를 의미한다. 제한이 아닌 예로서, 컴퓨터 판독가능 전송 매체는 유선 네트워크 또는 직접 배선 접속(direct-wired connection)과 같은 유선 매체, 그리고 음향, RF, 적외선, 기타 무선 매체와 같은 무선 매체를 포함한다. 상술된 매체들 중 임의의 것의 조합도 역시 컴퓨터 판독가능 전송 매체의 범위 안에 포함되는 것으로 한다.Computer-readable transmission media typically embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism. and information delivery media. The term modulated data signal means a signal in which one or more of the characteristics of the signal is set or changed so as to encode information in the signal. By way of example, and not limitation, computer-readable transmission media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also intended to be included within the scope of computer-readable transmission media.

컴퓨터(1102)를 포함하는 본 개시의 여러가지 측면들을 구현하는 예시적인 환경(1100)이 나타내어져 있으며, 컴퓨터(1102)는 처리 장치(1104), 시스템 메모리(1106) 및 시스템 버스(1108)를 포함한다. 시스템 버스(1108)는 시스템 메모리(1106)(이에 한정되지 않음)를 비롯한 시스템 컴포넌트들을 처리 장치(1104)에 연결시킨다. 처리 장치(1104)는 다양한 상용 프로세서들 중 임의의 프로세서일 수 있다. 듀얼 프로세서 및 기타 멀티프로세서 아키텍처도 역시 처리 장치(1104)로서 이용될 수 있다.An example environment 1100 implementing various aspects of the disclosure is shown including a computer 1102 , the computer 1102 including a processing unit 1104 , a system memory 1106 , and a system bus 1108 . do. A system bus 1108 couples system components, including but not limited to system memory 1106 , to the processing device 1104 . The processing device 1104 may be any of a variety of commercially available processors. Dual processor and other multiprocessor architectures may also be used as processing unit 1104 .

시스템 버스(1108)는 메모리 버스, 주변장치 버스, 및 다양한 상용 버스 아키텍처 중 임의의 것을 사용하는 로컬 버스에 추가적으로 상호 연결될 수 있는 몇 가지 유형의 버스 구조 중 임의의 것일 수 있다. 시스템 메모리(1106)는 판독 전용 메모리(ROM)(1110) 및 랜덤 액세스 메모리(RAM)(1112)를 포함한다. 기본 입/출력 시스템(BIOS)은 ROM, EPROM, EEPROM 등의 비휘발성 메모리(1110)에 저장되며, 이 BIOS는 시동 중과 같은 때에 컴퓨터(1102) 내의 구성요소들 간에 정보를 전송하는 일을 돕는 기본적인 루틴을 포함한다. RAM(1112)은 또한 데이터를 캐싱하기 위한 정적 RAM 등의 고속 RAM을 포함할 수 있다.The system bus 1108 may be any of several types of bus structures that may further interconnect a memory bus, a peripheral bus, and a local bus using any of a variety of commercial bus architectures. System memory 1106 includes read only memory (ROM) 1110 and random access memory (RAM) 1112 . A basic input/output system (BIOS) is stored in non-volatile memory 1110, such as ROM, EPROM, EEPROM, etc., the BIOS is the basic input/output system (BIOS) that helps transfer information between components within computer 1102, such as during startup. contains routines. RAM 1112 may also include high-speed RAM, such as static RAM, for caching data.

컴퓨터(1102)는 또한 내장형 하드 디스크 드라이브(HDD)(1114)(예를 들어, EIDE, SATA)―이 내장형 하드 디스크 드라이브(1114)는 또한 적당한 섀시(도시 생략) 내에서 외장형 용도로 구성될 수 있음―, 자기 플로피 디스크 드라이브(FDD)(1116)(예를 들어, 이동식 디스켓(1118)으로부터 판독을 하거나 그에 기록을 하기 위한 것임), 및 광 디스크 드라이브(1120)(예를 들어, CD-ROM 디스크(1122)를 판독하거나 DVD 등의 기타 고용량 광 매체로부터 판독을 하거나 그에 기록을 하기 위한 것임)를 포함한다. 하드 디스크 드라이브(1114), 자기 디스크 드라이브(1116) 및 광 디스크 드라이브(1120)는 각각 하드 디스크 드라이브 인터페이스(1124), 자기 디스크 드라이브 인터페이스(1126) 및 광 드라이브 인터페이스(1128)에 의해 시스템 버스(1108)에 연결될 수 있다. 외장형 드라이브 구현을 위한 인터페이스(1124)는 USB(Universal Serial Bus) 및 IEEE 1394 인터페이스 기술 중 적어도 하나 또는 그 둘다를 포함한다.The computer 1102 may also include an internal hard disk drive (HDD) 1114 (eg, EIDE, SATA) - this internal hard disk drive 1114 may also be configured for external use within a suitable chassis (not shown). Yes—a magnetic floppy disk drive (FDD) 1116 (eg, for reading from or writing to removable diskette 1118), and an optical disk drive 1120 (eg, a CD-ROM) for reading from, or writing to, disk 1122, or other high capacity optical media, such as DVD. The hard disk drive 1114 , the magnetic disk drive 1116 , and the optical disk drive 1120 are connected to the system bus 1108 by the hard disk drive interface 1124 , the magnetic disk drive interface 1126 , and the optical drive interface 1128 , respectively. ) can be connected to The interface 1124 for implementing an external drive includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.

이들 드라이브 및 그와 연관된 컴퓨터 판독가능 매체는 데이터, 데이터 구조, 컴퓨터 실행가능 명령어, 기타 등등의 비휘발성 저장을 제공한다. 컴퓨터(1102)의 경우, 드라이브 및 매체는 임의의 데이터를 적당한 디지털 형식으로 저장하는 것에 대응한다. 상기에서의 컴퓨터 판독가능 매체에 대한 설명이 HDD, 이동식 자기 디스크, 및 CD 또는 DVD 등의 이동식 광 매체를 언급하고 있지만, 당업자라면 집 드라이브(zip drive), 자기 카세트, 플래쉬 메모리 카드, 카트리지, 기타 등등의 컴퓨터에 의해 판독가능한 다른 유형의 매체도 역시 예시적인 운영 환경에서 사용될 수 있으며 또 임의의 이러한 매체가 본 개시의 방법들을 수행하기 위한 컴퓨터 실행가능 명령어를 포함할 수 있다는 것을 잘 알 것이다.These drives and their associated computer-readable media provide non-volatile storage of data, data structures, computer-executable instructions, and the like. In the case of computer 1102, drives and media correspond to storing any data in a suitable digital format. Although the description of computer readable media above refers to HDDs, removable magnetic disks, and removable optical media such as CDs or DVDs, those skilled in the art will use zip drives, magnetic cassettes, flash memory cards, cartridges, etc. It will be appreciated that other tangible computer-readable media such as etc. may also be used in the exemplary operating environment and any such media may include computer-executable instructions for performing the methods of the present disclosure.

운영 체제(1130), 하나 이상의 애플리케이션 프로그램(1132), 기타 프로그램 모듈(1134) 및 프로그램 데이터(1136)를 비롯한 다수의 프로그램 모듈이 드라이브 및 RAM(1112)에 저장될 수 있다. 운영 체제, 애플리케이션, 모듈 및/또는 데이터의 전부 또는 그 일부분이 또한 RAM(1112)에 캐싱될 수 있다. 본 개시가 여러가지 상업적으로 이용가능한 운영 체제 또는 운영 체제들의 조합에서 구현될 수 있다는 것을 잘 알 것이다.A number of program modules may be stored in the drive and RAM 1112 , including an operating system 1130 , one or more application programs 1132 , other program modules 1134 , and program data 1136 . All or portions of the operating system, applications, modules, and/or data may also be cached in RAM 1112 . It will be appreciated that the present disclosure may be implemented in various commercially available operating systems or combinations of operating systems.

사용자는 하나 이상의 유선/무선 입력 장치, 예를 들어, 키보드(1138) 및 마우스(1140) 등의 포인팅 장치를 통해 컴퓨터(1102)에 명령 및 정보를 입력할 수 있다. 기타 입력 장치(도시 생략)로는 마이크, IR 리모콘, 조이스틱, 게임 패드, 스타일러스 펜, 터치 스크린, 기타 등등이 있을 수 있다. 이들 및 기타 입력 장치가 종종 시스템 버스(1108)에 연결되어 있는 입력 장치 인터페이스(1142)를 통해 처리 장치(1104)에 연결되지만, 병렬 포트, IEEE 1394 직렬 포트, 게임 포트, USB 포트, IR 인터페이스, 기타 등등의 기타 인터페이스에 의해 연결될 수 있다.A user may enter commands and information into the computer 1102 via one or more wired/wireless input devices, for example, a pointing device such as a keyboard 1138 and a mouse 1140 . Other input devices (not shown) may include a microphone, IR remote control, joystick, game pad, stylus pen, touch screen, and the like. Although these and other input devices are connected to the processing unit 1104 through an input device interface 1142 that is often connected to the system bus 1108, parallel ports, IEEE 1394 serial ports, game ports, USB ports, IR interfaces, and the like may be connected by other interfaces.

모니터(1144) 또는 다른 유형의 디스플레이 장치도 역시 비디오 어댑터(1146) 등의 인터페이스를 통해 시스템 버스(1108)에 연결된다. 모니터(1144)에 부가하여, 컴퓨터는 일반적으로 스피커, 프린터, 기타 등등의 기타 주변 출력 장치(도시 생략)를 포함한다.A monitor 1144 or other type of display device is also coupled to the system bus 1108 via an interface, such as a video adapter 1146 . In addition to the monitor 1144, the computer typically includes other peripheral output devices (not shown), such as speakers, printers, and the like.

컴퓨터(1102)는 유선 및/또는 무선 통신을 통한 원격 컴퓨터(들)(1148) 등의 하나 이상의 원격 컴퓨터로의 논리적 연결을 사용하여 네트워크화된 환경에서 동작할 수 있다. 원격 컴퓨터(들)(1148)는 워크스테이션, 서버 컴퓨터, 라우터, 퍼스널 컴퓨터, 휴대용 컴퓨터, 마이크로프로세서-기반 오락 기기, 피어 장치 또는 기타 통상의 네트워크 노드일 수 있으며, 일반적으로 컴퓨터(1102)에 대해 기술된 구성요소들 중 다수 또는 그 전부를 포함하지만, 간략함을 위해, 메모리 저장 장치(1150)만이 도시되어 있다. 도시되어 있는 논리적 연결은 근거리 통신망(LAN)(1152) 및/또는 더 큰 네트워크, 예를 들어, 원거리 통신망(WAN)(1154)에의 유선/무선 연결을 포함한다. 이러한 LAN 및 WAN 네트워킹 환경은 사무실 및 회사에서 일반적인 것이며, 인트라넷 등의 전사적 컴퓨터 네트워크(enterprise-wide computer network)를 용이하게 해주며, 이들 모두는 전세계 컴퓨터 네트워크, 예를 들어, 인터넷에 연결될 수 있다.Computer 1102 may operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1148 via wired and/or wireless communications. Remote computer(s) 1148 may be workstations, server computers, routers, personal computers, portable computers, microprocessor-based entertainment devices, peer devices, or other common network nodes, and are generally Although including many or all of the components described, only memory storage device 1150 is shown for simplicity. The logical connections shown include wired/wireless connections to a local area network (LAN) 1152 and/or a larger network, eg, a wide area network (WAN) 1154 . Such LAN and WAN networking environments are common in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which can be connected to a worldwide computer network, for example, the Internet.

LAN 네트워킹 환경에서 사용될 때, 컴퓨터(1102)는 유선 및/또는 무선 통신 네트워크 인터페이스 또는 어댑터(1156)를 통해 로컬 네트워크(1152)에 연결된다. 어댑터(1156)는 LAN(1152)에의 유선 또는 무선 통신을 용이하게 해줄 수 있으며, 이 LAN(1152)은 또한 무선 어댑터(1156)와 통신하기 위해 그에 설치되어 있는 무선 액세스 포인트를 포함하고 있다. WAN 네트워킹 환경에서 사용될 때, 컴퓨터(1102)는 모뎀(1158)을 포함할 수 있거나, WAN(1154) 상의 통신 서버에 연결되거나, 또는 인터넷을 통하는 등, WAN(1154)을 통해 통신을 설정하는 기타 수단을 갖는다. 내장형 또는 외장형 및 유선 또는 무선 장치일 수 있는 모뎀(1158)은 직렬 포트 인터페이스(1142)를 통해 시스템 버스(1108)에 연결된다. 네트워크화된 환경에서, 컴퓨터(1102)에 대해 설명된 프로그램 모듈들 또는 그의 일부분이 원격 메모리/저장 장치(1150)에 저장될 수 있다. 도시된 네트워크 연결이 예시적인 것이며 컴퓨터들 사이에 통신 링크를 설정하는 기타 수단이 사용될 수 있다는 것을 잘 알 것이다.When used in a LAN networking environment, the computer 1102 is coupled to the local network 1152 through a wired and/or wireless communication network interface or adapter 1156 . Adapter 1156 may facilitate wired or wireless communication to LAN 1152 , which LAN 1152 also includes a wireless access point installed therein for communicating with wireless adapter 1156 . When used in a WAN networking environment, the computer 1102 may include a modem 1158 , connected to a communication server on the WAN 1154 , or otherwise establishing communications over the WAN 1154 , such as over the Internet. have the means A modem 1158 , which may be internal or external and a wired or wireless device, is coupled to the system bus 1108 via a serial port interface 1142 . In a networked environment, program modules described for computer 1102 , or portions thereof, may be stored in remote memory/storage device 1150 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communication link between the computers may be used.

컴퓨터(1102)는 무선 통신으로 배치되어 동작하는 임의의 무선 장치 또는 개체, 예를 들어, 프린터, 스캐너, 데스크톱 및/또는 휴대용 컴퓨터, PDA(portable data assistant), 통신 위성, 무선 검출가능 태그와 연관된 임의의 장비 또는 장소, 및 전화와 통신을 하는 동작을 한다. 이것은 적어도 Wi-Fi 및 블루투스 무선 기술을 포함한다. 따라서, 통신은 종래의 네트워크에서와 같이 미리 정의된 구조이거나 단순하게 적어도 2개의 장치 사이의 애드혹 통신(ad hoc communication)일 수 있다.The computer 1102 may be associated with any wireless device or object that is deployed and operates in wireless communication, for example, a printer, scanner, desktop and/or portable computer, portable data assistant (PDA), communication satellite, wireless detectable tag. It operates to communicate with any device or place, and phone. This includes at least Wi-Fi and Bluetooth wireless technologies. Accordingly, the communication may be a predefined structure as in a conventional network or may simply be an ad hoc communication between at least two devices.

Wi-Fi(Wireless Fidelity)는 유선 없이도 인터넷 등으로의 연결을 가능하게 해준다. Wi-Fi는 이러한 장치, 예를 들어, 컴퓨터가 실내에서 및 실외에서, 즉 기지국의 통화권 내의 아무 곳에서나 데이터를 전송 및 수신할 수 있게 해주는 셀 전화와 같은 무선 기술이다. Wi-Fi 네트워크는 안전하고 신뢰성있으며 고속인 무선 연결을 제공하기 위해 IEEE 802.11(a,b,g, 기타)이라고 하는 무선 기술을 사용한다. 컴퓨터를 서로에, 인터넷에 및 유선 네트워크(IEEE 802.3 또는 이더넷을 사용함)에 연결시키기 위해 Wi-Fi가 사용될 수 있다. Wi-Fi 네트워크는 비인가 2.4 및 5 GHz 무선 대역에서, 예를 들어, 11Mbps(802.11a) 또는 54 Mbps(802.11b) 데이터 레이트로 동작하거나, 양 대역(듀얼 대역)을 포함하는 제품에서 동작할 수 있다.Wi-Fi (Wireless Fidelity) makes it possible to connect to the Internet, etc. without a wired connection. Wi-Fi is a wireless technology such as cell phones that allows these devices, eg, computers, to transmit and receive data indoors and outdoors, ie anywhere within range of a base station. Wi-Fi networks use a radio technology called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, and high-speed wireless connections. Wi-Fi can be used to connect computers to each other, to the Internet, and to wired networks (using IEEE 802.3 or Ethernet). Wi-Fi networks may operate in unlicensed 2.4 and 5 GHz radio bands, for example, at 11 Mbps (802.11a) or 54 Mbps (802.11b) data rates, or in products that include both bands (dual band). have.

본 개시의 기술 분야에서 통상의 지식을 가진 자는 여기에 개시된 실시예들과 관련하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 프로세서들, 수단들, 회로들 및 알고리즘 단계들이 전자 하드웨어, (편의를 위해, 여기에서 "소프트웨어"로 지칭되는) 다양한 형태들의 프로그램 또는 설계 코드 또는 이들 모두의 결합에 의해 구현될 수 있다는 것을 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호 호환성을 명확하게 설명하기 위해, 다양한 예시적인 컴포넌트들, 블록들, 모듈들, 회로들 및 단계들이 이들의 기능과 관련하여 위에서 일반적으로 설명되었다. 이러한 기능이 하드웨어 또는 소프트웨어로서 구현되는지 여부는 특정한 애플리케이션 및 전체 시스템에 대하여 부과되는 설계 제약들에 따라 좌우된다. 본 개시의 기술 분야에서 통상의 지식을 가진 자는 각각의 특정한 애플리케이션에 대하여 다양한 방식들로 설명된 기능을 구현할 수 있으나, 이러한 구현 결정들은 본 개시의 범위를 벗어나는 것으로 해석되어서는 안 될 것이다.Those of ordinary skill in the art of the present disclosure will recognize that the various illustrative logical blocks, modules, processors, means, circuits, and algorithm steps described in connection with the embodiments disclosed herein include electronic hardware, (convenience For this purpose, it will be understood that it may be implemented by various forms of program or design code (referred to herein as "software") or a combination of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. A person skilled in the art of the present disclosure may implement the described functionality in various ways for each specific application, but such implementation decisions should not be interpreted as a departure from the scope of the present disclosure.

여기서 제시된 다양한 실시예들은 방법, 장치, 또는 표준 프로그래밍 및/또는 엔지니어링 기술을 사용한 제조 물품(article)으로 구현될 수 있다. 용어 "제조 물품"은 임의의 컴퓨터-판독가능 장치로부터 액세스 가능한 컴퓨터 프로그램, 캐리어, 또는 매체(media)를 포함한다. 예를 들어, 컴퓨터-판독가능 매체는 자기 저장 장치(예를 들면, 하드 디스크, 플로피 디스크, 자기 스트립, 등), 광학 디스크(예를 들면, CD, DVD, 등), 스마트 카드, 및 플래쉬 메모리 장치(예를 들면, EEPROM, 카드, 스틱, 키 드라이브, 등)를 포함하지만, 이들로 제한되는 것은 아니다. 또한, 여기서 제시되는 다양한 저장 매체는 정보를 저장하기 위한 하나 이상의 장치 및/또는 다른 기계-판독가능한 매체를 포함한다. 용어 "기계-판독가능 매체"는 명령(들) 및/또는 데이터를 저장, 보유, 및/또는 전달할 수 있는 무선 채널 및 다양한 다른 매체를 포함하지만, 이들로 제한되는 것은 아니다. The various embodiments presented herein may be implemented as methods, apparatus, or articles of manufacture using standard programming and/or engineering techniques. The term “article of manufacture” includes a computer program, carrier, or media accessible from any computer-readable device. For example, computer-readable media include magnetic storage devices (eg, hard disks, floppy disks, magnetic strips, etc.), optical disks (eg, CDs, DVDs, etc.), smart cards, and flash memory. devices (eg, EEPROMs, cards, sticks, key drives, etc.). Also, various storage media presented herein include one or more devices and/or other machine-readable media for storing information. The term “machine-readable medium” includes, but is not limited to, wireless channels and various other media that can store, hold, and/or convey instruction(s) and/or data.

제시된 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조는 예시적인 접근들의 일례임을 이해하도록 한다. 설계 우선순위들에 기반하여, 본 개시의 범위 내에서 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조가 재배열될 수 있다는 것을 이해하도록 한다. 첨부된 방법 청구항들은 샘플 순서로 다양한 단계들의 엘리먼트들을 제공하지만 제시된 특정한 순서 또는 계층 구조에 한정되는 것을 의미하지는 않는다.It is to be understood that the specific order or hierarchy of steps in the presented processes is an example of exemplary approaches. Based on design priorities, it is to be understood that the specific order or hierarchy of steps in the processes may be rearranged within the scope of the present disclosure. The appended method claims present elements of the various steps in a sample order, but are not meant to be limited to the specific order or hierarchy presented.

제시된 실시예들에 대한 설명은 임의의 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 개시를 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이며, 여기에 정의된 일반적인 원리들은 본 개시의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 개시는 여기에 제시된 실시예들로 한정되는 것이 아니라, 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다. The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the embodiments presented herein, but is to be construed in the widest scope consistent with the principles and novel features presented herein.

Claims

A computer program stored in a computer-readable storage medium, wherein the computer program, when executed on one or more processors, causes the one or more processors to perform the following operations for outputting voice data corresponding to an in-game play situation, , the operations are:
generating metadata for voice synthesis based on game data related to game build and image data related to a plurality of objects included in the game;
inputting the metadata and dialogue text into a speech synthesis model; and
generating speech data synthesized through the speech synthesis model;
includes,
The metadata is attribute indication information for generating the voice data, and the attribute indication information includes attribute information of each of one or more contents included in the game, relationship information between a plurality of objects existing in the game, and the plurality of pieces of information. Including at least one piece of information among the attribute information of each object,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
The metadata is
comprising at least one of image vector information, style vector information, and voice identification information,
A computer program stored on a computer-readable storage medium.

delete

3. The method of claim 2,
The image vector information is,
Includes instruction information for generating voice data generated based on the game image,
The style vector information is
including predetermined instructional information for each of the one or more contents included in the game; and
The voice identification information is
Containing information for identifying seed voice data for voice synthesis,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
The dialogue text is
It includes content information for generating the voice data, and is predetermined based on each of the one or more contents included in the game,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
when receiving a voice output control signal from the client terminal, causing the client terminal to output voice data corresponding to the in-game play situation;
further comprising,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
constructing training data for training the speech synthesis model based on the at least one seed speech data, text corresponding to each of the at least one seed speech data, and metadata associated with the at least one seed speech data;
further comprising,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
The speech synthesis model is
It includes a dimension reduction sub-model, a dimension restoration sub-model, an attention module and a voice output module, and is learned through learning data including the metadata, learning dialogue text, and learning target voice data,
A computer program stored on a computer-readable storage medium.

9. The method of claim 8,
The speech synthesis model is
By using the learning input text including the meta data and the learning dialogue text as an input of the dimension reduction sub-model, the dimension restoration sub-model is trained to output a learning spectrogram corresponding to the target speech data,
A computer program stored on a computer-readable storage medium.

9. The method of claim 8,
The dimensionality reduction sub-model is,
outputting language speech features from the dialogue text by inputting the metadata and the dialogue text as inputs;
A computer program stored on a computer-readable storage medium.

9. The method of claim 8,
The dimensional restoration sub-model is,
It contains one or more Recurrent Neural Networks (RNNs),
The one or more RNNs are
Outputting the spectrogram of the second timestamp by receiving the spectrogram of the first timestamp as an input,
A computer program stored on a computer-readable storage medium.

9. The method of claim 8,
The attention module is
generating association information between the phoneme of the dialogue text and the time step of the dimensional reconstruction sub-model,
A computer program stored on a computer-readable storage medium.

9. The method of claim 8,
The audio output module is
Generating speech data synthesized using a speech reconstruction algorithm using a spectrogram output from the dimensional reconstruction sub-model,
A computer program stored on a computer-readable storage medium.

A method of generating voice data corresponding to the in-game play situation,
generating metadata for voice synthesis based on game data related to game build and image data related to a plurality of objects included in the game;
inputting the meta data and the dialogue text into a speech synthesis model; and
generating speech data synthesized through the speech synthesis model;
includes, and
The metadata is attribute indication information for generating the voice data, and the attribute indication information includes attribute information of each of one or more contents included in the game, relationship information between a plurality of objects existing in the game, and the plurality of pieces of information. Including at least one piece of information among the attribute information of each object,
A method of generating voice data corresponding to in-game play situations.

A server for generating voice data corresponding to an in-game play situation, the server comprising:
a processor including one or more cores;
a memory storing program codes executable by the processor; and
a network unit for transmitting and receiving data to and from the game server and the client terminal;
including, and
Generate metadata for voice synthesis based on game data related to game build and image data related to a plurality of objects included in the game,
inputting the metadata and dialogue text into a speech synthesis model; and
Generates synthesized speech data through the speech synthesis model,
The metadata is attribute indication information for generating the voice data, and the attribute indication information includes attribute information of each of one or more contents included in the game, relationship information between a plurality of objects existing in the game, and the plurality of pieces of information. Including at least one piece of information among the attribute information of each object,
A server for generating voice data corresponding to the in-game play situation.