KR20220159571A

KR20220159571A - Deep-learning based jang-gi game service method and apparatus thereof

Info

Publication number: KR20220159571A
Application number: KR1020210067321A
Authority: KR
Inventors: 박정훈
Original assignee: 엔에이치엔클라우드 주식회사
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2022-12-05
Also published as: KR102595656B1

Abstract

A deep learning-based Janggi game service method according to an embodiment of the present invention is a method in which a game play server provides a deep-learning-based Janggi game service to a user terminal, and comprises the steps of: obtaining a training data set (Training data set) including chessboard states for a plurality of actions; extracting input features based on the acquired training data set; learning a deep-learning model based on the extracted input features; receiving a first chessboard state; and acquiring at least one of an action to be performed and win/loss prediction information, which is an action to be performed for the received first chess board state, based on the deep learning model. The present invention aims to provide a deep-learning-based Janggi game service method and device which utilize a trained deep learning model to predict actions and game outcomes based on game situations in a Janggi match.

Description

Long-term game service method and device based on deep learning

본 발명은 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치에 관한 것이다. 보다 상세하게는, 장기 게임 서비스의 딥러닝 모델을 학습시키고, 학습된 딥러닝 모델을 이용하여 장기 게임 서비스에서의 대국 상황에 따라서 수행할 액션 (action)과 대국 결과를 예측하는 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치에 관한 것이다. The present invention relates to a long-term game service method and apparatus based on deep learning. More specifically, a deep learning-based chess that learns a deep learning model of a chess game service and predicts the action to be performed and the match result according to the match situation in the chess game service using the learned deep learning model. It relates to a game service method and device therefor.

최근 들어, 통신 및 네트워크 기술의 발달에 따라서 유/무선 인터넷의 보급이 급격하게 증가됨에 따라 인터넷이라는 동질의 매체를 통하여 여러 종류의 서비스가 이루어지고 있다. BACKGROUND ART [0002] In recent years, as the spread of wired/wireless Internet has rapidly increased with the development of communication and network technology, various types of services have been provided through a homogeneous medium called the Internet.

특히, 게임 서비스는 인터넷을 통해 제공되는 서비스 중에서도 많은 사용자들이 이용하는 서비스로 다양한 게임들이 서비스되고 있다. In particular, the game service is a service used by many users among services provided through the Internet, and various games are provided.

그 중에서도 승리의 대가로 게임 포인트를 주고받는 대전(對戰) 형식의 게임들이 많이 서비스되고 있으며, '장기', '바둑', '고스톱' 또는 '체스' 등의 게임들은 많은 사용자들을 확보하고 있는 대중적인 게임들이다.　Among them, many battle-type games in which game points are exchanged in exchange for victory are being serviced, and games such as 'janggi', 'go', 'go-stop' or 'chess' have a large number of users. they are bad games

한편, 이러한 추세와 더불어 최근에는 사람이 아닌 프로그램된 인공지능 컴퓨터와 위와 같은 게임을 통해 대전을 수행할 수 있게 되었다. On the other hand, in addition to this trend, it has recently been possible to play a game with a programmed artificial intelligence computer rather than a human being through the above game.

더하여, 인공지능 컴퓨터가 사람 수준으로 대전을 수행할 수 있도록 인공지능 컴퓨터의 기력을 높이기 위한 연구가 활발하게 진행되고 있는 추세이다. In addition, there is a trend in which research is being actively conducted to enhance the power of artificial intelligence computers so that they can perform matches at the human level.

그러나 종래의 인공지능 컴퓨터는, 바둑이나 체스와 같이 좀 더 대중적인 게임 서비스에 특화되어 있는 실정이어서, 장기 게임 서비스에서 높은 성능을 가지는 인공지능 컴퓨터를 구현하기 위한 기술이 미비한 상황이다. However, conventional artificial intelligence computers are specialized for more popular game services such as Go or chess, so technology for implementing artificial intelligence computers with high performance in long-term game services is insufficient.

즉, 장기 게임 서비스는 장기의 기물(棋物)마다 제각기 독특하게 정해진 행마법(行馬法)에 의하여 바둑과 같은 다른 게임 서비스에 비해 가능한 액션의 수가 보다 방대하다는 특성 등과 같이, 장기 게임 서비스에서 특징적으로 구현되는 게임의 속성을 보다 소상하게 고려하며 그에 최적화된 딥러닝을 수행할 수 있는 인공지능 컴퓨터의 개발이 미흡한 실정이다. In other words, the long-term game service is characterized by the characteristic that the number of possible actions is greater than that of other game services such as Baduk due to the uniquely determined row magic for each piece of the long-term game. The development of an artificial intelligence computer capable of performing deep learning optimized for it by considering the properties of the implemented game in more detail is insufficient.

KRKR 2007371 2007371 B1B1

본 발명은, 장기 게임 서비스의 딥러닝 모델을 학습시키고, 학습된 딥러닝 모델을 이용하여 장기 게임 서비스에서의 대국 상황에 따라서 수행할 액션(action)과 대국 결과를 예측하는 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치를 제공하는데 그 목적이 있다. The present invention is a deep learning-based shogi game that learns a deep learning model of a shogi game service and predicts an action to be performed and a game result according to a game situation in a shogi game service using the learned deep learning model. Its purpose is to provide a service method and its device.

자세히, 본 발명은, 장기 게임 서비스에 특화되도록 딥러닝 모델을 학습시키는 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치를 제공하고자 한다. In detail, the present invention is to provide a deep learning-based long game service method and apparatus for learning a deep learning model to be specialized for long game service.

또한, 본 발명은, 학습된 딥러닝 모델을 이용하여 장기 게임 서비스에서의 대국 상황에 기반한 액션 별 승률, 해당 대국 상황에 대하여 수행할 액션과 그에 따른 대국 결과를 예측하는 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치를 제공하고자 한다. In addition, the present invention is a deep learning-based long-term game service that predicts the winning rate for each action based on the match situation in the long-term game service, the action to be performed for the match situation, and the result of the match using the learned deep learning model It is intended to provide a method and an apparatus therefor.

또한, 본 발명은, 예측된 수행 액션을 기반으로 장기 대국을 수행하는 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치를 제공하고자 한다.In addition, the present invention is to provide a deep learning-based long-term game service method and apparatus for performing a long-term game based on a predicted action.

다만, 본 발명 및 본 발명의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present invention and the embodiments of the present invention are not limited to the technical problems described above, and other technical problems may exist.

본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은, 게임 플레이 서버가 유저 단말기에 딥러닝 기반의 장기 게임 서비스를 제공하는 방법으로서, 복수의 액션(action) 별 장기판 상태를 포함하는 트레이닝 데이터 셋(Training data set)을 획득하는 단계; 상기 획득된 트레이닝 데이터 셋을 기초로 입력 특징(Input features)을 추출하는 단계; 상기 추출된 입력 특징을 기초로 딥러닝 모델(Deep-learning model)을 학습시키는 단계; 제1 장기판 상태를 수신하는 단계; 및 상기 딥러닝 모델을 기초로 상기 수신된 제1 장기판 상태에 대하여 수행할 액션인 수행액션 및 승패 예측정보 중 적어도 하나를 획득하는 단계를 포함한다. A deep learning-based long-term game service method according to an embodiment of the present invention is a method in which a game play server provides a deep-learning-based long-term game service to a user terminal. Obtaining a data set (Training data set); extracting input features based on the acquired training data set; learning a deep-learning model based on the extracted input features; Receiving a first chessboard state; and acquiring at least one of an action to be performed and win/loss prediction information, which is an action to be performed for the received first chessboard state, based on the deep learning model.

이때, 상기 입력 특징(Input features)은, 장기 기물(棋物)의 종류, 제1 플레이어의 최근 8개 액션에 대한 히스토리(history) 정보, 상기 제1 플레이어의 최근 8개 액션에 대한 기물 위치정보, 제2 플레이어의 최근 8개 액션에 대한 히스토리(history) 정보, 상기 제2 플레이어의 최근 8개 액션에 대한 기물 위치정보 및 현재 플레이어가 상기 제1 플레이어 또는 상기 제2 플레이어 중 어느 플레이어인지에 대한 차례 정보 중 적어도 일부를 포함한다. At this time, the input features include a type of long-term piece, history information on the last 8 actions of the first player, and piece location information on the last 8 actions of the first player. , History information on the last 8 actions of the second player, object location information on the last 8 actions of the second player, and whether the current player is the first player or the second player. It includes at least some of the sequence information.

또한, 상기 딥러닝 모델(Deep-learning model)을 학습시키는 단계는, 소정의 장기판 상태를 입력 데이터로 하고, 상기 입력된 소정의 장기판 상태에서 수행 가능한 액션인 후보액션 별 승률추정값, 상기 소정의 장기판 상태에 대하여 수행할 액션인 수행액션 및 상기 소정의 장기판 상태에 대한 승패 예측정보 중 적어도 하나를 출력 데이터로 하도록 상기 딥러닝 모델을 학습시키는 단계를 포함한다. In addition, in the step of learning the deep-learning model, the predetermined chess board state is used as input data, and the odds estimate value for each candidate action, which is an action that can be performed in the input predetermined chess board state, and the predetermined chess board state and learning the deep learning model so that at least one of an action to be performed for the state and prediction information for a win or loss for the predetermined checkerboard state is output data.

또한, 상기 액션(action)은, 장기 기물(棋物) 별 행마법(行馬法)을 기초로 장기판 상에서 특정 기물을 움직여 착수시키는 행동이다. In addition, the action is an action of moving a specific piece on a chess board based on a magic trick for each chess piece.

또한, 상기 액션(action)은, 궁성 외 이동액션, 마(馬) 이동액션, 상(象) 이동액션, 궁성 내 이동액션 및 패스(pass) 액션을 포함한다. In addition, the action includes a movement action outside the palace, a horse movement action, a statue movement action, a movement action within the palace, and a pass action.

또한, 상기 제1 장기판 상태에 대한 수행액션 및 승패 예측정보 중 적어도 하나를 획득하는 단계는, 상기 제1 장기판 상태를 상기 딥러닝 모델에 입력하고, 상기 제1 장기판 상태에서 수행 가능한 액션인 후보액션 별 승률추정값을 획득하는 단계를 포함한다. In addition, the obtaining of at least one of the action performed for the first checkerboard state and win/loss prediction information may include inputting the first checkerboard state to the deep learning model, and selecting a candidate action that is an action that can be performed in the first checkerboard state. and obtaining an odds ratio estimation value for each.

또한, 상기 제1 장기판 상태에 대한 수행액션 및 승패 예측정보 중 적어도 하나를 획득하는 단계는, 상기 승률추정값이 가장 높은 후보액션을 상기 제1 장기판 상태에 대한 수행액션으로 결정하는 단계를 더 포함한다. In addition, the obtaining of at least one of the action performed for the first chess board state and win/loss prediction information further includes determining a candidate action having the highest odds ratio estimation value as the action performed for the first chess board state. .

또한, 상기 제1 장기판 상태에 대한 수행액션 및 승패 예측정보 중 적어도 하나를 획득하는 단계는, 상기 결정된 수행액션이 수행됨에 따른 승패 여부를 기초로 상기 제1 장기판 상태에 대한 상기 승패 예측정보를 획득하는 단계를 더 포함한다. In addition, the obtaining of at least one of an action performed for the first chessboard state and win/loss prediction information may include obtaining the win/loss prediction information for the first chess board state based on whether the determined action is performed and whether the game is won or lost. It further includes the steps of

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은, 상기 제1 장기판 상태에 대한 상기 수행액션을 수행하는 단계를 더 포함한다. In addition, the deep learning-based chess game service method according to an embodiment of the present invention further includes performing the action for the first chessboard state.

한편, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 장치는, 장기판 상태를 수신하는 통신부; 딥러닝 모델을 저장하는 메모리; 및 상기 수신된 장기판 상태를 기초로 상기 딥러닝 모델을 학습시키고, 상기 딥러닝 모델을 기초로 상기 장기판 상태에서 가능한 복수의 후보액션 중 가장 높은 승률추정값을 가지는 액션인 수행액션과, 상기 수행액션이 수행됨에 따른 상기 장기판 상태에 대한 승패 예측정보 중 적어도 하나를 획득하는 프로세서;를 포함하는 것을 특징으로 한다. On the other hand, a long-term game service device based on deep learning according to an embodiment of the present invention includes a communication unit that receives a state of a long-term board; memory for storing deep learning models; and learning the deep learning model based on the received chess board state, and performing an action having the highest odds estimation value among a plurality of candidate actions possible in the chess board state based on the deep learning model, and the performing action and a processor for acquiring at least one of prediction information about a win or loss for the state of the chess board as it is performed.

본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치는, 장기 게임 서비스에 특화되도록 딥러닝 모델을 학습시킴으로써 장기 게임 서비스에서 특징적으로 구현되는 게임의 속성을 보다 소상하게 고려하며 그에 최적화된 딥러닝을 수행할 수 있는 인공지능 컴퓨터를 구현할 수 있다. A long-term game service method and apparatus based on deep learning according to an embodiment of the present invention considers the characteristics of a game characteristically implemented in a long-term game service in more detail by learning a deep learning model to be specialized for the long-term game service, and accordingly It is possible to implement an artificial intelligence computer capable of performing optimized deep learning.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치는, 학습된 딥러닝 모델을 이용하여 장기 게임 서비스에서의 대국 상황에 기반한 액션 별 승률을 예측하고, 이를 기초로 해당 대국 상황에 대하여 수행할 액션과 그에 따른 대국 결과를 예측함으로써 장기 게임 서비스에 대한 인공지능 컴퓨터의 성능을 보다 향상시킬 수 있다. In addition, the deep learning-based long-term game service method and apparatus according to an embodiment of the present invention predict the winning rate for each action based on the game situation in the long-term game service using the learned deep learning model, and based on this, the corresponding The performance of artificial intelligence computers for long-term game services can be further improved by predicting the actions to be performed in the game situation and the results of the game accordingly.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치는, 예측된 수행 액션을 기반으로 장기 대국을 수행함으로써 인공지능 컴퓨터의 장기 대국 수준을 증진시킬 수 있다. In addition, the deep learning-based long-term game service method and apparatus according to an embodiment of the present invention can improve the level of long-term games of artificial intelligence computers by performing long-term games based on predicted actions.

다만, 본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 명확하게 이해될 수 있다. However, the effects obtainable in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood from the description below.

도 1은 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 시스템에 대한 개념도이다.
도 2는 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스에서 인공지능 컴퓨터의 액션(action)을 위한 게임 플레이 서버의 액션 모델 구조를 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른 액션 모델이 탐색부의 파이프 라인에 따라서 액션을 수행하는 과정을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스를 제공하는 방법을 설명하기 위한 흐름도이다.
도 5는 본 발명의 실시예에 따른 입력 특징(Input features)을 설명하기 위한 도면의 일례이다.
도 6은 본 발명의 실시예에 따른 장기 기물 기반의 가능한 액션 수(number of possible actions)를 설명하기 위한 도면이다.
도 7은 본 발명의 실시예에 따른 액션 모델이 제1 장기판 상태에 기반한 딥러닝을 수행하여 출력 데이터를 제공하는 모습을 나타내는 도면의 일례이다.
도 8은 본 발명의 실시예에 따른 액션 모델이 제1 장기판 상태에 대하여 결정된 수행 액션을 수행하는 모습을 나타내는 도면의 일례이다. 1 is a conceptual diagram of a long-term game service system based on deep learning according to an embodiment of the present invention.
2 is a diagram for explaining the action model structure of a game play server for an action of an artificial intelligence computer in a long-term game service based on deep learning according to an embodiment of the present invention.
3 is a diagram for explaining a process in which an action model according to an embodiment of the present invention performs an action according to a pipeline of a search unit.
4 is a flowchart illustrating a method of providing a long-term game service based on deep learning according to an embodiment of the present invention.
5 is an example of a diagram for explaining input features according to an embodiment of the present invention.
6 is a diagram for explaining the number of possible actions based on long-term objects according to an embodiment of the present invention.
7 is an example of a diagram showing how an action model according to an embodiment of the present invention provides output data by performing deep learning based on a first chessboard state.
8 is an example of a diagram illustrating a state in which an action model according to an embodiment of the present invention performs an action determined for a first chessboard state.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. 이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 또한, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 또한, 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.Since the present invention can apply various transformations and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. Effects and features of the present invention, and methods for achieving them will become clear with reference to the embodiments described later in detail together with the drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various forms. In the following embodiments, terms such as first and second are used for the purpose of distinguishing one component from another component without limiting meaning. Also, expressions in the singular number include plural expressions unless the context clearly dictates otherwise. In addition, terms such as include or have mean that features or elements described in the specification exist, and do not preclude the possibility that one or more other features or elements may be added. In addition, in the drawings, the size of components may be exaggerated or reduced for convenience of explanation. For example, since the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of description, the present invention is not necessarily limited to the illustrated bar.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and when describing with reference to the drawings, the same or corresponding components are assigned the same reference numerals, and overlapping descriptions thereof will be omitted. .

도 1은 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 시스템에 대한 개념도이다. 1 is a conceptual diagram of a long-term game service system based on deep learning according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 딥러닝 기반의 장기(將棋) 게임 서비스 시스템(10)은, 단말기(100), 장기서버(200), 게임 플레이 서버(300) 및 네트워크(400)를 포함할 수 있다. Referring to FIG. 1, a deep learning-based chess game service system 10 according to an embodiment of the present invention includes a terminal 100, a chess server 200, a game play server 300, and a network ( 400) may be included.

도 1의 각 구성요소는, 네트워크(400)를 통해 연결될 수 있다. 단말기(100), 장기서버(200) 및/또는 게임 플레이 서버(300) 등과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다. Each component of FIG. 1 may be connected through a network 400 . It means a connection structure capable of exchanging information between nodes such as the terminal 100, the long-term server 200, and/or the game play server 300. An example of such a network is the 3rd Generation Partnership Project (3GPP) network, LTE (Long Term Evolution) network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN( Personal Area Network), Bluetooth network, satellite broadcasting network, analog broadcasting network, DMB (Digital Multimedia Broadcasting) network, etc. are included, but are not limited thereto.

- 단말기(100: Terminal) - Terminal (100: Terminal)

먼저, 단말기(100)는, 장기 게임 서비스를 제공받고자 하는 유저의 단말기이다. 또한, 단말기(100)는 다양한 작업을 수행하는 애플리케이션들을 실행하기 위한 유저가 사용하는 하나 이상의 컴퓨터 또는 다른 전자 장치이다. First, the terminal 100 is a terminal of a user who wants to receive a long-term game service. Also, the terminal 100 is one or more computers or other electronic devices used by a user to execute applications that perform various tasks.

예컨대, 컴퓨터, 랩탑 컴퓨터, 스마트 폰, 모바일 전화기, PDA, 태블릿 PC, 혹은 장기서버(200) 및/또는 게임 플레이 서버(300)와 통신하도록 동작 가능한 임의의 다른 디바이스를 포함한다. For example, a computer, laptop computer, smart phone, mobile phone, PDA, tablet PC, or any other device operable to communicate with organ server 200 and/or game play server 300.

다만 이에 한정되는 것은 아니고 단말기(100)는 다양한 머신들 상에서 실행되고, 다수의 메모리 내에 저장된 명령어들을 해석하여 실행하는 프로세싱 로직을 포함하고, 외부 입력/출력 디바이스 상에 그래픽 사용자 인터페이스(GUI)를 위한 그래픽 정보를 디스플레이하는 프로세스들과 같이 다양한 기타 요소들을 포함할 수 있다. However, it is not limited thereto, and the terminal 100 is executed on various machines, includes processing logic that interprets and executes commands stored in a plurality of memories, and provides a graphical user interface (GUI) on an external input/output device. It may contain various other elements, such as processes that display graphical information.

아울러 단말기(100)는 입력 장치(예를 들면 마우스, 키보드, 터치 감지 표면 등) 및 출력 장치(예를 들면 디스플레이장치, 모니터, 스크린 등)에 접속될 수 있다. In addition, the terminal 100 may be connected to an input device (eg, mouse, keyboard, touch sensitive surface, etc.) and an output device (eg, display device, monitor, screen, etc.).

단말기(100)에 의해 실행되는 애플리케이션들은 게임 애플리케이션, 웹 브라우저, 웹 브라우저에서 동작하는 웹 애플리케이션, 워드 프로세서들, 미디어 플레이어들, 스프레드시트들, 이미지 프로세서들, 보안 소프트웨어 또는 그 밖의 것을 포함할 수 있다.Applications executed by the terminal 100 may include game applications, web browsers, web applications running on web browsers, word processors, media players, spreadsheets, image processors, security software, or the like. .

또한, 단말기(100)는 명령들을 저장하는 적어도 하나의 메모리(101), 적어도 하나의 프로세서(102) 및 통신부(103)를 포함할 수 있다. In addition, the terminal 100 may include at least one memory 101 storing instructions, at least one processor 102 and a communication unit 103 .

단말기(100)의 메모리(101)는 단말기(100)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 단말기(100)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. The memory 101 of the terminal 100 may store a plurality of application programs or applications running in the terminal 100, data for operation of the terminal 100, and commands.

명령들은 프로세서(102)로 하여금 동작들을 수행하게 하기 위해 프로세서(102)에 의해 실행 가능하고, 동작들은 장기 게임 실행 요청 신호를 전송, 게임 데이터 송수신, 액션 정보 송수신, 승률 정보 송수신, 승패 예측정보 송수신, 게임 시간 정보 요청 및 게임 시간 정보 수신 등의 각종 정보를 송수신하는 동작들을 포함할 수 있다. Instructions are executable by the processor 102 to cause the processor 102 to perform operations, such as transmitting a long-term game play request signal, transmitting and receiving game data, transmitting and receiving action information, transmitting and receiving odds information, and transmitting and receiving prediction information for winning and losing. , operations for transmitting and receiving various types of information such as requesting game time information and receiving game time information.

또한, 메모리(101)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(130)는 인터넷(internet)상에서 상기 메모리(101)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다. In addition, the memory 101 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in terms of hardware, and the memory 130 performs the storage function of the memory 101 on the Internet. It can also be a web storage that performs.

단말기(100)의 프로세서(102)는 전반적인 동작을 제어하여 장기 게임 서비스를 제공받기 위한 데이터 처리를 수행할 수 있다. The processor 102 of the terminal 100 may perform data processing to receive a long-term game service by controlling overall operations.

단말기(100)에서 장기 게임 애플리케이션이 실행되면, 단말기(100)에서 장기 게임 환경이 구성된다. 그리고 장기 게임 애플리케이션은 네트워크(400)를 통해 장기서버(200)와 장기 게임 데이터를 교환하여 단말기(100) 상에서 장기 게임 서비스가 실행되도록 한다. When a long game application is executed in the terminal 100, a long game environment is configured in the terminal 100. Also, the long term game application exchanges long term game data with the long term server 200 through the network 400 so that the long term game service is executed on the terminal 100 .

이러한 프로세서(102)는 ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 임의의 형태의 프로세서일 수 있다.These processors 102 include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, and microcontrollers. It may be micro-controllers, microprocessors, or any type of processor for performing other functions.

단말기(100)의 통신부(103)는, 하기 통신방식(예를 들어, GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced) 등), WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access)에 따라 구축된 네트워크망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신할 수 있다. The communication unit 103 of the terminal 100 uses the following communication methods (eg, Global System for Mobile communication (GSM), Code Division Multi Access (CDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink (HSUPA)) Packet Access), LTE (Long Term Evolution), LTE-A (Long Term Evolution-Advanced), etc.), WLAN (Wireless LAN), Wi-Fi (Wireless-Fidelity), Wi-Fi (Wireless Fidelity) Direct, DLNA ( A wireless signal may be transmitted and received with at least one of a base station, an external terminal, and a server on a network constructed according to Digital Living Network Alliance (WiBro), Wireless Broadband (WiBro), and World Interoperability for Microwave Access (WiMAX).

이러한 단말기(100)는, 후술되는 장기서버(200) 및 게임 플레이 서버(300) 중 적어도 하나에서 수행되는 기능 동작의 적어도 일부를 수행할 수도 있다.Such a terminal 100 may perform at least a part of the functional operation performed in at least one of the long-term server 200 and the game play server 300 to be described later.

- 장기서버(200: JANG-GI server) - Long term server (200: JANG-GI server)

장기서버(200)가 제공하는 장기 게임 서비스는 장기서버(200)가 제공하는 가상의 컴퓨터 유저와 실제 유저가 함께 게임에 참여하는 형태로 구성될 수 있다. 이는 유저 측 단말기(100) 상에서 구현되는 장기 게임 환경에서 하나의 실제 유저와 하나의 컴퓨터 유저가 함께 게임을 플레이 한다. The long-term game service provided by the long-term server 200 may be configured in such a way that a virtual computer user provided by the long-term server 200 and a real user participate in a game together. In this, one real user and one computer user play the game together in a long-term game environment implemented on the user-side terminal 100 .

다른 측면에서, 장기서버(200)가 제공하는 장기 게임 서비스는 복수의 유저 측 디바이스가 참여하여 장기 게임이 플레이되는 형태로 구성될 수도 있다. In another aspect, the long-term game service provided by the long-term server 200 may be configured in a form in which a plurality of user-side devices participate to play the long-term game.

장기서버(200)는 명령들을 저장하는 적어도 하나의 메모리(201), 적어도 하나의 프로세서(202) 및 통신부(203)를 포함할 수 있다. The long-term server 200 may include at least one memory 201 storing instructions, at least one processor 202 and a communication unit 203.

장기서버(200)의 메모리(201)는 장기서버(200)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 장기서버(200)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. The memory 201 of the long-term server 200 may store a plurality of application programs or applications running in the long-term server 200, data for the operation of the long-term server 200, and commands. .

명령들은 프로세서(202)로 하여금 동작들을 수행하게 하기 위해 프로세서(202)에 의해 실행 가능하고, 동작들은 게임 실행 요청 신호 수신, 게임 데이터 송수신, 액션 정보 송수신, 승률 정보 송수신, 승패 예측정보 송수신, 게임 시간 정보 요청 및 게임 시간 정보 수신 등의 각종 전송 동작을 포함할 수 있다. The instructions are executable by the processor 202 to cause the processor 202 to perform operations, the operations being game play request signal reception, game data transmission and reception, action information transmission and reception, odds information transmission and reception, win/loss prediction information transmission and reception, game It may include various transmission operations such as requesting time information and receiving game time information.

또한, 메모리(201)는 장기서버(200)에서 대국을 하였던 복수의 장기 기보(이하, 기보) 또는 기존에 공개된 복수의 기보를 저장할 수 있다. 복수의 기보 각각은, 대국 시작의 첫 액션 정보인 제1 액션부터 대국이 종료되는 최종 액션까지의 정보를 모두 포함할 수 있다. 즉, 복수의 기보는 액션에 관한 히스토리 정보를 포함할 수 있다. 또한, 복수의 기보의 각 기보는 장기 대국에서의 액션 순서에 따른 각각의 장기판 상태(S)를 포함할 수 있다. 여기서, 상기 장기판 상태는, 장기판 상에 장기 기물(棋物)이 놓여져 있는 상태일 수 있다. In addition, the memory 201 may store a plurality of long-term notation (hereinafter referred to as notation) or a plurality of previously published notations that were played in the long-term server 200. Each of the plurality of notations may include all information from the first action, which is the first action information of the start of the game, to the final action in which the game ends. That is, the plurality of notations may include history information about actions. In addition, each notation of a plurality of notations may include each chessboard state (S) according to the order of actions in the chess game. Here, the chess board state may be a state in which chess pieces are placed on the chess board.

또한, 장기서버(200)는 게임 플레이 서버(300)의 트레이닝을 위하여 저장된 복수의 기보를 게임 플레이 서버(300)에 제공할 수 있게 한다. In addition, the long term server 200 enables the game play server 300 to provide a plurality of stored notations for training of the game play server 300 .

또한, 메모리(201)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(201)는 인터넷(internet)상에서 상기 메모리(201)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.In addition, the memory 201 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in terms of hardware, and the memory 201 performs the storage function of the memory 201 on the Internet. It can also be a web storage that performs.

장기서버(200)의 프로세서(202)는 전반적인 동작을 제어하여 장기 게임 서비스를 제공하기 위한 데이터 처리를 수행할 수 있다. 이러한 프로세서(202)는 ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 임의의 형태의 프로세서일 수 있다.The processor 202 of the long term server 200 may perform data processing to provide a long term game service by controlling overall operations. These processors 202 include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, and microcontrollers. It may be micro-controllers, microprocessors, or any type of processor for performing other functions.

장기서버(200)는 통신부(203)를 통해 네트워크(400)를 경유하여 단말기(100) 및 게임 플레이 서버(300)와 통신을 수행할 수 있다.The long term server 200 may communicate with the terminal 100 and the game play server 300 via the network 400 through the communication unit 203 .

- 게임 플레이 서버(300: Game playing server) - Game play server (300: Game playing server)

게임 플레이 서버(300)는, 별도의 클라우드 서버나 컴퓨팅 장치를 포함할 수 있다. 또한, 게임 플레이 서버(300)는 단말기(100)의 프로세서 또는 장기서버(200)의 데이터 처리부에 설치된 신경망 시스템일 수 있지만, 이하에서 게임 플레이 서버(300)는, 단말기(100) 또는 장기서버(200)와 별도의 장치로 설명한다.The game play server 300 may include a separate cloud server or a computing device. In addition, the game play server 300 may be a neural network system installed in the processor of the terminal 100 or the data processing unit of the long term server 200, but hereinafter, the game play server 300 is the terminal 100 or the long term server ( 200) and a separate device.

게임 플레이 서버(300)는 명령들을 저장하는 적어도 하나의 메모리(301), 적어도 하나의 프로세서(302) 및 통신부(303)를 포함할 수 있다. The game play server 300 may include at least one memory 301 storing instructions, at least one processor 302 and a communication unit 303 .

게임 플레이 서버(300)는 장기 규칙에 따라 스스로 학습하는 딥러닝 모델인 액션 모델(action model)을 구축하고 단말기(100)의 유저와 대국을 할 수 있는 인공지능 컴퓨터로써 자신의 턴에서 대국에서 이길 수 있도록 장기 게임에서의 액션을 수행할 수 있다. 게임 플레이 서버(300)가 액션 모델로 트레이닝하는 자세한 설명은 도 2 내지 도 6의 액션 모델에 관한 설명을 따른다.The game play server 300 builds an action model, which is a deep learning model that learns by itself according to long-term rules, and is an artificial intelligence computer capable of playing a game with the user of the terminal 100, and wins the game in its own turn. Actions in the chess game can be performed so that A detailed description of how the game play server 300 trains the action model follows the description of the action model of FIGS. 2 to 6 .

여기서, 실시예에 따른 액션(action)은, 장기 기물(이하, 기물) 별 행마법(行馬法)을 기초로 장기판 상에서 특정 기물을 움직여 착수시키는 행동을 의미할 수 있다. 이때, 상기 행마법은 기물들이 제각기 독특하게 정해진 규칙에 의해서 움직이는 것을 말한다. Here, an action according to an embodiment may mean an action of moving a specific piece on a chessboard based on a magic trick for each piece (hereinafter, piece). At this time, the moving magic means that the pieces move according to rules that are uniquely determined.

게임 플레이 서버(300)의 메모리(301)는 게임 플레이 서버(300)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 게임 플레이 서버(300)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(302)로 하여금 동작들을 수행하게 하기 위해 프로세서(302)에 의해 실행 가능하고, 동작들은 액션 모델 학습(트레이닝) 동작, 액션 정보 송수신, 액션 준비 시간 수신, 게임 시간 정보 수신 및 각종 전송 동작을 포함할 수 있다. The memory 301 of the game play server 300 stores a plurality of application programs or applications running in the game play server 300, data for the operation of the game play server 300, and instructions. can be saved Instructions are executable by the processor 302 to cause the processor 302 to perform operations, which include action model learning (training) operation, action information transmission and reception, action preparation time reception, game time information reception, and various transmissions. Actions may be included.

또한, 메모리(301)는 딥러닝 모델인 액션 모델을 저장할 수 있다. 또한, 메모리(301)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(301)는 인터넷(internet)상에서 상기 메모리(301)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.Also, the memory 301 may store an action model that is a deep learning model. In addition, the memory 301 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in terms of hardware, and the memory 301 performs the storage function of the memory 301 on the Internet. It can also be a web storage that performs.

게임 플레이 서버(300)의 프로세서(302)는 메모리(302)에 저장된 액션 모델을 독출하여, 구축된 신경망 시스템에 따라서 하기 기술하는 액션 모델 학습 및 장기 게임에서의 액션을 수행하게 된다. The processor 302 of the game play server 300 reads the action model stored in the memory 302 and, according to the built neural network system, learns the action model described below and performs actions in the long game.

실시예에 따라서 프로세서(302)는, 전체 유닛들을 제어하는 메인 프로세서와, 액션 모델에 따라서 신경망 구동 시 필요한 대용량의 연산을 처리하는 복수의 그래픽 프로세서(Graphics Processing Unit, GPU)를 포함하도록 구성될 수 있다. Depending on the embodiment, the processor 302 may be configured to include a main processor that controls all units and a plurality of graphics processing units (GPUs) that process large-capacity calculations required for driving a neural network according to an action model. have.

게임 플레이 서버(300)는 통신부(303)를 통해 네트워크(400)를 경유하여 장기서버(200)와 통신을 수행할 수 있다. The game play server 300 may communicate with the long-term server 200 via the network 400 through the communication unit 303 .

<액션 모델><action model>

본 발명의 실시예에 따른 액션 모델은, 소정의 장기판 상태를 입력 데이터로 하고, 상기 입력된 소정의 장기판 상태에서 수행 가능한 액션(action)인 후보액션 별 승률추정값(V), 적어도 하나 이상의 후보액션 중 대국에서 이기기 위한 액션으로 판단되는 수행액션(A) 및/또는 해당 대국의 결과 즉, 승패 여부를 예측한 승패 예측정보(O)를 출력 데이터로 할 수 있다. The action model according to an embodiment of the present invention takes a predetermined chess board state as input data, and an odds estimation value (V) for each candidate action, which is an action that can be performed in the input predetermined chess board state, and at least one candidate action An action to be performed (A) determined as an action to win in a major power and/or a result of the corresponding power, that is, win/loss prediction information (O) predicting a win or loss may be used as output data.

도 2는 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스에서 인공지능 컴퓨터의 액션(action)을 위한 게임 플레이 서버(300)의 액션 모델 구조를 설명하기 위한 도면이다. 2 is a diagram for explaining the action model structure of the game play server 300 for actions of an artificial intelligence computer in a long-term game service based on deep learning according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시예에 따른 액션 모델은 게임 플레이 서버(300)의 딥러닝 모델로서, 탐색부(310), 셀프 플레이부(320) 및 딥러닝 신경망(330)을 포함할 수 있다. Referring to FIG. 2 , an action model according to an embodiment of the present invention is a deep learning model of a game play server 300, and may include a search unit 310, a self play unit 320, and a deep learning neural network 330. can

자세히, 액션 모델은 탐색부(310), 셀프 플레이부(320) 및 딥러닝 신경망(330)을 이용하여 대국에서 이기기 위한 액션(action)을 결정하는 모델로 학습할 수 있다. In detail, the action model may be learned as a model for determining an action to win in a game using the search unit 310, the self-play unit 320, and the deep learning neural network 330.

구체적으로, 탐색부(350)는 딥러닝 신경망(340)의 가이드에 따라서 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 동작을 수행할 수 있다. MCTS는 모종의　의사 결정을 위한 체험적　탐색 알고리즘이다. 즉, 탐색부(350)는 딥러닝 신경망(340)이 제공하는 이동 확률값(p) 및/또는 승률추정값(V)에 기초하여 MCTS를 수행할 수 있다. 일 예로, 딥러닝 신경망(340)에 의해 가이드된 탐색부(350)는 MCTS를 수행하여 액션들에 대한 확률분포값인 탐색 확률값(

)을 출력할 수 있다. Specifically, the search unit 350 may perform a Monte Carlo Tree Search (MCTS) operation according to the guide of the deep learning neural network 340 . MCTS is a heuristic search algorithm for some kind of decision making. That is, the search unit 350 may perform MCTS based on the movement probability value (p) and/or the odds ratio estimation value (V) provided by the deep learning neural network 340 . For example, the search unit 350 guided by the deep learning neural network 340 performs MCTS to search probability values that are probability distribution values for actions (

) can be output.

셀프 플레이부(320)는 탐색 확률값(

)에 따라 스스로 장기 대국을 할 수 있다. 셀프 플레이부(320)는 게임의 승패가 결정되는 시점까지 스스로 장기 대국을 진행하고, 자가 대국(즉, self-play)이 종료되면 장기판 상태(S), 탐색 확률값(

), 자가 플레이 가치값(z)을 딥러닝 신경망(340)에 제공할 수 있다. 여기서 장기판 상태(S)는 장기판 상에 기물이 놓여진 상태일 수 있다. 자가 플레이 가치값(z)은 장기판 상태(S)에서 자가 대국을 하였을 때 승률 값이다. The self-play unit 320 provides a search probability value (

), you can play a long-term game yourself. The self-play unit 320 proceeds with the long-term game by itself until the game is decided, and when the self-play (ie, self-play) ends, the long-term board state (S), the search probability value (

), the self-play value (z) may be provided to the deep learning neural network 340 . Here, the chess board state (S) may be a state in which pieces are placed on the chess board. The self-play value value (z) is a win rate value when a player plays a game in the chessboard state (S).

딥러닝 신경망(340)은 이동 확률값(p)과 승률추정값(V)을 출력할 수 있다. 이동 확률값(p)은 장기판 상태(S)에 따라서 어떠한 액션을 수행하는 것이 게임을 이길 수 있는 좋은 액션인지를 수치로 나타낸 확률분포값일 수 있다. 승률추정값(V)은 해당 액션 수행 시의 승률을 나타낸다. 예를 들어, 이동 확률값(p)이 높은 액션이 좋은 액션일 수 있다. 딥러닝 신경망(340)은 이동 확률값(p)이 탐색 확률값(

)과 동일해지도록 트레이닝되고, 승률추정값(V)이 자가 플레이 가치값(z)과 동일해지도록 트레이닝될 수 있다. 이후 트레이닝된 딥러닝 신경망(340)은 탐색부(350)를 가이드하고, 탐색부(350)는 이전 탐색 확률값(

)보다 더 좋은 액션을 찾도록 액션 준비 시간 동안 MCTS를 진행하여 새로운 탐색 확률값(

)을 출력하게 한다. 예를 들어, 액션 준비 시간은 MCTS 진행 시간에 따라 평균 액션 준비 시간, 제1 액션 준비 시간 및 제2 액션 준비 시간 중 어느 하나의 액션 준비 시간을 따를 수 있다. 액션 준비 시간은 기본적으로 평균 액션 준비 시간으로 설정되어 있을 수 있다. The deep learning neural network 340 may output a movement probability value (p) and an odds ratio estimation value (V). The movement probability value p may be a probability distribution value representing numerically which action to perform according to the checkerboard state S is a good action to win the game. The odds ratio estimation value (V) represents the odds ratio when the corresponding action is performed. For example, an action having a high movement probability value p may be a good action. The deep learning neural network 340 determines that the movement probability value (p) is the search probability value (

), and the odds estimate value (V) may be trained to be equal to the self-play value value (z). Then, the trained deep learning neural network 340 guides the search unit 350, and the search unit 350 uses the previous search probability value (

) to find a better action than the new search probability value (

) to output. For example, the action preparation time may follow one action preparation time among an average action preparation time, a first action preparation time, and a second action preparation time according to the MCTS progress time. The action preparation time may be basically set to an average action preparation time.

셀프 플레이부(320)는 새로운 탐색 확률값(

)에 기초하여 장기판 상태(S)에 따른 새로운 자가 플레이 가치값(z)을 출력하고 장기판 상태(S), 새로운 탐색 확률값(

), 새로운 자가 플레이 가치값(z)을 딥러닝 신경망(340)에 제공할 수 있다. 딥러닝 신경망(340)은 이동 확률값(p)과 승률추정값(V)이 새로운 탐색 확률값(

)과 새로운 자가 플레이 가치값(z)으로 출력되도록 다시 트레이닝될 수 있다. The self-play unit 320 provides a new search probability value (

), outputs a new self-play value value (z) according to the checkerboard state (S), and outputs the checkerboard state (S), new search probability value (

), the new self-play value value z may be provided to the deep learning neural network 340 . The deep learning neural network 340 determines that the movement probability value (p) and the odds ratio estimation value (V) are a new search probability value (

) and a new self play value (z) can be trained again.

즉, 액션 모델은 이러한 과정을 반복하여 딥러닝 신경망(340)이 대국에서 이기기 위한 더 좋은 액션을 찾도록 트레이닝 될 수 있다. 일 예로, 액션 모델은 액션 손실(l)을 이용할 수 있다. 액션 손실(l)은 수학식 1과 같다.That is, the action model can be trained to find a better action to win the deep learning neural network 340 by repeating this process. As an example, the action model may use action loss (l). The action loss (l) is shown in Equation 1.

[수학식 1][Equation 1]

는 신경망의 파라미터이고, c는 매우 작은 상수이다.

is a parameter of the neural network, and c is a very small constant.

수학식 1의 액션 손실(l)에서 z와 v가 같아 지도록 하는 것은 평균 제곱 손실(mean square loss) 텀에 해당되고,

와 p가 같아 지도록 하는 것은 크로스 엔트로피 손실(cross entropy loss) 텀에 해당되고,

에 c를 곱하는 것은 정규화 텀으로 오버피팅(overfitting)을 방지하기 위한 것이다.Making z and v equal in the action loss (l) of Equation 1 corresponds to the mean square loss term,

and p are equal to the cross entropy loss term,

Multiplying by c is to prevent overfitting with the regularization term.

한편, 딥러닝 신경망(340)은 신경망 구조로 구성될 수 있다. 일 예로, 딥러닝 신경망(340)은 한 개의 컨볼루션(convolution) 블록과 19개의 레지듀얼(residual) 블록으로 구성될 수 있다. 컨볼루션 블록은 3X3 컨볼루션 레이어가 여러개 중첩된 형태일 있다. 하나의 레지듀얼 블록은 3X3 컨볼루션 레이어가 여러개 중첩되고 스킵 커넥션을 포함한 형태일 수 있다. 스킵 커넥션은 소정의 레이어의 입력이 해당 레이어의 출력값과 합하여서 출력되어 다른 레이어에 입력되는 구조이다. Meanwhile, the deep learning neural network 340 may have a neural network structure. For example, the deep learning neural network 340 may include one convolution block and 19 residual blocks. A convolution block may have a form in which several 3X3 convolution layers are overlapped. One residual block may have a form in which several 3X3 convolutional layers are overlapped and a skip connection is included. A skip connection is a structure in which an input of a predetermined layer is outputted by summing an output value of a corresponding layer and inputted to another layer.

또한, 딥러닝 신경망(340)의 입력은 기물의 종류, 제1 플레이어(예컨대, 청 플레이어)의 최근 8개의 액션에 대한 히스토리 및 제1 플레이어의 최근 8개의 액션에 대한 기물의 위치 정보와, 제2 플레이어(예컨대, 홍 플레이어)의 최근 8개의 액션에 대한 히스토리 및 제1 플레이어의 최근 8개의 액션에 대한 기물의 위치 정보와, 현재 플레이어가 제1 플레이어인지 제2 플레이어인지에 대한 차례 정보 등을 포함하는 114*10*9의 RGB 이미지일 수 있다. 이에 대한 자세한 설명은 도 2 내지 도 8의 액션 모델에 관한 설명을 따른다. In addition, the input of the deep learning neural network 340 is the type of the object, the history of the last 8 actions of the first player (eg, the blue player), the location information of the object for the last 8 actions of the first player, and the first player. The history of the last eight actions of the second player (eg, the red player) and the location information of the object for the last eight actions of the first player, and the turn information on whether the current player is the first player or the second player, etc. It may be a 114*10*9 RGB image including A detailed description of this follows the description of the action model of FIGS. 2 to 8 .

도 3은 본 발명의 실시예에 따른 액션 모델이 탐색부의 파이프 라인에 따라서 액션을 수행하는 과정을 설명하기 위한 도면이다. 3 is a diagram for explaining a process in which an action model according to an embodiment of the present invention performs an action according to a pipeline of a search unit.

도 3을 참조하면, 본 발명의 실시예에 따른 액션 모델은, 자신의 차례에서 딥러닝 신경망(330)과 탐색부(310)를 이용하여 소정의 액션을 수행할 수 있다. 자세히, 액션 모델은 탐색부(310), 셀프 플레이부(320) 및 딥러닝 신경망(330)을 이용하여 몬테 카를로 트리 서치(MCTS)를 수행할 수 있다. Referring to FIG. 3 , the action model according to the embodiment of the present invention may perform a predetermined action by using the deep learning neural network 330 and the search unit 310 in its own turn. In detail, the action model may perform Monte Carlo Tree Search (MCTS) using the search unit 310, the self-play unit 320, and the deep learning neural network 330.

구체적으로, 액션 모델은 선택 과정(a)을 통하여 현재 제1 장기판 상태(S1)에서 MCTS를 통해 탐색하지 않은 가지 중 활동 함수(Q)와 신뢰값(U)이 높은 액션을 가지는 제2 장기판 상태(S1-2)를 선택한다. 활동 함수(Q)는 해당 가지를 지날 때마다 산출된 승률추정값(V)들의 평균값이다. 신뢰값(U)은 해당 가지를 지나는 방문 횟수(N)에 반비례하고 이동 확률값(p)에 비례한다. Specifically, the action model is the second checkerboard state having an action having a high activity function (Q) and a high confidence value (U) among branches not searched through MCTS in the current first checkerboard state (S1) through the selection process (a). Select (S1-2). The activity function (Q) is an average value of odds ratio estimation values (V) calculated each time a corresponding branch is passed. The confidence value (U) is inversely proportional to the number of visits (N) passing through the corresponding branch and proportional to the movement probability value (p).

액션 모델은 확장과 평가 과정(b)을 통하여 선택된 액션에서의 제3 장기판 상태(S1-2-1)로 확장하고 이동 확률값(p)을 산출할 수 있다. The action model may be expanded to the third chessboard state (S1-2-1) in the action selected through the expansion and evaluation process (b) and calculate a movement probability value (p).

액션 모델은 상기 확장된 제3 장기판 상태(S1-2-1)의 승률추정값(V)을 산출하고 백업 과정(c)을 통하여 지나온 가지들의 활동 함수(Q), 방문 횟수(N), 이동 확률값(p)을 저장할 수 있다. The action model calculates the odds ratio estimation value (V) of the extended third chessboard state (S1-2-1), and the activity function (Q), number of visits (N), and movement probability value of the branches passed through the backup process (c) (p) can be stored.

액션 모델은 액션 준비 시간 동안 선택(a), 확장 및 평가(b), 백업(c) 과정을 반복하고 각 액션에 대한 방문 횟수(N)를 이용하여 확률 분포를 만들어서 탐색 확률값(

)을 출력할 수 있다. 액션 모델은 액션들 중 가장 높은 탐색 확률값(

)을 가지는 액션을 검출할 수 있다. 또한, 액션 모델은 상기 검출된 액션을 기초로 수행액션(A)으로 결정할 수 있다. 또한, 액션 모델은 결정된 수행액션(A)을 수행할 수 있다. 또한, 액션 모델은 결정된 일련의 수행액션(A)이 수행됨에 따른 대국의 결과 즉, 승패 여부를 판단한 승패 예측정보(O)를 제공할 수 있다. The action model repeats the process of selection (a), expansion and evaluation (b), and backup (c) during the action preparation time, and creates a probability distribution using the number of visits (N) for each action, resulting in a search probability value (

) can be output. The action model has the highest search probability value among actions (

) can be detected. Also, the action model may determine the action (A) based on the detected action. Also, the action model may perform the determined action (A). In addition, the action model may provide a result of a match according to the execution of the determined series of actions (A), that is, win/loss prediction information (O) for determining win/loss.

- 딥러닝 기반의 장기 게임 서비스 방법 - Long-term game service method based on deep learning

도 4는 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스를 제공하는 방법을 설명하기 위한 흐름도이다. 4 is a flowchart illustrating a method of providing a long-term game service based on deep learning according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스는, 게임 플레이 서버(300)가 장기 게임 서비스의 딥러닝 모델인 액션 모델을 학습시키고, 학습된 액션 모델을 이용하여 장기 게임 서비스에서의 대국 상황에 따라서 수행할 액션(action)과 대국 결과를 예측하게 할 수 있다. Referring to FIG. 4 , in the long-term game service based on deep learning according to an embodiment of the present invention, the game play server 300 learns an action model, which is a deep learning model of the long-term game service, and uses the learned action model. Depending on the match situation in the long-term game service, the action to be performed and the result of the match can be predicted.

구체적으로, 게임 플레이 서버(300)는 장기 게임 서비스에 특화되도록 액션 모델을 학습시킬 수 있다. 또한, 게임 플레이 서버(300)는 학습된 액션 모델을 이용하여 장기 게임 서비스에서의 대국 상황에 기반한 액션 별 승률, 해당 대국 상황에 대한 수행 액션과 그에 따른 대국 결과를 예측할 수 있다. 또한, 게임 플레이 서버(300)는 예측된 수행 액션을 기반으로 장기 대국을 수행할 수 있다. Specifically, the game play server 300 may train an action model to be specialized for a long-term game service. In addition, the game play server 300 may predict the winning rate for each action based on the game situation in the long-term game service, the action performed for the game situation, and the result of the match by using the learned action model. In addition, the game play server 300 may perform a long game based on the predicted action.

여기서, 실시예에 따른 상기 액션(action)은, 기물 별 행마법(行馬法)을 기초로 장기판 상에서 특정 기물을 움직여 착수시키는 행동을 의미할 수 있다. 이때, 상기 행마법은 기물들이 제각기 독특하게 정해진 규칙에 의해서 움직이는 것을 말한다. Here, the action according to the embodiment may refer to an action of moving a specific piece on a checkerboard based on a magic trick for each piece. At this time, the moving magic means that the pieces move according to rules that are uniquely determined.

자세히, 도 4를 더 참조하면, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은 게임 플레이 서버(300)가 트레이닝 데이터 셋(Training data set)을 수신하는 단계(S101)를 포함할 수 있다. In detail, further referring to FIG. 4, the deep learning-based long-term game service method according to an embodiment of the present invention includes a step (S101) of receiving a training data set by the game play server 300. can

보다 상세히, 게임 플레이 서버(300)는, 장기서버(200)와 연동하여 액션 모델의 학습을 위한 트레이닝 데이터 셋을 수신할 수 있다. 여기서, 트레이닝 데이터 셋은 장기서버(200)에서 대국을 하였던 복수의 기보 또는 기존에 공개된 복수의 기보를 포함할 수 있다. 이때, 상기 복수의 기보 각각은, 대국 시작의 첫 액션 정보인 제1 액션부터 대국이 종료되는 최종 액션까지의 정보를 모두 포함할 수 있다. 즉, 복수의 기보는 액션에 관한 히스토리 정보를 포함할 수 있다. 또한, 복수의 기보의 각 기보는 장기 대국에서의 액션 순서에 따른 각각의 장기판 상태(S)를 포함할 수 있다. 여기서, 상기 장기판 상태(S)는, 장기판 상에 기물이 놓여져 있는 상태일 수 있다. In more detail, the game play server 300 may receive a training data set for learning an action model in conjunction with the long-term server 200 . Here, the training data set may include a plurality of notations that have been played in the long-term server 200 or a plurality of notations previously published. At this time, each of the plurality of notation may include all information from the first action information of the first action of the start of the game to the final action that ends the game. That is, the plurality of notations may include history information about actions. In addition, each notation of a plurality of notations may include each chessboard state (S) according to the order of actions in the chess game. Here, the chess board state (S) may be a state in which pieces are placed on the chess board.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은 게임 플레이 서버(300)가 수신된 트레이닝 데이터 셋을 기초로 입력 특징(Input features)을 추출하는 단계(S103)를 포함할 수 있다. In addition, the deep learning-based long-term game service method according to an embodiment of the present invention may include extracting input features based on the training data set received by the game play server 300 (S103). have.

여기서, 상기 입력 특징은, 기물의 종류, 제1 플레이어(예컨대, 청 플레이어)의 최근 8개의 액션에 대한 히스토리 및 제1 플레이어의 최근 8개의 액션에 대한 기물의 위치 정보와, 제2 플레이어(예컨대, 홍 플레이어)의 최근 8개의 액션에 대한 히스토리 및 제1 플레이어의 최근 8개의 액션에 대한 기물의 위치 정보와, 현재 플레이어가 제1 플레이어인지 제2 플레이어인지에 대한 차례 정보 등을 포함하는 114*10*9의 RGB 이미지일 수 있다. Here, the input characteristics include the type of the object, the history of the last 8 actions of the first player (eg, the blue player) and the location information of the object for the last 8 actions of the first player, and the second player (eg, the blue player). , Hong player) 114* including history of the last 8 actions and location information of the object for the last 8 actions of the first player, turn information on whether the current player is the first player or the second player, etc. It may be a 10*9 RGB image.

도 5는 본 발명의 실시예에 따른 입력 특징(Input features)을 설명하기 위한 도면의 일례이다. 5 is an example of a diagram for explaining input features according to an embodiment of the present invention.

자세히, 도 5를 참조하면, 게임 플레이 서버(300)는 복수의 기보의 장기판 상태(S)에 기초하여 상술된 입력 특징을 추출하는 입력 특징 추출 프로세스를 수행할 수 있다. In detail, referring to FIG. 5 , the game play server 300 may perform an input feature extraction process of extracting the above-described input features based on the checkboard state (S) of a plurality of notations.

구체적으로, 게임 플레이 서버(300)는 상기 입력 특징 추출 프로세스를 수행하는 입력 특징 추출부를 더 포함할 수 있다. 일 예로, 입력 특징 추출부는 신경망 구조로 되어 있을 수 있으며 일종의 인코더를 포함할 수 있다. Specifically, the game play server 300 may further include an input feature extraction unit that performs the input feature extraction process. For example, the input feature extraction unit may have a neural network structure and may include a kind of encoder.

또한, 게임 플레이 서버(300)는 상기 입력 특징 추출부를 이용하여 상기 장기판 상태(S)에 대한 입력 특징을 추출할 수 있다. In addition, the game play server 300 may extract input features for the checkerboard state (S) using the input feature extraction unit.

이때, 게임 플레이 서버(300)는, 위와 같이 추출된 입력 특징을 이미지화하여 사용할 수 있다. 자세히, 게임 플레이 서버(300)는 추출된 적어도 하나 이상의 입력 특징을 딥러닝 모델(실시예에서, 액션 모델)의 입력 형태에 맞는 이미지 형태로 변환할 수 있다. At this time, the game play server 300 may use the extracted input feature as an image. In detail, the game play server 300 may convert one or more extracted input features into an image form suitable for an input form of a deep learning model (in an embodiment, an action model).

실시예로, 게임 플레이 서버(300)는, 현재 플레이어가 제1 플레이어인지 제2 플레이어인지를 나타내기 위한 차례 정보를 이미지로 변환하기 위하여, 제1 플레이어(예컨대, 청 플레이어)의 순서인 경우 상기 제1 플레이어의 색상(예컨대, 청색)을 가지는 복수의 기물들로 장기판이 모두 채워진 장기판 상태(S)의 이미지를 생성하여 상기 차례 정보를 나타낼 수 있다. 또는, 게임 플레이 서버(300)는, 제2 플레이어(예컨대, 홍 플레이어)의 순서인 경우 상기 제2 플레이어의 색상(예컨대, 홍색)을 가지는 복수의 기물들로 장기판이 모두 채워진 장기판 상태(S)의 이미지를 생성하여 상기 차례 정보를 나타낼 수 있다. In an embodiment, the game play server 300 converts order information for indicating whether the current player is the first player or the second player into an image, in the case of the first player (eg, the second player) The turn information may be indicated by generating an image of a checkerboard state S in which the checkerboard is all filled with a plurality of pieces having the color of the first player (eg, blue). Alternatively, the game play server 300, when it is the turn of the second player (eg, the red player), the checkerboard state (S) in which the checkerboard is all filled with a plurality of pieces having the color (eg, red) of the second player An image of may be generated to indicate the order information.

즉, 게임 플레이 서버(300)는 액션 모델에 대한 입력 데이터의 형태에 적합한 이미지 형태로 입력 특징들을 추출할 수 있다. That is, the game play server 300 may extract input features in the form of an image suitable for the form of input data for the action model.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은 게임 플레이 서버(300)가 추출된 입력 특징을 기초로 딥러닝 모델을 학습시키는 단계(S105)를 포함할 수 있다. In addition, the deep learning-based long-term game service method according to an embodiment of the present invention may include a step of allowing the game play server 300 to train a deep learning model based on the extracted input features (S105) .

자세히, 게임 플레이 서버(300)는 위와 같이 추출된 입력 특징을 기초로 딥러닝 기반의 액션 모델을 학습시킬 수 있다. 구체적으로, 게임 플레이 서버(300)는 이미지 형태를 가지는 복수의 입력 특징을 트레이닝 데이터 셋으로 하여 상기 액션 모델을 학습시킬 수 있다. In detail, the game play server 300 may train a deep learning-based action model based on the input features extracted as above. Specifically, the game play server 300 may train the action model by using a plurality of image-shaped input features as a training data set.

이때, 게임 플레이 서버(300)는 소정의 장기판 상태(S)를 입력 데이터로 하고, 상기 입력된 소정의 장기판 상태(S)에 기반한 적어도 하나 이상의 후보액션 별 승률추정값(V), 적어도 하나 이상의 후보액션 중 대국에서 이기기 위한 액션으로 판단되는 수행액션(A) 및/또는 해당 대국의 결과 즉, 대국의 승패 여부를 예측한 승패 예측정보(O)를 출력 데이터(Output features)로 제공하도록 액션 모델을 학습시킬 수 있다. At this time, the game play server 300 takes the predetermined chess board state (S) as input data, and the odds estimation value (V) for each of the at least one candidate action based on the input predetermined chess board state (S), and at least one candidate Among the actions, the action model is designed to provide the action (A) that is determined as an action to win the game and/or the result of the game, that is, the win/loss prediction information (O) that predicts whether the player wins or loses as output data (Output features). can be learned

실시예로, 게임 플레이 서버(300)는 액션 모델이 상기 추출된 입력 특징에 기초하여 복수의 기보에 따라서 수행된 액션 및 그에 따른 대국 결과를 획득하게 할 수 있다. 또한, 게임 플레이 서버(300)는 액션 모델이 상기 복수의 기보에 따라서 수행된 액션과 그에 따른 대국 결과에 기초하여 상술된 출력 데이터를 생성하게 할 수 있다. As an embodiment, the game play server 300 may cause the action model to obtain an action performed according to a plurality of notations and a result of the game based on the extracted input characteristics. In addition, the game play server 300 may cause the action model to generate the above-described output data based on the action performed according to the plurality of notations and the result of the game.

일 례로, 게임 플레이 서버(300)는 상기 복수의 기보에 기반한 입력 특징을 바탕으로, 소정의 제1 액션을 수행한 경우의 승패 여부를 판단하여 상기 제1 액션에 대한 승률추정값(V)을 추정하도록 상기 액션 모델을 학습시킬 수 있다. 또한, 게임 플레이 서버(300)는 상기 복수의 기보에 기반한 입력 특징을 바탕으로 복수의 후보액션 각각에 대한 승률추정값(V)을 추정하고, 가장 높은 승률추정값(V)을 가지는 후보액션을 수행액션(A)으로 결정하도록 상기 액션 모델을 학습시킬 수 있다. 또한, 게임 플레이 서버(300)는 상기 복수의 기보에 기반한 입력 특징을 바탕으로, 장기판 상태(S) 별 가장 높은 승률추정값(V)을 가지는 일련의 액션(실시예에서, 수행액션(A))을 수행함에 따른 대국의 승패 여부를 추정하여 승패 판단정보를 생성하도록 상기 액션 모델을 학습시킬 수 있다. For example, the game play server 300 based on the input characteristics based on the plurality of notations, determines whether or not to win or lose when performing a predetermined first action Estimate the odds ratio estimate (V) for the first action The action model can be trained to do so. In addition, the game play server 300 estimates the odds ratio estimate value (V) for each of the plurality of candidate actions based on the input characteristics based on the plurality of notations, and performs the candidate action having the highest odds ratio estimate value (V). The action model may be trained to determine (A). In addition, the game play server 300, based on the input characteristics based on the plurality of notations, a series of actions (in the embodiment, the action performed (A)) having the highest odds estimate value (V) for each board state (S) The action model may be trained to generate win/loss decision information by estimating whether a major country wins or loses according to performing.

이때, 다시 말하자면 본 발명의 실시예에 따른 액션은, 기물 별 행마법(行馬法)을 기초로 장기판 상에서 특정 기물을 움직여 착수시키는 행동을 의미할 수 있다. 여기서, 상기 행마법은 기물들이 제각기 독특하게 정해진 규칙에 의해서 움직이는 것을 말한다. In this case, in other words, the action according to the embodiment of the present invention may refer to an action of moving a specific piece on the checkerboard based on a magic trick for each piece. Here, the row magic refers to the movement of each piece according to a uniquely determined rule.

도 6은 본 발명의 실시예에 따른 장기 기물 기반의 가능한 액션 수(number of possible actions)를 설명하기 위한 도면이다. 6 is a diagram for explaining the number of possible actions based on long-term objects according to an embodiment of the present invention.

보다 구체적으로, 도 6을 참조하면, 위와 같은 액션은, 각 기물 별 행마법에 따라서 궁성 외 이동액션, 마(馬) 이동액션, 상(象) 이동액션, 궁성 내 이동액션 및 패스(pass) 액션을 포함할 수 있다. More specifically, referring to FIG. 6, the above actions are movement actions outside the palace, horse movement actions, statue movement actions, movement actions within the palace, and pass actions according to the magic of each piece. can include

자세히, 궁성 외 이동액션은, 장기판의 궁성 영역을 제외한 그라운드 영역에서, 차(車), 포(包) 또는 졸(卒)(또는 병(兵)) 기물이 행마법에 따라서 이동하는 액션을 포함할 수 있다. 또한, 마 이동액션은, 마(馬) 기물이 행마법에 따라서 이동하는 액션을 포함할 수 있다. 또한, 상 이동액션은, 상(象) 기물이 행마법에 따라서 이동하는 액션을 포함할 수 있다. 또한, 궁성 내 이동액션은, 궁성 영역에서 소정의 기물이 행마법을 따라서 이동하는 액션을 포함할 수 있다. 또한, 패스 액션은, 액션을 수행하지 않는 액션을 의미할 수 있다. In detail, the movement action outside the palace may include an action in which a car, gun, or pawn (or soldier) piece moves according to the magic method in the ground area excluding the palace area of the chessboard. can Also, the horse moving action may include an action in which a horse object moves according to a magic trick. Also, the phase movement action may include an action in which phase objects move according to a row magic. In addition, the moving action within the palace may include an action of moving a predetermined piece according to a magic trick in the palace area. Also, a pass action may mean an action that does not perform an action.

이때, 상기 각 기물 별 행마법에 따른 상기 액션의 수는, 총 2451개로 구현될 수 있다. 자세히, 상기 궁성 외 이동액션의 수는 궁성 영역을 제외한 그라운드 영역에서, 차(車), 포(包) 또는 졸(卒)(또는 병(兵)) 기물의 행마법에 따라서 가능한 이동 수의 총합으로 1530개일 수 있다. 또한, 상기 마 이동액션의 수는, 마(馬) 기물의 행마법에 따라서 가능한 이동 수의 총합으로 508개일 수 있다. 또한, 상기 상 이동액션의 수는, 상(象) 기물의 행마법에 따라서 가능한 이동 수의 총합으로 388개일 수 있다. 또한, 상기 궁성 내 이동액션의 수는, 궁성 영역에서의 가능한 이동 수의 총합으로 24개일 수 있다. 또한, 패스 액션의 수는, 액션을 수행하지 않는 액션 1개일 수 있다. 그리하여 상기 액션의 수는 총 2451개로 구현될 수 있다. At this time, the number of actions according to the row magic for each piece can be realized as a total of 2451. In detail, the number of movement actions other than the palace is the total number of moves possible according to the magic of the car, gun, or pawn (or soldier) piece in the ground area excluding the palace area. It can be 1530. In addition, the number of horse movement actions may be 508 as the sum of the number of moves possible according to the magic of moving the horse. In addition, the number of phase movement actions may be 388 as the total number of moves possible according to the row magic of phase objects. Also, the number of movement actions within the palace may be 24 as the total number of possible movements in the palace area. Also, the number of pass actions may be one action that does not perform an action. Thus, a total of 2451 actions may be implemented.

즉, 게임 플레이 서버(300)는, 상술된 바와 같이 장기 대국에 특화된 액션들 중에서 기보에 따라서 수행된 액션과 그에 따른 대국 결과를 기초로, 소정의 장기판 상태(S)를 입력 데이터로 하고 상기 입력된 소정의 장기판 상태(S)에 기반한 적어도 하나 이상의 후보액션 별 승률추정값(V), 수행액션(A) 및/또는 승패 예측정보(O)를 출력 데이터로 하는 액션 모델을 학습시킬 수 있다. That is, the game play server 300, as described above, based on the action performed according to the notation among the actions specific to the long game and the result of the game, the predetermined state of the board (S) as input data, and the input It is possible to learn an action model having, as output data, an odds ratio estimation value (V) for each of at least one candidate action based on a predetermined chess board state (S), an action to be performed (A), and/or win/loss prediction information (O).

이와 같이, 실시예에 따른 딥러닝 기반의 장기 게임 서비스 장치는 소정의 장기판 상태(S)로부터 장기 대국에 특화된 입력 특징을 추출하여 딥러닝 모델 학습에 사용할 수 있다. In this way, the deep learning-based long-term game service device according to the embodiment can extract input features specific to long-term games from a predetermined checkerboard state (S) and use them for deep learning model learning.

한편, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은 게임 플레이 서버(300)가 제1 장기판 상태를 수신하는 단계(S107)를 포함할 수 있다. Meanwhile, the deep learning-based long term game service method according to an embodiment of the present invention may include a step of receiving, by the game play server 300, a first check board state (S107) .

자세히, 게임 플레이 서버(300)는 장기서버(200)와 연동하여 소정의 제1 장기판 상태를 수신할 수 있다. In detail, the game play server 300 may receive a predetermined first chess board state in conjunction with the chess server 200 .

실시예로, 게임 플레이 서버(300)는 장기서버(200)와 연동하여 현재 진행 중인 대국에서의 현재 장기판 상태를 상기 제1 장기판 상태로 수신할 수 있다. As an example, the game play server 300 may receive the current chess board state in the current playing game as the first chess board state in conjunction with the chess server 200 .

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은 게임 플레이 서버(300)가 수신된 제1 장기판 상태에 기반한 딥러닝을 수행하는 단계(S109)를 포함할 수 있다. In addition, the deep learning-based long term game service method according to an embodiment of the present invention may include a step of performing deep learning based on the state of the first check board received by the game play server 300 (S109 ).

도 7은 본 발명의 실시예에 따른 액션 모델이 제1 장기판 상태에 기반한 딥러닝을 수행하여 출력 데이터를 제공하는 모습을 나타내는 도면의 일례이다. 7 is an example of a diagram showing how an action model according to an embodiment of the present invention provides output data by performing deep learning based on a first chessboard state.

자세히, 도 7을 참조하면, 게임 플레이 서버(300)는 액션 모델을 이용하여 상기 수신된 제1 장기판 상태(1)에 기반한 딥러닝을 수행할 수 있다. 또한, 게임 플레이 서버(300)는 수행된 딥러닝에 기초하여 상기 제1 장기판 상태(1)에서 수행 가능한 액션인 후보액션 별 승률추정값(V), 적어도 하나 이상의 후보액션 중 대국에서 이기기 위한 액션으로 판단되는 수행액션(A) 및/또는 해당 대국의 결과 즉, 승패 여부를 예측한 승패 예측정보(O)를 획득할 수 있다. In detail, referring to FIG. 7 , the game play server 300 may perform deep learning based on the received first chessboard state 1 using an action model. In addition, the game play server 300 determines the odds ratio estimation value (V) for each candidate action, which is an action that can be performed in the first chess board state (1) based on the deep learning performed, and an action to win the game among at least one candidate action. The determined action (A) and/or the result of the corresponding game, that is, win/loss prediction information (O) predicting win/loss may be acquired.

보다 상세히, 게임 플레이 서버(300)는 상기 수신된 제1 장기판 상태(1)를 액션 모델에 입력 데이터로 입력할 수 있다. In more detail, the game play server 300 may input the received first checkerboard state 1 to the action model as input data.

이때, 상기 제1 장기판 상태(1)를 입력받는 액션 모델은 복수의 기보를 포함하는 트레이닝 데이터 셋에 기반하여 학습된 딥러닝 모델일 수 있다. 그리하여 상기 액션 모델은 입력된 제1 장기판 상태(1)에 기초한 딥러닝을 수행해 상기 제1 장기판 상태(1)에 기반한 적어도 하나 이상의 후보액션 별 승률추정값(V), 수행액션(A) 및/또는 승패 예측정보(O)를 출력 데이터로 제공할 수 있다. At this time, the action model receiving the first chessboard state 1 may be a deep learning model learned based on a training data set including a plurality of notations. Thus, the action model performs deep learning based on the input first chessboard state 1 to obtain an odds estimate value V for each candidate action based on the first chess board state 1, an action to be performed (A), and/or Win-loss prediction information (O) may be provided as output data.

다시 돌아와서, 또한 게임 플레이 서버(300)는 상기 제1 장기판 상태(1)를 입력받은 액션 모델에서 수행된 딥러닝을 통해 출력된 상기 제1 장기판 상태(1)에 기반한 적어도 하나 이상의 후보액션 별 승률추정값(V), 수행액션(A) 및/또는 승패 예측정보(O)를 획득할 수 있다. Returning again, the game play server 300 also provides the winning rate for each of the at least one candidate action based on the first checkerboard state 1 output through deep learning performed in the action model receiving the first checkerboard state 1. An estimated value (V), an action to be performed (A), and/or win/loss prediction information (O) may be acquired.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은 게임 플레이 서버(300)가 위와 같이 수행된 딥러닝의 결과를 제공하는 단계(S111)를 포함할 수 있다. In addition, the deep learning-based long-term game service method according to an embodiment of the present invention may include a step (S111) of providing the result of the deep learning performed as above by the game play server 300.

자세히, 게임 플레이 서버(300)는 상술된 딥러닝을 통하여 획득된 제1 장기판 상태(1)에 대한 후보액션 별 승률추정값(V), 수행액션(A) 및/또는 승패 예측정보(O)를 유저의 단말기(100)로 송신할 수 있다. 이때, 상기 유저의 단말기(100)는 수신된 정보를 소정의 방식(예컨대, 장기판 상태(S)를 나타내는 장기판 이미지 상에 표시 등)에 따라서 출력하여 제공할 수 있다. In detail, the game play server 300 calculates the odds estimate value (V) for each candidate action for the first chess board state (1) obtained through the above-described deep learning, the action to be performed (A), and/or the win-loss prediction information (O). It can be transmitted to the user's terminal 100. At this time, the user's terminal 100 may output and provide the received information according to a predetermined method (eg, display on a checkerboard image indicating the checkerboard state S).

이와 같이, 실시예에 따른 딥러닝 기반의 장기 게임 서비스 장치는 현재 장기판 상태와 같은 특정 장기판 상태(S)에서 가능한 액션 별 승률 정보와, 해당 시점에서의 대국 승패를 예측한 정보 등을 유저에게 제공할 수 있다. In this way, the deep learning-based long-term game service device according to the embodiment provides the user with information on the odds of winning for each possible action in a specific checkerboard state (S), such as the current checkerboard state, and information predicting the game win or loss at that point in time. can do.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법은 게임 플레이 서버(300)가 위와 같이 수행된 딥러닝을 기초로 제1 장기판 상태(1)에 대한 액션(action)을 수행하는 단계(S113)를 포함할 수 있다. In addition, in the long-term game service method based on deep learning according to an embodiment of the present invention, the game play server 300 performs an action on the first check board state 1 based on the deep learning performed as described above. Step S113 may be included.

도 8은 본 발명의 실시예에 따른 액션 모델이 제1 장기판 상태(1)에 대하여 결정된 수행 액션을 수행하는 모습을 나타내는 도면의 일례이다. FIG. 8 is an example of a diagram showing how an action model according to an embodiment of the present invention performs an action determined for a first check board state (1).

자세히, 도 8을 참조하면, 게임 플레이 서버(300)는 수신된 제1 장기판 상태(1)에 대하여 수행된 딥러닝을 기초로, 상기 제1 장기판 상태(1)에서 대국에서 이기기 위한 최선의 액션 즉, 상기 제1 장기판 상태(1)에서 가능한 복수의 후보액션 중 가장 높은 승률을 가지는 액션이라고 판단되는 수행액션(A)을 상기 제1 장기판 상태(1)에 대한 액션으로 결정할 수 있다. 또한, 게임 플레이 서버(300)는 결정된 액션을 수행할 수 있다. 따라서, 이와 같이 실시예에 따른 딥러닝 기반의 장기 게임 서비스 장치는 해당하는 대국에서 이기기 위한 최선의 액션으로 판단되는 수행액션(A)을 기초로 장기 대국을 수행할 수 있다. In detail, referring to FIG. 8 , the game play server 300 performs the best action for winning the game in the first checker board state 1 based on the deep learning performed on the received first checker board state 1. That is, an action to be performed (A), which is determined to be an action having the highest winning rate among a plurality of candidate actions possible in the first chess board state (1), may be determined as an action for the first chess board state (1). Also, the game play server 300 may perform the determined action. Therefore, the long-term game service device based on deep learning according to the embodiment can perform a long-term game based on the action A determined as the best action to win in the corresponding game.

이상, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치는, 장기 게임 서비스에 특화되도록 딥러닝 모델을 학습시킴으로써 장기 게임 서비스에서 특징적으로 구현되는 게임의 속성을 보다 소상하게 고려하며 그에 최적화된 딥러닝을 수행할 수 있는 인공지능 컴퓨터를 구현할 수 있다. As described above, the long-term game service method and apparatus based on deep learning according to an embodiment of the present invention consider the characteristics of the game characteristically implemented in the long-term game service in more detail by learning a deep learning model to be specialized for the long-term game service. It is possible to implement an artificial intelligence computer that can perform deep learning optimized for it.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 장기 게임 서비스 방법 및 그 장치는, 학습된 딥러닝 모델을 이용하여 장기 게임 서비스에서의 대국 상황에 기반한 액션 별 승률을 예측하고, 이를 기초로 해당 대국 상황에 대한 수행 액션과 그에 따른 대국 결과를 예측함으로써 장기 게임 서비스에 대한 인공지능 컴퓨터의 성능을 보다 향상시킬 수 있다. In addition, the deep learning-based long-term game service method and apparatus according to an embodiment of the present invention predict the winning rate for each action based on the game situation in the long-term game service using the learned deep learning model, and based on this, the corresponding The performance of artificial intelligence computers for long-term game services can be further improved by predicting the actions performed in the game situation and the result of the game accordingly.

또한, 이상에서 설명된 본 발명에 따른 실시예는 다양한 컴퓨터 구성요소를 통하여 실행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위하여 하나 이상의 소프트웨어 모듈로 변경될 수 있으며, 그 역도 마찬가지이다.In addition, the embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer readable recording medium. The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. medium), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes generated by a compiler. A hardware device may be modified with one or more software modules to perform processing according to the present invention and vice versa.

본 발명에서 설명하는 특정 실행들은 일 실시 예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, “필수적인”, “중요하게” 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.Specific implementations described in the present invention are examples and do not limit the scope of the present invention in any way. For brevity of the specification, description of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connection of lines or connecting members between the components shown in the drawings are examples of functional connections and / or physical or circuit connections, which can be replaced in actual devices or additional various functional connections, physical connection, or circuit connections. In addition, if there is no specific reference such as “essential” or “important”, it may not be a component necessarily required for the application of the present invention.

또한 설명한 본 발명의 상세한 설명에서는 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자 또는 해당 기술분야에 통상의 지식을 갖는 자라면 후술할 특허청구범위에 기재된 본 발명의 사상 및 기술 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 명세서의 상세한 설명에 기재된 내용으로 한정되는 것이 아니라 특허청구범위에 의해 정하여져야만 할 것이다.In addition, the detailed description of the present invention described has been described with reference to preferred embodiments of the present invention, but those skilled in the art or those having ordinary knowledge in the art will find the spirit of the present invention described in the claims to be described later. And it will be understood that the present invention can be variously modified and changed without departing from the technical scope. Therefore, the technical scope of the present invention is not limited to the contents described in the detailed description of the specification, but should be defined by the claims.

Claims

As a method for a game play server to provide a long-term game service based on deep learning to a user terminal,
Acquiring a training data set including a checkboard state for each of a plurality of actions;
extracting input features based on the obtained training data set;
learning a deep-learning model based on the extracted input features;
Receiving a first chessboard state; and
Acquiring at least one of an action to be performed and win-loss prediction information based on the deep learning model for the received first chessboard state
Long-term game service method based on deep learning.

According to claim 1,
The input features are,
Type of long-term piece, history information on the last 8 actions of the first player, position information of the piece on the last 8 actions of the first player, and information on the last 8 actions of the second player At least some of history information, piece location information for the last 8 actions of the second player, and turn information about which player is the first player or the second player
Long-term game service method based on deep learning.

According to claim 1,
The step of learning the deep-learning model,
A predetermined chessboard state is used as input data,
At least one of an odds ratio estimation value for each candidate action that is an action that can be performed in the input predetermined chess board state, an action to be performed with respect to the predetermined chess board state, and win-loss prediction information for the predetermined chess board state as output data Including the step of learning the deep learning model
Long-term game service method based on deep learning.

According to claim 1,
The action (action),
It is an action to start by moving a specific piece on the chess board based on the magic of each piece of chess.
Long-term game service method based on deep learning.

According to claim 4,
The action (action),
Including movement action outside the palace, horse movement action, statue movement action, movement action within the palace and pass action
Long-term game service method based on deep learning.

According to claim 1,
Acquiring at least one of an action performed for the first chess board state and win/loss prediction information,
Including the step of inputting the first chess board state into the deep learning model and obtaining an odds estimate value for each candidate action that is an action that can be performed in the first chess board state
Long-term game service method based on deep learning.

According to claim 6,
Acquiring at least one of an action performed for the first chess board state and win/loss prediction information,
Further comprising determining a candidate action having the highest odds ratio estimate as an action to be performed for the first chessboard state.
Long-term game service method based on deep learning.

According to claim 7,
Acquiring at least one of an action performed for the first chess board state and win/loss prediction information,
Further comprising the step of obtaining the win-loss prediction information for the first check board state based on whether the determined execution action is won or lost.
Long-term game service method based on deep learning.

According to claim 1,
Further comprising the step of performing the action for the first check board state
Long-term game service method based on deep learning.

a communication unit for receiving a chess board state;
memory for storing deep learning models; and
Learning the deep learning model based on the received chess board state;
Based on the deep learning model, at least one of a performing action, which is an action having the highest odds estimate value among a plurality of candidate actions possible in the chess board state, and win-loss prediction information for the chess board state as the performing action is performed Obtaining at least one Characterized in that it comprises a processor;
Long-term game service device based on deep learning.