KR20220164323A

KR20220164323A - Method and device for providing service that deep learning-based imperfect information game

Info

Publication number: KR20220164323A
Application number: KR1020210072949A
Authority: KR
Inventors: 이용구; 박철웅; 박정훈; 이창율; 박근한
Original assignee: 엔에이치엔클라우드 주식회사
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2022-12-13
Also published as: KR102628188B1

Abstract

The present invention provides a method and a device for providing an imperfect information game service based on deep learning which can determine the best action in accordance with a present play state. According to an embodiment of the present invention, the method for providing an imperfect information game service based on deep learning provides a deep learning-based imperfect information game service through a user terminal by a game play server and comprises: a step of acquiring a training dataset including go-stop board states for a plurality of actions; and a step of training a deep learning model of the imperfect information service based on the acquired training dataset. The step of training the deep learning model includes at least one among: a step of performing state determination training based on the training dataset; a step of performing winning rate estimation training based on the training dataset; a step of performing score estimation training based on the training dataset; and a step of performing genealogy achievement probability training based on the training dataset.

Description

Deep learning-based imperfect information game service providing method and its device

본 발명은 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치에 관한 것이다. 보다 상세하게는, 불완전 정보 게임에서 현재 플레이 상태에 따른 최선의 액션(action)을 수행하는 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치에 관한 것이다.The present invention relates to a deep learning-based imperfect information game service providing method and apparatus therefor. More specifically, it relates to a deep learning-based incomplete information game service providing method and apparatus for performing the best action according to the current play state in an incomplete information game.

최근 들어, 통신 및 네트워크 기술의 발달에 따라서 유/무선 인터넷의 보급이 급격하게 증가됨에 따라 인터넷이라는 동질의 매체를 통하여 여러 종류의 서비스가 이루어지고 있다. BACKGROUND ART [0002] In recent years, as the spread of wired/wireless Internet has rapidly increased with the development of communication and network technology, various types of services have been provided through a homogeneous medium called the Internet.

특히, 게임 서비스는 인터넷을 통해 제공되는 서비스 중에서도 많은 사용자들이 이용하는 서비스로 다양한 게임들이 서비스되고 있다. In particular, the game service is a service used by many users among services provided through the Internet, and various games are provided.

그 중에서도 승리의 대가로 게임 포인트를 주고받는 대전(對戰) 형식의 게임들이 많이 서비스되고 있으며, '고스톱(GOSTOP)', '장기' 또는 '포커' 등의 게임은 많은 사용자들을 확보하고 있는 대중적인 게임들이다.　Among them, many battle-type games in which game points are exchanged in exchange for victory are being serviced, and games such as 'GOSTOP', 'janggi' or 'poker' are popular with many users. they are games

한편, 이러한 추세와 더불어 최근에는 사람이 아닌 프로그램된 인공지능 컴퓨터와 위와 같은 게임을 통해 대전을 수행할 수 있게 되었다.On the other hand, in addition to this trend, it has recently been possible to play a game with a programmed artificial intelligence computer rather than a human being through the above game.

자세히, 인공지능 컴퓨터가 사람 수준으로 대전을 수행할 수 있도록 인공지능 컴퓨터의 기력을 높이기 위한 연구가 활발하게 진행되고 있는 추세이다. In detail, there is a trend in which research is being actively conducted to enhance the power of artificial intelligence computers so that they can play matches at the human level.

최근 개발자들은 인공지능 컴퓨터에 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 알고리즘과 딥러닝 기술을 적용하여 인공지능 컴퓨터의 기력을 상당 수준 이상으로 올렸다.Recently, developers have applied the Monte Carlo Tree Search (MCTS) algorithm and deep learning technology to artificial intelligence computers, raising the power of artificial intelligence computers to a considerable level.

그러나 종래의 인공지능 컴퓨터는, 게임의 진행과 관련된 모든 정보가 모든 게임 플레이어들에게 온전하게 공개되는 완전 정보 게임(예컨대, 바둑 및 장기 등)에 특화되어 있는 실정이어서, 게임 진행에 필요한 일부 정보(예컨대, 상대 플레이어의 카드나 패 또는 뒤집어진 덱(deck)에 포함된 카드나 패 등)가 비공개되는 불완전 정보 게임(예컨대, 고스톱 등)에서의 높은 성능을 구현할 수 있는 인공지능 컴퓨터에 대한 기술 개발이 미비한 실정이다. However, conventional artificial intelligence computers are specialized in complete information games (e.g., Go and chess) in which all information related to the progress of the game is fully disclosed to all game players, so some information necessary for game progress ( Development of technology for artificial intelligence computers capable of realizing high performance in imperfect information games (eg, GoStop, etc.) This is an obscure situation.

USUS 2014-0315625 2014-0315625 A1A1

본 발명은, 게임 진행에 필요한 일부 정보가 비공개되는 불완전 정보 게임에서, 현재 플레이 상태에 따른 최선의 액션(action)을 수행하는 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치를 제공하는데 그 목적이 있다. An object of the present invention is to provide a deep learning-based incomplete information game service providing method and apparatus for performing the best action according to the current play state in an incomplete information game in which some information necessary for game progress is not disclosed. there is.

자세히, 본 발명은 불완전 정보 게임에서 현재 플레이 상태에서 수행 가능한 최선의 액션을 판단하는 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치를 제공하고자 한다. In detail, the present invention is to provide a deep learning-based incomplete information game service providing method and apparatus for determining the best action that can be performed in a current play state in an incomplete information game.

또한, 본 발명은 불완전 정보 게임에서 현재 플레이 상태에 기반한 롤 아웃(Roll-out)을 수행하여 승률 및 점수를 추정하는 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치를 제공하고자 한다. In addition, the present invention is to provide a deep learning-based incomplete information game service providing method and apparatus for estimating a win rate and a score by performing a roll-out based on a current play state in an incomplete information game.

또한, 본 발명은 불완전 정보 게임에서의 족보 성취여부를 예측하는 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치를 제공하고자 한다. In addition, the present invention is to provide a deep learning-based incomplete information game service providing method and apparatus for predicting whether a genealogy is achieved in an incomplete information game.

또한, 본 발명은 불완전 정보 게임에서 상대 플레이어의 액션을 딥러닝에 기초하여 추정하는 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치를 제공하고자 한다. In addition, the present invention is to provide a deep learning-based incomplete information game service providing method and apparatus for estimating an opponent's action in an incomplete information game based on deep learning.

또한, 본 발명은 기존의 UCT(Upper Confidence Bound 1 applied to Trees) 알고리즘에 점수 관련 파라미터 항을 추가하여 게임의 승패 뿐만 아니라 점수까지 고려해야 하는 불완전 정보 게임에 최적화된 형식의 알고리즘을 사용하는 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치를 제공하고자 한다. In addition, the present invention adds a score-related parameter term to the existing UCT (Upper Confidence Bound 1 applied to Trees) algorithm to develop a deep learning-based algorithm using an algorithm optimized for incomplete information games that must consider not only game wins and losses but also scores. It is intended to provide a method and device for providing incomplete information game service.

다만, 본 발명 및 본 발명의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present invention and the embodiments of the present invention are not limited to the technical problems described above, and other technical problems may exist.

본 발명의 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 방법은, 게임 플레이 서버에서 유저 단말기를 통해 딥러닝 기반 불완전 정보 게임 서비스를 제공하는 방법으로서, 복수의 액션(action) 별 고스톱판 상태를 포함하는 트레이닝 데이터 셋(Training data set)을 획득하는 단계; 및 상기 획득된 트레이닝 데이터 셋을 기초로 상기 불완전 정보 서비스의 딥러닝 모델을 학습시키는 단계를 포함하고, 상기 딥러닝 모델을 학습시키는 단계는, 상기 트레이닝 데이터 셋을 기초로 상태판단 트레이닝을 수행하는 단계와, 상기 트레이닝 데이터 셋을 기초로 승률추정 트레이닝을 수행하는 단계와, 상기 트레이닝 데이터 셋을 기초로 점수추정 트레이닝을 수행하는 단계와, 상기 트레이닝 데이터 셋을 기초로 족보 성취확률 트레이닝을 수행하는 단계 중 적어도 하나의 단계를 포함한다. A method for providing a game service based on deep learning based on incomplete information according to an embodiment of the present invention is a method for providing a game service based on deep learning based on incomplete information through a user terminal in a game play server. Obtaining a training data set comprising; and learning a deep learning model of the incomplete information service based on the obtained training data set, wherein the learning of the deep learning model comprises performing state decision training based on the training data set. And, performing odds estimation training based on the training data set, performing score estimation training based on the training data set, and performing genealogy achievement probability training based on the training data set. It includes at least one step.

이때, 상기 상태판단 트레이닝은, 제1 고스톱판 상태를 입력으로 하고, 상기 입력된 제1 고스톱판 상태에서 수행 가능한 적어도 하나의 액션을 판단한 정보인 상태 판단정보를 출력하도록 상기 딥러닝 모델을 학습시킨다. At this time, the state judgment training takes the first high stop board state as an input and trains the deep learning model to output state determination information that is information obtained by determining at least one action that can be performed in the input first high stop board state. .

또한, 상기 상태 판단정보는, 상기 제1 고스톱판 상태에서 수행 가능한 적어도 하나의 액션을 검출하는 기준 요소인 활성화 상태를 판단한 정보이다. In addition, the state determination information is information for determining an activation state, which is a reference element for detecting at least one action that can be performed in the state of the first high-stop board.

또한, 상기 활성화 상태는, 손패 상태(HAND State), 흔들기 상태(SHAKE State), 먹을패 선택 상태(SELECT State), 술잔위치 선택 상태(SWITCH State), 고스톱 상태(GOSTOP State), 게임종료 상태(END State), 손패 쥐어주기 상태(CHANCE_HAND State) 및 덱 뒤집기 상태(CHANCE_FLIP State) 중 적어도 하나 이상의 상태를 포함한다. In addition, the activation state is a hand state (HAND State), a shake state (SHAKE State), an eating tile selection state (SELECT State), a drinking cup position selection state (SWITCH State), a gostop state (GOSTOP State), a game end state ( END State), hand holding state (CHANCE_HAND State), and deck flipping state (CHANCE_FLIP State).

또한, 상기 승률추정 트레이닝은, 상기 트레이닝 데이터 셋을 이용한 롤 아웃(Roll-out)을 기초로 제1 고스톱판 상태를 입력으로 하고, 상기 입력된 제1 고스톱판 상태에서 가능한 액션들에 의한 승률을 추정한 정보인 승률추정값을 출력하도록 상기 딥러닝 모델을 학습시킨다. In addition, the odds estimation training takes the first high-stop board state as an input based on roll-out using the training data set, and the winning rate by actions possible in the input first high-stop board state The deep learning model is trained to output an odds ratio estimation value, which is the estimated information.

또한, 상기 점수추정 트레이닝은, 상기 트레이닝 데이터 셋을 이용한 롤 아웃(Roll-out)을 기초로 제1 고스톱판 상태를 입력으로 하고, 상기 입력된 제1 고스톱판 상태에서 가능한 액션들에 의한 점수를 추정한 정보인 점수추정값을 출력하도록 상기 딥러닝 모델을 학습시킨다. In addition, the score estimation training takes the first high-stop board state as an input based on roll-out using the training data set, and scores by actions possible in the input first high-stop board state The deep learning model is trained to output a score estimation value, which is the estimated information.

또한, 상기 족보 성취확률 트레이닝은, 제1 고스톱판 상태를 입력으로 하고, 상기 입력된 제1 고스톱판 상태에 기반한 롤 아웃 종료 시 소정의 족보를 성취할 확률을 예측한 정보인 족보 성취확률값을 출력하도록 상기 딥러닝 모델을 학습시킨다. In addition, the genealogy achievement probability training takes the first high-stop board state as an input and outputs a genealogy achievement probability value, which is information predicting the probability of achieving a predetermined genealogy at the end of roll-out based on the input first high-stop board state The deep learning model is trained to do so.

또한, 상기 고스톱판 상태는, 플레이어의 손패, 플레이어의 먹은패, 상대 플레이어의 먹은패, 바닥에 깔린 패, 플레이어가 승리하기 위해 갱신해야 하는 점수, 상대 플레이어가 승리하기 위해 갱신해야 하는 점수, 플레이어의 선택 패, 상대 플레이어의 선택 패, 보너스 패 인질 종류, 고(go) 실행횟수, 선(先) 플레이어 정보, 뻑 유발 정보 및 9열끗 피 사용정보 중 적어도 일부를 포함한다. In addition, the state of the high-stop board is the player's hand, the player's eaten hand, the opposing player's eaten hand, the floor laid on the floor, the score that the player needs to update to win, the score that the opponent player needs to update to win, the player It includes at least some of the selected hand, opponent player's selected hand, bonus hand, hostage type, go execution count, previous player information, knock-inducing information, and 9 blood usage information.

한편, 본 발명의 실시예에 따른 불완전 정보 게임 서비스 제공 장치는, 트레이닝 데이터 셋을 수신하는 통신부; 딥러닝 모델을 저장하는 메모리; 및 상기 수신된 트레이닝 데이터 셋을 기초로 상기 딥러닝 모델에 대한 상태판단 트레이닝, 승률추정 트레이닝, 점수추정 트레이닝 및 족보 성취확률 트레이닝 중 적어도 하나의 트레이닝을 수행하는 프로세서;를 포함하는 것을 특징으로 한다. On the other hand, an incomplete information game service providing apparatus according to an embodiment of the present invention includes a communication unit for receiving a training data set; memory for storing deep learning models; and a processor that performs at least one of state judgment training, odds estimation training, score estimation training, and genealogy achievement probability training for the deep learning model based on the received training data set.

이때, 상기 프로세서는, 상기 트레이닝을 수행하여 상기 딥러닝 모델이 제1 고스톱판 상태에 대한 상태 판단정보, 승률추정값, 점수추정값 및 족보 성취확률값 중 적어도 하나를 출력하도록 학습시킨다. At this time, the processor performs the training so that the deep learning model is trained to output at least one of state determination information, odds estimation value, score estimation value, and genealogy achievement probability value for the first high-stop version state.

본 발명의 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치는, 불완전 정보 게임에서 현재 플레이 상태에 기초한 상태 판단정보, 승률추정값, 점수추정값 및 족보 성취확률값 중 적어도 하나를 이용하여 최선의 액션(action)을 수행할 수 있다. A deep learning-based incomplete information game service providing method and apparatus according to an embodiment of the present invention, using at least one of state determination information based on a current play state in an incomplete information game, an odds estimate value, a score estimate value, and a genealogy achievement probability value, of actions can be performed.

또한, 본 발명의 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치는, 상대 플레이어의 패(또는 카드 등)와 덱(deck)이 포함하는 패(또는 카드 등) 등과 같은 일부 정보를 비공개하는 불완전 정보 게임에서의 상대 플레이어의 액션을 무작위 랜덤 방식이 아닌 소정의 알고리즘에 따른 딥러닝을 기반으로 추정할 수 있다. In addition, a deep learning-based imperfect information game service providing method and apparatus according to an embodiment of the present invention include some information such as the hand (or card, etc.) of an opponent and the hand (or card, etc.) included in a deck It is possible to estimate the action of an opponent player in an incomplete information game that discloses based on deep learning according to a predetermined algorithm rather than a random method.

또한, 본 발명의 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치는, 불완전 정보 게임에 보다 최적화된 형태로 개선된 UCT 알고리즘을 이용하여 현재 플레이 상태에 따른 최선의 액션을 결정할 수 있다. In addition, the deep learning-based incomplete information game service providing method and apparatus according to an embodiment of the present invention can determine the best action according to the current play state using an improved UCT algorithm in a form more optimized for incomplete information games. have.

다만, 본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 명확하게 이해될 수 있다. However, the effects obtainable in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood from the description below.

도 1은 본 발명의 실시예에 따른 딥러닝 기반 불완전 게임 서비스 시스템에 대한 예시도이다.
도 2는 본 발명의 실시예에 따른 게임 플레이 서버의 동작 모델 구조를 설명하기 위한 도면의 일례이다.
도 3은 본 발명의 실시예에 따른 동작 모델이 탐색부의 파이프 라인에 따라서 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS)를 수행하는 과정을 설명하기 위한 도면의 일례이다.
도 4는 본 발명의 실시예에 따른 게임 플레이 서버의 동작 모델을 학습하는 방법을 설명하기 위한 흐름도이다.
도 5는 본 발명의 실시예에 따른 게임 플레이 서버의 동작 모델을 학습하는 방법을 설명하기 위한 개념도이다.
도 6은 본 발명의 실시예에 따른 상태판단 트레이닝을 수행하는 방법을 설명하기 위한 개념도이다.
도 7은 본 발명의 실시예에 따른 상태 간 전환 관계를 보여주는 다이어그램의 일례이다.
도 8은 본 발명의 실시예에 따른 롤 아웃(Roll-out) 기반 트레이닝을 설명하기 위한 개념도이다.
도 9는 본 발명의 실시예에 따른 승률 및 점수를 추정하는 방법을 설명하기 위한 도면의 일례이다.
도 10은 본 발명의 실시예에 따른 게임 플레이 서버의 동작 모델이 딥러닝 기반의 불완전 정보 게임 서비스를 제공하는 방법을 설명하기 위한 흐름도이다.
도 11은 본 발명의 실시예에 따른 게임 플레이 서버의 동작 모델이 딥러닝 기반의 불완전 정보 게임 서비스를 제공하는 방법을 설명하기 위한 개념도이다.
도 12는 본 발명의 실시예에 따른 불완전 정보 게임 서비스의 MCTS에서 상대 플레이어의 액션을 추정하는 방법을 설명하기 위한 도면의 일례이다.
도 13은 본 발명의 실시예에 따른 G-UCT 알고리즘에서 사용하는 시그모이드 함수(Sigmoid function)를 나타내는 도면의 일례이다. 1 is an exemplary view of a deep learning-based incomplete game service system according to an embodiment of the present invention.
2 is an example of a diagram for explaining the structure of an operation model of a game play server according to an embodiment of the present invention.
3 is an example of a diagram for explaining a process in which an operation model according to an embodiment of the present invention performs a Monte Carlo Tree Search (MCTS) according to a pipeline of a search unit.
4 is a flowchart illustrating a method of learning an operation model of a game play server according to an embodiment of the present invention.
5 is a conceptual diagram illustrating a method of learning an operation model of a game play server according to an embodiment of the present invention.
6 is a conceptual diagram for explaining a method of performing state judgment training according to an embodiment of the present invention.
7 is an example of a diagram showing transition relationships between states according to an embodiment of the present invention.
8 is a conceptual diagram illustrating roll-out based training according to an embodiment of the present invention.
9 is an example of a diagram for explaining a method of estimating odds and scores according to an embodiment of the present invention.
10 is a flowchart illustrating a method for providing a deep learning-based incomplete information game service by an operation model of a game play server according to an embodiment of the present invention.
11 is a conceptual diagram for explaining a method for providing a game service based on deep learning based on incomplete information by an operation model of a game play server according to an embodiment of the present invention.
12 is an example of a diagram for explaining a method of estimating an action of an opponent player in MCTS of an incomplete information game service according to an embodiment of the present invention.
13 is an example of a diagram showing a sigmoid function used in the G-UCT algorithm according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. 이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 또한, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 또한, 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.Since the present invention can apply various transformations and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. Effects and features of the present invention, and methods for achieving them will become clear with reference to the embodiments described later in detail together with the drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various forms. In the following embodiments, terms such as first and second are used for the purpose of distinguishing one component from another component without limiting meaning. Also, expressions in the singular number include plural expressions unless the context clearly dictates otherwise. In addition, terms such as include or have mean that features or elements described in the specification exist, and do not preclude the possibility that one or more other features or elements may be added. In addition, in the drawings, the size of components may be exaggerated or reduced for convenience of description. For example, since the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of description, the present invention is not necessarily limited to the illustrated bar.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and when describing with reference to the drawings, the same or corresponding components are assigned the same reference numerals, and overlapping descriptions thereof will be omitted. .

도 1은 본 발명의 실시예에 따른 딥러닝 기반 불완전 게임 서비스 시스템에 대한 예시도이다.1 is an exemplary view of a deep learning-based incomplete game service system according to an embodiment of the present invention.

도 1을 참조하면, 실시예에 따른 딥러닝 기반 불완전 게임 서비스 시스템은, 단말기(100), 게임 서비스 제공서버(200), 게임 플레이 서버(300) 및 네트워크(400)를 포함할 수 있다. Referring to FIG. 1 , a deep learning-based incomplete game service system according to an embodiment may include a terminal 100, a game service providing server 200, a game play server 300, and a network 400.

도 1의 각 구성요소는, 네트워크(400)를 통해 연결될 수 있다. 단말기(100), 게임 서비스 제공서버(200) 및/또는 게임 플레이 서버(300) 등과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다. Each component of FIG. 1 may be connected through a network 400 . It means a connection structure capable of exchanging information between nodes such as the terminal 100, the game service providing server 200, and/or the game play server 300. An example of such a network is 3GPP (3rd Generation Partnership) Project) network, LTE (Long Term Evolution) network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), A personal area network (PAN), a Bluetooth network, a satellite broadcasting network, an analog broadcasting network, a digital multimedia broadcasting (DMB) network, and the like are included, but are not limited thereto.

한편, 본 발명의 실시예에 따른 불완전 정보 게임 서비스는 게임 진행에 필요한 일부 정보가 비공개 처리되는 어떠한 게임 서비스에도 적용이 가능하며, 이하에서는 카드 게임 서비스, 그 중에서도 고스톱(GOSTOP) 게임 서비스에 기준하여 설명한다. On the other hand, the incomplete information game service according to an embodiment of the present invention can be applied to any game service in which some information necessary for game progress is kept private. Explain.

- 단말기(100: Terminal) - Terminal (100: Terminal)

먼저, 단말기(100)는, 고스톱 게임 서비스를 제공받고자 하는 유저의 단말기이다. 또한, 단말기(100)는 다양한 작업을 수행하는 애플리케이션들을 실행하기 위한 유저가 사용하는 하나 이상의 컴퓨터 또는 다른 전자 장치이다. First, the terminal 100 is a terminal of a user who wants to receive a go-stop game service. Also, the terminal 100 is one or more computers or other electronic devices used by a user to execute applications that perform various tasks.

예컨대, 컴퓨터, 랩탑 컴퓨터, 스마트 폰, 모바일 전화기, PDA, 태블릿 PC, 혹은 게임 서비스 제공서버(200) 및/또는 게임 플레이 서버(300)와 통신하도록 동작 가능한 임의의 다른 디바이스를 포함한다. For example, it includes a computer, laptop computer, smart phone, mobile phone, PDA, tablet PC, or any other device operable to communicate with the game service providing server 200 and/or the game play server 300.

다만 이에 한정되는 것은 아니고 단말기(100)는 다양한 머신들 상에서 실행되고, 다수의 메모리 내에 저장된 명령어들을 해석하여 실행하는 프로세싱 로직을 포함하고, 외부 입력/출력 디바이스 상에 그래픽 사용자 인터페이스(GUI)를 위한 그래픽 정보를 디스플레이하는 프로세스들과 같이 다양한 기타 요소들을 포함할 수 있다. However, it is not limited thereto, and the terminal 100 is executed on various machines, includes processing logic that interprets and executes commands stored in a plurality of memories, and provides a graphical user interface (GUI) on an external input/output device. It may contain various other elements, such as processes that display graphical information.

아울러 단말기(100)는 입력 장치(예를 들면 마우스, 키보드, 터치 감지 표면 등) 및 출력 장치(예를 들면 디스플레이장치, 모니터, 스크린 등)에 접속될 수 있다. In addition, the terminal 100 may be connected to an input device (eg, mouse, keyboard, touch sensitive surface, etc.) and an output device (eg, display device, monitor, screen, etc.).

단말기(100)에 의해 실행되는 애플리케이션들은 게임 애플리케이션, 웹 브라우저, 웹 브라우저에서 동작하는 웹 애플리케이션, 워드 프로세서들, 미디어 플레이어들, 스프레드시트들, 이미지 프로세서들, 보안 소프트웨어 또는 그 밖의 것을 포함할 수 있다.Applications executed by the terminal 100 may include game applications, web browsers, web applications running on web browsers, word processors, media players, spreadsheets, image processors, security software, or the like. .

또한, 단말기(100)는 명령들을 저장하는 적어도 하나의 메모리(101), 적어도 하나의 프로세서(102) 및 통신부(103)를 포함할 수 있다.In addition, the terminal 100 may include at least one memory 101 storing instructions, at least one processor 102 and a communication unit 103 .

단말기(100)의 메모리(101)는 단말기(100)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 단말기(100)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다.The memory 101 of the terminal 100 may store a plurality of application programs or applications running in the terminal 100, data for operation of the terminal 100, and commands.

명령들은 프로세서(102)로 하여금 동작들을 수행하게 하기 위해 프로세서(102)에 의해 실행 가능하고, 동작들은 고스톱 게임 실행 요청 신호를 전송, 게임 데이터 송수신, 액션 정보 송수신, 상태 판단 요청 신호를 전송, 상태 판단 결과 수신, 게임 시간 정보 요청, 게임 시간 정보 수신 및 각종 정보 수신하는 동작들을 포함할 수 있다.Instructions are executable by the processor 102 to cause the processor 102 to perform operations, such as sending a go-stop game play request signal, sending and receiving game data, sending and receiving action information, sending a status determination request signal, status It may include operations for receiving a decision result, requesting game time information, receiving game time information, and receiving various types of information.

또한, 메모리(101)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(130)는 인터넷(internet)상에서 상기 메모리(101)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다. In addition, the memory 101 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in terms of hardware, and the memory 130 performs the storage function of the memory 101 on the Internet. It can also be a web storage that performs.

단말기(100)의 프로세서(102)는 전반적인 동작을 제어하여 고스톱 게임 서비스를 제공받기 위한 데이터 처리를 수행할 수 있다. The processor 102 of the terminal 100 may perform data processing to receive the GoStop game service by controlling overall operations.

단말기(100)에서 고스톱 게임 애플리케이션이 실행되면, 단말기(100)에서 고스톱 게임 환경이 구성된다. 그리고 고스톱 게임 애플리케이션은 네트워크(400)를 통해 게임 서비스 제공서버(200)와 고스톱 게임 데이터를 교환하여 단말기(100) 상에서 고스톱 게임 서비스가 실행되도록 한다. When the GoStop game application is executed in the terminal 100, a GoStop game environment is configured in the terminal 100. In addition, the GoStop game application exchanges GoStop game data with the game service providing server 200 through the network 400 so that the GoStop game service is executed on the terminal 100 .

이러한 프로세서(102)는 ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 임의의 형태의 프로세서일 수 있다.These processors 102 include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, and microcontrollers. It may be micro-controllers, microprocessors, or any type of processor for performing other functions.

단말기(100)의 통신부(103)는, 하기 통신방식(예를 들어, GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced) 등), WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access)에 따라 구축된 네트워크망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신할 수 있다. The communication unit 103 of the terminal 100 uses the following communication methods (eg, Global System for Mobile communication (GSM), Code Division Multi Access (CDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink (HSUPA)) Packet Access), LTE (Long Term Evolution), LTE-A (Long Term Evolution-Advanced), etc.), WLAN (Wireless LAN), Wi-Fi (Wireless-Fidelity), Wi-Fi (Wireless Fidelity) Direct, DLNA ( A wireless signal may be transmitted and received with at least one of a base station, an external terminal, and a server on a network constructed according to Digital Living Network Alliance (WiBro), Wireless Broadband (WiBro), and World Interoperability for Microwave Access (WiMAX).

이러한 단말기(100)는, 후술되는 게임 서비스 제공서버(200) 및 게임 플레이 서버(300) 중 적어도 하나에서 수행되는 기능 동작의 적어도 일부를 수행할 수도 있다.Such a terminal 100 may perform at least a part of a function operation performed in at least one of the game service providing server 200 and the game play server 300 to be described later.

- 게임 서비스 제공서버(200: Game service providing server) - Game service providing server (200: Game service providing server)

게임 서비스 제공서버(200)가 제공하는 고스톱 게임 서비스는 게임 서비스 제공서버(200)가 제공하는 가상의 컴퓨터 유저와 실제 유저가 함께 게임에 참여하는 형태로 구성될 수 있다. 이는 유저 측 단말기(100) 상에서 구현되는 고스톱 게임 환경에서 하나의 실제 유저와 하나의 컴퓨터 유저가 함께 게임을 플레이 한다.The GoStop game service provided by the game service providing server 200 may be configured in such a way that a virtual computer user provided by the game service providing server 200 and a real user participate in a game together. In this case, one real user and one computer user play the game together in a go-stop game environment implemented on the user-side terminal 100 .

다른 측면에서, 게임 서비스 제공서버(200)가 제공하는 고스톱 게임 서비스는 복수의 유저 측 디바이스가 참여하여 고스톱 게임이 플레이되는 형태로 구성될 수도 있다.In another aspect, the Go-Stop game service provided by the game service providing server 200 may be configured in a form in which a plurality of user-side devices participate to play the Go-Stop game.

게임 서비스 제공서버(200)는 명령들을 저장하는 적어도 하나의 메모리(201), 적어도 하나의 프로세서(202) 및 통신부(203)를 포함할 수 있다.The game service providing server 200 may include at least one memory 201 storing instructions, at least one processor 202, and a communication unit 203.

게임 서비스 제공서버(200)의 메모리(201)는 게임 서비스 제공서버(200)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 게임 서비스 제공서버(200)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(202)로 하여금 동작들을 수행하게 하기 위해 프로세서(202)에 의해 실행 가능하고, 동작들은 게임 실행 요청 신호 수신, 게임 데이터 송수신, 액션 정보 송수신, 상태 판단 요청 신호 송수신, 상태 판단 결과 송수신, 액션 준비 시간 송수신, 게임 정보 시간 송수신 및 각종 전송 동작을 포함할 수 있다. The memory 201 of the game service providing server 200 stores a plurality of application programs or applications running in the game service providing server 200 and data for the operation of the game service providing server 200. , can store commands. Instructions are executable by the processor 202 to cause the processor 202 to perform operations, which include receiving a game execution request signal, transmitting and receiving game data, transmitting and receiving action information, transmitting and receiving a state determination request signal, and transmitting and receiving a state determination result. , action preparation time transmission and reception, game information time transmission and reception, and various transmission operations.

또한, 메모리(201)는 게임 서비스 제공서버(200)에서 대전(對戰)을 하였던 복수의 플레이 데이터 또는 기존에 공개된 복수의 플레이 데이터를 저장할 수 있다. In addition, the memory 201 may store a plurality of play data that has been played in the game service providing server 200 or a plurality of play data that has been previously disclosed.

복수의 플레이 데이터 각각은 대전 시작의 첫 액션 정보인 제1 액션부터 대전이 종료되는 최종 액션까지의 정보를 모두 포함할 수 있다. 즉, 복수의 플레이 데이터는 액션에 관한 히스토리 정보를 포함할 수 있다. Each of the plurality of play data may include all information from a first action, which is information about a first action at the start of a competitive battle, to a final action at which the competitive game ends. That is, the plurality of play data may include action-related history information.

또한, 플레이 데이터는, 고스톱 게임 서비스에서 수행한 액션 순서에 따른 각각의 플레이 상태 즉, 실시예에서 고스톱판 상태를 포함할 수 있다. In addition, the play data may include each play state according to the order of actions performed by the go-stop game service, that is, the go-stop board state in the embodiment.

여기서, 상기 플레이 상태(이하, 고스톱판 상태)는, 고스톱판에 고스톱 패가 놓여진 상태 등을 포함하는 게임 진행에 따른 데이터들의 집합으로서, 실시예에서 플레이어의 손패, 플레이어의 먹은패, 상대 플레이어의 먹은패, 바닥에 깔린 패(즉, 공개 패), 플레이어가 승리하기 위해 갱신해야 하는 점수, 상대 플레이어가 승리하기 위해 갱신해야 하는 점수, 플레이어가 손패 상태일 때 선택한 패, 상대 플레이어가 손패 상태일 때 선택한 패, 뻑 유발정보, 보너스 패 인질 종류, 고(go) 실행횟수, 선(先) 플레이어 정보 및/또는 9열끗 피 사용정보 등을 포함할 수 있다. Here, the play state (hereinafter referred to as the Go-Stop board state) is a set of data according to the progress of the game, including the state in which the Go-Stop hand is placed on the Go-Stop board, and the like. hand, floored hand (i.e. open hand), number of points a player must score to win, number of points opponent player must score to win, hand selected when player is in hand, and opponent player in hand It may include the selected hand, knock-inducing information, bonus hand hostage type, go execution count, previous player information, and/or 9-fold blood usage information.

게임 서비스 제공서버(200)는 게임 플레이 서버(300)의 트레이닝을 위하여 저장된 복수의 플레이 데이터를 게임 플레이 서버(300)에 제공할 수 있게 한다. The game service providing server 200 can provide the game play server 300 with a plurality of stored play data for training of the game play server 300 .

또한, 메모리(201)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(201)는 인터넷(internet)상에서 상기 메모리(201)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.In addition, the memory 201 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in terms of hardware, and the memory 201 performs the storage function of the memory 201 on the Internet. It can also be a web storage that performs.

게임 서비스 제공서버(200)의 프로세서(202)는 전반적인 동작을 제어하여 고스톱 게임 서비스를 제공하기 위한 데이터 처리를 수행할 수 있다. 이러한 프로세서(202)는 ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 임의의 형태의 프로세서일 수 있다.The processor 202 of the game service providing server 200 may perform data processing to provide a go-stop game service by controlling overall operations. These processors 202 include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, and microcontrollers. It may be micro-controllers, microprocessors, or any type of processor for performing other functions.

게임 서비스 제공서버(200)는 통신부(203)를 통해 네트워크(400)를 경유하여 단말기(100) 및 게임 플레이 서버(300)와 통신을 수행할 수 있다. The game service providing server 200 may communicate with the terminal 100 and the game play server 300 via the network 400 through the communication unit 203 .

- 게임 플레이 서버(300: Game playing server) - Game play server (300: Game playing server)

게임 플레이 서버(300)는, 별도의 클라우드 서버나 컴퓨팅 장치를 포함할 수 있다. 또한, 게임 플레이 서버(300)는 단말기(100)의 프로세서 또는 게임 서비스 제공서버(200)의 데이터 처리부에 설치된 신경망 시스템일 수 있지만, 이하에서 게임 플레이 서버(300)는, 단말기(100) 또는 게임 서비스 제공서버(200)와 별도의 장치로 설명한다.The game play server 300 may include a separate cloud server or a computing device. In addition, the game play server 300 may be a neural network system installed in the processor of the terminal 100 or the data processing unit of the game service providing server 200. It will be described as a device separate from the service providing server 200.

게임 플레이 서버(300)는 명령들을 저장하는 적어도 하나의 메모리(301), 적어도 하나의 프로세서(302) 및 통신부(303)를 포함할 수 있다. The game play server 300 may include at least one memory 301 storing instructions, at least one processor 302 and a communication unit 303 .

게임 플레이 서버(300)는 고스톱 규칙에 따라 스스로 학습하는 딥러닝 모델인 동작 모델을 구축하고 단말기(100)의 유저와 대전을 할 수 있는 인공지능 컴퓨터로써 자신의 턴에서 대전에서 이길 수 있도록 고스톱 게임에서의 액션을 수행할 수 있다. 게임 플레이 서버(300)가 동작 모델로 트레이닝하는 자세한 설명은 도 2 내지 도 9의 동작 모델에 관한 설명을 따른다.The game play server 300 builds an action model, which is a deep learning model that learns by itself according to the Go-Stop rules, and is an artificial intelligence computer that can compete with the user of the terminal 100. Actions can be performed in A detailed description of how the game play server 300 trains the motion model follows the description of the motion model of FIGS. 2 to 9 .

게임 플레이 서버(300)의 메모리(301)는 게임 플레이 서버(300)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 게임 플레이 서버(300)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(302)로 하여금 동작들을 수행하게 하기 위해 프로세서(302)에 의해 실행 가능하고, 동작들은 동작 모델 학습(트레이닝) 동작, 액션 정보 송수신, 액션 준비 시간 수신, 게임 시간 정보 수신 및 각종 전송 동작을 포함할 수 있다. The memory 301 of the game play server 300 stores a plurality of application programs or applications running in the game play server 300, data for the operation of the game play server 300, and instructions. can be saved Instructions are executable by the processor 302 to cause the processor 302 to perform operations, which include operation model learning (training) operations, action information transmission and reception, action preparation time reception, game time information reception, and various transmissions. Actions may be included.

또한, 메모리(301)는 딥러닝 모델인 동작 모델을 저장할 수 있다. 또한, 메모리(301)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(301)는 인터넷(internet)상에서 상기 메모리(301)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.Also, the memory 301 may store an operation model that is a deep learning model. In addition, the memory 301 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in terms of hardware, and the memory 301 performs the storage function of the memory 301 on the Internet. It can also be a web storage that performs.

게임 플레이 서버(300)의 프로세서(302)는 메모리(302)에 저장된 동작 모델을 독출하여, 구축된 신경망 시스템에 따라서 하기 기술하는 동작 모델 학습 및 고스톱 게임에서의 액션을 수행하게 된다. The processor 302 of the game play server 300 reads the motion model stored in the memory 302, and learns the motion model described below and performs actions in the GoStop game according to the built neural network system.

실시예에 따라서 프로세서(302)는, 전체 유닛들을 제어하는 메인 프로세서와, 동작 모델에 따라 신경망 구동 시 필요한 대용량의 연산을 처리하는 복수의 그래픽 프로세서(Graphics Processing Unit, GPU)를 포함하도록 구성될 수 있다. Depending on the embodiment, the processor 302 may be configured to include a main processor that controls all units and a plurality of graphics processing units (GPUs) that process large-capacity calculations required for driving a neural network according to an operation model. have.

게임 플레이 서버(300)는 통신부(303)를 통해 네트워크(400)를 경유하여 게임 서비스 제공서버(200)와 통신을 수행할 수 있다. The game play server 300 may communicate with the game service providing server 200 via the network 400 through the communication unit 303 .

<동작 모델><action model>

도 2는 본 발명의 실시예에 따른 게임 플레이 서버(300)의 동작 모델 구조를 설명하기 위한 도면의 일례이다. 2 is an example of a diagram for explaining the operation model structure of the game play server 300 according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시예에 따른 동작 모델은 게임 플레이 서버(300)의 딥러닝 모델로서, 상태 판단부(310), 시뮬레이션부(320), 액션 결정부(330), 딥러닝 신경망(340) 및 탐색부(350)를 포함할 수 있다.Referring to FIG. 2 , the operation model according to an embodiment of the present invention is a deep learning model of the game play server 300, and includes a state determination unit 310, a simulation unit 320, an action determination unit 330, and a deep learning A neural network 340 and a search unit 350 may be included.

자세히, 동작 모델은 시뮬레이션부(320), 딥러닝 신경망(340) 및 탐색부(350)를 포함하는 MCTS 모듈(390)과, 상태 판단부(310) 및 액션 결정부(330)를 이용하여 대전에서 이기기 위한 액션(action)을 결정하는 모델로 학습할 수 있다.In detail, the action model is played using the MCTS module 390 including the simulation unit 320, the deep learning neural network 340 and the search unit 350, the state determination unit 310 and the action determination unit 330. It can be learned as a model that determines the action to win in .

구체적으로, 탐색부(350)는 딥러닝 신경망(340)의 가이드에 따라서 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 동작을 수행할 수 있다. MCTS는 모종의　의사 결정을 위한 체험적　탐색 알고리즘이다. 즉, 탐색부(350)는 딥러닝 신경망(340)이 제공하는 이동 확률값(p) 및/또는 가치값(V)에 기초하여 MCTS를 수행할 수 있다. Specifically, the search unit 350 may perform a Monte Carlo Tree Search (MCTS) operation according to the guide of the deep learning neural network 340 . MCTS is a heuristic search algorithm for some kind of decision-making. That is, the search unit 350 may perform MCTS based on the movement probability value (p) and/or value value (V) provided by the deep learning neural network 340 .

일 예로, 딥러닝 신경망(340)에 의해 가이드된 탐색부(350)는 MCTS를 수행하여 방문 수(Visit count)에 따른 액션들에 대한 확률분포값인 탐색 확률값(

)을 출력할 수 있다. 시뮬레이션부(320)는 탐색 확률값(

)에 따라 스스로 고스톱 대전을 할 수 있다. 시뮬레이션부(320)는 게임의 승패가 결정되는 시점까지 스스로 고스톱 대전을 진행하고, 자가 대전(즉, self-play)이 종료되면 고스톱판 상태(S), 탐색 확률값(

), 자가 플레이 가치값(z)을 딥러닝 신경망(340)을 학습시키기 위한 정답 데이터로서 제공할 수 있다. 여기서 고스톱판 상태(S)는 고스톱판에 고스톱 패가 놓여진 상태 등을 포함하는 게임 진행에 따른 데이터들의 집합일 수 있다. 자가 플레이 가치값(z)은 고스톱판 상태(S)에서 자가 대전을 하였을 때 승률 값이다. 또한, 시뮬레이션부(320)는 자가 대전의 결과로 승률추정값(W), 점수추정값(G) 및/또는 족보 성취확률값(L)을 액션 결정부(330)로 제공할 수 있다. For example, the search unit 350 guided by the deep learning neural network 340 performs MCTS to search probability values (probability distribution values for actions according to the number of visits) (

) can be output. The simulation unit 320 provides a search probability value (

), you can play Go-Stop by yourself. The simulation unit 320 proceeds with the Go-Stop game by itself until the game's win or loss is decided, and when the self-play (ie, self-play) ends, the Go-Stop board state (S), the search probability value (

), and the self-play value value z may be provided as correct answer data for learning the deep learning neural network 340 . Here, the high-stop board state (S) may be a set of data according to the progress of the game including a state in which a high-stop hand is placed on the high-stop board. The self-play value value (z) is a win rate value when a self-play is played in the high-stop board state (S). In addition, the simulation unit 320 may provide an odds estimation value (W), a score estimation value (G), and/or a genealogy achievement probability value (L) to the action determination unit 330 as a result of the self-play.

딥러닝 신경망(340)은 이동 확률값(p)과 가치값(V)을 출력할 수 있다. 이동 확률값(p)은 고스톱판 상태(S)에 따라서 어떠한 액션을 수행하는 것이 게임을 이길 수 있는 좋은 액션인지를 수치로 나타낸 확률분포값일 수 있다. 가치값(V)은 현재 플레이어가 직면한 상태(State)에서의 종합적인 승률값을 나타낸다. 여기서, 상기 종합적인 승률값이란, 플레이어가 수행할 수 있는 적어도 하나 이상의 액션과, 상기 적어도 하나 이상의 액션이 수행됨에 따른 미래의 상황들을 종합적으로 고려하여 해당 플레이어의 승률을 단일 값으로 나타낸 데이터를 의미할 수 있다. 예를 들어, 이동 확률값(p)이 높은 액션이 좋은 액션일 수 있다. 딥러닝 신경망(340)은 이동 확률값(p)이 탐색 확률값(

)과 동일해지도록 트레이닝되고, 가치값(V)이 자가 플레이 가치값(z)과 동일해지도록 트레이닝될 수 있다. 이후 트레이닝된 딥러닝 신경망(340)은 탐색부(350)를 가이드하고, 탐색부(350)는 이전 탐색 확률값(

)보다 더 좋은 액션을 찾도록 액션 준비 시간 동안 MCTS를 진행하여 새로운 탐색 확률값(

)을 출력하게 한다. 예를 들어, 액션 준비 시간은 MCTS 진행 시간에 따라 평균 액션 준비 시간, 제1 액션 준비 시간 및 제2 액션 준비 시간 중 어느 하나의 액션 준비 시간을 따를 수 있다. 액션 준비 시간은 기본적으로 평균 액션 준비 시간으로 설정되어 있을 수 있다. The deep learning neural network 340 may output a movement probability value (p) and a value value (V). The movement probability value p may be a probability distribution value representing numerically which action to perform according to the high-stop board state S is a good action to win the game. The value value (V) represents the overall win rate value in the current player's state (State). Here, the comprehensive win rate value means data representing the win rate of the corresponding player as a single value by comprehensively considering at least one action that a player can perform and future situations as the at least one or more actions are performed. can do. For example, an action having a high movement probability value p may be a good action. The deep learning neural network 340 determines that the movement probability value (p) is the search probability value (

), and the value (V) may be trained to be equal to the self-play value (z). Then, the trained deep learning neural network 340 guides the search unit 350, and the search unit 350 uses the previous search probability value (

) to find a better action than the new search probability value (

) to output. For example, the action preparation time may follow one action preparation time among an average action preparation time, a first action preparation time, and a second action preparation time according to the MCTS progress time. The action preparation time may be basically set to an average action preparation time.

시뮬레이션부(320)는 새로운 탐색 확률값(

)에 기초하여 고스톱판 상태(S)에 따른 새로운 자가 플레이 가치값(z)을 출력하고 고스톱판 상태(S), 새로운 탐색 확률값(

), 새로운 자가 플레이 가치값(z)을 딥러닝 신경망(340)에 제공할 수 있다. 딥러닝 신경망(340)은 이동 확률값(p)과 가치값(V)이 새로운 탐색 확률값(

)과 새로운 자가 플레이 가치값(z)으로 출력되도록 다시 트레이닝될 수 있다. The simulation unit 320 provides a new search probability value (

) Based on the high stop board state (S), a new self-play value value (z) is output, and the high stop board state (S), a new search probability value (

), the new self-play value value z may be provided to the deep learning neural network 340 . The deep learning neural network 340 determines that the movement probability value (p) and the value value (V) are a new search probability value (

) and a new self play value (z) can be trained again.

즉, 동작 모델은 이러한 과정을 반복하여 딥러닝 신경망(340)이 대전에서 이기기 위한 더 좋은 액션을 찾도록 트레이닝 될 수 있다. 일 예로, 동작 모델은 액션 손실(l)을 이용할 수 있다. 액션 손실(l)은 수학식 1과 같다.That is, the action model can be trained to find a better action for winning the match by repeating this process for the deep learning neural network 340 . As an example, the action model may use action loss (l). The action loss (l) is shown in Equation 1.

[수학식 1][Equation 1]

는 신경망의 파라미터이고, c는 매우 작은 상수이다.

is a parameter of the neural network, and c is a very small constant.

수학식 1의 액션 손실(l)에서 z와 v가 같아 지도록 하는 것은 평균 제곱 손실(mean square loss) 텀에 해당되고,

와 p가 같아 지도록 하는 것은 크로스 엔트로피 손실(cross entropy loss) 텀에 해당되고,

에 c를 곱하는 것은 정규화 텀으로 오버피팅(overfitting)을 방지하기 위한 것이다.Making z and v equal in the action loss (l) of Equation 1 corresponds to the mean square loss term,

and p are equal to the cross entropy loss term,

Multiplying by c is to prevent overfitting with the regularization term.

한편, 딥러닝 신경망(340)은 신경망 구조로 구성될 수 있다. 일 예로, 딥러닝 신경망(340)은 한 개의 컨볼루션(convolution) 블록과 19개의 레지듀얼(residual) 블록으로 구성될 수 있다. 컨볼루션 블록은 3X3 컨볼루션 레이어가 여러개 중첩된 형태일 있다. 하나의 레지듀얼 블록은 3X3 컨볼루션 레이어가 여러개 중첩되고 스킵 커넥션을 포함한 형태일 수 있다. 스킵 커넥션은 소정의 레이어의 입력이 해당 레이어의 출력값과 합하여서 출력되어 다른 레이어에 입력되는 구조이다. Meanwhile, the deep learning neural network 340 may have a neural network structure. For example, the deep learning neural network 340 may include one convolution block and 19 residual blocks. A convolution block may have a form in which several 3X3 convolution layers are overlapped. One residual block may have a form in which several 3X3 convolutional layers are overlapped and a skip connection is included. A skip connection is a structure in which an input of a predetermined layer is outputted by summing an output value of a corresponding layer and inputted to another layer.

도 3은 본 발명의 실시예에 따른 동작 모델이 탐색부(350)의 파이프 라인에 따라서 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS)를 수행하는 과정을 설명하기 위한 도면의 일례이다. 3 is an example of a diagram for explaining a process of performing a Monte Carlo Tree Search (MCTS) according to the pipeline of the search unit 350 by the operating model according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 실시예에 따른 동작 모델은, 시뮬레이션부(320), 딥러닝 신경망(340) 및 탐색부(350)를 포함하는 MCTS 모듈(390)을 이용하여 몬테 카를로 트리 서치(MCTS)를 수행할 수 있다. Referring to FIG. 3 , an operation model according to an embodiment of the present invention performs a Monte Carlo tree search using an MCTS module 390 including a simulation unit 320, a deep learning neural network 340, and a search unit 350. (MCTS).

자세히, 동작 모델은 선택 과정(a)을 통하여 현재 제1 고스톱판 상태(S1)에서 MCTS를 통해 탐색하지 않은 가지 중 활동 함수(Q)와 신뢰값(U)이 높은 액션을 가지는 제2 고스톱판 상태(S1-2)를 선택한다. 활동 함수(Q)는 해당 가지를 지날 때마다 산출된 가치값(V)들의 평균값이다. 신뢰값(U)은 해당 가지를 지나는 방문 횟수(N)에 반비례하고 이동 확률값(p)에 비례한다. In detail, the action model is a second high-stop plate having an action having a high activity function (Q) and a high confidence value (U) among branches not searched through MCTS in the current first high-stop plate state (S1) through the selection process (a). State (S1-2) is selected. The activity function (Q) is the average value of value values (V) calculated every time a corresponding branch is passed. The confidence value (U) is inversely proportional to the number of visits (N) passing through the corresponding branch and proportional to the movement probability value (p).

동작 모델은 확장과 평가 과정(b)을 통하여 선택된 액션에서의 제3 고스톱판 상태(S1-2-1)로 확장하고 이동 확률값(p)을 산출할 수 있다. The motion model may be expanded to the third high stop board state (S1-2-1) in the selected action through the expansion and evaluation process (b) and calculate a movement probability value (p).

동작 모델은 상기 확장된 제3 고스톱판 상태(S1-2-1)의 가치값(V)을 산출하고 백업 과정(c)을 통하여 지나온 가지들의 활동 함수(Q), 방문 횟수(N), 이동 확률값(p)을 저장할 수 있다. The motion model calculates the value (V) of the extended third high-stop board state (S1-2-1), and the activity function (Q), number of visits (N), and movement of branches passed through the backup process (c). A probability value (p) can be stored.

동작 모델은 액션 준비 시간 동안 선택(a), 확장 및 평가(b), 백업(c) 과정을 반복하고 각 액션에 대한 방문 횟수(N)를 이용하여 확률 분포를 만들어서 탐색 확률값(

)을 출력할 수 있다. 동작 모델은 액션들 중 가장 높은 탐색 확률값(

)을 가지는 액션을 검출할 수 있다. The action model repeats the process of selection (a), expansion and evaluation (b), and backup (c) during the action preparation time, and creates a probability distribution using the number of visits (N) for each action, resulting in a search probability value (

) can be output. The action model has the highest search probability value among actions (

) can be detected.

한편, 동작 모델의 상태 판단부(310)는 소정의 고스톱판 상태(S)를 입력으로 하고 상기 입력된 고스톱판 상태(S)에 대한 상태 판단정보(C)를 출력으로 할 수 있다. 여기서, 상태 판단정보(C)는 특정 고스톱판 상태(S)에서 수행 가능한 액션을 검출하기 위한 기준이 되는 요소인 활성화 상태를 판단한 정보일 수 있다. 즉, 활성화 상태는 특정 고스톱판 상태(S)에서 플레이어가 수행 가능한 적어도 하나의 액션을 결정할 수 있다. 이에 대한 자세한 설명은 도 6 및 도 7의 동작 모델에 관한 설명을 따른다. Meanwhile, the state determination unit 310 of the motion model may receive a predetermined high stop plate state (S) as an input and output state determination information (C) for the input high stop plate state (S). Here, the state determination information (C) may be information that determines the activation state, which is a criterion for detecting an action that can be performed in a specific high-stop board state (S). That is, the activation state may determine at least one action that can be performed by the player in the specific high-stop state (S). A detailed description thereof follows the description of the operation model of FIGS. 6 and 7 .

또한, 이러한 상태 판단부(310)는, 게임 서비스 제공서버(200)가 제공하는 트레이닝 데이터 셋을 기초로 소정의 고스톱판 상태(S)에 대한 상태 판단정보(C)를 출력하도록 상태판단 학습이 수행될 수 있다. 이에 대한 자세한 설명은 도 4 내지 도 9의 동작 모델에 관한 설명을 따른다. In addition, the state determination unit 310 performs state determination learning to output state determination information (C) for a predetermined high-stop board state (S) based on the training data set provided by the game service providing server 200. can be performed A detailed description thereof follows the description of the operation model of FIGS. 4 to 9 .

다른 한편, 동작 모델의 액션 결정부(330)는 MCTS 모듈(390)과 연동하여 소정의 고스톱판 상태(S)에 대한 최선의 액션인 최적 액션(A)을 MCTS를 기초로 결정 및 수행할 수 있다. 여기서, 최적 액션(A)은 소정의 고스톱판 상태(S)에서 가장 높은 승률과 점수를 가질 수 있다고 판단된 소정의 액션일 수 있다. On the other hand, the action determination unit 330 of the action model may determine and perform the optimal action (A), which is the best action for a predetermined high-stop board state (S), based on the MCTS, in conjunction with the MCTS module 390. there is. Here, the optimal action (A) may be a predetermined action determined to have the highest win rate and score in a predetermined high-stop board state (S).

자세히, 액션 결정부(330)는 게임 서비스 제공서버(200)로부터 수신한 고스톱판 상태(S)와, 시뮬레이션부(320)로부터 수신한 승률추정값(W), 점수추정값(G) 및/또는 족보 성취확률값(L)과, 상태 판단부(310)로부터 수신한 상태 판단정보(C)를 기초로 MCTS를 수행하게 할 수 있고, 이를 통해 상기 고스톱판 상태(S)에 대한 최적 액션(A)을 결정 및 수행할 수 있다. In detail, the action decision unit 330 may include the high-stop board state (S) received from the game service providing server 200, the odds estimate value (W), the score estimate value (G) and/or the genealogy received from the simulation unit 320. Based on the achievement probability value (L) and the state determination information (C) received from the state determination unit 310, MCTS can be performed, and through this, the optimal action (A) for the high-stop state (S) can be determined. can be determined and carried out.

구체적으로, 액션 결정부(330)는 실시예에 따른 각각의 활성화 상태 별로 가능한 적어도 하나의 액션 중 어떠한 액션을 수행할지를 결정할 수 있다. 자세히, 액션 결정부(330)는 소정의 고스톱판 상태(S)에 대하여 판단된 현재의 활성화 상태를 기초로, 해당하는 활성화 상태에서 가능한 적어도 하나의 액션 중 어떠한 액션을 수행하는 것이 가장 높은 승률과 점수를 가질 수 있는 액션인지를 판단할 수 있다. Specifically, the action determination unit 330 may determine which action to perform from among at least one possible action for each activation state according to an embodiment. In detail, the action determination unit 330 determines which of the at least one action possible in the corresponding activation state has the highest odds rate and You can determine whether it is an action that can have points.

예를 들면, 액션 결정부(330)는 소정의 고스톱판 상태(S)에 대하여 판단된 활성화 상태가 내 손패 중에서 어떠한 패를 낼지 결정하는 상태인 손패 상태인 경우, 해당하는 손패 상태에서 가능한 액션(예컨대, 내 손패가 3장인 경우 제1 손패를 내는 액션, 제2 손패를 내는 액션 또는 제3 손패를 내는 액션 등)들 중 어떠한 액션을 수행하는 것이 가장 높은 승률과 점수를 가질 수 있는 액션인지 판단할 수 있다. For example, the action determination unit 330, when the activation state determined for a predetermined high-stop board state (S) is a hand state in which a hand is to be played from among my hands, actions possible in the corresponding hand state ( For example, if my hand has 3 cards, it is determined which of the actions to perform is the action with the highest winning rate and score (e.g., an action to play a first hand, an action to play a second hand, or an action to play a third hand) can do.

또한, 이러한 액션 판단부는, 게임 서비스 제공서버(200)가 제공하는 트레이닝 데이터 셋을 기초로 소정의 고스톱판 상태(S)에 대한 최적 액션(A)을 결정하도록 학습될 수 있다. 이에 대한 자세한 설명은 도 4 내지 도 9의 동작 모델에 관한 설명을 따른다. In addition, the action determining unit may be trained to determine an optimal action (A) for a predetermined high-stop board state (S) based on a training data set provided by the game service providing server 200 . A detailed description thereof follows the description of the operation model of FIGS. 4 to 9 .

- 일 실시예에 따른 게임 플레이 서버 - 동작 모델 학습 - Game play server according to an embodiment - Operation model learning

도 4는 본 발명의 실시예에 따른 게임 플레이 서버(300)의 동작 모델을 학습하는 방법을 설명하기 위한 흐름도이고, 도 5는 본 발명의 실시예에 따른 게임 플레이 서버(300)의 동작 모델을 학습하는 방법을 설명하기 위한 개념도이다. 4 is a flowchart for explaining a method for learning an operation model of the game play server 300 according to an embodiment of the present invention, and FIG. 5 shows an operation model of the game play server 300 according to an embodiment of the present invention. It is a conceptual diagram to explain how to learn.

도 4 및 도 5를 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스는, 게임 플레이 서버(300)가 특정 고스톱판 상태(S)에 대한 최적 액션(A)을 수행하도록 동작 모델을 학습시킬 수 있다. 구체적으로, 게임 플레이 서버(300)는 특정 고스톱판 상태(S)에 대한 활성화 상태를 판단하도록 동작 모델을 학습시킬 수 있다. 또한, 게임 플레이 서버(300)는 특정 고스톱판 상태(S)에 기반한 승률추정값(W)을 판단하도록 동작 모델을 학습시킬 수 있다. 또한, 게임 플레이 서버(300)는 특정 고스톱판 상태(S)에 기반한 점수추정값(G)을 판단하도록 동작 모델을 학습시킬 수 있다. 또한, 게임 플레이 서버(300)는 특정 고스톱판 상태(S)에 기반한 족보 성취확률값(L)을 판단하도록 동작 모델을 학습시킬 수 있다. 4 and 5, the deep learning-based go-stop game service according to an embodiment of the present invention operates so that the game play server 300 performs an optimal action (A) for a specific go-stop board state (S). model can be trained. Specifically, the game play server 300 may train an action model to determine an activation state for a specific high-stop board state (S). In addition, the game play server 300 may train an operating model to determine an odds ratio estimation value (W) based on a specific high-stop board state (S). In addition, the game play server 300 may train an operating model to determine a score estimation value (G) based on a specific high-stop board state (S). In addition, the game play server 300 may train an operating model to determine a genealogy achievement probability value (L) based on a specific high-stop board state (S).

자세히, 도 4를 더 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 트레이닝 데이터 셋(Training data set)을 수신하는 단계(S101)를 포함할 수 있다. In detail, further referring to FIG. 4, the deep learning-based GoStop game service method according to an embodiment of the present invention includes a step (S101) of receiving a training data set by the game play server 300. can

보다 상세히, 게임 플레이 서버(300)는, 게임 서비스 제공서버(200)와 연동하여 동작 모델의 학습을 위한 트레이닝 데이터 셋을 수신할 수 있다. 여기서, 트레이닝 데이터 셋의 각 트레이닝 데이터는, 고스톱 게임에서의 플레이 기록 데이터(이하, 플레이 데이터)일 수 있다. 플레이 데이터는, 대전 시작의 첫 액션 정보인 제1 액션부터 대전이 종료되는 최종 액션까지의 정보를 모두 포함할 수 있다. 즉, 복수의 플레이 데이터는 액션에 관한 히스토리 정보를 포함할 수 있다. 또한, 플레이 데이터는, 고스톱 게임 서비스에서 수행한 액션 순서에 따른 각각의 고스톱판 상태(S)를 포함할 수 있다. 여기서, 고스톱판 상태(S)는, 고스톱판에 고스톱 패가 놓여진 상태 등을 포함하는 게임 진행에 따른 데이터들의 집합으로서, 실시예에서 각각의 고스톱판 상태(S)에서의 플레이어의 손패, 플레이어의 먹은패, 상대 플레이어의 먹은패, 바닥에 깔린 패(즉, 공개 패), 플레이어가 승리하기 위해 갱신해야 하는 점수, 상대 플레이어가 승리하기 위해 갱신해야 하는 점수, 플레이어가 손패 상태일 때 선택한 패, 상대 플레이어가 손패 상태일 때 선택한 패, 뻑 유발정보, 보너스 패 인질 종류, 고(go) 실행횟수, 선(先) 플레이어 정보 및/또는 9열끗 피 사용정보 등을 포함할 수 있다. In more detail, the game play server 300 may receive a training data set for learning a motion model in conjunction with the game service providing server 200 . Here, each training data of the training data set may be play record data (hereinafter, play data) in a high-stop game. The play data may include all information from a first action, which is information about a first action at the start of a competitive battle, to a final action at which the competitive game ends. That is, the plurality of play data may include action-related history information. In addition, the play data may include each GoStop version state (S) according to the order of actions performed by the GoStop game service. Here, the go-stop board state (S) is a set of data according to the game progress, including the state in which the go-stop hand is placed on the go-stop board, etc. hand, opponent player's eaten hand, floored hand (i.e. open hand), number of points a player must score to win, number of points opponent player must score to win, hand selected by player when in hand, opponent It may include a hand selected when the player is in a hand state, knock-inducing information, a hostage type of a bonus hand, the number of go runs, previous player information, and/or 9-fold blood usage information.

실시예에 따라서 게임 서비스 제공서버(200)는, 게임 서비스 제공서버(200)와 연동하여 복수의 플레이 데이터를 획득하고, 획득된 플레이 데이터에 대한 입력 특징(즉, 실시예에서 고스톱판 상태(S) 데이터) 추출 프로세스를 수행할 수 있다. According to the embodiment, the game service providing server 200 obtains a plurality of play data in conjunction with the game service providing server 200, and input characteristics for the acquired play data (ie, in the embodiment, the high-stop version state (S ) data) extraction process.

자세히, 게임 서비스 제공서버(200)는 상기 플레이 데이터에 대한 입력 특징 추출 프로세스를 수행하는 입력 특징 추출부를 더 포함할 수 있다. 일 예로, 입력 특징 추출부는 신경망 구조로 되어 있을 수 있으며 일종의 인코더를 포함할 수 있다. In detail, the game service providing server 200 may further include an input feature extraction unit that performs an input feature extraction process on the play data. For example, the input feature extraction unit may have a neural network structure and may include a kind of encoder.

또한, 게임 서비스 제공서버(200)는 상기 입력 특징 추출부를 이용하여 상기 플레이 데이터에 대한 상술된 고스톱판 상태(S) 데이터를 추출할 수 있다. In addition, the game service providing server 200 may extract the above-described high-stop version state (S) data for the play data using the input feature extraction unit.

그리하여 게임 서비스 제공서버(200)는 위와 같이 획득된 플레이 데이터와 고스톱판 상태(S) 정보에 기초하여 동작 모델을 학습시키기 위한 트레이닝 데이터 셋을 구축할 수 있다. Thus, the game service providing server 200 may build a training data set for learning a motion model based on the play data obtained as above and the high-stop board state (S) information.

또한, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 수신된 트레이닝 데이터 셋 기반 지도학습을 수행하는 단계(S103)를 포함할 수 있다. In addition, the deep learning-based GoStop game service method according to an embodiment of the present invention may include a step S103 in which the game play server 300 performs supervised learning based on the received training data set.

자세히, 게임 플레이 서버(300)는 수신된 트레이닝 데이터 셋을 기초로 동작 모델에 대한 지도학습(supervised learning)을 수행할 수 있다. 보다 상세히는, 게임 플레이 서버(300)가 수신된 트레이닝 데이터 셋을 기초로 동작 모델의 상태 판단부(310)를 지도학습 할 수 있다. In detail, the game play server 300 may perform supervised learning on the motion model based on the received training data set. In more detail, the game play server 300 may supervise the state determination unit 310 of the motion model based on the received training data set.

또한, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 지도학습 기반의 상태판단 트레이닝을 수행하는 단계(S105)를 포함할 수 있다. In addition, the deep learning-based GoStop game service method according to an embodiment of the present invention may include a step (S105) of the game play server 300 performing supervised learning-based state judgment training.

도 6은 본 발명의 실시예에 따른 상태판단 트레이닝을 수행하는 방법을 설명하기 위한 개념도이고, 도 7은 본 발명의 실시예에 따른 상태 간 전환 관계를 보여주는 다이어그램의 일례이다. 6 is a conceptual diagram for explaining a method of performing state judgment training according to an embodiment of the present invention, and FIG. 7 is an example of a diagram showing a transition relationship between states according to an embodiment of the present invention.

도 6을 참조하면, 게임 플레이 서버(300)는 동작 모델이 소정의 고스톱판 상태(S)를 입력으로 하고, 해당 고스톱판 상태(S)에 대한 상태 판단정보(C)를 출력하도록 동작 모델의 상태 판단부(310)를 학습시키는 상태판단 트레이닝을 수행할 수 있다. Referring to FIG. 6 , the game play server 300 takes a predetermined high-stop board state (S) as an input and outputs state determination information (C) for the corresponding high-stop board state (S) of the action model. State determination training for learning the state determination unit 310 may be performed.

여기서, 상태 판단정보(C)는 특정 고스톱판 상태(S)에서 수행 가능한 액션을 검출하기 위한 기준이 되는 요소인 활성화 상태를 판단한 정보일 수 있다. 즉, 활성화 상태는 특정 고스톱판 상태(S)에서 플레이어가 수행 가능한 적어도 하나의 액션을 결정할 수 있다. Here, the state determination information (C) may be information that determines the activation state, which is a criterion for detecting an action that can be performed in a specific high-stop board state (S). That is, the activation state may determine at least one action that can be performed by the player in the specific high-stop state (S).

여기서, 활성화 상태는 손패 상태(HAND State), 흔들기 상태(SHAKE State), 먹을패 선택 상태(SELECT State), 술잔위치 선택 상태(SWITCH State), 고스톱 상태(GOSTOP State), 게임종료 상태(END State), 손패 쥐어주기 상태(CHANCE_HAND State) 및 덱 뒤집기 상태(CHANCE_FLIP State)를 포함할 수 있다. 이는 불완전 정보 게임에서의 최적화된 형태로 상태들을 구분하기 위한 것이다. Here, the activation states are HAND State, SHAKE State, SELECT State, SWITCH State, GOSTOP State, END State ), hand holding state (CHANCE_HAND State) and deck flipping state (CHANCE_FLIP State). This is to classify states in an optimized form in an incomplete information game.

이때, 상기 손패 상태(HAND State)는, 플레이어가 가지는 내 손패 중에서 어떤 패를 낼지 결정하는 상태로서, 플레이어의 손패 중 어느 하나를 고스톱 게임 그라운드(이하, 바닥)에 내는 액션이 가능한 상태일 수 있다. 예를 들어, 플레이어가 3장의 패를 가지고 있는 경우, 제1 손패를 내는 액션, 제2 손패를 내는 액션 또는 제3 손패를 내는 액션 등이 가능할 수 있다. 즉, 손패 상태에서는 내 손패가 바닥 상에 내어지는 선택 패(공개 패)로 전환되게 할 수 있다. At this time, the hand state (HAND State) is a state in which the player decides which hand to play out of his or her hand, and an action of playing one of the player's hands on the go-stop game ground (hereinafter referred to as the floor) may be possible. . For example, when a player has three hands, an action of playing a first hand, an action of playing a second hand, or an action of playing a third hand may be possible. That is, in the hand mode, my hand can be converted to a selected hand (public hand) played on the floor.

또한, 상기 흔들기 상태(SHAKE State)는, 고스톱 규칙에 따라서 흔들기가 가능한 경우 흔들기를 수행할지 여부를 결정하는 상태로서, 흔들기를 수행하는 액션 또는 흔들기를 수행하지 않는 액션이 가능한 상태일 수 있다. Further, the SHAKE State is a state for determining whether or not to perform shaking when shaking is possible according to the go-stop rule, and may be a state in which an action to perform shaking or an action not to perform shaking is possible.

또한, 상기 먹을패 선택 상태(SELECT State)는, 내 손패 중에서 바닥에 내어지는 선택 패 또는 덱(deck)이 포함하는 패(이하, 덱 패) 중 하나가 뒤집어져 바닥에 내어지는 패에 의하여 바닥에 존재하는 패(이하, 바닥패) 중 적어도 하나를 취득할 수 있는 경우 어떠한 바닥패를 취득할지 결정하는 상태로서, 바닥에 놓인 적어도 하나 이상의 바닥패 중에서 적어도 하나를 선택하여 취득하는 액션이 가능한 상태일 수 있다. In addition, in the selection state (SELECT State), one of the selected hand or deck included in the deck (hereinafter referred to as a deck hand) is overturned and placed on the floor. A state in which it is decided which floor tile to acquire when at least one of the tiles (hereinafter referred to as “floor tiles”) existing on the floor is acquired, and an action of selecting and acquiring at least one of at least one floor tile placed on the floor is possible. can be

또한, 상기 술잔위치 선택 상태(SWITCH State)는, 술잔패(즉, 고스톱 패 중 9월의 열끗 패)를 취득한 상태에서 최초로 날 수 있는 경우 술잔패의 위치를 결정하는 상태로서, 술잔패를 피 그룹에 위치시키는 액션(즉, 술잔패를 쌍피로 사용하는 액션) 또는 술잔패를 열끗 그룹에 위치시키는 액션(즉, 술잔패를 열끗패로 사용하는 액션)이 가능한 상태일 수 있다. In addition, the wine cup position selection state (SWITCH State) is a state in which the location of the wine cup is determined when the wine cup can fly for the first time in a state in which a wine cup (i.e., a bad hand in September among the high-stop hands) is acquired, and the cup is avoided. An action of placing a drinking cup in a group (ie, an action of using a drinking cup as a pair) or an action of placing a drinking cup in a group (ie, an action of using a drinking cup as a pair) may be in a state in which it is possible.

또한, 상기 고스톱 상태(GOSTOP State)는, 점수가 나서 고(go) 또는 스톱(stop)을 선언할 수 있는 경우 고(go)를 할지 또는 스톱(stop)을 할지 결정하는 상태로서, 고(go)를 하는 액션과 스톱(stop)을 하는 액션이 가능한 상태일 수 있다. In addition, the GOSTOP State is a state in which it is determined whether to go or stop when a score can be declared to go or stop. ) and an action to stop may be in a possible state.

또한, 상기 게임종료 상태(END State)는, 일 플레이어가 스톱(stop)을 선언하거나 또는 승자가 없는 나가리 판이 되어서 게임이 종료되는 상태로서, 게임 종료를 수행하는 액션이 가능한 상태일 수 있다. In addition, the game end state (END State) is a state in which the game ends when one player declares a stop or the game ends when there is no winner, and may be a state in which an action to end the game is possible.

또한, 손패 쥐어주기 상태(CHANCE_HAND State)는, 상대 플레이어의 손패에 어떠한 패를 손패로 쥐어줄지 결정하는 상태로서, 해당하는 고스톱 게임에서의 공개 패를 제외한 나머지 비공개 패 중 적어도 하나를 상대 플레이어의 손패로 대체시키는 액션이 가능한 상태일 수 있다. In addition, the hand handing state (CHANCE_HAND State) is a state in which it is determined which hand is to be handed to the opposing player's hand, and at least one of the remaining private hands excluding the open hand in the corresponding GoStop game is placed in the opposing player's hand. It may be in a state in which an action to replace with is possible.

또한, 상기 덱 뒤집기 상태(CHANCE_FLIP State)는, 덱(deck) 패를 뒤집거나 또는 보너스패를 내고 새로운 손패를 가져오는 경우 어떠한 패를 내 손패에 쥐어줄지 결정하는 상태로서, 고스톱 게임에서의 비공개 패 중 어느 하나를 내 손패에 쥐어주는 액션이 가능한 상태이다. In addition, the deck flip state (CHANCE_FLIP State) is a state that determines which hand to put in my hand when a deck hand is overturned or a bonus hand is paid and a new hand is brought. It is a state in which the action of putting one of them into my hand is possible.

한편, 상술된 바와 같은 활성화 상태들은 고스톱 게임 규칙, 활성화 상태 간 연관관계나 컨텍스트(context) 등에 따라서 도 7과 같은 상호 간 전환 관계를 가질 수 있다. Meanwhile, the above-described activation states may have a mutual switching relationship as shown in FIG. 7 according to the go-stop game rule, a relationship between activation states or a context, and the like.

다시 돌아와서, 게임 플레이 서버(300)는 상태 판단부(310)에 고스톱 게임에서 가능한 복수의 활성화 상태 각각에 대한 활성화 여부를 출력하는 복수의 서브 레이어 모듈을 더 포함할 수 있다. 여기서, 활성화 상태 별 서브 레이어 모듈은, 소정의 고스톱판 상태(S)를 입력으로 하여 해당하는 활성화 상태에 대한 활성화 여부를 출력으로 제공할 수 있다. Returning again, the game play server 300 may further include a plurality of sub-layer modules that output to the state determining unit 310 whether or not each of a plurality of activation states possible in the GoStop game is activated. Here, the sub-layer module for each activation state may take a predetermined high stop plate state (S) as an input and provide as an output whether or not the corresponding activation state is activated.

자세히, 게임 플레이 서버(300)는 트레이닝 데이터 셋의 고스톱판 상태(S)에 기반한 입력 데이터(실시예로, 플레이어의 손패, 플레이어의 먹은패, 상대 플레이어의 먹은패, 바닥에 깔린 패(즉, 공개 패), 플레이어가 승리하기 위해 갱신해야 하는 점수, 상대 플레이어가 승리하기 위해 갱신해야 하는 점수, 플레이어가 손패 상태일 때 선택한 패, 상대 플레이어가 손패 상태일 때 선택한 패, 뻑 유발정보, 보너스 패 인질 종류, 고(go) 실행횟수, 선(先) 플레이어 정보 및/또는 9열끗 피 사용정보 등)를 입력으로 하고, 이를 기초로 고스톱 게임에서 가능한 활성화 상태 별 활성화 여부를 출력하도록 상태 판단부(310)를 학습시킬 수 있다. In detail, the game play server 300 inputs data based on the high-stop board state (S) of the training data set (for example, the player's hand, the player's eaten hand, the opposing player's eaten hand, and the floor laid (i.e., open hand), the number of points a player must refresh to win, the number of points an opponent must refresh to win, a hand selected by a player while in hand, a hand selected by opponent player in hand, trigger information, bonus hand Hostage type, go execution count, first player information and/or 9th blood usage information, etc.) as input, and based on this, a state determination unit to output activation status for each possible activation state in the GoStop game ( 310) can be learned.

또한, 게임 플레이 서버(300)는 상기 활성화 상태 별 활성화 여부를 기초로 상기 입력된 고스톱판 상태(S)에 대한 상태 판단정보(C)를 출력하도록 상태 판단부(310)를 학습시킬 수 있다. 즉, 게임 플레이 서버(300)는 상태 판단부(310)가 복수의 활성화 상태 중에서 활성화된 것으로 판단된 즉, 현재 대전 중에 플레이어가 대면하고 있는 활성화 상태를 검출하고, 상기 검출된 활성화 상태를 기초로 해당 고스톱판 상태(S)에 대한 상태 판단정보(C)를 출력하게 할 수 있다. In addition, the game play server 300 may teach the state determination unit 310 to output state determination information (C) for the input high-stop board state (S) based on whether each activation state is activated. That is, the game play server 300 determines that the state determination unit 310 is active among a plurality of activation states, that is, detects the active state facing the player during the current match, and based on the detected activation state State determination information (C) for the corresponding high stop plate state (S) can be output.

따라서, 본 발명의 일 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 고스톱과 같은 불완전 정보 게임의 현재 플레이 상태를 기초로 플레이어가 수행 가능한 적어도 하나의 액션을 결정하는 활성화 상태를 판단할 수 있다. Therefore, an apparatus for providing a deep learning-based incomplete information game service according to an embodiment of the present invention determines an activation state for determining at least one action that a player can perform based on a current play state of an incomplete information game such as GoStop. can

또한, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 수신된 트레이닝 데이터 셋 기반 롤 아웃(Roll-out)을 수행하는 단계(S107)를 포함할 수 있다. In addition, the deep learning-based GoStop game service method according to an embodiment of the present invention may include a step of the game play server 300 performing a roll-out based on the received training data set (S107). .

도 8은 본 발명의 실시예에 따른 롤 아웃 기반 트레이닝을 설명하기 위한 개념도이다. 8 is a conceptual diagram illustrating roll-out based training according to an embodiment of the present invention.

자세히, 도 8을 참조하면, 게임 플레이 서버(300)는 수신된 트레이닝 데이터 셋을 기초로 동작 모델을 학습시킬 수 있다. 또한, 게임 플레이 서버(300)는, 상기 학습된 동작 모델을 이용하여 MCTS 기반의 셀프 플레이를 수행할 수 있다. In detail, referring to FIG. 8 , the game play server 300 may learn a motion model based on the received training data set. In addition, the game play server 300 may perform MCTS-based self-play using the learned motion model.

자세히, 게임 플레이 서버(300)는 상기 수신된 트레이닝 데이터 셋을 기초로 시뮬레이션부(320), 딥러닝 신경망(340) 및 탐색부(350)를 포함하는 MCTS 모듈(390)을 학습시킬 수 있다. In detail, the game play server 300 may train the MCTS module 390 including the simulation unit 320, the deep learning neural network 340, and the search unit 350 based on the received training data set.

또한, 게임 플레이 서버(300)는 학습된 MCTS 모듈(390)을 포함하는 동작 모델에 기초하여 승률추정값(W) 및/또는 점수추정값(G)을 제공할 수 있다. In addition, the game play server 300 may provide an odds ratio estimation value (W) and/or a score estimation value (G) based on an operation model including the learned MCTS module 390 .

도 9는 본 발명의 실시예에 따른 승률 및 점수를 추정하는 방법을 설명하기 위한 도면의 일례이다.9 is an example of a diagram for explaining a method of estimating odds and scores according to an embodiment of the present invention.

도 9를 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 롤 아웃 기반의 승률추정 트레이닝을 수행하는 단계(S109)를 포함할 수 있다. Referring to FIG. 9 , the deep learning-based go-stop game service method according to an embodiment of the present invention may include a step in which the game play server 300 performs roll-out-based win rate estimation training (S109).

자세히, 게임 플레이 서버(300)는 MCTS 모듈(390)을 동작하여 상기 수신된 트레이닝 데이터 셋에 기초한 롤 아웃을 수행할 수 있다. 또한, 게임 플레이 서버(300)는 해당하는 고스톱판 상태(S)에서 가능한 액션들에 대한 승률(즉, 이길 확률)을 추정하도록 동작 모델을 학습시키는 승률추정 트레이닝 수행할 수 있다. In detail, the game play server 300 may operate the MCTS module 390 to perform rollout based on the received training data set. In addition, the game play server 300 may perform win rate estimation training for learning an action model to estimate a win rate (ie, win probability) for possible actions in the corresponding high-stop board state (S).

즉, 게임 플레이 서버(300)는 소정의 고스톱판 상태(S)를 입력으로 하고, 상기 입력된 고스톱판 상태(S)에서 가능한 액션들에 의한 승률을 추정한 정보인 승률추정값(W)을 출력으로 하도록 동작 모델을 학습시킬 수 있다. That is, the game play server 300 receives a predetermined high-stop board state (S) as an input and outputs an odds ratio estimation value (W), which is information obtained by estimating the win rate by actions possible in the input high-stop board state (S). A motion model can be trained to do so.

구체적으로, 게임 플레이 서버(300)는 수신된 트레이닝 데이터 셋을 기초로 동작 모델의 Policy net 및 Value net을 학습시킬 수 있다. 또한, 게임 플레이 서버(300)는, 학습된 동작 모델을 이용하여 소정의 고스톱판 상태(S)를 기초로 수행된 일련의 액션들에 따른 승자를 판단할 수 있다. Specifically, the game play server 300 may learn the Policy net and Value net of the motion model based on the received training data set. In addition, the game play server 300 may determine a winner according to a series of actions performed based on a predetermined high-stop board state S by using the learned action model.

이때, 게임 플레이 서버(300)는, 고(go) 또는 스톱(stop)이 선언되지 않은 상태인 경우 롤 아웃을 더 수행한 미래에 고(go) 또는 스톱(stop)에 먼저 도달하는 플레이어를 승자로 판단할 수 있다. At this time, the game play server 300, if the go (go) or stop (stop) is not declared, the player who performs more roll-out and reaches the go (go) or stop (stop) first in the future is the winner can be judged by

또는, 게임 플레이 서버(300)는, 고(go)를 선언한 상태인 경우 롤 아웃을 더 수행한 미래에 고(go) 또는 스톱(stop)에 먼저 도달하는 플레이어를 승자로 판단할 수 있다. 또는, 게임 플레이 서버(300)는, 스톱(stop)을 선언한 상태인 경우 해당하는 스톱을 선언한 플레이어를 승자로 판단할 수 있다. Alternatively, when the game play server 300 has declared go, the player who further rolls out and reaches a go or stop first in the future may be determined as the winner. Alternatively, when the game play server 300 is in a state in which a stop is declared, the player who declared the corresponding stop may be determined as the winner.

따라서, 본 발명의 일 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 고스톱과 같은 불완전 정보 게임에서 승자를 추정하기 위한 명확한 기준을 제공할 수 있다. Therefore, the deep learning-based incomplete information game service providing apparatus according to an embodiment of the present invention can provide a clear criterion for estimating a winner in an incomplete information game such as GoStop.

또한, 게임 플레이 서버(300)는 위와 같이 승자를 판단하며 수행된 복수의 롤 아웃 횟수(즉, 전체 롤 아웃 횟수)와, 각각의 롤 아웃에서 승자로 판단된 횟수를 토대로 승률을 추정할 수 있다. In addition, the game play server 300 may estimate the winning rate based on the number of roll-outs performed while determining the winner as described above (ie, the total number of roll-outs) and the number of times determined as the winner in each roll-out. .

자세히, 실시예로 게임 플레이 서버(300)는 전체 롤 아웃 횟수 대비 각각의 롤 아웃에서 승자로 판단된 횟수 간의 비율(즉, 롤 아웃에서 승자로 판단된 횟수/전체 롤 아웃 횟수)을 산출할 수 있다. 또한, 게임 플레이 서버(300)는 산출된 비율을 토대로 해당하는 고스톱판 상태(S)를 기초로 수행된 일련의 액션들에 따른 승률추정값(W)을 판단할 수 있다. 이때, 상기 승률추정값(W)은 0~1 범위를 가질 수 있다. In detail, in an embodiment, the game play server 300 may calculate a ratio between the total number of rollouts and the number of times determined as a winner in each rollout (ie, the number of times determined as a winner in rollouts/the total number of rollouts). there is. In addition, the game play server 300 may determine an odds ratio estimation value (W) according to a series of actions performed based on the corresponding go-stop board state (S) based on the calculated ratio. At this time, the odds ratio estimation value (W) may have a range of 0 to 1.

즉, 게임 플레이 서버(300)는 위와 같은 방식으로 소정의 고스톱판 상태(S)를 입력으로 하고, 상기 입력된 고스톱판 상태(S)에서 가능한 액션들에 의한 승률을 추정한 정보인 승률추정값(W)을 출력으로 하도록 동작 모델을 학습시킬 수 있다. That is, the game play server 300 receives a predetermined high-stop board state (S) as an input in the above manner, and an odds ratio estimation value ( A motion model can be trained to output W).

또한, 도 9를 더 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 롤 아웃 기반의 점수추정 트레이닝을 수행하는 단계(S111)를 포함할 수 있다. Further, referring to FIG. 9 , the deep learning-based go-stop game service method according to an embodiment of the present invention may include a step (S111) of the game play server 300 performing roll-out-based score estimation training. have.

자세히, 게임 플레이 서버(300)는 MCTS 모듈(390)을 동작하여 상기 수신된 트레이닝 데이터 셋에 기초한 롤 아웃을 수행할 수 있다. 또한, 게임 플레이 서버(300)는 해당하는 고스톱판 상태(S)에서 가능한 액션 별로 획득 가능한 점수를 추정하도록 동작 모델을 학습시키는 점수추정 트레이닝을 수행할 수 있다. In detail, the game play server 300 may operate the MCTS module 390 to perform rollout based on the received training data set. In addition, the game play server 300 may perform score estimation training for learning an action model to estimate a score obtainable for each possible action in the corresponding high-stop board state (S).

즉, 게임 플레이 서버(300)는 소정의 고스톱판 상태(S)를 입력으로 하고, 상기 입력된 고스톱판 상태(S)에서 가능한 액션들에 의한 점수를 추정한 정보인 점수추정값(G)을 출력으로 하도록 동작 모델을 학습시킬 수 있다. That is, the game play server 300 receives a predetermined high-stop board state (S) as an input and outputs a score estimation value (G), which is information obtained by estimating scores by actions possible in the input high-stop board state (S). A motion model can be trained to do so.

보다 상세히, 게임 플레이 서버(300)는 수신된 트레이닝 데이터 셋에 기초한 롤 아웃을 수행하여, 소정의 고스톱판 상태(S)를 기초로 수행된 일련의 액션들에 따른 점수를 판단할 수 있다. In more detail, the game play server 300 may determine a score according to a series of actions performed based on a predetermined high-stop board state S by performing roll-out based on the received training data set.

이때, 게임 플레이 서버(300)는 고(go) 또는 스톱(stop)이 선언되지 않은 상태인 경우 롤 아웃을 더 수행한 미래에 스톱(stop)을 선언하면 획득할 수 있는 점수를 점수추정값(G)으로 결정할 수 있다. At this time, when the game play server 300 has not declared a go or stop, the score that can be obtained by declaring a stop in the future after further roll-out is a score estimate value (G ) can be determined.

또는, 게임 플레이 서버(300)는 고(go)를 선언한 상태인 경우 롤 아웃을 더 수행한 미래에 획득할 수 있는 점수를 점수추정값(G)으로 결정할 수 있다. 또는, 게임 플레이 서버(300)는 스톱(stop)을 선언한 상태인 경우 해당 스톱을 선언한 시점에 획득할 수 있는 점수를 점수추정값(G)으로 결정할 수 있다. Alternatively, the game play server 300 may determine a score that can be obtained in the future when roll-out is further performed as the score estimation value (G) when the game play server 300 has declared go. Alternatively, when the game play server 300 is in a state in which a stop is declared, a score that can be obtained at the time when the stop is declared may be determined as the score estimation value (G).

따라서, 본 발명의 일 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 고스톱과 같은 불완전 정보 게임에서 점수를 추정하기 위한 명확한 기준을 제공할 수 있다. Therefore, the deep learning-based incomplete information game service providing apparatus according to an embodiment of the present invention can provide a clear criterion for estimating scores in incomplete information games such as GoStop.

다만, 일반적으로 고스톱 게임에서의 점수는, '0~1' 범위를 가지는 승률에 대비하여 보다 큰 범위(예컨대, '-1000~1000' 등)를 가지고 있다. 그러므로 게임 플레이 서버(300)는 보다 효과적인 점수추정 트레이닝을 위하여 스코어 클리핑 기법(Score Clipping method) 또는 리그레션 투 클래시피케이션 기법(Regression to Classification method)을 사용하여 점수추정 트레이닝을 수행할 수 있다. 여기서, 스코어 클리핑 기법은 점수추정값(G)의 범위를 소정의 범위(예컨대, '-200~200' 등)로 제한하여 상술된 롤 아웃 기반의 점수추정 트레이닝을 수행하는 방식일 수 있다. 또한 리그레션 투 클래시피케이션 기법은 회귀 모델(Regression model)이 아닌 분류 모델(Classification model)을 기반으로 롤 아웃을 통한 점수추정 트레이닝을 수행하는 방식일 수 있다. 자세히, 리그레션 투 클래시피케이션 기법은 기설정된 가능한 점수추정값(G)을 원-핫 벡터 방식(One-hot Vector method)으로 표현(예컨대, '[-200, -199, …, 199, 200]' 등)하고, 표현된 각각의 점수추정값(G)에 대한 확률분포를 획득하여, 가장 큰 확률값을 가지는 점수추정값(G)을 선택하는 방식으로 상기 롤 아웃 기반의 점수추정 트레이닝을 수행할 수 있다. However, in general, the score in the GoStop game has a larger range (eg, '-1000 to 1000') compared to the winning rate having a range of '0 to 1'. Therefore, the game play server 300 may perform score estimation training using a score clipping method or a regression to classification method for more effective score estimation training. Here, the score clipping technique may be a method of limiting the range of the score estimation value G to a predetermined range (eg, '-200 to 200') to perform the above-described roll-out-based score estimation training. Also, the regression to classification technique may be a method of performing score estimation training through roll-out based on a classification model rather than a regression model. In detail, the regression to classification technique expresses a preset possible score estimate (G) in a one-hot vector method (eg, '[-200, -199, ..., 199, 200] ' etc.), obtaining a probability distribution for each score estimate value (G) expressed, and selecting the score estimate value (G) having the largest probability value, thereby performing the rollout-based score estimation training. .

따라서, 본 발명의 일 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 넓은 범위를 대상으로 하는 점수추정 트레이닝을 보다 효율적으로 빠르게 수행할 수 있다. Therefore, the deep learning-based incomplete information game service providing apparatus according to an embodiment of the present invention can more efficiently and quickly perform score estimation training targeting a wide range.

정리하면, 게임 플레이 서버(300)는 위와 같은 방식으로 소정의 트레이닝 데이터 셋에 기반한 롤 아웃을 수행하여 승률추정 트레이닝과 점수추정 트레이닝을 수행할 수 있고, 이를 통해 소정의 고스톱판 상태(S)에 대한 승률추정값(W)과 점수추정값(G)을 출력하도록 동작 모델을 학습시킬 수 있다. In summary, the game play server 300 can perform rollout based on a predetermined training data set in the above manner to perform odds estimation training and score estimation training, and through this, to a predetermined high-stop board state (S) An operating model can be trained to output an odds estimate (W) and a score estimate (G) for

따라서, 본 발명의 일 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 고스톱과 같은 불완전 정보 게임의 소정의 플레이 상태를 기초로 수행된 일련의 액션들에 의한 승률 및 점수를 추정할 수 있다. Therefore, the deep learning-based incomplete information game service providing apparatus according to an embodiment of the present invention can estimate the winning rate and score by a series of actions performed based on a predetermined play state of an incomplete information game such as GoStop. there is.

또한, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 족보 성취확률 트레이닝을 수행하는 단계(S113)를 포함할 수 있다. In addition, the deep learning-based GoStop game service method according to an embodiment of the present invention may include a step (S113) of the game play server 300 performing genealogy achievement probability training.

여기서, 족보는 고스톱 게임에서 특별한 끗수를 얻을 수 있는 특권이 생기는 약(約)에 대한 사전에 약속한 패의 모으는 종류를 의미할 수 있다. 실시예에서 족보는 고스톱 규칙에 따른 고도리, 청단, 홍단 및/또는 초단 등을 포함할 수 있다. Here, the genealogy may mean a type of collection of hands promised in advance for a drug that gives a privilege to obtain a special number of points in the go-stop game. In the embodiment, the genealogy may include Godori, Cheongdan, Hongdan, and/or Chodan according to the GoStop rule.

자세히, 도 8을 더 참조하면, 게임 플레이 서버(300)는 MCTS 모듈(390)을 동작하여 상기 수신된 트레이닝 데이터 셋에 기초한 롤 아웃을 수행할 수 있다. 또한, 게임 플레이 서버(300)는 해당하는 고스톱판 상태(S)에 대한 롤 아웃 대전이 완료되었을 때, 고도리, 청단, 홍단 및/또는 초단 등과 같은 족보를 맞출 확률을 예측하도록 동작 모델을 학습시키는 족보 성취확률 트레이닝을 수행할 수 있다. In detail, further referring to FIG. 8 , the game play server 300 may operate the MCTS module 390 to perform rollout based on the received training data set. In addition, the game play server 300 trains an action model to predict the probability of matching genealogies such as Godori, Cheongdan, Hongdan and / or Chodan when the rollout match for the corresponding GoStop board state (S) is completed. Genealogical achievement probability training can be performed.

즉, 게임 플레이 서버(300)는 소정의 고스톱판 상태(S)를 입력으로 하고, 상기 입력된 고스톱판 상태(S)에 기반한 롤 아웃의 결과 소정의 족보(고도리, 청단, 홍단 및/또는 초단 등)를 성취(달성)할 확률을 예측한 정보인 족보 성취확률값(L)을 출력으로 하도록 동작 모델을 학습시킬 수 있다. That is, the game play server 300 receives a predetermined high-stop board state (S) as an input, and as a result of roll-out based on the input high-stop board state (S), the predetermined genealogy (godori, cheongdan, hongdan, and/or first stage) etc.) can be trained to output the genealogy achievement probability value (L), which is information that predicts the probability of achieving (achievement).

보다 상세히, 게임 플레이 서버(300)는 소정의 고스톱판 상태(S)를 입력 받으면, 상기 입력된 고스톱판 상태(S)에 기반한 롤 아웃을 수행하여 해당 고스톱판 상태(S)로부터 가능한 일련의 액션들의 결과로 대전이 완료되었을 때 고도리를 성취할 확률값, 청단을 성취할 확률값, 홍단을 성취할 확률값 및/또는 초단을 성취할 확률값을 각각 산출하여 출력으로 제공하도록 동작 모델을 학습시킬 수 있다. More specifically, upon receiving a predetermined high-stop board state (S), the game play server 300 performs a roll-out based on the input high-stop board state (S), and a series of actions possible from the corresponding high-stop board state (S). As a result of the above, when the match is completed, the probability of achieving Godori, the probability of achieving Cheongdan, the probability of achieving Hongdan, and / or the probability of achieving the first stage are calculated and provided as output. The model can be trained.

또한, 게임 플레이 서버(300)는 위와 같이 예측된 족보 성취확률값(L)을 상술된 승률추정값(W)과 점수추정값(G)을 추정할 시에 활용할 수 있다. 예를 들어, 게임 플레이 서버(300)는 족보 성취확률값(L)을 기초로 해당 고스톱 대전에서 특별한 끗수를 얻을 확률이 높다고 판단되면 소정의 승률추정값(W) 및/또는 소정의 점수추정값(G)을 증가시킬 수 있다. In addition, the game play server 300 may utilize the genealogy achievement probability value (L) predicted as above when estimating the above-described odds estimate value (W) and score estimate value (G). For example, when the game play server 300 determines that there is a high probability of obtaining a special score in the corresponding Go-Stop match based on the genealogy achievement probability value (L), a predetermined odds ratio estimation value (W) and/or a predetermined score estimation value (G) can increase

따라서, 본 발명의 일 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 고스톱과 같은 불완전 정보 게임에서 해당 게임 규칙에 따른 족보(실시예에서, 고도리, 청단, 홍단 및/또는 초단 등)를 성취할 확률까지 예측하여, 불완전 정보 게임을 플레이하는 인공지능 컴퓨터의 성능을 향상시킬 수 있다. Therefore, an apparatus for providing a deep learning-based incomplete information game service according to an embodiment of the present invention is a genealogy according to a corresponding game rule in an incomplete information game such as Go-Stop (in an embodiment, Godori, Cheongdan, Hongdan, and/or Chodan, etc.) It is possible to improve the performance of an artificial intelligence computer playing an imperfect information game by predicting the probability of achieving

- 다른 실시예에 따른 게임 플레이 서버 - 딥러닝 기반 고스톱 서비스 제공 - Game play server according to another embodiment - Deep learning-based Go-Stop service provided

도 10은 본 발명의 실시예에 따른 게임 플레이 서버(300)의 동작 모델이 딥러닝 기반의 불완전 정보 게임 서비스를 제공하는 방법을 설명하기 위한 흐름도이고, 도 11은 본 발명의 실시예에 따른 게임 플레이 서버(300)의 동작 모델이 딥러닝 기반의 불완전 정보 게임 서비스를 제공하는 방법을 설명하기 위한 개념도이다.10 is a flow chart illustrating a method for providing a deep learning-based incomplete information game service by an operation model of a game play server 300 according to an embodiment of the present invention, and FIG. It is a conceptual diagram for explaining how the operation model of the play server 300 provides a deep learning-based incomplete information game service.

도 10 및 도 11을 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스는, 게임 플레이 서버(300)가 특정 고스톱판 상태(S)에 대한 최적 액션(A)을 수행하며 고스톱 게임을 플레이하게 할 수 있다. 여기서, 최적 액션(A)은 소정의 고스톱판 상태(S)에서 가장 높은 승률과 점수를 가질 수 있다고 판단된 소정의 액션일 수 있다. 구체적으로, 게임 플레이 서버(300)는 특정 고스톱판 상태(S)를 획득할 수 있다. 또한, 게임 플레이 서버(300)는 획득된 특정 고스톱판 상태(S)에 대한 활성화 상태를 판단한 상태 판단정보(C)와, 상기 특정 고스톱판 상태(S)에 기초한 승률추정값(W) 및/또는 점수추정값(G)을 포함하는 추정정보와, 상기 특정 고스톱판 상태(S)에 기초한 족보 성취확률값(L)을 획득할 수 있다. 또한, 게임 플레이 서버(300)는 획득된 특정 고스톱판 상태(S), 상태 판단정보(C) 및/또는 추정정보를 이용하여 상기 특정 고스톱판 상태(S)에 대한 최적 액션(A)을 결정 및 수행할 수 있다. 10 and 11, in the deep learning-based GoStop game service according to an embodiment of the present invention, the game play server 300 performs an optimal action (A) for a specific GoStop board state (S) and GoStop can play the game. Here, the optimal action (A) may be a predetermined action determined to have the highest win rate and score in a predetermined high-stop board state (S). Specifically, the game play server 300 may acquire a specific high-stop board state (S). In addition, the game play server 300 may obtain state determination information (C) for determining the activation state for the specific high-stop board state (S), and a win rate estimation value (W) based on the specific high-stop board state (S) and/or It is possible to obtain estimation information including a score estimation value (G) and a genealogy achievement probability value (L) based on the specific high-stop board state (S). In addition, the game play server 300 determines the optimal action (A) for the specific high-stop board state (S) by using the obtained specific high-stop board state (S), state determination information (C) and/or estimation information. and can be performed.

자세히, 도 10을 더 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 고스톱판 상태(S)를 수신하는 단계(S201)를 포함할 수 있다. In detail, further referring to FIG. 10, the deep learning-based go-stop game service method according to an embodiment of the present invention may include a step (S201) of receiving, by the game play server 300, a go-stop board state (S). .

보다 상세히, 게임 플레이 서버(300)는, 게임 서비스 제공서버(200)와 연동하여 현재 진행 중인 고스톱 게임에 대한 현재 고스톱판 상태(S)를 수신할 수 있다. In more detail, the game play server 300 may receive the current GoStop board state S for the GoStop game currently in progress in conjunction with the game service providing server 200 .

여기서, 상기 현재 고스톱판 상태(S)는, 고스톱판에 고스톱 패가 놓여진 상태 등을 포함하는 현재 게임 진행에 따른 데이터들의 집합으로서, 실시예에서 내 손패, 내 먹은패, 바닥에 깔린 패(즉, 공개 패), 승리하기 위해 갱신해야 하는 점수, 내 선택 패 정보, 뻑 유발정보, 보너스 패 인질 종류, 고(go) 실행횟수 및/또는 9열끗 피 사용정보 등을 포함하는 공개 정보들을 포함할 수 있다. 상기 현재 고스톱판 상태(S)는, 상대 손패, 상대 먹은패 및/또는 상대방이 승리하기 위해 갱신해야 하는 점수 등과 같이, 고스톱 게임을 진행하며 공개되지 않는 비공개 정보가 불포함될 수 있다. Here, the current high-stop board state (S) is a set of data according to the current game progress, including the state in which the high-stop hand is placed on the high-stop board, etc. It may contain public information, including open hand), points that need to be updated to win, my selected hand information, ditch trigger information, bonus hand hostage type, go execution count, and/or 9-bit blood usage information. there is. The current go-stop board state (S) may not include private information that is not disclosed while playing the go-stop game, such as the opponent's hand, the opponent's lost hand, and/or the score that the opponent needs to update to win.

또한, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 수신된 고스톱판 상태(S)에 대한 상태 판단정보(C)를 획득하는 단계(S203)를 포함할 수 있다. In addition, the deep learning-based GoStop game service method according to an embodiment of the present invention includes the step of obtaining, by the game play server 300, state determination information (C) for the received GoStop board state (S) (S203). can do.

여기서, 상태 판단정보(C)는 특정 고스톱판 상태(S)에서 수행 가능한 액션을 검출하기 위한 기준이 되는 요소인 활성화 상태를 판단한 정보일 수 있다. 즉, 활성화 상태는 특정 고스톱판 상태(S)에서 플레이어가 수행 가능한 적어도 하나의 액션을 결정할 수 있다. 여기서, 활성화 상태는 손패 상태(HAND State), 흔들기 상태(SHAKE State), 먹을패 선택 상태(SELECT State), 술잔위치 선택 상태(SWITCH State), 고스톱 상태(GOSTOP State), 게임종료 상태(END State), 손패 쥐어주기 상태(CHANCE_HAND State) 및 덱 뒤집기 상태(CHANCE_FLIP State)를 포함할 수 있다. 이는 불완전 정보 게임에서의 최적화된 형태로 상태들을 구분하기 위한 것이다. Here, the state determination information (C) may be information that determines the activation state, which is a criterion for detecting an action that can be performed in a specific high-stop board state (S). That is, the activation state may determine at least one action that can be performed by the player in the specific high-stop state (S). Here, the activation states are HAND State, SHAKE State, SELECT State, SWITCH State, GOSTOP State, END State ), hand holding state (CHANCE_HAND State) and deck flipping state (CHANCE_FLIP State). This is to classify states in an optimized form in an incomplete information game.

자세히, 게임 플레이 서버(300)는 동작 모델을 이용하여 상기 현재 고스톱판 상태(S)에 대한 상태 판단정보(C)를 획득할 수 있다. 구체적으로, 게임 플레이 서버(300)는 상기 동작 모델의 상태 판단부(310)를 기초로 상기 현재 고스톱판 상태(S)를 입력으로 하고 상기 현재 고스톱판 상태(S)에 대한 상태 판단정보(C)를 출력으로 하는 딥러닝을 수행할 수 있다. 이때, 상기 상태 판단부(310)는 소정의 고스톱판 상태(S)를 입력으로 하고 해당 고스톱판 상태(S)에 대한 상태 판단정보(C)를 출력하도록 상태판단 트레이닝될 수 있다. 이에 대한 자세한 설명은 도 6 및 도 7의 동작 모델에 관한 설명을 따른다. In detail, the game play server 300 may obtain state determination information (C) for the current high-stop board state (S) by using an operation model. Specifically, the game play server 300 takes the current high-stop board state (S) as an input based on the state determination unit 310 of the action model, and state determination information (C) for the current high-stop board state (S). ) can be performed as an output. At this time, the state determination unit 310 may receive a predetermined high stop plate state (S) as an input and may be trained to determine state to output state determination information (C) for the corresponding high stop plate state (S). A detailed description thereof follows the description of the operation model of FIGS. 6 and 7 .

또한, 게임 플레이 서버(300)는 상기 수행된 딥러닝을 통하여 상기 현재 고스톱판 상태(S)에 대한 상태 판단정보(C)를 획득할 수 있다. In addition, the game play server 300 may obtain state determination information (C) for the current high-stop board state (S) through the performed deep learning.

예를 들면, 게임 플레이 서버(300)는 현재 고스톱판 상태(S)에 대한 활성화 상태가 내 손패 중에서 어떤 패를 낼지 결정하는 손패 상태(HAND State)로서, 내 손패 중 어느 하나를 고스톱판(즉, 바닥)에 내는 액션이 가능한 상태임을 제공하는 상태 판단정보(C)를 획득할 수 있다. For example, the game play server 300 is a hand state (HAND State) in which the activation state for the current go-stop board state (S) determines which hand to play out of my hand, and any one of my hands is a go-stop board (ie, , the bottom) can obtain state determination information (C) providing that the action is possible.

또한, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 수신된 고스톱판 상태(S)에 대한 추정정보를 획득하는 단계(S205)를 포함할 수 있다. In addition, the deep learning-based go-stop game service method according to an embodiment of the present invention may include a step (S205) of acquiring estimation information on the received go-stop board state (S) by the game play server 300.

여기서, 추정정보는 소정의 고스톱판 상태(S)에 대하여 딥러닝을 기초로 추정된 승률추정값(W) 및/또는 점수추정값(G) 등을 포함하는 정보일 수 있다. 이때, 승률추정값(W)은 소정의 고스톱판 상태(S)에서 가능한 액션들에 의한 승률을 추정한 정보일 수 있다. 점수추정값(G)은 소정의 고스톱판 상태(S)에서 가능한 액션들에 의한 점수를 추정한 정보일 수 있다. Here, the estimation information may be information including an odds ratio estimation value (W) and/or a score estimation value (G) estimated based on deep learning for a predetermined high-stop board state (S). At this time, the odds ratio estimation value (W) may be information obtained by estimating the odds ratio by possible actions in a predetermined high-stop board state (S). The score estimation value (G) may be information obtained by estimating scores by possible actions in a predetermined high-stop board state (S).

자세히, 게임 플레이 서버(300)는 동작 모델을 이용하여 상기 현재 고스톱판 상태(S)에 대한 추정정보를 획득할 수 있다. 보다 상세히, 게임 플레이 서버(300)는 동작 모델의 MCTS 모듈(390)을 기초로 상기 현재 고스톱판 상태(S)를 입력으로 하고 상기 현재 고스톱판 상태(S)에 대한 승률추정값(W) 및/또는 점수추정값(G) 등을 출력으로 하는 딥러닝을 수행할 수 있다. 이때, 상기 MCTS 모듈(390)은 소정의 고스톱판 상태(S)를 입력으로 하고 해당 고스톱판 상태(S)에 대한 추정정보를 출력하도록 트레이닝될 수 있다. 이에 대한 자세한 설명은 도 8 및 도 9의 동작 모델에 관한 설명을 따른다. In detail, the game play server 300 may obtain estimation information on the current high-stop board state (S) using an operation model. In more detail, the game play server 300 takes the current high-stop board state (S) as an input based on the MCTS module 390 of the action model and obtains an odds estimation value (W) for the current high-stop board state (S) and/or Alternatively, deep learning with a score estimation value (G) as an output may be performed. At this time, the MCTS module 390 may be trained to receive a predetermined high stop plate state (S) as an input and output estimation information for the corresponding high stop plate state (S). A detailed description thereof follows the description of the operation model of FIGS. 8 and 9 .

또한, 게임 플레이 서버(300)는 상기 수행된 딥러닝을 통하여 상기 현재 고스톱판 상태(S)에 대한 추정정보를 획득할 수 있다. In addition, the game play server 300 may obtain estimation information on the current high-stop board state (S) through the deep learning performed above.

또한, 본 발명의 일 실시예에 따른 딥러닝 기반 고스톱 게임 서비스 방법은 게임 플레이 서버(300)가 획득된 고스톱판 상태(S), 상태 판단정보(C) 및 추정정보에 따른 액션(action)을 수행하는 단계(S207)를 포함할 수 있다. In addition, in the deep learning-based GoStop game service method according to an embodiment of the present invention, the game play server 300 takes an action according to the obtained GoStop board state (S), state determination information (C), and estimation information. It may include performing step (S207).

자세히, 게임 플레이 서버(300)는 동작 모델을 이용하여 상기 현재 고스톱판 상태(S), 상태 판단정보(C) 및 추정정보에 기반한 최적 액션(A)을 수행할 수 있다. 여기서, 최적 액션(A)은 소정의 고스톱판 상태(S)에서 가장 높은 승률과 점수를 가질 수 있다고 판단된 소정의 액션일 수 있다. In detail, the game play server 300 may perform an optimal action (A) based on the current high-stop board state (S), state determination information (C), and estimation information using an action model. Here, the optimal action (A) may be a predetermined action determined to have the highest win rate and score in a predetermined high-stop board state (S).

구체적으로, 게임 플레이 서버(300)는 동작 모델의 액션 결정부(330)를 기초로 상기 현재 고스톱판 상태(S)에 대하여 획득된 승률추정값(W) 및/또는 점수추정값(G) 등을 포함하는 추정정보를 입력으로 하고 상기 현재 고스톱판 상태(S)에 대한 최적 액션(A)을 출력으로 하는 딥러닝을 수행할 수 있다. Specifically, the game play server 300 includes an odds estimate value (W) and/or a score estimate value (G) obtained for the current high-stop board state (S) based on the action decision unit 330 of the motion model. It is possible to perform deep learning with the estimation information as an input and the optimal action (A) for the current high-stop board state (S) as an output.

보다 상세히, 게임 플레이 서버(300)는, 상기 현재 고스톱판 상태(S)와, 상기 현재 고스톱판 상태(S)에 대하여 획득된 상태 판단정보(C)와, 상기 현재 고스톱판 상태(S)에 대하여 획득된 승률추정값(W) 및/또는 점수추정값(G) 등을 포함하는 추정정보에 기초하여, 상기 현재 고스톱판 상태(S)에 대한 최적 액션(A)을 출력으로 하는 딥러닝을 수행할 수 있다. 이때, 상기 액션 결정부(330)는 상술된 바와 같은 추정정보를 입력으로 하고 해당 고스톱판 상태(S)에 대한 최적 액션(A)을 출력하도록 트레이닝될 수 있다. 이에 대한 자세한 설명은 도 4 내지 도 9의 동작 모델에 관한 설명을 따른다. In more detail, the game play server 300 determines the current high-stop board state (S), the state determination information (C) obtained for the current high-stop board state (S), and the current high-stop board state (S). Based on the estimated information including the odds estimate value (W) and/or the score estimate value (G) obtained for can At this time, the action determination unit 330 may be trained to output the optimal action (A) for the high-stop board state (S) with the estimation information as described above as an input. A detailed description thereof follows the description of the operation model of FIGS. 4 to 9 .

또한, 게임 플레이 서버(300)는 상기 수행된 딥러닝을 통하여 상기 현재 고스톱판 상태(S)에 대한 최적 액션(A)을 결정하고 이를 수행할 수 있다. In addition, the game play server 300 may determine the optimal action (A) for the current high-stop board state (S) through the performed deep learning and perform it.

이때, 게임 플레이 서버(300)는 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 알고리즘에 기반하여 상기 최적 액션(A)을 결정할 수 있다. At this time, the game play server 300 may determine the optimal action (A) based on a Monte Carlo Tree Search (MCTS) algorithm.

자세히, 게임 플레이 서버(300)는 상태 판단부(310), MCTS 모듈(390) 및 액션 결정부(330)가 상호 연동하는 동작 모델을 이용하여, 상기 현재 고스톱판 상태(S)와, 상기 현재 고스톱판 상태(S)에 대하여 획득된 상태 판단정보(C) 및 추정정보에 기초한 MCTS를 수행할 수 있다. 또한, 게임 플레이 서버(300)는, 수행된 MCTS를 기초로 상기 현재 고스톱판 상태(S)에 대한 활성화 상태와, 상기 활성화 상태에서 수행 가능한 액션에 따른 승률 및 점수와, 상기 현재 고스톱판 상태(S)를 기초로 수행 가능한 일련의 액션들에 따른 족보 성취확률 등을 고려하여, 상기 현재 고스톱판 상태(S)에서 가장 높은 숭률과 점수를 가질 수 있다고 판단되는 최적 액션(A)을 결정할 수 있다. 또한, 게임 플레이 서버(300)는 결정된 최적 액션(A)에 따른 동작을 수행할 수 있다. In detail, the game play server 300 uses an operation model in which the state determination unit 310, the MCTS module 390, and the action determination unit 330 interoperate, and the current high-stop board state (S) and the current MCTS based on the state determination information (C) and estimation information obtained for the high-stop board state (S) may be performed. In addition, the game play server 300 provides an activation state for the current high-stop board state (S) based on the performed MCTS, a win rate and a score according to actions that can be performed in the active state, and the current high-stop board state ( Considering the genealogy achievement probability according to a series of actions that can be performed based on S), the optimal action (A) that is determined to have the highest rate and score in the current high-stop board state (S) can be determined. . Also, the game play server 300 may perform an operation according to the determined optimal action (A).

따라서, 본 발명의 다른 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 고스톱과 같은 불완전 정보 게임의 소정의 플레이 상태에 대한 활성화 상태, 소정의 액션에 따른 승률 및/또는 점수를 성취할 확률을 고려하여 게임에서 이기기 위한 최적 액션(A)을 수행할 수 있다. Therefore, an apparatus for providing a deep learning-based incomplete information game service according to another embodiment of the present invention can achieve an activation state for a predetermined play state of an incomplete information game such as GoStop, and a win rate and/or score according to a predetermined action. The optimal action (A) for winning the game can be performed by considering the probability.

한편, 일반적으로 게임 진행에 필요한 일부 정보(실시예에서, 상대 플레이어의 손패와 덱(deck) 패 등)가 비공개 처리되는 불완전 정보 게임(실시예에서, 고스톱 게임)에 대한 MCTS는, 알려지지 않은 정보에 기반하여 유발될 수 있는 행동(실시예에서, 상대 플레이어의 액션)을 추정할 시 랜덤한 확률에 기반하여 행동한다. On the other hand, in general, the MCTS for an incomplete information game (in an embodiment, a go-stop game) in which some information necessary for game progress (in an embodiment, an opponent's hand and a deck hand, etc.) is kept private, is unknown information When estimating an action that can be triggered based on (in the embodiment, an action of an opponent player), the action is based on a random probability.

자세히, 기존의 랜덤 방식으로 고스톱 게임에 대한 MCTS를 수행할 시 상대 플레이어의 액션(즉, 찬스 노드(chance node)의 행동)을 추정하는 경우, 상대 플레이어에게 가능한 모든 액션들 중 하나의 액션을 소정의 알고리즘(예컨대, 랜덤 폴리시(random policy) 등)을 따라서 랜덤하게 선택하고, 랜덤하게 선택된 상대 플레이어의 액션이 상대 플레이어에 의해 수행되는 것으로 간주하며 MCTS를 진행할 수 있다. In detail, in the case of estimating the action of the opposing player (ie, the action of the chance node) when performing the MCTS for the go-stop game in the existing random method, one action among all possible actions for the opposing player is determined. MCTS may be performed by randomly selecting an action according to an algorithm (eg, random policy, etc.) and considering that the randomly selected action of the opposing player is performed by the opposing player.

도 12는 본 발명의 실시예에 따른 불완전 정보 게임 서비스의 MCTS에서 상대 플레이어의 액션을 추정하는 방법을 설명하기 위한 도면의 일례이다. 12 is an example of a diagram for explaining a method of estimating an action of an opponent player in MCTS of an incomplete information game service according to an embodiment of the present invention.

예시적으로, 도 12의 (a)를 참조하면, 기존의 랜덤 방식으로 고스톱 게임에 대한 MCTS를 수행할 시, 상대 플레이어의 손패가 2장이고 덱 패의 수가 3장일 때 상대 플레이어의 액션을 추정하는 경우, 상대 플레이어에게 가능한 모든 액션들 중 랜덤하게 선택된 하나의 액션 예컨대, 상대 플레이어가 손패 상태(HAND State)이고, 해당하는 고스톱 게임 상에서 아직 공개되지 않은 모든 비공개 패(즉, 상기 상대 플레이어의 손패 2장과 상기 덱 패 3장을 포함하는 5장의 패) 중에서 랜덤한 확률로 선택되는 하나의 패를 상대 플레이어가 선택하여 바닥 상에 내는 액션을 상대 플레이어의 액션으로 결정하며 시뮬레이션을 수행할 수 있다. Illustratively, referring to (a) of FIG. 12, when performing MCTS for a go-stop game in the conventional random method, when the opponent's hand is 2 and the number of deck hands is 3, the action of the opponent is estimated In this case, one randomly selected action among all possible actions for the opposing player. For example, the opposing player is in the HAND state, and all private hands that have not yet been revealed in the corresponding GoStop game (i.e., the opposing player's hand) 2 cards and 5 cards including 3 deck hands), the opponent player selects one hand selected at random probability, and the action on the floor is determined as the action of the opponent player, and the simulation can be performed .

그러나 실질적으로 상대 플레이어의 액션은 상술된 바와 같이 랜덤한 방식으로 결정되지 않으므로, 위와 같은 종래의 랜덤 방식에 기반하여 MCTS를 수행하면 그 성능에 대한 신뢰성이 저하될 수 있다. However, since the action of the opponent player is not substantially determined in a random manner as described above, performance reliability of the MCTS may deteriorate if the MCTS is performed based on the conventional random method as described above.

그리하여 본 발명의 다른 실시예에 따른 게임 플레이 서버(300)는, 게임 진행과 관련된 일부 정보(실시예에서, 상대 플레이어의 손패와 덱(deck) 패 등)가 비공개 처리되는 고스톱 게임에 대한 MCTS를 수행할 시, 알려지지 않은 정보에 기반하는 상대 플레이어의 액션을 딥러닝을 기초로 추정하는 딥러닝 기반의 상대액션 추정 시뮬레이션(이하, 상대액션 추정 시뮬레이션)을 수행할 수 있다. Therefore, the game play server 300 according to another embodiment of the present invention provides MCTS for a GoStop game in which some information related to the game progress (in the embodiment, the opponent's hand and deck hand, etc.) is kept private. When performing, a deep learning-based opponent action estimation simulation (hereinafter referred to as opponent action estimation simulation) may be performed to estimate an opponent player's action based on unknown information based on deep learning.

즉, 본 발명의 다른 실시예에 따른 게임 플레이 서버(300)는 고스톱과 같은 불완전 정보 게임에 대한 MCTS에서 상대 플레이어의 액션을 딥러닝을 이용하여 추정하는 상대액션 추정 시뮬레이션을 수행할 수 있다. That is, the game play server 300 according to another embodiment of the present invention may perform an opponent action estimation simulation in which an action of an opponent player is estimated using deep learning in MCTS for an incomplete information game such as GoStop.

자세히, 게임 플레이 서버(300)는 고스톱 게임 서비스에서 MCTS를 수행할 시 상대 플레이어의 액션을 추정하는 경우, 먼저 상대 플레이어에 대한 활성화 상태를 판단할 수 있다. 예를 들면, 게임 플레이 서버(300)는 실시예에 따른 고스톱 게임에서 가능한 복수의 활성화 상태(실시예에서, 손패 상태(HAND State), 흔들기 상태(SHAKE State), 먹을패 선택 상태(SELECT State), 술잔위치 선택 상태(SWITCH State), 고스톱 상태(GOSTOP State), 게임종료 상태(END State), 손패 쥐어주기 상태(CHANCE_HAND State) 및 덱 뒤집기 상태(CHANCE_FLIP State)) 중 하나의 상태인 상대 플레이어의 활성화 상태를 판단할 수 있다. In detail, the game play server 300 may first determine the activation state of the opponent player when estimating the action of the opponent player when performing the MCTS in the GoStop game service. For example, the game play server 300 may perform a plurality of activation states (in the embodiment, a hand state (HAND State), a shake state (SHAKE State), and a selection state (SELECT State)) possible in the go-stop game according to the embodiment. , SWITCH State, GOSTOP State, END State, CHANCE_HAND State, and CHANCE_FLIP State) Activation status can be determined.

또한, 도 12의 (b)를 참조하면, 실시예로 게임 플레이 서버(300)는 결정된 상대 플레이어의 활성화 상태가 손패 상태(HAND State)인 경우, 상대 플레이어의 손패 개수를 산출할 수 있다. 예를 들면, 게임 플레이 서버(300)는 상대 플레이어가 손패를 2장 가지고 있는 경우 해당 상대 플레이어의 손패 개수를 2개로 산출할 수 있다. 이때, 고스톱 게임의 특성 상 상기 상대 플레이어의 손패는 해당하는 패의 정보가 공개되지 않는 비공개 패일 수 있다. Also, referring to (b) of FIG. 12 , in an embodiment, the game play server 300 may calculate the number of hands of the opponent player when the determined activation state of the opponent player is a hand state (HAND State). For example, when the opponent player has two hands, the game play server 300 may calculate the number of hands of the opponent player as two. At this time, due to the characteristics of the Go-Stop game, the opponent player's hand may be a private hand in which information of the corresponding hand is not disclosed.

또한, 게임 플레이 서버(300)는 고스톱 게임의 전체 패 중에서 공개 패를 제외한 비공개 패 즉, 상대 플레이어의 손패와 덱 패 중 어느 하나의 패로 구분되는 적어도 하나 이상의 비공개 패를 검출할 수 있다. 이때, 게임 플레이 서버(300)는 상기 검출된 비공개 패 각각에 해당하는 패의 정보(즉, 어떠한 패인지를 나타내는 정보)를 해당하는 비공개 패 각각에 매칭하여 함께 획득할 수 있다. 즉, 불완전 정보 게임인 고스톱 게임의 특성 상 해당하는 고스톱 게임에서 아직 공개되지 않은 비공개 패들이 어떠한 패들인지는 판단할 수 있으나, 해당하는 비공개 패들 중에서 어떠한 패가 상대 플레이어의 손패에 속하는지는 비공개일 수 있다. In addition, the game play server 300 may detect at least one closed hand excluding the open hand from all hands of the GoStop game, that is, one or more of the other player's hand hand and deck hand. At this time, the game play server 300 may obtain information on a hand corresponding to each of the detected closed hands (that is, information indicating which hand it is) by matching it to each of the corresponding closed hands. That is, due to the nature of the Go-Stop game, which is an incomplete information game, it is possible to determine which paddles are private paddles that have not yet been released in the corresponding Go-Stop game, but which of the corresponding non-public paddles belongs to the opponent's hand may be private. .

또한, 게임 플레이 서버(300)는 검출된 적어도 하나 이상의 비공개 패 중에서 상기 산출된 상대 플레이어의 손패 개수만큼을 랜덤하게 추출할 수 있다. 예를 들면, 게임 플레이 서버(300)는 상대 플레이어의 손패 개수를 2개로 산출한 경우 상기 검출된 적어도 하나 이상의 비공개 패 중에서 랜덤하게 2개의 패를 선택해 추출할 수 있다. In addition, the game play server 300 may randomly extract as many as the calculated number of hand cards of the opponent player from among the detected at least one closed hand. For example, when the number of hands of the opposing player is calculated as two, the game play server 300 may randomly select and extract two hands from among the detected at least one closed hand.

또한, 게임 플레이 서버(300)는 추출된 비공개 패(이하, 추출 패)에 매칭되는 패의 정보를 기초로 해당하는 적어도 하나 이상의 추출 패가 어떠한 패인지를 감지할 수 있다. 예를 들면, 게임 플레이 서버(300)는 랜덤하게 추출된 제1 추출 패와 제2 추출 패 각각에 매칭된 패의 정보를 기초로 상기 제1 추출 패와 제2 추출 패 각각이 어떠한 패인지 판단할 수 있다. In addition, the game play server 300 may detect the type of at least one extracted hand based on information on a hand matched to the extracted undisclosed hand (hereinafter referred to as extracted hand). For example, the game play server 300 determines which hand each of the first and second extraction hands are based on the information of the randomly extracted first and second extraction hands and matched information. can do.

또한, 게임 플레이 서버(300)는 위와 같이 어떠한 패인지가 공개된 적어도 하나 이상의 추출 패 중에서, 어떠한 추출 패가 더 높은 승률과 점수를 가질 수 있는 최선의 패인지를 딥러닝을 통해 결정할 수 있다. In addition, the game play server 300 may determine, through deep learning, which extracted hand is the best hand that can have a higher win rate and score among at least one extracted hand, which hand is disclosed as above.

자세히, 게임 플레이 서버(300)는 해당하는 MCTS를 기초로, 상대 플레이어 측에서 상기 추출 패 각각에 기반한 액션을 수행해보도록 하는 셀프 플레이(self-play)를 할 수 있다. 예를 들어, 게임 플레이 서버(300)는 제1 추출 패를 내는 액션을 상대 플레이어의 액션으로 하는 셀프 플레이와, 제2 추출 패를 내는 액션을 상대 플레이어의 액션으로 하는 셀프 플레이를 수행할 수 있다. In detail, the game play server 300 may perform self-play in which an opponent player performs an action based on each of the extracted hands based on the corresponding MCTS. For example, the game play server 300 may perform a self-play in which an action of releasing a first extraction hand is an action of an opponent player, and a self-play in which an action of an action of releasing a second extraction hand is an action of an opponent player. .

또한, 게임 플레이 서버(300)는 상기 추출 패 각각에 대한 셀프 플레이 수행의 결과로, 해당하는 추출 패 별로 추정되는 승률추정값(W) 및/또는 점수추정값(G)을 획득할 수 있다. In addition, the game play server 300 may obtain an estimated win rate value (W) and/or a score estimation value (G) for each extracted hand as a result of performing the self-play for each extracted hand.

또한, 게임 플레이 서버(300)는 획득된 추출 패 별 승률추정값(W) 및/또는 점수추정값(G)을 기초로 해당하는 적어도 하나 이상의 추출 패 중에서 어떠한 추출 패를 상대 플레이어의 액션에 적용할지 결정할 수 있다. 실시예에서, 게임 플레이 서버(300)는 셀프 플레이의 결과 더 높은 승률추정값(W) 및/또는 점수추정값(G)을 가지는 추출 패를 상대 플레이어의 액션에 적용할 추출 패로 결정할 수 있다. 이때, 실시예에 따라서 게임 플레이 서버(300)는 MCTS 알고리즘을 실행할 시 softmax temperature 방식을 적용하여 그 정확성을 향상시킬 수 있다. In addition, the game play server 300 determines which extraction tile to apply to the action of the opponent player from among the corresponding at least one extraction tile based on the obtained win rate estimation value (W) and/or score estimation value (G) for each extraction hand. can In an embodiment, the game play server 300 may determine a extracted hand having a higher win rate estimation value (W) and/or score estimation value (G) as a extracted hand to be applied to the action of the opponent player as a result of self-play. At this time, according to the embodiment, the game play server 300 may improve the accuracy by applying the softmax temperature method when executing the MCTS algorithm.

그리하여 게임 플레이 서버(300)는, 해당 고스톱 게임에서 상대 플레이어의 손패 개수만큼 추출된 적어도 하나 이상의 추출 패 중에서, 더 높은 승률과 점수를 가질 수 있는 최선의 패로 판단되는 추출 패를 딥러닝을 통해 결정할 수 있다. Thus, the game play server 300 determines, through deep learning, the extracted hand that is determined to be the best hand that can have a higher win rate and score among at least one or more extracted hands extracted as many as the number of hands of the opposing player in the corresponding go-stop game. can

또한, 게임 플레이 서버(300)는 상기 최선의 패로 결정된 추출 패를 내는 액션을 상대 플레이어의 액션으로 결정할 수 있다. In addition, the game play server 300 may determine the action of releasing the extracted hand determined as the best hand as the action of the opposing player.

따라서, 본 발명의 다른 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 게임 진행과 관련된 일부 정보(실시예에서, 상대 플레이어의 손패와 덱(deck) 패 등)가 비공개 처리되는 고스톱 게임과 같은 불완전 정보 게임에 대한 MCTS를 수행할 시, 알려지지 않은 정보에 기반하는 상대 플레이어의 액션을 딥러닝을 기초로 추정할 수 있다. Therefore, an apparatus for providing a game service based on deep learning based on incomplete information according to another embodiment of the present invention is a go-stop game in which some information related to game progress (in an embodiment, an opponent's hand and deck hand, etc.) is confidentially processed. When MCTS is performed for an incomplete information game such as , an action of an opponent player based on unknown information can be estimated based on deep learning.

한편, 일반적으로 상술된 MCTS는 효율적으로 노드(node)를 탐색하기 위하여 PUCT라는 Policy Network 및 Value Network를 사용한 UCT 방식을 주로 사용하고 있다. 즉, 기존의 MCTS는 승률추정값(W)에 매우 의존하는 방식인 UCT 방식을 일반적으로 사용하고 있다. 다만, 고스톱과 같은 게임에서는 승률 이외에도 점수와 같은 요소가 매우 중요한 지표로 작용한다. On the other hand, in general, the above-described MCTS mainly uses a UCT method using a policy network called PUCT and a value network in order to efficiently search for nodes. That is, the existing MCTS generally uses the UCT method, which is a method that is highly dependent on the odds ratio estimation value (W). However, in games such as Go-Stop, factors such as scores in addition to the win rate act as very important indicators.

그리하여 본 발명의 다른 실시예에 따른 게임 플레이 서버(300)는, 승률추정값(W) 뿐만 아니라 점수추정값(G)까지 고려하여 MCTS를 수행하는 새로운 방식의 UCT를 제공하고자 한다. Thus, the game play server 300 according to another embodiment of the present invention intends to provide a UCT of a new method that performs MCTS by considering not only the odds estimation value (W) but also the score estimation value (G).

자세히, 게임 플레이 서버(300)는 UCT(Upper Confidence Bound 1 applied to Trees) 알고리즘에 기반하여 최적 액션(A)을 출력하는 MCTS를 수행할 수 있다. 여기서, 상기 UCT 알고리즘은 MCTS와 UCB1(Upper Confidence Bound 1)가 결합된 것으로서 공지된 알고리즘일 수 있으며, 하기와 같은 수식을 따른다. In detail, the game play server 300 may perform MCTS outputting an optimal action (A) based on an Upper Confidence Bound 1 applied to Trees (UCT) algorithm. Here, the UCT algorithm may be a known algorithm as a combination of MCTS and UCB1 (Upper Confidence Bound 1), and follows the following formula.

[수식 1][Formula 1]

수식 1에서, w(winrate)는 승률추정값(W)일 수 있고, b는 자식 노드를 나타낼 수 있으며, N은 방문 횟수를 나타낼 수 있고, C는 가중치로서 1로 설정될 수 있다. In Equation 1, w(winrate) may be an odds ratio estimation value (W), b may represent a child node, N may represent the number of visits, and C may be set to 1 as a weight.

이때, 게임 플레이 서버(300)는 위와 같은 기존의 UCT 알고리즘 상에 점수추정값(G) 지표를 더 적용하여 상기 UCT 알고리즘을 변형한 G-UCT 알고리즘을 구현할 수 있다. 본 발명의 다른 실시예에 따른 G-UCT 알고리즘은 하기와 같은 수식을 따른다.In this case, the game play server 300 may implement a G-UCT algorithm modified from the UCT algorithm by further applying a score estimation value (G) index to the above existing UCT algorithm. The G-UCT algorithm according to another embodiment of the present invention follows the following formula.

[수식 2][Formula 2]

구체적으로, 게임 플레이 서버(300)는 기존의 UCT 알고리즘(수식 1)에 본 발명의 다른 실시예에 따른 점수항을 추가할 수 있다. 즉, 상기 점수항은 하기와 같은 수식을 따른다. Specifically, the game play server 300 may add a score term according to another embodiment of the present invention to the existing UCT algorithm (Equation 1). That is, the score term follows the following formula.

[수식 3][Formula 3]

수식 2 및 수식 3에서, 상기 점수항의 score는 점수추정값(G)을 의미할 수 있고, const는 소정의 상수를 의미할 수 있으며, n은 시뮬레이션 횟수를 의미할 수 있다. In Equations 2 and 3, the score of the score term may mean a score estimation value (G), const may mean a predetermined constant, and n may mean the number of simulations.

즉, 게임 플레이 서버(300)는 위와 같은 점수항을 기존의 UCT에 추가한 새로운 형식의 알고리즘을 이용하여 소정의 고스톱판 상태(S)에 대한 시뮬레이션을 수행할 수 있다. That is, the game play server 300 may perform a simulation of a predetermined high-stop board state (S) by using a new type of algorithm in which the above score terms are added to the existing UCT.

이때, 상기 점수항이 포함하는 시그모이드(sigmoid) 함수는, 실시예에 따른 G-UCT의 승률추정값(W)과 점수추정값(G)에 대한 정규화(normalization)를 구현하기 위함일 수 있다. 자세히, 보편적으로 고스톱 게임에서의 점수의 분포는 승률는 달리 그 범위가 보다 광범위할 수 있다. 예를 들어, 승률추정값(W)은 대게 '0~1' 사이의 소정의 값으로 출력되는 반면, 점수추정값(G)은 이론적으로 '-1000~1000' 사이의 소정의 값으로 출력될 수 있다. In this case, the sigmoid function included in the score term may be used to implement normalization of the odds ratio estimation value (W) and the score estimation value (G) of the G-UCT according to the embodiment. In detail, generally, the distribution of scores in a go-stop game may have a wider range than the odds. For example, while the odds estimation value (W) is usually output as a predetermined value between '0 and 1', the score estimate (G) can be theoretically output as a predetermined value between '-1000 and 1000' .

도 13은 본 발명의 실시예에 따른 G-UCT 알고리즘에서 사용하는 시그모이드 함수(Sigmoid function)를 나타내는 도면의 일례이다. 13 is an example of a diagram showing a sigmoid function used in the G-UCT algorithm according to an embodiment of the present invention.

그리하여, 도 13을 참조하면, 본 발명의 다른 실시예에 따른 게임 플레이 서버(300)는 G-UCT 수행 시의 점수추정값(G)과 승률추정값(W) 간의 정규화(normalization)를 수행하기 위하여, 상기 G-UCT에서 점수추정값(G)을 적용하는 점수항에 시그모이드 함수를 포함할 수 있다. 즉, 게임 플레이 서버(300)는 점수항에 시그모이드 함수를 포함하여 점수추정값(G)의 분포를 '0~1' 사이로 조정하는 스케일링(scaling)을 수행하는 정규화를 구현할 수 있다. Thus, referring to FIG. 13, the game play server 300 according to another embodiment of the present invention normalizes between the score estimate value (G) and the odds estimate value (W) when performing G-UCT, A sigmoid function may be included in the score term to which the score estimation value (G) is applied in the G-UCT. That is, the game play server 300 may implement normalization that performs scaling by including a sigmoid function in the score term to adjust the distribution of the score estimation value (G) between '0 and 1'.

또한, 이때 게임 플레이 서버(300)는, 시그모이드 함수를 포함하는 점수항을 적용하여 G-UCT 알고리즘(수식 2)을 구현하는 경우, 소정의 임계치 이상으로 크거나 작은 점수추정값(G)(예컨대, 4보다 크거나 또는 -4보다 작은 점수추정값 등)이 0(또는 0에 가까운 값) 또는 1(또는 1에 가까운 값)로 수렴하는 상황을 방지하기 위하여, 상기 점수추정값(G)의 범위를 소정의 범위 이내로 조정하는 소정의 상수(const)를 상기 점수항에 더 포함할 수 있다. In addition, at this time, when the game play server 300 implements the G-UCT algorithm (Equation 2) by applying a score term including a sigmoid function, a score estimation value (G) that is larger or smaller than a predetermined threshold value ( For example, in order to prevent a situation in which a score estimate value greater than 4 or less than -4 converges to 0 (or a value close to 0) or 1 (or a value close to 1), the range of the score estimate value (G) A predetermined constant (const) that adjusts within a predetermined range may be further included in the score term.

그리하여 게임 플레이 서버(300)는, 위와 같이 구성되는 점수항을 기존의 UCT 알고리즘(수식 1) 상에 추가한 G-UCT 알고리즘(수식 2)을 구현할 수 있다. 또한, 게임 플레이 서버(300)는 구현된 G-UCT 알고리즘(수식 2)에 기반한 딥러닝 수행하여 고스톱 게임에서의 소정의 액션에 따른 승률뿐만 아니라 획득 가능한 점수를 더욱 소상하게 고려하여 소정의 현재 고스톱판 상태(S)에 대한 최적 액션(A)을 결정할 수 있다. Thus, the game play server 300 may implement the G-UCT algorithm (Equation 2) in which the score term configured as above is added to the existing UCT algorithm (Equation 1). In addition, the game play server 300 performs deep learning based on the implemented G-UCT algorithm (Equation 2) to consider in more detail not only the win rate according to the predetermined action in the go-stop game, but also the obtainable score to set the current go-stop The optimal action (A) for the board state (S) can be determined.

따라서, 본 발명의 다른 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 장치는, 고스톱 게임과 같은 불완전 정보 게임에 보다 최적화된 형태로 개선된 UCT 알고리즘(즉, 실시예에 따른 G-UCT 알고리즘)을 이용하여 현재 플레이 상태에 따른 최선의 액션을 결정할 수 있다. Therefore, an apparatus for providing a deep learning-based incomplete information game service according to another embodiment of the present invention is an improved UCT algorithm (ie, the G-UCT algorithm according to the embodiment) in a form more optimized for an incomplete information game such as a go-stop game. can be used to determine the best action according to the current play state.

이상, 본 발명의 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치는, 불완전 정보 게임에서 현재 플레이 상태에 기초한 상태 판단정보(C), 승률추정값(W) 및/또는 점수추정값(G) 중 적어도 하나를 이용하여 최선의 액션(action)을 수행할 수 있다. As described above, the deep learning-based incomplete information game service providing method and apparatus according to an embodiment of the present invention provide state determination information (C) based on the current play state in an incomplete information game, an odds estimate (W), and/or a score estimate ( At least one of G) can be used to perform the best action.

또한, 본 발명의 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치는, 상대 플레이어의 패와 덱(deck)이 포함하는 패 등과 같은 일부 정보를 비공개하는 불완전 정보 게임에서의 상대 플레이어의 액션을 무작위 랜덤 방식이 아닌 소정의 알고리즘에 따른 딥러닝을 기반으로 추정할 수 있다. In addition, a method and apparatus for providing a game service based on deep learning based on incomplete information according to an embodiment of the present invention discloses an opponent player in an incomplete information game in which some information such as an opponent's hand and a hand included in a deck is not disclosed. The action of can be estimated based on deep learning according to a predetermined algorithm rather than a random random method.

또한, 본 발명의 실시예에 따른 딥러닝 기반 불완전 정보 게임 서비스 제공 방법 및 그 장치는, 불완전 정보 게임에 보다 최적화된 형태로 개선된 UCT 알고리즘을 이용하여 현재 플레이 상태에 따른 최선의 액션을 결정할 수 있다. In addition, the deep learning-based incomplete information game service providing method and apparatus according to an embodiment of the present invention can determine the best action according to the current play state using an improved UCT algorithm in a form more optimized for incomplete information games. there is.

또한, 이상에서 설명된 본 발명에 따른 실시예는 다양한 컴퓨터 구성요소를 통하여 실행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위하여 하나 이상의 소프트웨어 모듈로 변경될 수 있으며, 그 역도 마찬가지이다.In addition, the embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer readable recording medium. The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. medium), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes generated by a compiler. A hardware device may be modified with one or more software modules to perform processing according to the present invention and vice versa.

본 발명에서 설명하는 특정 실행들은 일 실시 예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, “필수적인”, “중요하게” 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.Specific implementations described in the present invention are examples and do not limit the scope of the present invention in any way. For brevity of the specification, description of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connection of lines or connecting members between the components shown in the drawings are examples of functional connections and / or physical or circuit connections, which can be replaced in actual devices or additional various functional connections, physical connection, or circuit connections. In addition, if there is no specific reference such as “essential” or “important”, it may not be a component necessarily required for the application of the present invention.

또한 설명한 본 발명의 상세한 설명에서는 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자 또는 해당 기술분야에 통상의 지식을 갖는 자라면 후술할 특허청구범위에 기재된 본 발명의 사상 및 기술 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 명세서의 상세한 설명에 기재된 내용으로 한정되는 것이 아니라 특허청구범위에 의해 정하여져야만 할 것이다.In addition, the detailed description of the present invention described has been described with reference to preferred embodiments of the present invention, but those skilled in the art or those having ordinary knowledge in the art will find the spirit of the present invention described in the claims to be described later. And it will be understood that the present invention can be variously modified and changed without departing from the technical scope. Therefore, the technical scope of the present invention is not limited to the contents described in the detailed description of the specification, but should be defined by the claims.

Claims

A method of providing a deep learning-based incomplete information game service through a user terminal in a game play server,
Obtaining a training data set including a high-stop plate state for each of a plurality of actions; and
Learning a deep learning model of the incomplete information service based on the obtained training data set;
The step of learning the deep learning model,
performing state judgment training based on the training data set;
Performing odds estimation training based on the training data set;
performing score estimation training based on the training data set;
Including at least one step of performing genealogy achievement probability training based on the training data set
Imperfect information How to provide game services.

According to claim 1,
The state judgment training,
Learning the deep learning model to output state determination information, which is information obtained by determining at least one action that can be performed in the first high stop board state as an input and the input first high stop board state
Imperfect information How to provide game services.

According to claim 2,
The state determination information,
Information that determines the activation state, which is a reference element for detecting at least one action that can be performed in the first high-stop board state
Imperfect information How to provide game services.

According to claim 3,
The activation state is
HAND State, SHAKE State, SELECT State, SWITCH State, GOSTOP State, END State, hand holding state (CHANCE_HAND State) and deck flip state (CHANCE_FLIP State), including at least one state
Imperfect information How to provide game services.

According to claim 1,
The odds estimation training,
Based on the roll-out using the training data set, the first high-stop board state is taken as an input, and the win rate estimation value, which is information estimating the win rate by actions possible in the input first high-stop board state, is output Training the deep learning model to
Imperfect information How to provide game services.

According to claim 1,
The score estimation training,
Based on the roll-out using the training data set, the first high-stop board state is taken as an input, and a score estimation value, which is information obtained by estimating scores by possible actions in the input first high-stop board state, is output. Training the deep learning model to
Imperfect information How to provide game services.

According to claim 1,
The genealogy achievement probability training,
Training the deep learning model to output a genealogy achievement probability value, which is information predicting the probability of achieving a predetermined genealogy at the end of roll-out based on the first high-stop board state as an input and the input first high-stop board state
Imperfect information How to provide game services.

According to claim 1,
The high-stop plate state,
Player's hand, player's lost hand, opponent's lost hand, floored hand, number of points a player needs to update to win, number of points opponent needs to update to win, player's choice hand, opponent's choice Hand, bonus hand, hostage type, go execution count, previous player information, ditch trigger information, and at least part of 9 blood usage information
Imperfect information How to provide game services.

a communication unit for receiving a training data set;
memory for storing deep learning models; and
A processor for performing at least one of state judgment training, odds estimation training, score estimation training, and genealogy achievement probability training for the deep learning model based on the received training data set.
An incomplete information game service providing device.

According to claim 9,
the processor,
By performing the training, the deep learning model learns to output at least one of state judgment information, odds estimate, score estimate, and genealogy achievement probability value for the first high-stop board state
An incomplete information game service providing device.