KR20220141979A

KR20220141979A - Method and device for deep-learning based simulation count management of go game service

Info

Publication number: KR20220141979A
Application number: KR1020210048151A
Authority: KR
Inventors: 이창율; 이상현
Original assignee: 엔에이치엔클라우드 주식회사
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2022-10-21
Also published as: KR102572342B1

Abstract

According to an embodiment of the present invention, a device for managing the number of go game service simulation based on deep learning comprises: a communication unit receiving at least one among a go board state, a value, the number of visits, and the number of default simulation; a memory storing a prediction unit for the number of simulation and a control model for the number of simulation; and a processor determining the optimized number of simulation with respect to the go board state by using at least one among the go board state, the value, the number of visits, and the number of default simulation by reading the prediction unit for the number of simulation, allowing the control model for the number of simulation to learn based on the determined optimized number of simulation, and determining the number of simulation according to the current go board state by reading the control model for the number of simulation.

Description

Method and device for managing the number of simulations of Go game service based on deep learning

본 발명은 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치에 관한 것이다. 보다 상세하게는, 바둑 게임 서비스에서 수행되는 대국 상황에 따라서 딥러닝 시뮬레이션 횟수를 관리하는 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치에 관한 것이다. The present invention relates to a method and apparatus for managing the number of simulations of a Go game service based on deep learning. More specifically, it relates to a deep learning-based Go game service simulation number management method and apparatus for managing the number of deep learning simulations according to the game situation performed in the Go game service.

스마트폰, 태블릿 PC, PDA(Personal Digital Assistant), 노트북 등과 같은 사용자 단말의 이용이 대중화되고 정보 처리 기술이 발달함에 따라 사용자 단말을 이용하여 보드 게임의 일종인 바둑을 할 수 있게 되었고 나아가 사람이 아닌 프로그램된 인공지능 컴퓨터와 바둑 대국을 할 수 있게 되었다. As the use of user terminals such as smartphones, tablet PCs, personal digital assistants (PDAs), and notebook computers has become popular and information processing technology has developed, it has become possible to play Go, a type of board game, using user terminals. It is now possible to play Go games with a programmed artificial intelligence computer.

바둑은 다른 보드게임인 체스나 장기와 비교하였을 때 경우의 수가 많아서 인공지능 컴퓨터가 사람 수준으로 대국을 하는데 한계가 있었고 인공지능 컴퓨터의 기력을 높이기 위한 연구가 활발하게 진행되고 있는 추세이다. Compared to other board games such as chess and chess, there are many cases of Go, so there is a limit to how artificial intelligence computers can play matches at human level, and research to increase the energy of artificial intelligence computers is being actively conducted.

최근 개발자들은 인공지능 컴퓨터에 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 알고리즘과 딥러닝 기술을 적용하여 인공지능 컴퓨터의 기력을 프로기사들의 수준 이상으로 올렸다.Recently, developers have applied the Monte Carlo Tree Search (MCTS) algorithm and deep learning technology to artificial intelligence computers, raising the energy of artificial intelligence computers to the level of professional engineers.

또한, 바둑은 시간이 제한된 보드게임이다. 바둑 대회마다 시간이 다른데 보통 선수에게 각자 1시간에서 5시간의 다양한 시간이 주어질 수 있고, 주어진 시간이 초과되면 초읽기 규칙이 적용되어 초읽기 횟수를 넘기면 패배하는 규칙이 있다. 따라서, 남은 바둑 시간을 파악하고 한 수에 얼마나 많은 시간을 사용하는지 결정하는 것은 게임 승리에 중요한 요소이다. Also, Go is a time-limited board game. Each Go tournament has a different time, and players can be given various times ranging from 1 hour to 5 hours each. Therefore, knowing the remaining Go time and determining how much time is spent on a single move is an important factor in winning the game.

그러나, 인공지능 컴퓨터는 한 수를 두기 위해 소비하는 시간이 항상 일정하여 중요한 국면에서 좋지 못한 수를 착수하는 문제점이 있다. However, the artificial intelligence computer has a problem in that the time consumed to place a single move is always constant, so that it starts a bad move in an important phase.

또한, 일반적으로 일반이나 아마추어 또는 인공지능 컴퓨터는 남은 경기 길이를 예측할 수 없어서 시간 전략을 세울 수 없는 문제점이 있다.In addition, in general, general or amateur or artificial intelligence computers cannot predict the remaining game length, so there is a problem in that they cannot establish a time strategy.

더하여, 위와 같은 인공지능 컴퓨터에 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 알고리즘과 딥러닝 기술을 적용하여 트레이닝을 수행할 시 일반적인 대부분의 트레이닝에서는 고정된 시뮬레이션 횟수를 사용하고 있다. In addition, when training is performed by applying the Monte Carlo Tree Search (MCTS) algorithm and deep learning technology to the artificial intelligence computer as above, in most general training, a fixed number of simulations is used.

그러나 트레이닝에서 고정된 시뮬레이션 횟수를 사용하는 것은 인공지능 컴퓨터의 성능에 제약을 강제하는 문제가 있다. However, using a fixed number of simulations in training has the problem of forcing constraints on the performance of artificial intelligence computers.

또한, 현재 진행되고 있는 대국의 바둑판 상태에서 다음 착수를 선택하기 위한 MCTS를 수행할 때에도 고정된 시뮬레이션 횟수에 의존하게 되면, 해당하는 바둑판 상태와 관계되는 정황들(예컨대, 승률 및/또는 가용 시간 등)은 고려하지 못한 채 정해진 시뮬레이션 횟수와 시간만을 사용하여 다음 착수를 결정해야 한다는 문제가 있다. In addition, if you rely on a fixed number of simulations even when performing MCTS for selecting the next start in the current Go board status of the current game, the circumstances related to the corresponding Go board status (e.g., win rate and / or available time, etc.) ), there is a problem that the next start should be decided using only the set number of simulations and time.

JP 4392621 B2JP 4392621 B2

본 발명은, 바둑 게임 서비스에서 수행되는 대국 상황에 따라서 딥러닝 시뮬레이션 횟수를 관리하는 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치를 제공하는데 그 목적이 있다. An object of the present invention is to provide a deep learning-based Go game service simulation number management method and apparatus for managing the number of deep learning simulations according to the game situation performed in the Go game service.

자세히, 본 발명은, 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 알고리즘에 기반한 딥러닝을 수행할 시의 시뮬레이션 횟수를 바둑판 상태에 따라서 동적으로 조정하는 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치를 제공함을 목적으로 한다. In detail, the present invention is a deep learning-based Go game service simulation number management method that dynamically adjusts the number of simulations when performing deep learning based on the Monte Carlo Tree Search (MCTS) algorithm according to the state of the checkerboard And for the purpose of providing the device.

또한, 본 발명은, 중요한 국면에서 착수 준비 시간을 변경하는 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치를 제공함을 목적으로 한다. In addition, an object of the present invention is to provide a deep learning-based Go game service simulation number management method and apparatus for changing the start preparation time in an important phase.

또한, 본 발명은, 예측된 남은 경기 길이를 이용하여 착수 준비 시간을 효과적으로 나눌 수 있는 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치를 제공함을 목적으로 한다. In addition, an object of the present invention is to provide a method and apparatus for managing the number of simulations of a Go game service based on deep learning that can effectively divide the start preparation time using the predicted remaining game length.

다만, 본 발명 및 본 발명의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present invention and embodiments of the present invention are not limited to the technical problems as described above, and other technical problems may exist.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 장치는, 바둑판 상태, 가치값, 방문 횟수 및 디폴트 시뮬레이션 횟수 중 적어도 하나를 수신하는 통신부; 시뮬레이션 횟수 예측부와 시뮬레이션 횟수 조정 모델을 저장하는 메모리; 및 상기 횟수 예측부를 독출하여 상기 바둑판 상태, 가치값, 방문 횟수 및 디폴트 시뮬레이션 횟수 중 적어도 하나를 이용하여 상기 바둑판 상태에 대한 최적 시뮬레이션 횟수를 결정하고, 상기 결정된 최적 시뮬레이션 횟수를 기초로 상기 횟수 조정 모델을 학습시키고, 상기 횟수 조정 모델을 독출하여 현재 바둑판 상태에 따른 시뮬레이션 횟수를 결정하는 프로세서;를 포함하는 것을 특징으로 한다. A deep learning-based Go game service simulation number management apparatus according to an embodiment of the present invention includes: a communication unit for receiving at least one of a Go board state, a value value, the number of visits, and the default number of simulations; a memory for storing the simulation number prediction unit and the simulation number adjustment model; and reading the number prediction unit to determine the optimal number of simulations for the checkerboard state using at least one of the checkerboard state, the value value, the number of visits, and the default number of simulations, and the number adjustment model based on the determined optimal number of simulations and a processor for learning the number of times and reading the number adjustment model to determine the number of simulations according to the current state of the checkerboard.

이때, 상기 최적 시뮬레이션 횟수는, 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 시뮬레이션 시 선택되는 다음 착수점이 변경되지 않는 임계점 이내에서 최소한의 시간을 소요하는 시뮬레이션 횟수를 포함한다. In this case, the optimal number of simulations includes the number of simulations that take the minimum time within a critical point in which the next starting point selected during Monte Carlo Tree Search (MCTS) simulation is not changed.

또한, 상기 횟수 예측부는, 상기 결정된 최적 시뮬레이션 횟수와 상기 바둑판 상태를 매칭하여 바둑판 상태-최적 시뮬레이션 횟수 데이터를 생성하고, 상기 프로세서는, 상기 생성된 바둑판 상태-최적 시뮬레이션 횟수 데이터를 트레이닝 데이터 셋(Training data set)으로 하여 상기 횟수 조정 모델을 학습시킨다. In addition, the number prediction unit, by matching the determined optimal number of simulations and the checkerboard state to generate a checkerboard state-optimal number of simulations data, the processor, the generated checkerboard state-optimal number of simulations data, a training data set (Training data set) to train the frequency adjustment model.

또한, 상기 횟수 예측부는, 상기 디폴트 시뮬레이션 횟수를 기초로 상기 바둑판 상태에 기반하여 수행되는 제1 MCTS 시뮬레이션을 기반으로 복수의 착수 후보수 별 제1 방문 횟수 및 제1 가치값 중 적어도 하나를 획득하고, 상기 디폴트 시뮬레이션 횟수에 소정의 횟수를 추가한 추가 단위횟수를 기초로 상기 바둑판 상태에 기반하여 수행되는 제2 MCTS 시뮬레이션을 기반으로 복수의 착수 후보수 별 제2 방문 횟수 및 제2 가치값 중 적어도 하나를 획득한다. In addition, the number prediction unit, based on the number of default simulations, based on the first MCTS simulation performed based on the checkerboard state, obtains at least one of a first number of visits and a first value for each of a plurality of start candidates, and , based on the second MCTS simulation performed based on the checkerboard state based on the additional unit number of adding a predetermined number of times to the default simulation number, at least of a second number of visits and a second value value for each number of starting candidates get one

또한, 상기 횟수 예측부는, 상기 제1 방문 횟수와 상기 제2 방문 횟수를 기초로 상기 시뮬레이션 횟수 증가 시의 상기 다음 착수점이 변경되는지 여부를 판단하고, 상기 다음 착수점이 변경되지 않는 임계점 이내에서 최소한의 시간을 소요하는 시뮬레이션 횟수를 상기 최적 시뮬레이션 횟수로 결정한다. In addition, the number prediction unit, based on the first number of visits and the second number of visits, determines whether the next starting point is changed when the number of simulations increases, The number of simulations that take time is determined as the optimal number of simulations.

또한, 상기 횟수 예측부는, 상기 제1 방문 횟수와 상기 제2 방문 횟수에 기반한 변화율을 기초로 상기 다음 착수점의 변경 여부를 판단하고, 상기 변화율은, 상기 디폴트 시뮬레이션 횟수와 상기 추가 단위횟수 간 차이에 기초한 시뮬레이션 횟수 증가량 대비 상기 제1 방문 횟수와 상기 제2 방문 횟수 간 차이에 기초한 방문 횟수 증가량을 기초로 산출된다. In addition, the number prediction unit determines whether to change the next starting point based on the rate of change based on the first number of visits and the second number of visits, and the rate of change is the difference between the default number of simulations and the number of additional units It is calculated based on an increase in the number of visits based on a difference between the number of first visits and the number of visits to the second compared to an increase in the number of simulations based on .

또한, 상기 횟수 예측부는, 상기 복수의 착수 후보수 별 제1 방문 횟수 간 비율과, 상기 복수의 착수 후보수 별 제2 방문 횟수 간 비율에 기초하여 상기 다음 착수점의 변경 여부를 판단한다. In addition, the number prediction unit, based on the ratio between the first number of visits for each of the plurality of starting candidates and the ratio between the second number of visits for each of the plurality of starting candidates determines whether to change the next starting point.

또한, 상기 횟수 예측부는, 상기 제1 가치값과 상기 제2 가치값에 기반한 증감율을 기초로 상기 시뮬레이션 횟수 증가 시의 상기 다음 착수점이 변경되는지 여부를 판단하고, 상기 증감율은, 상기 디폴트 시뮬레이션 횟수와 상기 추가 단위횟수 간 차이에 기초한 시뮬레이션 횟수 증가량 대비 상기 제1 가치값과 상기 제2 가치값 간 차이에 기초한 가치값 증가량을 기초로 산출된다. In addition, the number prediction unit determines whether the next starting point is changed when the number of simulations increases based on the increase/decrease rate based on the first value and the second value, and the increase/decrease rate is the default simulation number and It is calculated based on the increase in the value of the value based on the difference between the first value and the second value compared to the increase in the number of simulations based on the difference between the number of additional units.

또한, 상기 횟수 예측부는, 상기 증감율이 소정의 기준을 충족하면 상기 증감율 산출 시의 최소 시뮬레이션 횟수를 상기 최적 시뮬레이션 횟수로 결정한다. In addition, when the increase/decrease rate satisfies a predetermined criterion, the number prediction unit determines the minimum number of simulations when calculating the increase/decrease rate as the optimal number of simulations.

또한, 상기 횟수 조정 모델은, 상기 현재 바둑판 상태에 따른 시뮬레이션 횟수에 대한 조정 프로세스를 수행하고, 상기 조정 프로세스는, 상기 시뮬레이션 횟수의 상한 및 하한 중 적어도 하나를 설정하는 프로세스와, 소정의 확률로 상기 시뮬레이션 횟수를 증가시시키는 프로세스와, 상기 현재 바둑판 상태에 따른 승률 및 승률변화 중 적어도 하나를 기초로 상기 시뮬레이션 횟수를 조정하는 프로세스와, 상기 시뮬레이션 횟수에 대한 소정의 통계치를 기초로 상기 시뮬레이션 횟수를 조정하는 프로세스 중 적어도 하나의 프로세스를 포함한다. In addition, the number adjustment model performs an adjustment process for the number of simulations according to the current checkerboard state, and the adjustment process includes a process of setting at least one of an upper limit and a lower limit of the number of simulations, and the predetermined probability. A process of increasing the number of simulations, a process of adjusting the number of simulations based on at least one of a win rate and a change in win rate according to the current checkerboard state, and adjusting the number of simulations based on a predetermined statistic on the number of simulations and at least one of the processes.

또한, 상기 프로세서는, 상기 현재 바둑판 상태에 따른 시뮬레이션 횟수를 상기 MCTS 시뮬레이션을 수행하는 착수 모델로 제공하여 상기 착수 모델이 상기 시뮬레이션 횟수를 기초로 상기 MCTS 시뮬레이션을 수행하게 한다. In addition, the processor provides the number of simulations according to the current checkerboard state as a set-off model for performing the MCTS simulation so that the set-off model performs the MCTS simulation based on the number of simulations.

한편, 본 발명의 실시예예 따른 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법은, 시간 관리 모델 서버에서 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수를 관리하는 방법으로서, 소정의 바둑판 상태를 수신하는 단계; 상기 수신된 바둑판 상태를 기초로 제1 MCTS를 수행하는 단계; 상기 수행된 제1 MCTS에 기초한 제1 방문 횟수 및 제1 가치값 중 적어도 하나를 획득하는 단계; 상기 수신된 바둑판 상태를 기초로 제2 MCTS를 수행하는 단계; 상기 수행된 제2 MCTS에 기초한 제2 방문 횟수 및 제2 가치값 중 적어도 하나를 획득하는 단계; 상기 제1 방문 횟수와 상기 제2 방문 횟수을 기초로 상기 바둑판 상태에 대한 최적 시뮬레이션 횟수를 결정하거나, 상기 제1 가치값과 상기 제2 가치값을 기초로 상기 바둑판 상태에 대한 최적 시뮬레이션 횟수를 결정하는 단계; 상기 결정된 최적 시뮬레이션 횟수와 상기 바둑판 상태를 매칭하여 바둑판 상태-최적 시뮬레이션 횟수 데이터를 생성하는 단계; 상기 생성된 바둑판 상태-최적 시뮬레이션 횟수 데이터를 기초로 딥러닝 모델을 학습시키는 단계; 상기 학습된 딥러닝 모델을 기초로 현재 바둑판 상태에 따른 시뮬레이션 횟수를 결정하는 단계; 및 상기 결정된 시뮬레이션 횟수를 착수 모델로 제공하는 단계를 포함한다. On the other hand, the deep learning-based Go game service simulation number management method according to an embodiment of the present invention is a method of managing the number of deep learning-based Go game service simulation times in a time management model server, comprising the steps of: receiving a predetermined Go board state; performing a first MCTS based on the received checkerboard state; obtaining at least one of a first number of visits and a first value based on the performed first MCTS; performing a second MCTS based on the received checkerboard state; obtaining at least one of a second number of visits and a second value based on the performed second MCTS; Determining the optimal number of simulations for the checkerboard state based on the first number of visits and the second number of visits, or determining the optimal number of simulations for the checkerboard state based on the first value and the second value step; generating checkerboard state-optimal simulation number data by matching the determined optimal number of simulations with the checkerboard state; learning a deep learning model based on the generated checkerboard state-optimal number of simulations data; determining the number of simulations according to the current checkerboard state based on the learned deep learning model; and providing the determined number of simulations as a starting model.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치는, 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 알고리즘에 기반한 딥러닝을 수행할 시의 시뮬레이션 횟수를 학습할 수 있다. Deep learning-based Go game service simulation number management method and apparatus according to an embodiment of the present invention, a Monte Carlo tree search (Monte Carlo Tree Search; MCTS) algorithm to learn the number of simulations when performing deep learning based on the can

또한, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치는, 상기 학습을 통하여 바둑판 상태에 따른 최적의 시뮬레이션 횟수를 동적으로 조정할 수 있다. In addition, the deep learning-based Go game service simulation number management method and apparatus according to an embodiment of the present invention can dynamically adjust the optimal number of simulations according to the state of the Go board through the learning.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치는, 중요한 국면에서 시뮬레이션 횟수를 변경할 수 있다. In addition, the deep learning-based Go game service simulation number management method and apparatus according to an embodiment of the present invention can change the number of simulations in an important phase.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치는, 상기 최적의 시뮬레이션 횟수를 기초로 MCTS를 수행하여 다음 착수를 결정할 수 있다. In addition, the deep learning-based Go game service simulation number management method and apparatus according to an embodiment of the present invention may determine the next start by performing MCTS based on the optimal number of simulations.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치는, 중요한 국면에서 착수 준비 시간을 변경할 수 있다.In addition, the deep learning-based Go game service simulation number management method and apparatus according to an embodiment of the present invention can change the start preparation time in an important phase.

또한, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시뮬레이션 횟수 관리 방법 및 그 장치는, 예측된 남은 대국 시간을 이용하여 착수 준비 시간을 효과적으로 분배할 수 있다. In addition, the deep learning-based Go game service simulation number management method and apparatus according to an embodiment of the present invention can effectively distribute the start preparation time using the predicted remaining playing time.

다만, 본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 명확하게 이해될 수 있다. However, the effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood from the description below.

도 1은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템에 대한 예시도이다.
도 2는 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스에서 인공지능 컴퓨터의 착수를 위한 착수 모델 서버의 착수 모델 구조를 설명하기 위한 도면이다.
도 3은 착수 모델의 정책에 따른 착수점에 대한 이동 확률 분포를 설명하기 위한 도면이다.
도 4는 착수 모델의 착수점에 대한 가치값과 방문 횟수를 설명하기 위한 도면이다.
도 5는 착수 모델이 탐색부의 파이프 라인에 따라 착수하는 과정을 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스의 형세 판단 기능을 제공하는 화면을 보여주는 예시도이다.
도 7은 본 발명의 형세 판단 모델 서버의 형세 판단 모델 구조를 설명하기 위한 도면이다.
도 8은 본 발명의 형세 판단 모델의 복수의 블록으로 이루어진 신경망 구조 중 하나의 블록을 설명하기 위한 도면이다.
도 9는 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제1 및 제2 전처리 단계를 설명하기 위한 도면이다.
도 10은 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제1 및 제2 전처리 단계를 설명하기 위한 도면이다.
도 11은 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제3 전처리 단계를 설명하기 위한 도면이다.
도 12는 본 발명의 형세 판단 모델의 형세 판단 결과를 설명하기 위한 도면이다.
도 13은 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이다.
도 14는 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이다.
도 15는 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이다.
도 16은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템에 신호 흐름에 대한 예시도이다.
도 17은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 형세 판단 방법이다.
도 18은 도 17의 형세 판단 방법 중 정답 레이블을 생성하기 위한 트레이닝 데이터의 전처리 방법이다.
도 19는 본 발명의 일 실시예에 따른 시간 관리 모델 서버의 시간 관리부를 설명하기 위한 도면이다.
도 20a 및 도 20b는 본 발명의 일 실시예에 따른 시간 관리부의 분산 산출을 설명하기 위한 도면이다.
도 21은 본 발명의 일 실시예에 따른 시간 관리 모델 서버의 바둑 게임 서비스 시스템에서의 신호 흐름에 대한 예시도이다.
도 22는 본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 착수 준비 시간 결정 방법이다.
도 23은 본 발명의 다른 실시예에 따른 시간 관리 모델 서버의 시뮬레이션 횟수 예측부와 시뮬레이션 횟수 조정 모델을 설명하기 위한 도면이다.
도 24는 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 MCTS 시뮬레이션 횟수 결정 방법이다.
도 25는 본 발명의 다른 실시예에 따른 바둑판 상태에 대한 제1 MCTS와 제2 MCTS를 수행하여 각각에서의 가치값과 방문 횟수를 도출한 모습의 일례이다.
도 26은 본 발명의 다른 실시예에 따른 방문 횟수에 기초하여 최적 시뮬레이션 횟수를 결정하는 방법을 설명하기 위한 도면의 일례이다.
도 27은 본 발명의 다른 실시예에 따른 방문 횟수 비율에 기초하여 최적 시뮬레이션 횟수를 결정하는 방법을 설명하기 위한 도면의 일례이다.
도 28은 본 발명의 다른 실시예에 따른 가치값을 기초로 최적 시뮬레이션 횟수를 결정하는 방법을 설명하기 위한 도면의 일례이다.
도 29는 본 발명의 다른 실시예에 따른 승률변화에 기반하여 시뮬레이션 횟수를 조정하는 방법을 설명하기 위한 도면의 일례이다.
도 30은 본 발명의 다른 실시예에 따른 시뮬레이션 횟수를 기초로 MCTS를 수행하는 착수 모델을 설명하기 위한 개념도이다.
도 31은 본 발명의 또 다른 실시예에 따른 시간 관리 모델 서버의 시간 관리 모델을 설명하기 위한 도면이다.
도 32a 및 도 32b는 본 발명의 또 다른 실시예에 따른 게임 시간 정보를 생성하기 위해 사용되는 집수 변화량을 설명하기 위한 도면이다.
도 33a 및 도 33b는 본 발명의 또 다른 실시예에 따른 게임 시간 정보를 생성하기 위해 사용되는 집수 변화량을 설명하기 위한 도면이다.
도 34a 및 도 34b는 본 발명의 또 다른 실시예에 따른 게임 시간 정보를 생성하기 위해 사용되는 공배수를 설명하기 위한 도면이다.
도 35은 본 발명의 또 다른 실시예에 따른 시간 관리 모델 서버의 바둑 게임 서비스 시스템에서의 신호 흐름에 대한 예시도이다.
도 36는 본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 게임 시간 정보 생성 방법이다.1 is an exemplary diagram of a deep learning-based Go game service system according to an embodiment of the present invention.
Figure 2 is a view for explaining the start model structure of the start model server for the start of the artificial intelligence computer in the deep learning-based Go game service according to an embodiment of the present invention.
3 is a view for explaining the movement probability distribution for the starting point according to the policy of the starting model.
4 is a view for explaining the value value and the number of visits to the starting point of the starting model.
5 is a diagram for explaining a process in which an initiation model launches according to a pipeline of a search unit.
6 is an exemplary diagram showing a screen providing a situation determination function of a deep learning-based Go game service according to an embodiment of the present invention.
7 is a diagram for explaining the structure of a situation determination model of the situation determination model server of the present invention.
8 is a diagram for explaining one block of the neural network structure composed of a plurality of blocks of the situation determination model of the present invention.
9 is a view for explaining the first and second pre-processing steps for generating a correct answer label used for learning the situation judgment model of the present invention.
10 is a view for explaining the first and second pre-processing steps for generating a correct answer label used to learn the situation judgment model of the present invention.
11 is a view for explaining a third pre-processing step for generating a correct answer label used to learn the situation judgment model of the present invention.
12 is a view for explaining a situation determination result of the situation determination model of the present invention.
13 is a view comparing the situation determination result of the situation determination model of the present invention with the situation determination result by the deep learning model according to the prior art.
14 is a view comparing the situation determination result of the situation determination model of the present invention with the situation determination result by the deep learning model according to the prior art.
15 is a view comparing the situation determination result of the situation determination model of the present invention with the situation determination result by the deep learning model according to the prior art.
16 is an exemplary diagram of a signal flow in a deep learning-based Go game service system according to an embodiment of the present invention.
17 is a method for determining a situation in a deep learning-based Go game service method according to an embodiment of the present invention.
18 is a pre-processing method of training data for generating a correct answer label in the method of determining the situation of FIG. 17 .
19 is a diagram for explaining a time management unit of a time management model server according to an embodiment of the present invention.
20A and 20B are diagrams for explaining distributed calculation of a time management unit according to an embodiment of the present invention.
21 is an exemplary diagram of a signal flow in the Go game service system of the time management model server according to an embodiment of the present invention.
22 is a method for determining a start preparation time among the deep learning-based Go game service methods according to an embodiment of the present invention.
23 is a view for explaining a simulation number prediction unit and a simulation number adjustment model of the time management model server according to another embodiment of the present invention.
24 is a method for determining the number of MCTS simulations among the deep learning-based Go game service methods according to another embodiment of the present invention.
25 is an example of a state in which the first MCTS and the second MCTS are performed for the checkerboard state according to another embodiment of the present invention, and the value value and the number of visits in each are derived.
26 is an example of a diagram for explaining a method of determining the optimal number of simulations based on the number of visits according to another embodiment of the present invention.
27 is an example of a diagram for explaining a method of determining the optimal number of simulations based on a ratio of the number of visits according to another embodiment of the present invention.
28 is an example of a diagram for explaining a method of determining the optimal number of simulations based on a value according to another embodiment of the present invention.
29 is an example of a diagram for explaining a method of adjusting the number of simulations based on a change in win rate according to another embodiment of the present invention.
30 is a conceptual diagram for explaining a set-up model for performing MCTS based on the number of simulations according to another embodiment of the present invention.
31 is a diagram for explaining a time management model of a time management model server according to another embodiment of the present invention.
32A and 32B are diagrams for explaining the amount of change in water collection used to generate game time information according to another embodiment of the present invention.
33A and 33B are diagrams for explaining the amount of change in water collection used to generate game time information according to another embodiment of the present invention.
34A and 34B are diagrams for explaining a common multiple used to generate game time information according to another embodiment of the present invention.
35 is an exemplary diagram of a signal flow in the Go game service system of the time management model server according to another embodiment of the present invention.
36 is a method for generating game time information among the deep learning-based Go game service methods according to another embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. 이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 또한, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 또한, 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.Since the present invention can apply various transformations and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. Effects and features of the present invention, and a method for achieving them, will become apparent with reference to the embodiments described below in detail in conjunction with the drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various forms. In the following embodiments, terms such as first, second, etc. are used for the purpose of distinguishing one component from another, not in a limiting sense. Also, the singular expression includes the plural expression unless the context clearly dictates otherwise. In addition, terms such as include or have means that the features or components described in the specification are present, and do not preclude the possibility that one or more other features or components will be added. In addition, in the drawings, the size of the components may be exaggerated or reduced for convenience of description. For example, since the size and thickness of each component shown in the drawings are arbitrarily indicated for convenience of description, the present invention is not necessarily limited to the illustrated bar.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and when described with reference to the drawings, the same or corresponding components are given the same reference numerals, and the overlapping description thereof will be omitted. .

----------------------------------------------------------

도 1은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템에 대한 예시도이다.1 is an exemplary diagram of a deep learning-based Go game service system according to an embodiment of the present invention.

도 1을 참조하면, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템은, 단말기(100), 바둑서버(200), 착수 모델 서버(300), 형세 판단 모델 서버(400) 및 네트워크(500)를 포함할 수 있다.1, the deep learning-based Go game service system according to the embodiment is a terminal 100, a Go server 200, a start model server 300, a situation determination model server 400, and a network 500) may include.

도 1의 각 구성요소는, 네트워크(500)를 통해 연결될 수 있다. 단말기(100), 바둑서버(200), 착수 모델 서버(300), 형세 판단 모델 서버(400) 및 시간 관리 모델 서버(500) 등과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다. Each component of FIG. 1 may be connected through a network 500 . The terminal 100, the Go server 200, the start model server 300, the situation judgment model server 400, and the time management model server 500, such as to mean a connection structure in which information can be exchanged between each node. , Examples of such networks include a 3rd Generation Partnership Project (3GPP) network, a Long Term Evolution (LTE) network, a World Interoperability for Microwave Access (WIMAX) network, the Internet, a Local Area Network (LAN), and a Wireless LAN (Wireless LAN) network. Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), Bluetooth (Bluetooth) network, satellite broadcasting network, analog broadcasting network, DMB (Digital Multimedia Broadcasting) network, and the like are included, but are not limited thereto.

<단말기(100)><Terminal (100)>

먼저, 단말기(100)는, 바둑 게임 서비스를 제공받고자 하는 유저의 단말기이다. 또한, 단말기(100)는 다양한 작업을 수행하는 애플리케이션들을 실행하기 위한 유저가 사용하는 하나 이상의 컴퓨터 또는 다른 전자 장치이다. 예컨대, 컴퓨터, 랩탑 컴퓨터, 스마트 폰, 모바일 전화기, PDA, 태블릿 PC, 혹은 바둑서버(200)와 통신하도록 동작 가능한 임의의 다른 디바이스를 포함한다. 다만 이에 한정되는 것은 아니고 단말기(100)는 다양한 머신들 상에서 실행되고, 다수의 메모리 내에 저장된 명령어들을 해석하여 실행하는 프로세싱로직을 포함하고, 외부 입력/출력 디바이스상에 그래픽 사용자 인터페이스(GUI)를 위한 그래픽 정보를 디스플레이하는 프로세스들과 같이 다양한 기타 요소들을 포함할 수 있다. 아울러 단말기(100)는 입력 장치(예를 들면 마우스, 키보드, 터치 감지 표면 등) 및 출력 장치(예를 들면 디스플레이장치, 모니터, 스크린 등)에 접속될 수 있다. 단말기(100)에 의해 실행되는 애플리케이션들은 게임 어플리케이션, 웹 브라우저, 웹 브라우저에서 동작하는 웹 애플리케이션, 워드 프로세서들, 미디어 플레이어들, 스프레드시트들, 이미지 프로세서들, 보안 소프트웨어 또는 그 밖의 것을 포함할 수 있다.First, the terminal 100 is a terminal of a user who wants to be provided with a Go game service. Also, the terminal 100 is one or more computers or other electronic devices used by a user to execute applications that perform various tasks. For example, it includes a computer, a laptop computer, a smart phone, a mobile phone, a PDA, a tablet PC, or any other device operable to communicate with the Go server 200 . However, not limited thereto, the terminal 100 is executed on various machines, and includes processing logic for interpreting and executing commands stored in a plurality of memories, and for a graphical user interface (GUI) on an external input/output device. It may include various other elements, such as processes for displaying graphical information. In addition, the terminal 100 may be connected to an input device (eg, a mouse, a keyboard, a touch-sensitive surface, etc.) and an output device (eg, a display device, a monitor, a screen, etc.). Applications executed by the terminal 100 may include a game application, a web browser, a web application running on a web browser, word processors, media players, spreadsheets, image processors, security software or the like. .

또한, 단말기(100)는 명령들을 저장하는 적어도 하나의 메모리(101), 적어도 하나의 프로세서(102) 및 통신부(103)를 포함할 수 있다. In addition, the terminal 100 may include at least one memory 101 for storing instructions, at least one processor 102 , and a communication unit 103 .

단말기(100)의 메모리(101)는 단말기(100)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 단말기(100)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(102)로 하여금 동작들을 수행하게 하기 위해 프로세서(102)에 의해 실행 가능하고, 동작들은 바둑 게임 실행 요청 신호를 전송, 게임 데이터 송수신, 착수 정보 송수신, 형세 판단 요청 신호를 전송, 형세 판단 결과 수신, 게임 시간 정보 요청, 게임 시간 정보 수신 및 각종 정보 수신하는 동작들을 포함할 수 있다. 또한, 메모리(101)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(130)는 인터넷(internet)상에서 상기 메모리(101)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다. The memory 101 of the terminal 100 may store a plurality of application programs or applications driven in the terminal 100 , data for operation of the terminal 100 , and commands. The instructions are executable by the processor 102 to cause the processor 102 to perform the operations, and the operations are to send a Go game execution request signal, send and receive game data, send and receive start information, send a situation determination request signal, the situation It may include operations of receiving a determination result, requesting game time information, receiving game time information, and receiving various types of information. In addition, in terms of hardware, the memory 101 may be various storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc., and the memory 130 performs the storage function of the memory 101 on the Internet. It may be a web storage that performs.

단말기(100)의 프로세서(102)는 전반적인 동작을 제어하여 바둑 게임 서비스를 제공받기 위한 데이터 처리를 수행할 수 있다. 단말기(100)에서 바둑 게임 어플리케이션이 실행되면, 단말기(100)에서 바둑 게임 환경이 구성된다. 그리고 바둑 게임 어플리케이션은 네트워크(500)를 통해 바둑 서버(200)와 바둑 게임 데이터를 교환하여 단말기(100) 상에서 바둑 게임 서비스가 실행되도록 한다. 이러한 프로세서(102)는 ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 임의의 형태의 프로세서일 수 있다.The processor 102 of the terminal 100 may perform data processing for receiving a Go game service by controlling the overall operation. When the Go game application is executed in the terminal 100 , the Go game environment is configured in the terminal 100 . And the Go game application exchanges Go game data with the Go server 200 through the network 500 so that the Go game service is executed on the terminal 100 . These processors 102 are ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays), controllers (controllers), micro It may be a controller (micro-controllers), microprocessors (microprocessors), any type of processor for performing other functions.

단말기(100)의 통신부(103)는, 하기 통신방식(예를 들어, GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced) 등), WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access)에 따라 구축된 네트워크망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신할 수 있다. The communication unit 103 of the terminal 100 includes the following communication methods (eg, Global System for Mobile communication (GSM), Code Division Multi Access (CDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink (HSUPA)) Packet Access), LTE (Long Term Evolution), LTE-A (Long Term Evolution-Advanced), etc.), WLAN (Wireless LAN), Wi-Fi (Wireless-Fidelity), Wi-Fi (Wireless Fidelity) Direct, DLNA ( Digital Living Network Alliance), WiBro (Wireless Broadband), and WiMAX (World Interoperability for Microwave Access) can transmit and receive a wireless signal to and from at least one of a base station, an external terminal, and a server on a network built in accordance with.

이러한 단말기(100)는, 후술되는 바둑서버(200), 착수 모델 서버(300), 형세 판단 모델 서버(400) 및 시간 관리 모델 서버(500) 중 적어도 하나에서 수행되는 기능 동작의 적어도 일부를 수행할 수도 있다. The terminal 100 performs at least a part of the functional operation performed in at least one of the Go server 200, the start model server 300, the situation determination model server 400, and the time management model server 500 to be described later. You may.

<바둑서버(200)><Go server (200)>

바둑서버(200)가 제공하는 바둑 게임 서비스는 바둑서버(200)가 제공하는 가상의 컴퓨터 유저와 실제 유저가 함께 게임에 참여하는 형태로 구성될 수 있다. 이는 유저측 단말기(100) 상에서 구현되는 바둑 게임 환경에서 하나의 실제 유저와 하나의 컴퓨터 유저가 함께 게임을 플레이 한다. 다른 측면에서, 바둑서버(200)가 제공하는 바둑 게임 서비스는 복수의 유저측 디바이스가 참여하여 바둑 게임이 플레이되는 형태로 구성될 수도 있다.The Go game service provided by the Go server 200 may be configured in a form in which a virtual computer user and a real user provided by the Go server 200 participate in the game together. This means that one real user and one computer user play a game together in a Go game environment implemented on the user's terminal 100 . In another aspect, the Go game service provided by the Go server 200 may be configured in such a way that a plurality of user-side devices participate and the Go game is played.

바둑서버(200)는 명령들을 저장하는 적어도 하나의 메모리(201), 적어도 하나의 프로세서(202) 및 통신부(203)를 포함할 수 있다. Go server 200 may include at least one memory 201 for storing instructions, at least one processor 202 and a communication unit 203 .

바둑서버(200)의 메모리(201)는 바둑서버(200)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 바둑서버(200)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(202)로 하여금 동작들을 수행하게 하기 위해 프로세서(202)에 의해 실행 가능하고, 동작들은 게임 실행 요청 신호 수신, 게임 데이터 송수신, 착수 정보 송수신, 형세 판단 요청 신호 송수신, 형세 판단 결과 송수신, 착수 준비 시간 송수신, 게임 정보 시간 송수신 및 각종 전송 동작을 포함할 수 있다. 또한, 메모리(201)는 바둑서버(200)에서 대국을 하였던 복수의 기보 또는 기존에 공개된 복수의 기보를 저장할 수 있다. 복수의 기보 각각은 대국 시작의 첫 착수 정보인 제1 착수부터 대국이 종료되는 최종 착수까지의 정보를 모두 포함할 수 있다. 즉, 복수의 기보는 착수에 관한 히스토리 정보를 포함할 수 있다. 바둑서버(200)는 형세 판단 모델 서버(400)의 트레이닝을 위하여 저장된 복수의 기보를 형세 판단 모델 서버(400)에 제공할 수 있게 한다. 또한, 메모리(201)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(201)는 인터넷(internet)상에서 상기 메모리(201)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.The memory 201 of the Go server 200 may store a plurality of application programs or applications running in the Go server 200, data for the operation of the Go server 200, and commands. . The instructions are executable by the processor 202 to cause the processor 202 to perform operations, and the operations are game execution request signal reception, game data transmission/reception, start information transmission/reception, situation determination request signal transmission/reception, situation determination result transmission/reception , may include transmission and reception of start preparation time, transmission and reception of game information time, and various transmission operations. In addition, the memory 201 may store a plurality of notations that were played in the Go server 200 or a plurality of previously published notations. Each of the plurality of notations may include all information from the first start, which is the first start information of the start of the game, to the final start at which the game ends. That is, a plurality of notations may include history information about the start. The Go server 200 makes it possible to provide a plurality of stored notations for the training of the posture determination model server 400 to the posture determination model server 400 . In addition, in terms of hardware, the memory 201 may be various storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc., and the memory 201 performs the storage function of the memory 201 on the Internet. It may be a web storage that performs.

바둑서버(200)의 프로세서(202)는 전반적인 동작을 제어하여 바둑 게임 서비스를 제공하기 위한 데이터 처리를 수행할 수 있다. 이러한 프로세서(202)는 ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 임의의 형태의 프로세서일 수 있다.The processor 202 of the Go server 200 may perform data processing for providing a Go game service by controlling the overall operation. These processors 202 are ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays), controllers (controllers), micro It may be a controller (micro-controllers), microprocessors (microprocessors), any type of processor for performing other functions.

바둑서버(200)는 통신부(203)를 통해 네트워크(500)를 경유하여 단말기(100), 착수 모델 서버(300) 및 형세 판단 모델 서버(400)와 통신을 수행할 수 있다. The Go server 200 may communicate with the terminal 100 , the start model server 300 and the situation determination model server 400 via the network 500 through the communication unit 203 .

<착수 모델 서버(300)><Start model server (300)>

착수 모델 서버(300)는, 별도의 클라우드 서버나 컴퓨팅 장치를 포함할 수 있다. 또한, 착수 모델 서버(300)는 단말기(100)의 프로세서 또는 바둑서버(200)의 데이터 처리부에 설치된 신경망 시스템일 수 있지만, 이하에서 착수 모델 서버(300)는, 단말기(100) 또는 바둑 서버(200)와 별도의 장치로 설명한다.Initiation model server 300 may include a separate cloud server or computing device. In addition, the set-up model server 300 may be a neural network system installed in the data processing unit of the processor or Go server 200 of the terminal 100, but below, the set-out model server 300 is the terminal 100 or the Go server ( 200) and a separate device.

착수 모델 서버(300)는 명령들을 저장하는 적어도 하나의 메모리(301), 적어도 하나의 프로세서(302) 및 통신부(303)를 포함할 수 있다. Initiation model server 300 may include at least one memory 301 for storing instructions, at least one processor 302 and a communication unit 303 .

착수 모델 서버(300)는 바둑 규칙에 따라 스스로 학습하여 딥러닝 모델인 착수 모델을 구축하고 단말기(100)의 유저와 대국을 할 수 있는 인공지능 컴퓨터로써 자신의 턴에서 대국에서 이길 수 있도록 바둑돌의 착수를 수행할 수 있다. 착수 모델 서버(300)가 착수 모델로 트레이닝하는 자세한 설명은 도 2 내지 도 5의 착수 모델에 관한 설명을 따른다.The start model server 300 is an artificial intelligence computer that can learn by itself according to the rules of Go to build a set-up model, which is a deep learning model, and play a game with the user of the terminal 100. initiation can be carried out. The detailed description of the set-up model server 300 training with the set-off model follows the description of the set-up model of FIGS. 2 to 5 .

착수 모델 서버(300)의 메모리(301)는 착수 모델 서버(300)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 착수 모델 서버(300)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(302)로 하여금 동작들을 수행하게 하기 위해 프로세서(302)에 의해 실행 가능하고, 동작들은 착수 모델 학습(트레이닝) 동작, 착수 정보 송수신, 착수 준비 시간 수신, 게임 시간 정보 수신 및 각종 전송 동작을 포함할 수 있다. 또한, 메모리(301)는 딥러닝 모델인 착수 모델을 저장할 수 있다. 또한, 메모리(301)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(301)는 인터넷(internet)상에서 상기 메모리(301)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.The memory 301 of the initiation model server 300 is a plurality of applications (application program) or applications (application) driven in the initiation model server 300, data for the operation of the initiation model server 300, commands can be saved The instructions are executable by the processor 302 to cause the processor 302 to perform the operations, and the operations are a set-out model learning (training) action, send/receive set-up information, receive set-up preparation time, receive game time information, and transmit various It can include actions. In addition, the memory 301 may store an initiation model that is a deep learning model. In addition, in terms of hardware, the memory 301 may be various storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc., and the memory 301 performs the storage function of the memory 301 on the Internet. It may be a web storage that performs.

착수 모델 서버(300)의 프로세서(302)는 메모리(302)에 저장된 착수 모델을 독출하여, 구축된 신경망 시스템에 따라서 하기 기술하는 착수 모델 학습 및 바둑알 착수를 수행하게 된다. 실시예에 따라서 프로세서(302)는, 전체 유닛들을 제어하는 메인 프로세서와, 착수 모델에 따라 신경망 구동시 필요한 대용량의 연산을 처리하는 복수의 그래픽 프로세서(Graphics Processing Unit, GPU)를 포함하도록 구성될 수 있다. The processor 302 of the set-off model server 300 reads the set-off model stored in the memory 302, and performs the set-up model learning and Go start to be described below according to the built-up neural network system. According to an embodiment, the processor 302 may be configured to include a main processor that controls all units, and a plurality of graphics processing units (GPUs) that process large-capacity calculations required when driving a neural network according to an initiation model. have.

착수 모델 서버(300)는 통신부(303)를 통해 네트워크(500)를 경유하여 바둑 서버(200)와 통신을 수행할 수 있다. The start model server 300 may communicate with the Go server 200 via the network 500 through the communication unit 303 .

<형세 판단 모델 서버(400)><Situation judgment model server 400>

형세 판단 모델 서버(400)는, 별도의 클라우드 서버나 컴퓨팅 장치를 포함할 수 있다. 또한, 형세 판단 모델 서버(400)는 단말기(100)의 프로세서 또는 바둑서버(200)의 데이터 처리부에 설치된 신경망 시스템일 수 있지만, 이하에서 형세 판단 모델 서버(400)는, 단말기(100) 또는 바둑 서버(200)와 별도의 장치로 설명한다.The situation determination model server 400 may include a separate cloud server or computing device. In addition, the situation determination model server 400 may be a neural network system installed in the processor of the terminal 100 or the data processing unit of the Go server 200, but below, the situation determination model server 400 is the terminal 100 or Go It will be described as a separate device from the server 200 .

형세 판단 모델 서버(400)는 명령들을 저장하는 적어도 하나의 메모리(401), 적어도 하나의 프로세서(402) 및 통신부(403)를 포함할 수 있다. The situation determination model server 400 may include at least one memory 401 for storing instructions, at least one processor 402 and a communication unit 403 .

형세 판단 모델 서버(400)는 통신부(403)를 통하여 바둑서버(200)로부터 트레이닝 데이터 셋을 수신할 수 있다. 트레이닝 데이터 셋은 복수의 기보와 해당 복수의 기보에 대한 형세 판단 정보일 수 있다. 형세 판단 모델 서버(400)는 수신한 트레이닝 데이터 셋을 이용하여 바둑알이 놓인 바둑판의 상태에 대한 형세를 판단할 수 있도록 지도학습하여 딥러닝 모델인 형세 판단 모델을 구축하고 단말기(100) 유저의 형세 판단 요청에 따라 형세 판단을 수행할 수 있다. 형세 판단 모델 서버(400)가 형세 판단 모델로 트레이닝하는 자세한 설명은 도 6 내지 도 18의 형세 판단 모델에 관한 설명을 따른다.The situation determination model server 400 may receive the training data set from the Go server 200 through the communication unit 403 . The training data set may be a plurality of notations and situation determination information for the plurality of notations. The posture determination model server 400 supervises learning to determine the condition of the Go board on which the Go balls are placed using the received training data set to build a deep learning model, the posture determination model, and the condition of the terminal 100 user A situational judgment may be performed according to a judgment request. A detailed description of the situation determination model server 400 training with the situation determination model follows the description of the situation determination model of FIGS. 6 to 18 .

형세 판단 모델 서버(400)의 메모리(401)는 형세 판단 모델 서버(400)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 형세 판단 모델 서버(400)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(402)로 하여금 동작들을 수행하게 하기 위해 프로세서(402)에 의해 실행 가능하고, 동작들은 형세 판단 모델 학습(트레이닝) 동작, 형세 판단 수행, 형세 판단 결과 송신, 복수의 기보 정보 수신, 집수의 변화량 정보 송신, 공배수 정보 송신 및 각종 전송 동작을 포함할 수 있다. 또한, 메모리(401)는 딥러닝 모델인 형세 판단 모델을 저장할 수 있다. 또한, 메모리(401)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(401)는 인터넷(internet)상에서 상기 메모리(301)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.The memory 401 of the situation determination model server 400 is a plurality of application programs or applications running in the situation determination model server 400, data for the operation of the situation determination model server 400 , commands can be stored. The instructions are executable by the processor 402 to cause the processor 402 to perform the operations, and the operations include a situation determination model learning (training) operation, performing a situation determination, transmitting a situation determination result, receiving a plurality of notation information, It may include transmission of change amount information of collection water, transmission of common drainage information, and various transmission operations. Also, the memory 401 may store a situation judgment model that is a deep learning model. In addition, in terms of hardware, the memory 401 may be various storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc., and the memory 401 performs the storage function of the memory 301 on the Internet. It may be a web storage that performs.

형세 판단 모델 서버(400)의 프로세서(402)는 메모리(402)에 저장된 형세 판단 모델을 독출하여, 구축된 신경망 시스템에 따라서 하기 기술하는 형세 판단 모델 학습 및 대국 중 바둑판의 형세 판단을 수행하게 된다. 실시예에 따라서 프로세서(402)는, 전체 유닛들을 제어하는 메인 프로세서와, 형세 판단 모델에 따라 신경망 구동시 필요한 대용량의 연산을 처리하는 복수의 그래픽 프로세서(Graphics Processing Unit, GPU)를 포함하도록 구성될 수 있다. The processor 402 of the situation determination model server 400 reads the situation determination model stored in the memory 402, and according to the constructed neural network system, learns the situation determination model described below and determines the situation of the checkerboard during the game. . According to the embodiment, the processor 402 may be configured to include a main processor that controls all units, and a plurality of graphics processing units (GPUs) that process large-capacity calculations required when driving a neural network according to a situation determination model. can

형세 판단 모델 서버(400)는 통신부(403)를 통해 네트워크(500)를 경유하여 바둑 서버(200)와 통신을 수행할 수 있다. The situation determination model server 400 may communicate with the Go server 200 via the network 500 through the communication unit 403 .

<시간 관리 모델 서버(500)><Time Management Model Server (500)>

시간 관리 모델 서버(500)는, 별도의 클라우드 서버나 컴퓨팅 장치를 포함할 수 있다. 또한, 시간 관리 모델 서버(500)는 단말기(100)의 프로세서, 바둑서버(200)의 메모리, 착수 모델 서버(300)의 메모리 또는 형세 판단 모델 서버(400)의 메모리에 설치된 신경망 시스템일 수 있지만, 이하에서 시간 관리 모델 서버(500)는, 단말기(100), 바둑 서버(200), 착수 모델 서버(300) 또는 형세 판단 모델 서버(400)와 별도의 장치로 설명한다.The time management model server 500 may include a separate cloud server or computing device. In addition, the time management model server 500 may be a neural network system installed in the processor of the terminal 100, the memory of the Go server 200, the memory of the start model server 300, or the memory of the situation judgment model server 400, but , Hereinafter, the time management model server 500 will be described as a device separate from the terminal 100 , the Go server 200 , the start model server 300 or the situation determination model server 400 .

시간 관리 모델 서버(500)는 명령들을 저장하는 적어도 하나의 메모리(501), 적어도 하나의 프로세서(502) 및 통신부(503)를 포함할 수 있다. The time management model server 500 may include at least one memory 501 for storing instructions, at least one processor 502 and a communication unit 503 .

또한, 시간 관리 모델 서버(500)는 통신부(503)를 통하여 착수 모델 서버(300)로부터 방문 횟수, 탐색 확률값 또는 가치값을 수신할 수 있다. 시간 관리 모델 서버(500)는 수신한 방문 횟수, 탐색 확률값, 가치값을 이용하여 착수 준비 시간을 결정할 수 있다. 시간 관리 모델 서버(500)의 착수 준비 시간 결정 방법에 대한 자세한 설명은 도 19 내지 도 22의 설명을 따른다. In addition, the time management model server 500 may receive the number of visits, a search probability value, or a value value from the start model server 300 through the communication unit 503 . The time management model server 500 may determine the start preparation time using the received number of visits, the search probability value, and the value value. A detailed description of the method for determining the start preparation time of the time management model server 500 follows the description of FIGS. 19 to 22 .

또한, 시간 관리 모델 서버(500)는 통신부(503)를 통하여 바둑서버(500)로부터 트레이닝 데이터 셋을 수신할 수 있다. 트레이닝 데이터 셋은 복수의 기보, 해당 복수의 기보에 대한 형세 판단 정보, 복수의 기보에 대한 각 바둑판 상태에 따른 시간 정보일 수 있다. In addition, the time management model server 500 may receive a training data set from the Go server 500 through the communication unit 503 . The training data set may be a plurality of notations, situation determination information for the plurality of notations, and time information according to each checkerboard state for a plurality of notations.

또한, 시간 관리 모델 서버(500)는 통신부(503)를 통하여 착수 모델 서버(300) 및/또는 바둑서버(500)로부터 바둑판 상태(S), 디폴트 시뮬레이션 횟수(D), 방문 횟수(N) 및/또는 가치값(V)을 수신할 수 있다. 시간 관리 모델 서버(500)는, 수신된 바둑판 상태(S), 디폴트 시뮬레이션 횟수(D), 방문 횟수(N) 및/또는 가치값(V)을 이용하여 MCTS 시뮬레이션 횟수를 결정할 수 있다. 시간 관리 모델 서버(500)의 MCTS 시뮬레이션 횟수 결정 방법에 대한 자세한 설명은 도 23 내지 도 30의 설명을 따른다. In addition, the time management model server 500 is a checkerboard state (S), the default number of simulations (D), the number of visits (N) and / or receive a value (V). The time management model server 500 may determine the number of MCTS simulations using the received checkerboard state (S), the default number of simulations (D), the number of visits (N), and/or the value value (V). A detailed description of the method of determining the number of MCTS simulations by the time management model server 500 follows the description of FIGS. 23 to 30 .

또한, 시간 관리 모델 서버(500)는 통신부(503)를 통하여 형세 판단 모델 서버(400)로부터 형세 판단 정보를 수신할 수 있다. 형세 판단 정보는 집수의 변화량 정보, 공배수 정보 등을 포함할 수 있다. 시간 관리 모델 서버(500)는 수신한 트레이닝 데이터 셋, 형세 판단 정보, 가치값 등을 이용하여 게임 시간 정보를 생성할 수 있도록 지도학습하여 딥러닝 모델인 시간 관리 모델을 구축하고 착수 모델 서버(300) 또는 단말기(100)에 바둑판 상태에 따른 게임 시간 정보를 제공할 수 있다. 시간 관리 모델 서버(50))가 시간 관리 모델로 트레이닝하는 자세한 설명은 도 35 내지 도 36의 시간 관리 모델에 관한 설명을 따른다. In addition, the time management model server 500 may receive the situation determination information from the situation determination model server 400 through the communication unit 503 . The situation determination information may include information on a change amount of catchment, information on common drainage, and the like. The time management model server 500 builds a time management model that is a deep learning model by supervising learning so as to generate game time information using the received training data set, situation determination information, value value, etc., and starting model server 300 ) or may provide game time information according to the state of the checkerboard to the terminal 100 . The detailed description of the time management model server 50 training with the time management model follows the description of the time management model of FIGS. 35 to 36 .

시간 관리 모델 서버(500)의 메모리(501)는 시간 관리 모델 서버(500)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 시간 관리 모델 서버(500)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(502)로 하여금 동작들을 수행하게 하기 위해 프로세서(502)에 의해 실행 가능하고, 동작들은 시간 관리 모델 학습(트레이닝) 동작, 방문 횟수, 탐색 확률값 또는 가치값을 수신, 착수 준비 시간 결정, 복수의 기보 정보 수신, 디폴트 시뮬레이션 횟수 수신, 집수의 변화량 정보 수신, 공배수 정보 수신, 게임 시간 정보 생성 및 각종 전송 동작을 포함할 수 있다. 또한, 메모리(501)는 시간 관리부, 시뮬레이션 횟수 예측부, 시뮬레이션 횟수 조정 모델 또는 시간 관리 모델을 저장할 수 있다. 또한, 메모리(501)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(501)는 인터넷(internet)상에서 상기 메모리(501)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.The memory 501 of the time management model server 500 is a plurality of application programs or applications driven in the time management model server 500, data for the operation of the time management model server 500 , commands can be stored. The instructions are executable by the processor 502 to cause the processor 502 to perform operations, the operations comprising: a time management model learning (training) operation; , a plurality of notation information reception, default simulation number reception, collection change amount information reception, common multiple information reception, game time information generation and various transmission operations may be included. Also, the memory 501 may store a time management unit, a simulation number prediction unit, a simulation number adjustment model, or a time management model. In addition, in terms of hardware, the memory 501 may be various storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc., and the memory 501 performs the storage function of the memory 501 on the Internet. It may be a web storage that performs.

시간 관리 모델 서버(500)의 프로세서(502)는 메모리(501)에 저장된 시간 관리부를 독출하여, 하기 기술하는 착수 준비 시간 결정을 수행하게 된다.The processor 502 of the time management model server 500 reads the time management unit stored in the memory 501, and performs a start preparation time determination to be described below.

또한, 시간 관리 모델 서버(500)의 프로세서(502)는 메모리(501)에 저장된 시뮬레이션 횟수 예측부 및/또는 시뮬레이션 횟수 조정 모델을 독출하여, 후술되는 MCTS 시뮬레이션 횟수 결정을 수행하게 된다. In addition, the processor 502 of the time management model server 500 reads the simulation number prediction unit and/or the simulation number adjustment model stored in the memory 501 to determine the number of MCTS simulations to be described later.

또한, 시간 관리 모델 서버(500)의 프로세서(502)는 메모리(501)에 저장된 시간 관리 모델을 독출하여, 구축된 신경망 시스템에 따라서 하기 기술하는 게임 시간 정보 생성을 수행하게 된다. In addition, the processor 502 of the time management model server 500 reads the time management model stored in the memory 501 and generates game time information to be described below according to the constructed neural network system.

실시예에 따라서 프로세서(502)는, 전체 유닛들을 제어하는 메인 프로세서와, 시간 관리 모델에 따라 신경망 구동시 필요한 대용량의 연산을 처리하는 복수의 그래픽 프로세서(Graphics Processing Unit, GPU)를 포함하도록 구성될 수 있다. According to the embodiment, the processor 502 may be configured to include a main processor that controls all units, and a plurality of graphics processing units (GPUs) that process large-capacity operations required when driving a neural network according to a time management model. can

시간 관리 모델 서버(500)는 통신부(403)를 통해 네트워크(500)를 경유하여 바둑 서버(200), 착수 모델 서버(300) 및 형세 판단 모델 서버(400)와 통신을 수행할 수 있다.The time management model server 500 may communicate with the Go server 200 , the start model server 300 and the situation determination model server 400 via the network 500 through the communication unit 403 .

<착수 모델><Start model>

도 2는 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스에서 인공지능 컴퓨터의 착수를 위한 착수 모델 서버의 착수 모델 구조를 설명하기 위한 도면이고, 도 3은 착수 모델의 정책에 따른 착수점에 대한 이동 확률 분포를 설명하기 위한 도면이고, 도 4는 착수 모델의 착수점에 대한 가치값과 방문 횟수를 설명하기 위한 도면이고, 도 5는 착수 모델이 탐색부의 파이프 라인에 따라 착수하는 과정을 설명하기 위한 도면이다.Figure 2 is a view for explaining the start model structure of the start model server for the start of the artificial intelligence computer in the deep learning-based Go game service according to an embodiment of the present invention, Figure 3 is a start point according to the policy of the start model It is a diagram for explaining the movement probability distribution for , FIG. 4 is a diagram for explaining the value value and number of visits to the starting point of the initiation model, and FIG. It is a drawing for explanation.

도 2를 참조하면, 본 발명의 실시예에 따른 착수 모델은 착수 모델 서버(300)의 딥러닝 모델로써 탐색부(310), 셀프 플레이부(320) 및 착수 신경망(330)을 포함할 수 있다. Referring to FIG. 2 , the settling model according to an embodiment of the present invention may include a search unit 310 , a self play unit 320 and a set off neural network 330 as a deep learning model of the settling model server 300 . .

착수 모델은 탐색부(310), 셀프 플레이부(320) 및 착수 신경망(330)을 이용하여 대국에서 이길 수 있도록 착수하는 모델로 학습할 수 있다. 보다 구체적으로, 탐색부(310)는 착수 신경망(330)의 가이드에 따라 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 동작을 수행할 수 있다. MCTS는 모종의　의사 결정을 위한 체험적　탐색 알고리즘이다. 즉, 탐색부(310)는 착수 신경망(330)이 제공하는 이동 확률값(p) 및/또는 가치값(V)에 기초하여 MCTS를 수행할 수 있다. 일 예로, 착수 신경망(330)에 의해 가이드된 탐색부(310)는 MCTS를 수행하여 착수점들에 대한 확률분포값인 탐색 확률값(

)을 출력할 수 있다. 셀프 플레이부(320)는 탐색 확률값(

)에 따라 스스로 바둑 대국을 할 수 있다. 셀프 플레이부(320)는 게임의 승패가 결정되는 시점까지 스스로 바둑 대국을 진행하고, 자가 대국이 종료되면 바둑판 상태(S), 탐색 확률값(

), 자가 플레이 가치값(z)을 착수 신경망(330)에 제공할 수 있다. 바둑판 상태(S)는 착수점들에 바둑돌이 놓여진 상태이다. 자가 플레이 가치값(z)은 바둑판 상태(S)에서 자가 대국을 하였을 때 승률 값이다. 착수 신경망(330)은 이동 확률값(p)과 가치값(V)을 출력할 수 있다. 이동 확률값(p)은 바둑판 상태(S)에 따라 착수점들에 대해 어느 착수점에 착수하는 것이 게임을 이길 수 있는 좋은 수인지 수치로 나타낸 확률분포값이다. 가치값(V)은 해당 착수점에 착수시 승률을 나타낸다. 예를 들어, 이동 확률값(p)이 높은 착수점이 좋은 수일 수 있다. 착수 신경망(330)은 이동 확률값(p)이 탐색 확률값(

)과 동일해지도록 트레이닝되고, 가치값(V)이 자가 플레이 가치값(z)과 동일해지도록 트레이닝될 수 있다. 이후 트레이닝된 착수 신경망(330)은 탐색부(310)를 가이드하고, 탐색부(310)는 이전 탐색 확률값(

)보다 더 좋은 수를 찾도록 착수 준비 시간 동안 MCTS를 진행하여 새로운 탐색 확률값(

)을 출력하게 한다. 예를 들어, 착수 준비 시간은 MCTS 진행 시간에 따라 평균 착수 준비 시간, 제1 착수 준비 시간 및 제2 착수 준비 시간 중 어느 하나의 착 수 준비 시간을 따를 수 있다. 착수 준비 시간은 시간 관리 모델 서버(500)에서 제공할 수 있고 기본적으로 평균 착수 준비 시간으로 설정되어 있을 수 있다. 셀프 플레이부(320)는 새로운 탐색 확률값(

)에 기초하여 바둑판 상태(S)에 따른 새로운 자가 플레이 가치값(z)을 출력하고 바둑판 상태(S), 새로운 탐색 확률값(

), 새로운 자가 플레이 가치값(z)을 착수 신경망(330)에 제공할 수 있다. 착수 신경망(330)은 이동 확률값(p)과 가치값(V)이 새로운 탐색 확률값(

)과 새로운 자가 플레이 가치값(z)으로 출력되도록 다시 트레이닝될 수 있다. 즉, 착수 모델은 이러한 과정을 반복하여 착수 신경망(330)이 대국에서 이기기 위한 더 좋은 착수점을 찾도록 트레이닝 될 수 있다. 일 예로, 착수 모델은 착수 손실(l)을 이용할 수 있다. 착수 손실(l)은 수학식 1과 같다.The set-off model can be learned as a model set out to win the game by using the search unit 310 , the self-play unit 320 , and the set-out neural network 330 . More specifically, the search unit 310 may perform a Monte Carlo Tree Search (MCTS) operation according to the guide of the onset neural network 330 . MCTS is an experiential search algorithm for decision making. That is, the search unit 310 may perform MCTS based on the movement probability value p and/or the value value V provided by the onset neural network 330 . As an example, the search unit 310 guided by the onset neural network 330 performs MCTS to find a search probability value (

) can be printed. The self-play unit 320 is a search probability value (

), you can play Go by yourself. The self-player 320 proceeds with a Go game by itself until the time when the victory or defeat of the game is determined, and when the self-playing game ends, the Go board state (S), the search probability value (

), the self-play value value z may be provided to the onset neural network 330 . The checkerboard state (S) is a state in which Go stones are placed at the starting points. The self-play value value (z) is a win rate value when a self-playing game is played in the checkerboard state (S). The onset neural network 330 may output a movement probability value (p) and a value value (V). The moving probability value (p) is a probability distribution value expressed numerically whether it is a good number to win the game by starting at which starting point for the starting points according to the checkerboard state (S). The value (V) represents the win rate when starting the corresponding starting point. For example, the moving probability value (p) may be a good number of starting points high. The onset neural network 330 is a movement probability value (p) is a search probability value (

) and can be trained so that the value V is equal to the self-play value z. Afterwards, the trained onset neural network 330 guides the search unit 310, and the search unit 310 sets the previous search probability value (

) to find a better number than the new search probability value (

) to output. For example, the start preparation time may follow the start preparation time of any one of the average start preparation time, the first set off preparation time, and the second set off preparation time according to the MCTS progress time. The start preparation time may be provided by the time management model server 500 and may be basically set as the average start preparation time. The self-play unit 320 sets a new search probability value (

), output a new self-play value value (z) according to the checkerboard state (S) based on the checkerboard state (S), and a new search probability value (

), a new self-play value z may be provided to the initiating neural network 330 . The starting neural network 330 is a new search probability value (p) and a value value (V)

) and a new self-play value (z) can be retrained to be output. That is, the onset model may be trained to repeat this process to find a better starting point for the initiation neural network 330 to win the game. As an example, the settling model may use the settling loss (l). The onset loss (l) is the same as Equation 1.

(수학식 1)(Equation 1)

는 신경망의 파라미터이고, c는 매우 작은 상수이다.

is a parameter of the neural network, and c is a very small constant.

수학식 1의 착수 손실(l)에서 z와 v가 같아 지도록 하는 것은 평균 제곱 손실(mean square loss) 텀에 해당되고,

와 p가 같아 지도록 하는 것은 크로스 엔트로피 손실(cross entropy loss) 텀에 해당되고,

에 c를 곱하는 것은 정규화 텀으로 오버피팅(overfitting)을 방지하기 위한 것이다.To make z and v equal in the settling loss (l) of Equation 1 corresponds to a mean square loss term,

Making p and p be equal corresponds to a cross entropy loss term,

Multiplying by c is to prevent overfitting with the regularization term.

예를 들어, 도 3을 참조하면 트레이닝된 착수 모델은 착수점들에 이동 확률값(p)을 도 3과 같이 확률분포값으로 나타낼 수 있다. 도 4를 참조하면 트레이닝 된 착수 모델의 탐색 확률값(

)은 하나의 착수점에서 위에 표시된 값으로 나타낼 수 있다. 탐색 확률값(

)은 착수 후보수의 방문 횟수를 전체 횟수로 나눈 비율일 수 있다. 일 예로, MCTS 시뮬레이션 전체 횟수가 1000번이고 90.00이라고 표시되어 있으면 해당 착수 후보수에 1000번 중 900번 방문했다는 것을 의미한다. 트레이닝 된 착수 모델의 가치값(V)은 도 4의 하나의 착수점에서 아래에 표시된 값으로 나타낼 수 있다. 착수 신경망(330)은 신경망 구조로 구성될 수 있다. 일 예로, 착수 신경망(330)은 한 개의 컨볼루션(convolution) 블록과 19개의 레지듀얼(residual) 블록으로 구성될 수 있다. 컨볼루션 블록은 3X3 컨볼루션 레이어가 여러개 중첩된 형태일 있다. 하나의 레지듀얼 블록은 3X3 컨볼루션 레이어가 여러개 중첩되고 스킵 커넥션을 포함한 형태일 수 있다. 스킵 커넥션은 소정의 레이어의 입력이 해당 레이어의 출력값과 합하여서 출력되어 다른 레이어에 입력되는 구조이다. 또한, 착수 신경망(330)의 입력은 흑 플레이어의 최근 8 수에 대한 돌의 위치 정보과 백 플레이어의 최근 8 수에 대한 돌의 위치 정보와 현재 플레이어가 흑인지 백인지에 대한 차례 정보를 포함한 19*19*17의 RGB 이미지가 입력될 수 있다.For example, referring to FIG. 3 , the trained set off model may represent the moving probability value p at the set points as a probability distribution value as shown in FIG. 3 . Referring to Figure 4, the search probability value (

) can be expressed as the value indicated above at one starting point. Search probability value (

) may be a ratio obtained by dividing the number of visits by the number of start-up candidates by the total number of visits. For example, if the total number of MCTS simulations is 1000 and it is marked as 90.00, it means that 900 out of 1000 visits are made to the number of candidates for starting. The value (V) of the trained set off model may be represented by a value displayed below at one set point of FIG. 4 . The onset neural network 330 may be configured in a neural network structure. As an example, the onset neural network 330 may include one convolution block and 19 residual blocks. The convolution block may have a form in which several 3X3 convolution layers are superimposed. One residual block may have a form in which several 3X3 convolution layers are overlapped and a skip connection is included. The skip connection is a structure in which an input of a predetermined layer is output by summing an output value of the corresponding layer and input to another layer. In addition, the input of the onset neural network 330 includes the location information of the stone for the last 8 numbers of the black player, the location information of the stones for the last 8 numbers of the white player, and turn information on whether the current player is black or white 19* An RGB image of 19*17 can be input.

도 5를 참조하면, 학습된 착수 모델은 자신의 차례에서 착수 신경망(330)과 탐색부(310)를 이용하여 착수할 수 있다. 착수 모델은 선택 과정(a)을 통하여 현재 제1 바둑판 상태(S1)에서 MCTS를 통해 탐색하지 않은 가지 중 활동 함수(Q)와 신뢰값(U)이 높은 착수점을 가지는 제2 바둑판 상태(S1-2)를 선택한다. 활동 함수(Q)는 해당 가지를 지날 때마다 산출된 가치값(V)들의 평균값이다. 신뢰값(U)은 해당 가지를 지나는 방문 횟수(N)에 반비례하고 이동 확률값(p)에 비례한다. 착수 모델은 확장과 평가 과정(b)을 통하여 선택된 착수점에서의 제3 바둑판 상태(S1-2-1)로 확장하고 이동 확률값(p)을 산출할 수 있다. 착수 모델은 상기 확장된 제3 바둑판 상태(S1-2-1)의 가치값(V)을 산출하고 백업 과정(c)을 통하여 지나온 가지들의 활동 함수(Q), 방문 횟수(N), 이동 확률값(p)을 저장할 수 있다. 착수 모델은 착수 준비 시간 동안 선택(a), 확장 및 평가(b), 백업(c) 과정을 반복하고 각 착수점에 대한 방문 횟수(N)를 이용하여 확률 분포를 만들어서 탐색 확률값(

)을 출력할 수 있다. 착수 모델은 착수점들 중 가장 높은 탐색 확률값(

)을 선택하여 착수할 수 있다. Referring to FIG. 5 , the learned initiation model may be launched using the initiation neural network 330 and the search unit 310 in its turn. The initiation model is a second checkerboard state (S1) having a high launch point with high activity function (Q) and confidence value (U) among branches that are not currently searched through MCTS in the first checkerboard state (S1) through the selection process (a). -2) is selected. The activity function (Q) is the average value of the value values (V) calculated every time the branch passes. The confidence value (U) is inversely proportional to the number of visits (N) passing through the branch and is proportional to the movement probability value (p). The start model can be expanded to the third checkerboard state (S1-2-1) at the selected set point through the expansion and evaluation process (b) and calculate the movement probability value (p). The start model calculates the value (V) of the extended third checkerboard state (S1-2-1), and the activity function (Q), number of visits (N), and movement probability value of branches passed through the backup process (c) (p) can be saved. The initiation model repeats the process of selection (a), expansion and evaluation (b), and backup (c) during the set-up preparation time, and creates a probability distribution using the number of visits (N) for each starting point to create a search probability value (

) can be printed. The launch model has the highest search probability value (

) to start.

<형세 판단 모델><Situation Judgment Model>

도 6은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스의 형세 판단 기능을 제공하는 화면을 보여주는 예시도이고, 도 7은 본 발명의 형세 판단 모델 서버의 형세 판단 모델 구조를 설명하기 위한 도면이고, 도 8은 본 발명의 형세 판단 모델의 복수의 블록으로 이루어진 신경망 구조 중 하나의 블록을 설명하기 위한 도면이다.6 is an exemplary view showing a screen providing a situation determination function of a deep learning-based Go game service according to an embodiment of the present invention, and FIG. 7 is a diagram to explain the structure of the situation determination model of the situation determination model server of the present invention 8 is a diagram for explaining one block of the neural network structure composed of a plurality of blocks of the situation judgment model of the present invention.

도 6을 참조하면, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 현재 바둑판 상태의 형세 판단을 할 수 있다. 일 예로, 도 6과 같이 유저가 단말기(100)의 화면에서 바둑 대국 중 형세 판단 메뉴(A)를 클릭하여 형세 판단을 요청하면 딥러닝 기반의 바둑 게임 서비스가 팝업 창에 형세 판단 결과를 제공할 수 있다. 형세 판단은 바둑 대국 중에 상대방과 나의 집을 계산하여 누가 몇점으로 이기고 있는지 판단하는 것이다. 예를 들어, 유저는 형세가 나에게 유리하다는 판단이 서면 더 이상 무리하지 말고 현재의 유리한 상황을 그대로 유지한 채 대국을 종료하는 방향으로 전략을 세울 것이고, 만약 불리하다는 판단이면 게임 국면을 새롭게 전환할 수 있도록 여러가지 전략을 모색할 수 있다. 형세 판단의 기준은 바둑돌이 바둑판에 배치된 상태에 따른 집, 사석, 돌, 공배, 빅이 된다. 돌은 바둑판에 놓여진 돌이고 한국 규칙에서는 점수가 아니다. 집은 한 가지 색의 바둑돌로 둘러쌓인 빈 점으로 구성된 영역으로 한국 규칙에서는 점수이다. 공배와 빅은 바둑이 끝났을 때 흑집도 백집도 아닌 영역으로 한국 규칙에서는 점수가 아니다. 판위사석은 바둑판 위에 놓여진 돌 중에서 어떻게 두어도 잡힐 수밖에 없어 죽게 된 돌로 한국 규칙에서는 상대방의 집을 메우는데 사용하므로 점수이다. 빅은 바둑이 끝났을 때, 흑집도 백집도 아닌 영역을 말한다.　 따라서, 형세 판단은 바둑돌이 놓인 바둑판 상태에서 집, 사석, 돌, 공배, 빅을 정확히 구분 또는 예측해야 정확한 판단이 될 수 있다. 이 때, 집, 사석, 돌, 공배, 빅을 정확히 구분하는 것은 집, 사석, 돌, 공배, 빅이 완전히 이루어진 상태를 구분하는 것이고, 집, 사석, 돌, 공배, 빅을 정확히 예측하는 것은 집, 사석, 돌, 공배, 빅이 될 가능성이 높은 상태를 예측하는 것일 수 있다. Referring to FIG. 6 , the deep learning-based Go game service according to an embodiment of the present invention may determine the situation of the current Go board state. For example, as shown in FIG. 6 , when a user requests a situation determination by clicking the situation determination menu (A) during a Go game on the screen of the terminal 100, the deep learning-based Go game service provides the situation determination result in a pop-up window. can The situation judgment is to determine who wins by how many points by calculating the opponent and my house during the Go game. For example, if the user decides that the situation is favorable to me, do not overdo it any more and devise a strategy to end the game while maintaining the current advantageous situation. If it is judged unfavorable, the game phase will be changed Several strategies can be devised to do this. The standard for judging the situation is a house, a stone, a stone, a ball, and a big according to the state of the Go stones placed on the board. A stone is a stone placed on a checkerboard and is not a score in Korean rules. A house is an area made up of empty dots surrounded by one-colored Go stones, which is a score in Korean rules. Gongbae and Big are areas that are neither black nor white when Go is over, and are not points in Korean rules. Among the stones placed on the checkerboard, the Panwisa Stone is a stone that has no choice but to be caught and killed. Under Korean rules, it is used to fill the opponent's house, so it is a score. Big refers to an area that is neither black nor white when the Go game is over. Therefore, in order to judge the situation, it is necessary to accurately distinguish or predict a house, a stone, a stone, a ball, and a big in the state of the Go board on which the Go stones are placed can be an accurate judgment. At this time, to accurately distinguish a house, a stone, a stone, a public stone, and a big is to distinguish the state in which a house, a stone, a stone, a public stone, and a big are completely formed, and to accurately predict a house, a stone, a stone, a public meeting, and a big is a house , stone, stone, public meeting, or predicting a state that is likely to become a big one.

도 7을 참조하면, 본 발명의 실시예에 따른 형세 판단 모델은 형세 판단 모델 서버(400)의 딥러닝 모델로써 형세 판단 신경망(410), 입력 특징 추출부(420) 및 정답 레이블 생성부(430)를 포함할 수 있다. Referring to FIG. 7 , the situation determination model according to an embodiment of the present invention is a deep learning model of the situation determination model server 400 , and a situation determination neural network 410 , an input feature extractor 420 , and a correct answer label generator 430 . ) may be included.

형세 판단 모델은 형세 판단 신경망(410)을 이용하여 현재 바둑판 상태의 형세를 판단할 수 있도록 지도 학습(supervised learning)할 수 있다. 보다 구체적으로, 형세 판단 모델 바둑판 상태(S)에 관한 트레이닝 데이터 셋을 생성하고 생성된 트레이닝 데이터 셋을 이용하여 형세 판단 신경망(410)이 현재 바둑판 상태(S)에 따른 형세를 판단할 수 있도록 학습시킬 수 있다. 형세 판단 모델 서버(400)는 바둑서버(200)로부터 복수의 기보를 수신할 수 있다. 복수의 기보의 각 기보는 착수 순서에 따른 각각의 바둑판 상태(S)를 포함할 수 있다. The situation determination model may perform supervised learning to determine the situation of the current checkerboard state using the situation determination neural network 410 . More specifically, a training data set for the situation judgment model checkerboard state (S) is generated, and the situation judgment neural network 410 is trained to determine the situation according to the current checkerboard state (S) using the generated training data set. can do it The situation determination model server 400 may receive a plurality of notations from the Go server 200 . Each notation of a plurality of notations may include a respective checkerboard state (S) according to the starting order.

입력 특징 추출부(420)는 복수의 기보의 바둑판 상태(S)에서 입력 특징(IF)을 추출하여 형세 판단 신경망(410)에 트레이닝을 위한 입력 데이터로 제공할 수 있다. 바둑판 상태(S)의 입력 특징(IF)은 흑 플레이어의 최근 8 수에 대한 돌의 위치 정보과 백 플레이어의 최근 8 수에 대한 돌의 위치 정보와 현재 플레이어가 흑인지 백인지에 대한 차례 정보를 포함한 19*19*18의 RGB 이미지일 수 있다. 일 예로, 입력 특징 추출부(420)는 신경망 구조로 되어 있을 수 있으며 일종의 인코더를 포함할 수 있다.The input feature extraction unit 420 may extract the input feature (IF) from the checkerboard state (S) of a plurality of notations and provide it to the situation determination neural network 410 as input data for training. The input feature (IF) of the checkerboard state (S) includes the position information of the stone for the last 8 moves of the black player, the position information of the stone for the last 8 moves of the white player, and turn information on whether the current player is black or white. It can be an RGB image of 19*19*18. For example, the input feature extractor 420 may have a neural network structure and may include a kind of encoder.

정답 레이블 생성부(430)는 현재 바둑판 상태(S)로 전처리 과정을 거쳐 정답 레이블(ground truth)을 생성하고 정답 레이블을 형세 판단 신경망(410)에 트레이닝을 위한 타겟 데이터(

)로 제공할 수 있다. 정답 레이블 생성부(430)의 정답 레이블 생성은 후술하는 도 9 내지 도 11의 설명을 따른다. 일 예로, 정답 레이블 생성부(430)는 신경망 구조의 롤아웃 또는 인코더를 포함할 수 있다.The correct answer label generator 430 generates a correct answer label (ground truth) through a pre-processing process in the current checkerboard state (S), and applies the correct answer label to the neural network 410 for training target data (

) can be provided. The correct answer label generation unit 430 generates the correct answer label according to the description of FIGS. 9 to 11, which will be described later. As an example, the correct answer label generator 430 may include a rollout of a neural network structure or an encoder.

형세 판단 모델은 입력 특징(IF)을 입력 데이터로 하고 정답 레이블을 타겟 데이터(

)로 한 트레이닝 데이터 셋을 이용하여 형세 판단 신경망(410)에서 생성된 출력 데이터(o)가 타겟 데이터(

)와 동일해지도록 형세 판단 신경망(420)을 충분히 학습할 수 있다. 일 예로, 형세 판단 모델은 형세 판단 손실(

)을 이용할 수 있다. 형세 판단 손실(

)은 평균 제곱 에러(mean square error)를 이용할 수 있다. 예를 들어, 형세 판단 손실(

)은 수학식 2와 같다.The situation judgment model uses the input feature (IF) as input data and the correct answer label as the target data (

), the output data (o) generated by the situation determination neural network 410 using the training data set as the target data (

), the situation determination neural network 420 can be sufficiently trained. As an example, the situation judgment model is a situation judgment loss (

) can be used. loss of judgment (

) can use the mean square error. For example, loss of judgment (

) is the same as in Equation 2.

(수학식 2)(Equation 2)

B는 바둑판의 전체 교차점 수이다. 바둑판은 가로 19줄 및 세로 19줄이 서로 교차하여 361개의 교차점이 배치된다. 이에 제한되는 것은 아니고 바둑판이 가로 9줄 및 세로 9줄일 경우 81개의 교차점이 배치될 수 있다.

는 현재 바둑판 상태(S)에서 정답 레이블에 따른 소정의 교차점(i)에 대한 형세값이다. 형세값에 대한 설명은 후술하는 도 11의 설명에 따른다.

는 현재 바둑판 상태(S)에서 소정의 교차점(i)을 형세 판단 신경망(410)에 입력하였을 때에 출력되는 출력 데이터이다. 형세 판단 모델은 형세 판단 손실(

)이 최소화되도록 경사 하강법(gradient-descent)과 역전파(backpropagation)을 이용하여 형세 판단 신경망(410) 내의 가중치와 바이어스 값들을 조절하여 형세 판단 신경망(410)를 학습시킬 수 있다.B is the total number of intersections of the checkerboard. In the checkerboard, 19 horizontal and 19 vertical lines intersect each other, and 361 intersections are arranged. It is not limited thereto, and when the checkerboard has 9 horizontal and 9 vertical lines, 81 intersections may be arranged.

is a configuration value for a predetermined intersection point (i) according to the correct answer label in the current checkerboard state (S). The description of the configuration value follows the description of FIG. 11, which will be described later.

is output data output when a predetermined intersection (i) is input to the situation determination neural network 410 in the current checkerboard state (S). The situation judgment model is based on the situation judgment loss (

) can be minimized by adjusting the weights and bias values in the situation determination neural network 410 using gradient-descent and backpropagation to train the situation determination neural network 410 .

형세 판단 신경망(410)은 신경망 구조로 구성될 수 있다. 일 예로, 형세 판단 신경망(420)은 19개의 레지듀얼(residual) 블록으로 구성될 수 있다. 도 8을 참조하면, 하나의 레지듀얼 블록은 256개의 3X3 컨볼루션 레이어, 일괄 정규화(batch normalization) 레이어, Relu 활성화 함수 레이어, 256개의 3X3 컨볼루션 레이어, 일괄 정규화(batch normalization) 레이어, 스킵 커넥션, Relu 활성화 함수 레이어 순으로 배치될 수 있다. 일괄 정규화(batch normalization) 레이어는 학습하는 도중에 이전 레이어의 파라미터 변화로 인해 현재 레이어의 입력의 분포가 바뀌는 현상인 공변량 변화(covariate shift)를 방지하기 위한 것이다. 스킵 커넥션은 블록 층이 두꺼워지더라도 신경망의 성능이 감소하는 것을 방지하고 블록 층을 더욱 두껍게 하여 전체 신경망 성능을 높일 수 있게 한다. 스킵 커넥션은 레지듀얼 블록의 최초 입력 데이터가 두 번째 일괄 정규화(batch normalization) 레이어의 출력과 합하여 두번째 Relu 활성화 함수 레이어에 입력되는 형태일 수 있다.The situation determination neural network 410 may be configured in a neural network structure. As an example, the situation determination neural network 420 may be composed of 19 residual blocks. Referring to FIG. 8 , one residual block includes 256 3X3 convolutional layers, batch normalization layers, Relu activation function layers, 256 3X3 convolutional layers, batch normalization layers, skip connections, Relu activation function layers may be arranged in order. The batch normalization layer is to prevent covariate shift, a phenomenon in which the distribution of the input of the current layer is changed due to the parameter change of the previous layer during learning. Skip connection prevents the performance of the neural network from decreasing even if the block layer becomes thicker, and makes the block layer thicker to increase the overall neural network performance. The skip connection may be in a form in which the first input data of the residual block is combined with the output of the second batch normalization layer and input to the second Relu activation function layer.

도 9 및 도 10은 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제1 및 제2 전처리 단계를 설명하기 위한 도면이고, 도 11은 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제3 전처리 단계를 설명하기 위한 도면이다.9 and 10 are diagrams for explaining the first and second pre-processing steps for generating a correct answer label used for learning the situation judgment model of the present invention, and FIG. 11 is a diagram for learning the situation judgment model of the present invention It is a diagram for explaining the third pre-processing step for generating the correct answer label used for

정답 레이블 생성부(430)는 형세 판단 신경망(410)이 정확한 형세 판단을 할 수 있도록 학습하는데 이용되는 정답 레이블을 생성할 수 있다.The correct answer label generator 430 may generate a correct answer label used for learning so that the situation determination neural network 410 can accurately determine the situation.

보다 구체적으로, 정답 레이블 생성부(430)는 입력 데이터에 기초가 되는 바둑판 상태(S)를 입력으로 받고, 현재 바둑판 상태(S)에서 끝내기를 하는 제1 전처리를 수행하여 제1 전처리 상태(P1)를 생성할 수 있다. 제1 전처리인 끝내기는 집 계산을 하기 전에 집의 경계가 명확해지도록 소정의 착수를 하여 게임을 마무리하는 과정이다. 일 예로, 도 9를 참조하면 정답 레이블 생성부(430)는 도 9의 (a)의 현재 바둑판 상태(S)에서 끝내기를 하여 도 9의 (b)의 제1 전처리 상태(P1)를 생성할 수 있다. More specifically, the correct answer label generating unit 430 receives the checkerboard state (S) that is the basis of the input data as an input, and performs a first preprocessing of ending the current checkerboard state (S) to the first preprocessing state (P1) ) can be created. The first pre-processing, ending, is a process of finishing the game by making a predetermined start so that the boundary of the house becomes clear before calculating the house. For example, referring to FIG. 9 , the correct answer label generating unit 430 generates the first pre-processing state P1 of FIG. 9(b) by ending the current checkerboard state S of FIG. can

정답 레이블 생성부(430)는 제1 전처리 상태(P1)에서 집 경계 내에 배치되며 집 구분에 불필요한 돌을 제거하는 제2 전처리를 수행하여 제2 전처리 상태(P2)를 생성할 수 있다. 예를 들어, 집 경계 내에 배치되며 집 구분에 불필요한 돌은 사석일 수 있다. 사석은 집안에 상대방 돌이 배치되어 어떻게 두어도 잡힐수 밖에 없어 죽게 된 돌임을 앞서 설명하였다. 또한, 집 경계 내에 배치되며 집 구분에 불필요한 돌은 집안에 배치된 자신의 돌일 수 있다. 일 예로, 도 9를 참조하면 정답 레이블 생성부(430)는 도 9의 (b)의 제1 전처리 상태(P1)에서 집 구분에 불필요한 돌을 제거하여 도 9의 (c)의 제2 전처리 상태(P2)를 생성할 수 있다.The correct answer label generator 430 may generate the second pre-processing state P2 by performing a second pre-processing that is disposed within the house boundary in the first pre-processing state P1 and removes stones unnecessary for house classification. For example, a stone that is placed within the boundary of a house and is not necessary to classify a house may be a quarry stone. I explained earlier that the stone was a stone that died because the other stone was placed in the house, and no matter how it was placed, it could only be caught. In addition, stones that are placed within the boundary of the house and are not necessary to classify the house may be own stones placed in the house. As an example, referring to FIG. 9 , the correct answer label generating unit 430 removes stones unnecessary for house classification in the first pre-processing state P1 of FIG. (P2) can be created.

다른 예로, 도 10을 참조하면, 정답 레이블 생성부(430)는 도 10의 (a)의 현재 바둑판 상태(S)에서 제1 전처리인 끝내기를 위하여 도 10의 (b)와 같이 빨간색 x에 착수할 수 있다. 정답 레이블 생성부(430)는 도 10의 (b)에서 파란색 x로 표시된 사석을 제거하기 위하여 녹색 x에 착수하여 사석을 제거하고 사석 제거를 위해 사용된 녹색 x에 착수한 돌도 제거하여 제2 전처리를 수행할 수 있다.As another example, referring to FIG. 10 , the correct answer label generating unit 430 starts the red x as shown in FIG. can do. The correct answer label generating unit 430 removes the green x to remove the rubble marked with a blue x in FIG. Pre-processing can be performed.

정답 레이블 생성부(430)는 제2 전처리 상태(P2)에서 각 교차점을 -1 부터 +1까지 표시된 형세값(g, 단 g는 정수)으로 변경하는 제3 전처리를 수행할 수 있다. 즉, 제3 전처리는 정답 레이블 생성부(430)가 이미지 특징인 제2 전처리 상태(P2)를 수치 특징인 제3 전처리 상태(P3)로 변경하는 것이다. 일 예로, 제2 전처리 상태(P2)에서 교차점에 내 돌이 배치되면 0, 내 집 영역이면 +1, 상대 돌이 배치되면 0, 상대 집 영역이면 -1로 대응할 수 있다. 이 경우, 형세 판단 신경망(410)은 형세 판단시 집, 돌, 사석을 구분할 수 있도록 학습될 수 있다. 다른 예로, 제2 전처리 상태(P2)에서 교차점에 내 돌이 배치되면 0, 내 집 영역이면 +1, 상대 돌이 배치되면 0, 상대 집 영역이면 -1, 빅 또는 공배이면 0으로 대응할 수 있다. 다른 예의 경우 형세 판단 신경망(410)은 형세 판단시 빅 또는 공배를 구분할 수 있도록 학습될 수 있다. 예를 들어, 도 11을 참조하면, 정답 레이블 생성부(430)는 도 11의 (a)의 제2 전처리 상태(P2)를 도 11의 (b)의 제3 전처리 상태(P3)로 특징을 변경할 수 있다. The correct answer label generator 430 may perform a third preprocessing of changing each intersection point to a shape value (g, where g is an integer) indicated from -1 to +1 in the second preprocessing state P2 . That is, in the third pre-processing, the correct answer label generating unit 430 changes the second pre-processing state P2, which is an image feature, to a third pre-processing state P3, which is a numerical feature. For example, in the second pre-processing state P2 , if my stone is placed at the intersection, it may correspond to 0, if it is my house area, +1, if the opponent stone is disposed, it may correspond to 0, and if it is the opponent's house area, it may correspond to -1. In this case, the situation determination neural network 410 may be trained to distinguish a house, a stone, and a stone when determining the situation. As another example, in the second pre-processing state P2 , if my stone is placed at the intersection, it may correspond to 0, if it is my house area, +1, if the opponent stone is placed, 0, if it is the opponent's house area, -1, and if it is big or common, it may correspond to 0. In another example, the situation determination neural network 410 may be trained to distinguish a big or a common match when determining a situation. For example, referring to FIG. 11 , the correct answer label generating unit 430 characterizes the second pre-processing state P2 of FIG. 11 (a) as the third pre-processing state P3 of FIG. 11 (b). can be changed

제3 전처리 상태(P3)는 바둑판 상태(S)에서의 형세 판단의 정답 레이블이 되고 형세 판단 신경망(410)의 학습 시 타겟 데이터(

)로 이용될 수 있다. The third pre-processing state (P3) becomes the correct answer label of the situation determination in the checkerboard state (S), and the target data (

) can be used as

도 12는 본 발명의 형세 판단 모델의 형세 판단 결과를 설명하기 위한 도면이다.12 is a view for explaining a situation determination result of the situation determination model of the present invention.

학습된 형세 판단 모델은 바둑판 상태가 입력되면 바둑판의 모든 교차점에 대한 형세값을 제공할 수 있다. 즉, 바둑판 교차점의 361개 지점에 대해 형세값인 -1 내지 +1의 정수 값을 제공할 수 있다. The learned layout determination model may provide a configuration value for all intersections of the checkerboard when the checkerboard state is input. That is, it is possible to provide an integer value of -1 to +1, which is a configuration value, for 361 points of the checkerboard intersection.

도 12를 참조하면, 형세 판단 모델 서버(400)는 형세 판단 모델이 제공한 형세값, 소정의 임계값, 돌의 유무를 이용하여 형세를 판단할 수 있다. 일 예로, 형세 판단 모델 서버(400)는 돌이 없는 곳이며, 형세 값이 제1 임계값을 넘으면 내 집이 될 가능성이 높은 곳으로 판단하고, +1에 가까운 값이면 내 집 영역으로 판단할 수 있다. 형세 판단 모델 서버(400)는 내 집일 가능성이 높을수록 점점 커지는 내 돌과 같은 색의 네모 형태로 표시할 수 있다. 형세 판단 모델 서버(400)는 돌이 없는 곳이며, 형세 값이 제2 임계값 이하이면 상대 집이 될 가능성이 높은 곳으로 판단하고, -1에 가까운 값이면 상대 집 영역으로 판단할 수 있다. 형세 판단 모델 서버(400)는 상대 집일 가능성이 높을수록 점점 커지는 상대 돌과 같은 색의 네모 형태로 표시할 수 있다. 형세 판단 모델 서버(400)는 돌이 없는 곳이며, 형세 값이 제3 임계값 범위 이내 또는 0에 가까운 값이면 공배 또는 빅으로 판단할 수 있다. 형세 판단 모델 서버(400)는 공배 또는 빅으로 판단하면 X로 표시할 수 있다. 형세 판단 모델 서버(400)는 돌이 있는 곳이며, 형세 값이 제3 임계값 범위 이내 또는 0에 가까운 값이면 내 돌 또는 상대 돌로 판단할 수 있다. 형세 판단 모델 서버(400)는 공배 또는 빅으로 판단하면 아무런 표시를 안할 수 있다. 형세 판단 모델 서버(400)는 돌이 있는 곳이며, 형세 값이 제1 임계값을 넘으면 상대 돌의 사석이 될 가능성이 높은 곳으로 판단하고, +1에 가까운 값이면 상대 돌의 사석으로 판단할 수 있다. 형세 판단 모델 서버(400)는 상대 돌의 사석일 가능성이 높을수록 점점 커지는 내 돌과 같은 색의 네모 형태로 표시할 수 있다. 형세 판단 모델 서버(400)는 돌이 있는 곳이며, 형세 값이 제2 임계값 이하이면 내 돌의 사석이 될 가능성이 높은 곳으로 판단하고, -1에 가까운 값이면 상대 돌의 사석으로 판단할 수 있다. 형세 판단 모델 서버(400)는 상대 돌의 사석일 가능성이 높을수록 점점 커지는 상대 돌과 같은 색의 네모 형태로 표시할 수 있다. Referring to FIG. 12 , the condition determination model server 400 may determine the condition by using the condition value provided by the condition determination model, a predetermined threshold value, and the presence or absence of stones. For example, the situation determination model server 400 is a place without stones, and if the situation value exceeds the first threshold value, it is determined as a place with a high probability of becoming my home, and if the value is close to +1, it can be determined as my home area. have. The situation determination model server 400 may display in the form of a square of the same color as my stone, which increases as the probability that it is my house increases. The situation determination model server 400 may determine that there is no stone, and if the situation value is less than or equal to the second threshold value, it is highly likely to become the other house, and if the value is close to -1, it may be determined as the other house area. The situation determination model server 400 may display a square shape of the same color as the counterpart stone, which increases as the probability of the counterpart's house increases. The situation determination model server 400 is a place where there are no stones, and if the situation value is within the third threshold range or a value close to 0, it may be determined to be common or big. The situation determination model server 400 may display an X when it is determined as a common match or a big match. The shape determination model server 400 is where the stone is, and if the shape value is within the range of the third threshold value or close to 0, it may determine the stone as my stone or the opponent's stone. The situation judgment model server 400 may not display anything when it is judged to be a common match or a big game. The situation determination model server 400 is a place where the stone is located, and if the situation value exceeds the first threshold value, it is determined as a place with a high probability of becoming a stone of the opponent stone, and if the value is close to +1, it can be determined as a stone of the opponent stone. have. The situation determination model server 400 may display in the form of a square of the same color as the inner stone, which increases as the probability that the other stone is a non-stone. The situation determination model server 400 is a place where a stone is located, and if the situation value is less than the second threshold, it is determined as a place with a high probability of becoming my stone, and if the value is close to -1, it can be determined as a stone of the other stone. have. The situation determination model server 400 may display a square shape of the same color as that of the opponent stone, which increases as the probability that the opponent stone is a quarry stone increases.

또한, 형세 판단 모델 서버(300)는 각 교차점에서 판단한 형세 판단 기준을 이용하여 현재 바둑판 상태에서의 계가 결과를 표시할 수 있다. In addition, the situation determination model server 300 may display the result of the calculation in the current checkerboard state by using the situation determination criterion determined at each intersection.

또한, 형세 판단 모델 서버(300)는 바둑판 상태에 따른 집수의 변화량 정보 및 공배수 정보를 생성할 수 있다. 예를 들어, 형세 판단 모델 서버(300)는 이전 착수에 따른 바둑판 상태의 형세 판단 결과와 현재 바둑판 상태의 형세 판단 결과를 이용하여 집수의 변화량 정보를 생성할 수 있다. 또한, 형세 판단 모델 서버(300)는 바둑판 상태의 형세 판단 결과를 이용하여 공배수 정보를 생성할 수 있다. In addition, the situation determination model server 300 may generate information on the amount of change in water collection and information on the common drainage according to the state of the checkerboard. For example, the condition determination model server 300 may generate information on the amount of change in water collection by using the condition determination result of the checkerboard state according to the previous start and the condition determination result of the current checkerboard condition. In addition, the situation determination model server 300 may generate common multiple information using the result of determining the situation of the checkerboard state.

따라서, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 그 장치는 딥러닝 신경망을 이용하여 바둑 형세를 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 정확히 구분하여 바둑의 형세를 정확히 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 예측하여 바둑의 형세를 정확히 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑 대국 중 신속하게 형세를 판단할 수 있다.Therefore, the deep learning-based Go game service device according to the embodiment may determine the Go situation using a deep learning neural network. In addition, the deep learning-based Go game service apparatus according to the embodiment can accurately determine the situation of Go by accurately classifying a house, a private stone, a stone, a ball, and a big according to the Go rules. In addition, the deep learning-based Go game service device according to the embodiment may accurately determine the situation of Go by predicting a house, a stone, a stone, a ball, and a big according to the Go rules. In addition, the deep learning-based Go game service device according to the embodiment can quickly determine the situation in the Go game.

도 13은 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이고, 도 14는 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이고, 도 15는 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이다.13 is a view comparing the situation determination result of the situation determination model of the present invention and the situation determination result by the deep learning model according to the prior art, and FIG. 14 is the situation determination result of the situation determination model of the present invention and the prior art It is a state of comparing the situation judgment result by the deep learning model, and FIG. 15 is a state comparing the situation judgment result of the situation judgment model of the present invention with the situation judgment result by the deep learning model according to the prior art.

도 13을 참조하면, 본 발명의 형세 판단 모델은 도 13의 (a)의 B영역과 같이 교차점 마다 집, 돌, 사석을 구분하여 형세를 판단한다. 그러나 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 모델은 도 13의 (b)에서 도 13의 (a)와 대응되는 영역의 교차점에 대하여 집, 돌, 사석을 구분하지 못한다.Referring to FIG. 13 , the situation determination model of the present invention determines the situation by classifying houses, stones, and rocks at each intersection, as in region B of FIG. 13 (a). However, the situation judgment model by the deep learning model according to the prior art cannot distinguish a house, a stone, and a gravel at the intersection of the regions corresponding to those of FIG. 13(b) in FIG. 13(a).

마찬가지로 도 14를 참조하면, 본 발명의 형세 판단 모델은 도 14의 (a)의 C영역과 같이 교차점 마다 집, 돌, 사석을 구분하여 형세를 판단한다. 그러나 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 모델은 도 14의 (b)에서 도 13의 (a)와 대응되는 영역의 교차점에 대하여 집, 돌, 사석을 구분하지 못한다.Similarly, referring to FIG. 14 , the situation determination model of the present invention determines the situation by classifying houses, stones, and rocks at each intersection, as in region C of FIG. 14 (a). However, the situation judgment model by the deep learning model according to the prior art cannot distinguish a house, a stone, and a gravel with respect to the intersection of the regions corresponding to those of FIG. 14(b) to FIG. 13(a).

도 15을 참조하면, 본 발명의 형세 판단 모델은 도 15의 (a)의 D영역과 같이 백집을 제대로 인식한다. 그러나 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 모델은 도 15의 (b)에서 도 15의 (a)와 대응되는 영역에서 백집을 구분하지 못한다.Referring to FIG. 15 , the situation judgment model of the present invention properly recognizes a bag as in the region D of FIG. 15 (a). However, the situation judgment model by the deep learning model according to the prior art does not distinguish the bag in the area corresponding to FIG. 15 (b) in FIG. 15 (a).

도 16은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템에 신호 흐름에 대한 예시도이다.16 is an exemplary diagram of a signal flow in a deep learning-based Go game service system according to an embodiment of the present invention.

도 16을 참조하면, 착수 모델 서버(300)는 인공지능 컴퓨터로써 자신의 턴에서 대국에서 이길 수 있도록 바둑돌의 착수를 수행할 수 있도록 바둑 규칙에 따라 스스로 학습하여 딥러닝 모델인 착수 모델을 트레이닝 할 수 있다(s11). 바둑서버(22)는 복수의 기보를 형세 판단 모델 서버(400)에게 송신할 수 있다. 형세 판단 모델 서버(400)는 트레이닝 데이터 셋을 생성할 수 있다. 먼저, 형세 판단 모델 서버(400)는 복수의 기보의 바둑판 상태에서 입력 특징을 추출할 수 있다(S13). 형세 판단 모델 서버(400)는 입력 특징을 추출한 바둑판 상태를 이용하여 정답 레이블을 생성할 수 있다(S14). 형세 판단 모델 서버(400)은 입력 특징을 입력 데이터로 하고 정답 레이블을 타겟 데이터로 한 트레이닝 데이터 셋을 이용하여 형세 판단 모델을 트레이닝 할 수 있다(S15). 단말기(100)는 바둑서버(200)에 인공지능 컴퓨터를 상대로 또는 다른 유저 단말기를 상대로 바둑 게임을 요청할 수 있다(S16). 바둑서버(200)는 단말기(100)가 인공지능 컴퓨터를 상대로 바둑 게임을 요청하면 착수 모델 서버(300)에 착수를 요청할 수 있다(S17). 바둑서버(200)는 바둑 게임을 진행하며 단말기(100)와 착수 모델 서버(300)가 자신의 턴에 착수를 수행할 수 있다(S18 내지 S20). 대국 중 단말기(100)는 바둑서버(200)에 형세 판단을 요청할 수 있다(S21). 바둑서버(200)는 형세 판단 모델 서버(400)에게 현재 바둑판 상태에 대한 형세 판단을 요청할 수 있다(S22). 형세 판단 모델 서버(400)는 현재 바둑판 상태의 입력 특징을 추출하고, 딥러닝 모델인 형세 판단 모델이 입력 특징을 이용하여 형세값을 생성하고, 바둑판 상태와 형세값을 이용하여 형세 판단을 수행할 수 있다(S23). 형세 판단 모델 서버(400)는 형세 판단 결과를 바둑서버(200)에 제공할 수 있다(S24). 바둑서버(200)는 단말기(100)에 형세 판단 결과를 제공할 수 있다(S25).Referring to FIG. 16, the start model server 300 is an artificial intelligence computer that learns by itself according to the rules of Go so as to be able to perform the initiation of Go so that it can win the game in its own turn to train the initiation model, which is a deep learning model. can be (s11). Go server 22 may transmit a plurality of notations to the situation determination model server 400 . The situation determination model server 400 may generate a training data set. First, the situation determination model server 400 may extract input features from the checkerboard state of a plurality of notations (S13). The situation determination model server 400 may generate a correct answer label using the checkerboard state from which the input features are extracted (S14). The situation determination model server 400 may train the situation determination model using a training data set using the input characteristics as input data and the correct answer label as the target data (S15). The terminal 100 may request the Go game from the Go server 200 against the artificial intelligence computer or against another user terminal (S16). The Go server 200 may request the start model server 300 when the terminal 100 requests a Go game against the artificial intelligence computer (S17). Go server 200 proceeds a Go game, and the terminal 100 and the start model server 300 may perform a start in their turn (S18 to S20). During the game, the terminal 100 may request the Go server 200 to determine the situation (S21). The Go server 200 may request the condition determination model server 400 to determine the current condition of the Go board (S22). The situation determination model server 400 extracts the input features of the current checkerboard state, the deep learning model, the situation determination model, generates a situation value using the input features, and performs the situation determination using the checkerboard state and the situation value. can be (S23). The situation determination model server 400 may provide the situation determination result to the Go server 200 (S24). The Go server 200 may provide the result of determining the situation to the terminal 100 (S25).

도 17은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 형세 판단 방법이고, 도 18은 도 17의 형세 판단 방법 중 정답 레이블을 생성하기 위한 트레이닝 데이터의 전처리 방법이다.17 is a method for determining a situation in the deep learning-based Go game service method according to an embodiment of the present invention, and FIG. 18 is a method for pre-processing training data for generating a correct answer label among the method for determining the situation of FIG. 17 .

도 17을 참조하면, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델 서버가 바둑서버로부터 복수의 기보를 수신하는 단계(S100)를 포함할 수 있다. Referring to FIG. 17 , the deep learning-based Go game service method according to an embodiment of the present invention may include the step (S100) of the situation determination model server receiving a plurality of notations from the Go server.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델 서버의 형세 판단 모델 중 입력 특징 추출부가 복수의 기보의 바둑판 상태에서 입력 특징을 추출하는 단계(S200)를 포함할 수 있다. 입력 특징을 추출하는 방법은 도 7의 설명을 따른다.The deep learning-based Go game service method according to an embodiment of the present invention may include the step (S200) of extracting input features from the checkerboard state of a plurality of notations by the input feature extraction unit among the posture determination models of the posture determination model server. . The method of extracting the input feature follows the description of FIG. 7 .

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델 중 정답 레이블 생성부가 입력 특징을 추출한 바둑판 상태에 기초하여 정답 레이블을 생성하는 단계(S300)를 포함할 수 있다. 일 예로, 도 18을 참조하면, 정답 레이블 생성 단계(S300)는 정답 레이블 생성부가 현재 바둑판 상태에서 끝내기 하는 제1 전처리하는 단계(S301)를 포함할 수 있다. 제1 전처리하는 단계(S301)는 도 9 내지 도 10의 설명을 따른다. 정답 레이블 생성 단계(S300)는 정답 레이블 생성부가 제1 전처리된 바둑판 상태에서 불필요한 돌을 제거하는 제2 전처리하는 단계(S302)를 포함할 수 있다. 제2 전처리하는 단계(S302)는 도 9 내지 도 10의 설명을 따른다. 정답 레이블 생성 단계(S300)는 정답 레이블 생성부가 제2 전처리된 바둑판 상태의 각 교차점을 형세값으로 변경하는 제3 전처리하는 단계(S303)를 포함할 수 있다. 제3 전처리하는 단계(S303)는 도 11의 설명을 따른다. 정답 레이블 생성 단계(S300)는 제3 전처리 상태를 정답 레이블로 하여 형세 판단 신경망에 타겟 데이터로 제공하는 단계(S303)를 포함할 수 있다. 타겟 데이터를 제공하는 단계(S301)는 도 7 및 도 11의 설명을 따른다.The deep learning-based Go game service method according to an embodiment of the present invention may include generating a correct answer label based on a checkerboard state from which the correct answer label generator extracts input features from the situation determination model (S300). As an example, referring to FIG. 18 , the correct answer label generating step ( S300 ) may include a first pre-processing step ( S301 ) in which the correct answer label generating unit ends the current checkerboard state. The first pre-processing step ( S301 ) follows the description of FIGS. 9 to 10 . The correct answer label generating step (S300) may include a second pre-processing step (S302) of the correct answer label generating unit removing unnecessary stones from the first pre-processed checkerboard state. The second pre-processing step ( S302 ) follows the description of FIGS. 9 to 10 . The correct answer label generating step (S300) may include a third pre-processing step (S303) of the correct answer label generating unit changing each intersection of the second pre-processed checkerboard state to a shape value. The third pre-processing step ( S303 ) follows the description of FIG. 11 . The correct answer label generation step (S300) may include a step (S303) of providing the third preprocessing state as the correct answer label as target data to the situation determination neural network. The step of providing the target data ( S301 ) follows the description of FIGS. 7 and 11 .

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 트레이닝 데이터 셋을 이용하여 형세 판단 모델의 형세 판단 신경망을 트레이닝하는 단계(S400)을 포함할 수 있다. 형세 판단 신경망을 트레이닝(학습)하는 방법은 도 7의 설명을 따른다.The deep learning-based Go game service method according to an embodiment of the present invention may include training a situation determination neural network of a situation determination model using a training data set (S400). A method of training (learning) a situation determination neural network follows the description of FIG. 7 .

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 신경망의 트레이닝이 완료되어 형세 판단 모델을 구축하는 단계(S500)를 포함한다. 일 예로, 형세 판단 신경망의 트레이닝의 완료는 도 7의 형세 판단 손실이 소정의 값 이하가 된 경우일 수 있다.The deep learning-based Go game service method according to an embodiment of the present invention includes a step (S500) of completing the training of the situation determination neural network to build a situation determination model. As an example, the training of the situation determination neural network may be completed when the situation determination loss of FIG. 7 is less than or equal to a predetermined value.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 단말기의 형세 판단 요청에 의해 현재 바둑판 상태가 형세 판단 모델에 입력되는 단계(S600)를 포함할 수 있다. The deep learning-based Go game service method according to an embodiment of the present invention may include a step (S600) of inputting a current Go board state into a situation determination model in response to a request for determination of a condition of a terminal.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델이 입력된 현재 바둑판 상태의 형세 판단을 수행하는 단계(S700)를 포함할 수 있다. 형세 판단을 수행하는 단계(S700)는 도 12에서 설명한 형세 판단 모델이 현재 바둑판 상태의 형세값을 생성하는 설명을 따를 수 있다.The deep learning-based Go game service method according to an embodiment of the present invention may include a step (S700) of determining the current state of the Go board to which the situation determination model is input. The step ( S700 ) of determining the shape may follow the description in which the shape determination model described in FIG. 12 generates the shape value of the current checkerboard state.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델 서버가 형세 판단 결과를 출력하는 단계(S800)를 포함할 수 있다. 형세 판단 결과를 출력하는 단계(S800)는 도 12에서 설명한 형세 판단 모델 서버가 형세값, 바둑판의 상태, 소정의 임계값을 이용하여 형세 판단 결과를 제공하는 설명을 따를 수 있다. The deep learning-based Go game service method according to an embodiment of the present invention may include the step (S800) of the situation determination model server outputting the situation determination result. The step of outputting the situation determination result ( S800 ) may follow the description in which the situation determination model server provides the situation determination result by using the situation value, the state of the checkerboard, and a predetermined threshold value described in FIG. 12 .

따라서, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 딥러닝 신경망을 이용하여 바둑 형세를 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 정확히 구분하여 바둑의 형세를 정확히 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 예측하여 바둑의 형세를 정확히 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 바둑 대국 중 신속하게 형세를 판단할 수 있다.Therefore, the deep learning-based Go game service method according to the embodiment may determine the Go situation using a deep learning neural network. In addition, the deep learning-based Go game service method according to the embodiment can accurately determine the situation of Go by accurately classifying a house, a private stone, a stone, a ball, and a big according to the Go rules. In addition, the deep learning-based Go game service method according to the embodiment can accurately determine the situation of Go by predicting a house, a stone, a stone, a ball, and a big according to the Go rules. In addition, the deep learning-based Go game service method according to the embodiment can quickly determine the situation in the Go game.

<일 실시예에 따른 시간 관리 모델 서버><Time management model server according to an embodiment>

도 19는 본 발명의 일 실시예에 따른 시간 관리 모델 서버의 시간 관리부를 설명하기 위한 도면이고, 도 20은 본 발명의 일 실시예에 따른 시간 관리부의 분산 산출을 설명하기 위한 도면이다.19 is a diagram for explaining a time management unit of a time management model server according to an embodiment of the present invention, and FIG. 20 is a diagram for explaining distributed calculation of the time management unit according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 시간 관리 모델 서버(500)가 시간 관리 정보 중 하나인 착수 준비 시간을 결정하고, 착수 모델이 기 설정된 또는 결정된 착수 준비 시간에 기초하여 착수할 수 있다. 시간 관리 모델 서버(500)는 사용자와 인공지능 컴퓨터의 대국 또는 인공지능 컴퓨터 간의 대국에서 경기가 막상막하일 경우 인공지능 컴퓨터가 착수하기 더 좋은 수를 두기 위하여 좋은 수를 찾기 위해 착수 준비 시간을 증가시킬 수 있다. Deep learning-based Go game service according to an embodiment of the present invention, the time management model server 500 determines the start preparation time, which is one of the time management information, and the start model is preset or based on the set start preparation time. can get started The time management model server 500 increases the start preparation time to find a good number in order to place a better number for the artificial intelligence computer to undertake when the game is close in the game between the user and the artificial intelligence computer or the artificial intelligence computer. can do it

보다 구체적으로, 도 19를 참조하면, 시간 관리 모델 서버(500)는 착수 모델 서버(300) 로부터 탐색 확률값(

), 방문 횟수(N) 또는 가치값(V)을 수신하고, 착수 모델 서버(300)로 착수 준비 시간(PP)을 제공할 수 있다. 일 예로, 착수 준비 시간(PP)은 평균 착수 준비 시간, 제1 착수 준비 시간 또는 제2 착수 준비 시간을 포함할 수 있다. 제1 착수 준비 시간은 평균 착수 준비 시간보다 길고, 제2 착수 준비 시간은 제1 착수 준비 시간보다 길 수 있다. 시간 관리 모델 서버(500)는 시간 관리부(510)를 포함할 수 있다. 시간 관리부(510)는 탐색 확률값(

) 및 방문 횟수(N)에 기초하여 착수 준비 시간(PP)을 결정할 수 있다. 시간 관리부(510)는 탐색 확률값(

), 방문 횟수(N)을 이용하여 분산을 산출하고, 분산과 임계 분산 값을 비교하여 착수 준비 시간을 결정할 수 있다. 즉, 착수 후보점에 대한 분산이 낮을수록 착수 후보점이 가장 좋은 수가 아닐 가능성이 높다는 것이고 이러한 정보에 비추어 현재 경기가 막상막하일 가능성이 높을 수 있다. 이에, 시간 관리부는 분산이 임계 분산 값보다 낮을 경우 착수 준비 시간을 더 길게 하여 착수 모델이 더 좋은 수를 착도록 할 수 있다. 분산은 수학식 3과 수학식 4를 이용하여 구할 수 있다.More specifically, referring to FIG. 19 , the time management model server 500 is a search probability value (

), the number of visits (N) or a value value (V) may be received, and may provide a set-up preparation time (PP) to the set-off model server 300 . As an example, the start preparation time (PP) may include an average start preparation time, a first set off preparation time, or a second set off preparation time. The first set-out preparation time may be longer than the average set-out preparation time, and the second set-out preparation time may be longer than the first set off preparation time. The time management model server 500 may include a time management unit 510 . The time management unit 510 is a search probability value (

) and the number of visits (N) to determine the start preparation time (PP). The time management unit 510 is a search probability value (

), calculate the variance using the number of visits (N), and compare the variance with the critical variance value to determine the start preparation time. In other words, the lower the variance of the starting candidate points, the higher the possibility that the starting candidate points are not the best. Accordingly, when the variance is lower than the threshold variance value, the time management unit may lengthen the start preparation time so that the set off model can wear a better number. The variance can be calculated using

Equations

3 and 4.

(수학식 3)(Equation 3)

수학식 3에서 n은 교차점 수이고,

는 각 교차점에 대한 방문 횟수이고,

는 각 교차점에 대한 탐색 확률값이고,

는 평균 방문 횟수이다.In Equation 3, n is the number of intersections,

is the number of visits to each intersection,

is the search probability value for each intersection,

is the average number of visits.

(수학식 4)(Equation 4)

수학식 4에서 Var는 분산이다.In Equation 4, Var is variance.

시간 관리부(510)는 분산(Var)이 임계 분산값보다 낮으면 제1 착수 준비 시간으로 착수 준비 시간을 결정할 수 있고, 분산(Var)이 임계 분산값보다 낮지 않으면 평균 착수 준비 시간으로 착수 준비 시간을 결정할 수 있다.The time management unit 510 may determine the start preparation time as the first set off preparation time if the variance (Var) is lower than the threshold variance value, and if the variance (Var) is not lower than the threshold variance value, the start preparation time as the average set off preparation time can be decided

예를 들어, 도 20을 참조하면, 임계 분산값은 0.05일 수 있다. 도 20(a)의 경우, 시간 관리부(510)는 분산(Var)이 0.2109로 산출할 수 있다. 이 경우, 시간 관리부(510)는 분산(Var)이 임계 분산값보다 높으므로 평균 착수 준비 시간으로 착수 준비 시간을 결정할 수 있다. 도 20(b)의 경우, 시간 관리부(510)는 분산(Var)이 0.0145로 산출할 수 있다. 이 경우, 시간 관리부(510)는 분산(Var)이 임계 분산값보다 낮으므로 제1 착수 준비 시간으로 착수 준비 시간을 결정할 수 있다. 이에, 착수 모델은 더 오랫동안 또는 더 많은 횟수의 MCTS 시뮬레이션을 수행하여 더 좋은 착수 후보점을 선택할 수 있다. For example, referring to FIG. 20 , the threshold variance value may be 0.05. In the case of FIG. 20A , the time management unit 510 may calculate the variance Var as 0.2109. In this case, since the variance (Var) is higher than the threshold variance value, the time management unit 510 may determine the start preparation time as the average start preparation time. In the case of FIG. 20B , the time management unit 510 may calculate the variance Var as 0.0145. In this case, the time management unit 510 may determine the start preparation time as the first set off preparation time because the variance (Var) is lower than the threshold variance value. Accordingly, the launch model may select better candidate points for starting by performing MCTS simulations for a longer period of time or a greater number of times.

또한, 시간 관리부(510)는 가치값(V)에 기초하여 착수 준비 시간(PP)을 결정할 수 있다. 보다 구체적으로, 시간 관리부(510)는 제1 착수 준비 시간 결정 후 가치값(V)이 임계 가치값 이하이면 착수 준비 시간(PP)을 제2 착수 준비 시간으로 결정할 수 있다. 예를 들어, 임계 가치값은 50.0%일 수 있다. 즉, 착수 후보점의 가치값이 낮다는 것은 현재 경기에서 지고 있을 가능성이 높다는 것이므로 시간 관리부(510)가 더 좋은 착수 후보점을 찾도록 착수 모델이 더 오랫동안 또는 더 많은 횟수의 MCTS 시뮬레이션을 수행할 수 있도록 하는 것이다.In addition, the time management unit 510 may determine the start preparation time (PP) based on the value (V). More specifically, the time management unit 510 may determine the set off preparation time (PP) as the second set off preparation time when the value (V) is less than or equal to the threshold value after determining the first set off preparation time. For example, the threshold value may be 50.0%. That is, since the low value of the starting candidate point means that it is highly likely to lose the current game, the starting model performs MCTS simulation for a longer time or more times so that the time management unit 510 finds a better starting candidate point. is to make it possible

따라서, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑 게임 시간 관리를 할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 중요한 국면에서 착수 준비 시간을 변경할 수 있다.Therefore, the deep learning-based Go game service device according to the embodiment may manage the Go game time. In addition, the deep learning-based Go game service device according to the embodiment may change the start preparation time in an important phase.

도 21은 본 발명의 일 실시예에 따른 시간 관리 모델 서버의 바둑 게임 서비스 시스템에서의 신호 흐름에 대한 예시도이다.21 is an exemplary diagram of a signal flow in the Go game service system of the time management model server according to an embodiment of the present invention.

도 21을 참조하면, 착수 모델 서버(300)는 대국 중에 대국에서 이기기 위한 착수점을 찾기 위하여 MCTS 시뮬레이션을 수행할 수 있다(S2101). 착수 모델 서버(300)는 MCTS 시뮬레이션 결과 생성된 탐색 확률값(

), 방문 횟수(N), 가치값(V)를 시간 관리 모델 서버(500)에 전송할 수 있다(S2102). 시간 관리 모델 서버(500)는 수신한 탐색 확률값(

), 방문 횟수(N), 가치값(V)에 기초하여 착수 준비 시간을 결정할 수 있다(S2103). 시간 관리 모델 서버(500)는 결정된 착수 준비 시간을 착수 모델 서버(300)에 전송할 수 있다(S2104). 착수 모델 서버(300)는 수신한 착수 준비 시간 또는 기설정된 착수 준비 시간에 기초하여 착수를 수행할 수 있다(S2105). 착수 모델 서버(300)는 수신한 착수 준비 시간과 기설정된 착수 준비 시간 중 수신한 착수 준비 시간에 우선하여 착수를 수행할 수 있다. Referring to FIG. 21 , the start model server 300 may perform MCTS simulation to find a start point for winning in a game during a game (S2101). Start model server 300 is a search probability value generated as a result of MCTS simulation (

), the number of visits (N), and the value value (V) may be transmitted to the time management model server 500 (S2102). The time management model server 500 receives the search probability value (

), the number of visits (N), it is possible to determine the start preparation time based on the value (V) (S2103). The time management model server 500 may transmit the determined start preparation time to the start model server 300 (S2104). Initiation model server 300 may perform an initiation based on the received start preparation time or a preset start preparation time (S2105). Initiation model server 300 may perform the initiation in preference to the received start preparation time among the received start preparation time and the preset start preparation time.

도 22는 본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 착수 준비 시간 결정 방법이다.22 is a method for determining a start preparation time among the deep learning-based Go game service methods according to an embodiment of the present invention.

도 22를 참조하면, 본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 탐색 확률값(

), 방문 횟수(N) 또는 가치값(V)을 수신하는 단계(S2201)을 포함할 수 있다. Referring to Figure 22, in the deep learning-based Go game service method according to an embodiment of the present invention, the time management model server 500 is a search probability value (

), receiving the number of visits (N) or a value (V) (S2201).

본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500) 중 시간 관리부가 시간 관리 모델 서버(500)가 탐색 확률값(

), 방문 횟수(N)에 기초하여 분산을 산출하는 단계(S2202)를 포함할 수 있다. 분산을 산출하는 방법은 도 19 및 도 20의 설명을 따른다.In the deep learning-based Go game service method according to an embodiment of the present invention, the time management unit time management model server 500 among the time management model servers 500 searches for probability values (

), calculating a variance based on the number of visits (N) (S2202). The method of calculating the variance follows the description of FIGS. 19 and 20 .

본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500) 중 시간 관리부가 산출된 분산이 임계 분산값 미만인지 판단하는 단계(S2203)를 포함할 수 있다. The deep learning-based Go game service method according to an embodiment of the present invention may include a step (S2203) of determining whether the variance calculated by the time management unit among the time management model servers 500 is less than a threshold variance value (S2203).

본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500) 중 시간 관리부가 분산이 임계 분산값 미만이 아니면 착수 준비 시간을 평균 착수 준비 시간으로 결정하는 단계(S2204)를 포함할 수 있다. 착수 준비시간을 평균 착수 준비 시간으로 결정하는 방법은 도 19 및 도 20의 설명을 따른다.In the deep learning-based Go game service method according to an embodiment of the present invention, if the time management unit of the time management model server 500 is not less than the threshold variance value, the step of determining the start preparation time as the average start preparation time (S2204) ) may be included. A method of determining the start preparation time as the average start preparation time follows the description of FIGS. 19 and 20 .

본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500) 중 시간 관리부가 분산이 임계 분산값 미만이면 착수 준비 시간을 제1 착수 준비 시간으로 결정하는 단계(S2205)를 포함할 수 있다. 착수 준비 시간을 제1 착수 준비 시간으로 결정하는 방법은 도 19 및 도 20의 설명을 따른다.In the deep learning-based Go game service method according to an embodiment of the present invention, if the time management unit of the time management model server 500 is less than a threshold variance value, the step of determining the start preparation time as the first start preparation time (S2205) ) may be included. A method of determining the set-out preparation time as the first set-off preparation time follows the description of FIGS. 19 and 20 .

본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500) 중 시간 관리부가 착수 준비 시간을 제1 착수 준비 시간으로 결정 후 가치값이 임계 가치값 이하인지 판단하는 단계(S2206)를 포함할 수 있다. 가치값이 임계 가치값 이하인지 판단하는 방법은 도 19 및 도 20의 설명을 따른다. 시간 관리부는 가치값이 임계 가치값 이하가 아니면 제1 착수 준비 시간을 착수 준비 시간으로 결정할 수 있다.Deep learning-based Go game service method according to an embodiment of the present invention is to determine whether the value is less than or equal to a threshold value after the time management unit determines the start preparation time as the first start preparation time of the time management model server 500 It may include step S2206. A method of determining whether the value is equal to or less than the threshold value follows the description of FIGS. 19 and 20 . If the value is not less than the threshold value, the time management unit may determine the first start preparation time as the start preparation time.

본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500) 중 시간 관리부가 가치값이 임계 가치값 이하이면 착수 준비 시간을 제2 착수 준비 시간으로 결정하는 단계(S2207)를 포함할 수 있다. 착수 준비 시간을 제2 착수 준비 시간으로 결정하는 방법은 도 19 및 도 20의 설명을 따른다. The deep learning-based Go game service method according to an embodiment of the present invention comprises the steps of determining the start preparation time as the second start preparation time if the time management value of the time management model server 500 is less than or equal to the threshold value ( S2207) may be included. A method of determining the set-off preparation time as the second set-off preparation time follows the description of FIGS. 19 and 20 .

본 발명의 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500) 중 시간 관리부가 결정된 착수 준비 시간을 전송하는 단계(S2208)을 포함할 수 있다.The deep learning-based Go game service method according to an embodiment of the present invention may include a step (S2208) of transmitting the determined start preparation time to the time management unit of the time management model server 500.

따라서, 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 바둑 게임 시간 관리를 할 수 있다. 또한, 일 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 중요한 국면에서 착수 준비 시간을 변경할 수 있다.Therefore, the deep learning-based Go game service method according to an embodiment can manage the Go game time. In addition, the deep learning-based Go game service method according to an embodiment may change the start preparation time in an important phase.

<다른 실시예에 따른 시간 관리 모델 서버><Time management model server according to another embodiment>

도 23은 본 발명의 다른 실시예에 따른 시간 관리 모델 서버의 시뮬레이션 횟수 예측부와 시뮬레이션 횟수 조정 모델을 설명하기 위한 도면이고, 도 24는 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 MCTS 시뮬레이션 횟수 결정 방법이다. 23 is a diagram for explaining a simulation number prediction unit and a simulation number adjustment model of a time management model server according to another embodiment of the present invention, and FIG. 24 is a deep learning-based Go game service according to another embodiment of the present invention. Among the methods, it is a method of determining the number of MCTS simulations.

도 23을 참조하면, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 시간 관리 모델 서버(500)가 상술된 착수 모델이 수행하는 MCTS 시뮬레이션 횟수(C: 이하, 시뮬레이션 횟수)를 결정하게 할 수 있다. 또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 상기 결정된 시뮬레이션 횟수(C)를 기초로 상기 착수 모델이 MCTS를 수행하여 다음 착수점을 결정하게 할 수 있다. Referring to Figure 23, the deep learning-based Go game service according to another embodiment of the present invention determines the number of MCTS simulations (C: hereinafter, the number of simulations) performed by the time management model server 500 by the start model described above. can do it In addition, the deep learning-based Go game service according to another embodiment of the present invention may allow the start model to determine the next start point by performing MCTS based on the determined number of simulations (C).

자세히, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 시간 관리 모델 서버(500)의 딥러닝 모델인 시뮬레이션 횟수 예측부(551: 이하, 횟수 예측부) 및/또는 시뮬레이션 횟수 조정 모델(552: 이하, 횟수 조정 모델)에 기초하여 결정되는 시뮬레이션 횟수(C)를 기초로 상술된 착수 모델이 MCTS를 수행하여 다음 착수점을 결정하게 할 수 있다. In detail, the deep learning-based Go game service according to another embodiment of the present invention is a deep learning model of the time management model server 500, a simulation number prediction unit (551: hereinafter, number prediction unit) and/or a simulation number adjustment model (552: hereinafter, the number of times adjustment model) based on the number of simulations (C) determined on the basis of the above-described set off model may perform MCTS to determine the next set point.

여기서, 시간 관리 모델 서버(500)는 횟수 예측부(551)를 포함할 수 있다. 횟수 예측부(551)는 복수의 바둑판 상태(S), 디폴트 시뮬레이션 횟수(D: 이하, 디폴트 횟수), 방문 횟수(N) 및/또는 가치값(V)에 기초하여 복수의 바둑판 상태-최적 시뮬레이션 횟수(SC) 데이터를 생성할 수 있다. Here, the time management model server 500 may include a number prediction unit 551 . The number prediction unit 551 is based on the plurality of checkerboard states (S), the default number of simulations (D: hereinafter, the default number), the number of visits (N) and/or the value value (V), a plurality of checkerboard states-optimal simulations Count (SC) data can be generated.

또한, 시간 관리 모델 서버(500)는 횟수 조정 모델(552)을 포함할 수 있다. 횟수 조정 모델(552)은 상기 횟수 예측부(551)에서 제공하는 복수의 바둑판 상태-최적 시뮬레이션 횟수(SC)를 트레이닝 데이터 셋(Training data set)으로 하여 학습될 수 있다. 또한 횟수 조정 모델(552)은, 위와 같은 학습을 통하여 소정의 바둑판 상태(S)(실시예에서, 현재 바둑판 상태(S))를 입력으로 하고 상기 바둑판 상태(S)에 따른 시뮬레이션 횟수(C)를 출력으로 할 수 있다. In addition, the time management model server 500 may include a frequency adjustment model 552 . The number of times adjustment model 552 may be learned by using a plurality of checkerboard states-optimal simulation times SC provided by the number prediction unit 551 as a training data set. In addition, the frequency adjustment model 552 receives a predetermined checkerboard state (S) (in the embodiment, the current checkerboard state (S)) as an input through the above learning, and the number of simulations (C) according to the checkerboard state (S) can be output.

보다 상세히, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 시간 관리 모델 서버(500)의 횟수 예측부(551)를 이용하여 소정의 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(C)를 결정할 수 있다. In more detail, the deep learning-based Go game service according to another embodiment of the present invention uses the number prediction unit 551 of the time management model server 500 to optimize the number of simulations (C) for a predetermined Go board state (S). ) can be determined.

또한, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 위와 같이 획득되는 적어도 하나 이상의 바둑판 상태-최적 시뮬레이션 횟수(SC)를 트레이닝 데이터 셋으로 하여 횟수 조정 모델(552)을 학습시킬 수 있다. In addition, the deep learning-based Go game service according to another embodiment may train the number adjustment model 552 by using at least one or more Go board state-optimal simulation times (SC) obtained as above as a training data set.

또한, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 상기 학습된 횟수 조정 모델(552)에 현재 바둑판 상태(S)를 입력 데이터로 입력하고, 상기 학습된 횟수 조정 모델(552)에서 상기 현재 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(C)를 출력 데이터로 출력하게 할 수 있다. In addition, the deep learning-based Go game service according to another embodiment inputs the current Go board state (S) as input data to the learned number adjustment model 552, and in the learned number adjustment model 552, the current It is possible to output the number of simulations (C) optimized for the checkerboard state (S) as output data.

또한, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 상기 출력된 시뮬레이션 횟수(C)에 기초하여 상술된 착수 모델이 MCTS 시뮬레이션을 수행하게 할 수 있다. In addition, the deep learning-based Go game service according to another embodiment may allow the above-described start model to perform MCTS simulation based on the outputted number of simulations (C).

그리하여 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 위와 같이 결정되는 시뮬레이션 횟수(C)를 기초로 착수모델이 MCTS를 수행하여 다음 착수점을 결정하게 할 수 있다. Thus, the deep learning-based Go game service according to another embodiment may allow the start model to determine the next start point by performing MCTS based on the number of simulations (C) determined as above.

구체적으로, 도 24를 참조하면, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 소정의 바둑판 상태(S)를 수신하는 단계(S2301)를 포함할 수 있다. Specifically, referring to FIG. 24 , the deep learning-based Go game service method according to another embodiment of the present invention includes the step (S2301) of the time management model server 500 receiving a predetermined Go board state (S) can do.

즉, 시간 관리 모델 서버(500)는 바둑서버(200)로부터 복수의 기보를 수신할 수 있다. 복수의 기보의 각 기보는 착수 순서에 따른 각각의 바둑판 상태(S)를 포함할 수 있다. That is, the time management model server 500 may receive a plurality of notations from the Go server 200 . Each notation of a plurality of notations may include a respective checkerboard state (S) according to the starting order.

도 25는 본 발명의 다른 실시예에 따른 바둑판 상태(S)에 대한 제1 MCTS와 제2 MCTS를 수행하여 각각에서의 가치값(V)과 방문 횟수(N)를 도출한 모습의 일례이다.25 is an example of a state of deriving the value (V) and the number of visits (N) in each of the first MCTS and the second MCTS for the checkerboard state (S) according to another embodiment of the present invention.

또한, 도 25를 참조하면, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 수신된 바둑판 상태(S)에 대한 제1 MCTS를 수행하는 단계(S2302)를 포함할 수 있다. In addition, referring to FIG. 25 , the deep learning-based Go game service method according to another embodiment of the present invention includes the steps of performing the first MCTS on the received Go board state (S) by the time management model server 500 ( S2302) may be included.

자세히, 시간 관리 모델 서버(500)는 착수 모델 서버(300)의 착수 모델과 연동하여 상기 수신된 바둑판 상태(S)를 기초로 기 설정된 디폴트 횟수(D)(예컨대, 800회 등)만큼 MCTS를 수행하게 할 수 있다. 보다 상세히, 시간 관리 모델 서버(500)는 수신된 바둑판 상태(S)를 착수 모델에 제공할 수 있다. 이때, 상기 바둑판 상태(S)를 제공받은 착수 모델은 소정의 디폴트 횟수(D)만큼 상기 바둑판 상태(S)에 대한 MCTS을 수행할 수 있다. In detail, the time management model server 500 MCTS by a preset number of times (D) (eg, 800 times, etc.) can make it work In more detail, the time management model server 500 may provide the received checkerboard state (S) to the start model. At this time, the start model provided with the checkerboard state (S) may perform MCTS for the checkerboard state (S) as many times as a predetermined default number (D).

또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 위와 같이 수행된 제1 MCTS에 기반한 제1 방문 횟수(N1) 및/또는 제1 가치값(V1)을 저장하는 단계(S2303)를 포함할 수 있다. In addition, the deep learning-based Go game service method according to another embodiment of the present invention is a time management model server 500 based on the first MCTS performed as above the first number of visits (N1) and / or the first value It may include a step (S2303) of storing (V1).

즉, 시간 관리 모델 서버(500)는 착수 모델로부터 상기 바둑판 상태(S)에 대하여 상기 디폴트 횟수(D)만큼 수행된 MCTS의 출력으로 획득되는 복수의 착수 후보수(즉, MCTS 내 복수의 노드) 별 제1 방문 횟수(N1) 및/또는 제1 가치값(V1)을 수신하여 저장할 수 있다. That is, the time management model server 500 is a plurality of start candidates obtained as the output of the MCTS performed by the default number of times (D) for the checkerboard state (S) from the start model (that is, a plurality of nodes in the MCTS) The first number of visits N1 and/or the first value V1 may be received and stored.

또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 상기 수신된 바둑판 상태(S)에 대한 제2 MCTS를 수행하는 단계(S2304)를 포함할 수 있다. In addition, the deep learning-based Go game service method according to another embodiment of the present invention may include the step (S2304) of the time management model server 500 performing the second MCTS for the received Go board state (S). can

자세히, 시간 관리 모델 서버(500)는 착수 모델 서버(300)의 착수 모델과 연동하여 상기 수신된 바둑판 상태(S)를 기초로 상기 디폴트 횟수(D)(예컨대, 800회 등)를 소정의 추가횟수(예컨대, 10회 등) 단위로 증가시키면서 적어도 일 회 이상의 제2 MCTS를 수행할 수 있다. In detail, the time management model server 500 adds a predetermined number of times (D) (eg, 800 times, etc.) to the default number of times (D) based on the received checkerboard state (S) in conjunction with the start model of the start model server 300 At least one second MCTS may be performed while increasing the number of times (eg, 10 times, etc.).

구체적으로, 시간 관리 모델 서버(500)는 착수 모델과 연동하여 상기 디폴트 횟수(D)(예컨대, 800회 등)를 소정의 추가횟수(예컨대, 10회 등) 단위로 증가시킨 적어도 하나 이상의 추가 단위횟수(예컨대, 810회, 820회, 830회 등)에 기반하여 상기 수신된 바둑판 상태(S)에 대한 적어도 일 회 이상의 제2 MCTS를 수행할 수 있다. Specifically, the time management model server 500 increases the default number of times (D) (eg, 800 times, etc.) by a predetermined additional number (eg, 10 times, etc.) unit by interworking with the start model. Based on the number of times (eg, 810 times, 820 times, 830 times, etc.), the second MCTS may be performed at least once for the received checkerboard state (S).

또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 상기 수행된 제2 MCTS에 기반한 적어도 하나 이상의 제2 방문 횟수(N2) 및/또는 제2 가치값(V2)을 저장하는 단계(S2305)를 포함할 수 있다. In addition, in the deep learning-based Go game service method according to another embodiment of the present invention, the time management model server 500 performs at least one second number of visits (N2) and/or the second based on the second MCTS. It may include the step of storing the value value (V2) (S2305).

자세히, 시간 관리 모델 서버(500)는 착수 모델로부터 상기 바둑판 상태(S)에 대하여 상기 추가 단위횟수를 기초로 적어도 일 회 이상 수행된 MCTS의 출력으로 획득되는 적어도 하나 이상의 착수 후보수(즉, MCTS 내 복수의 노드) 별 제2 방문 횟수(N2) 및/또는 제2 가치값(V2)을 수신하여 저장할 수 있다. 즉, 상기 제2 방문 횟수(N2)는 적어도 하나 이상의 착수 후보수 별 방문 횟수(N) 데이터를 포함할 수 있고, 상기 제2 가치값(V2)은 적어도 하나 이상의 착수 후보수 별 가치값(V) 데이터를 포함할 수 있다. In detail, the time management model server 500 is at least one start candidate number (ie, MCTS) obtained as an output of MCTS performed at least once or more based on the number of additional units for the checkerboard state (S) from the start model. My plurality of nodes) may receive and store the second number of visits (N2) and/or the second value (V2) for each. That is, the second number of visits (N2) may include at least one or more number of visits (N) data for each number of starting candidates, and the second value value (V2) is at least one value value (V) for each number of starting candidates. ) may contain data.

이때, 시간 관리 모델 서버(500)는 상기 반복 수행되는 제2 MCTS를 이하에서 시뮬레이션 횟수(C)가 결정됨에 따라서 중단할 수 있다. At this time, the time management model server 500 may stop the repeated second MCTS as the number of simulations (C) is determined below.

또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 방문 횟수(N) 및/또는 가치값(V)을 기초로 최적 시뮬레이션 횟수를 결정하는 단계(S2306)를 포함할 수 있다. In addition, the deep learning-based Go game service method according to another embodiment of the present invention includes the step of determining, by the time management model server 500, the optimal number of simulations based on the number of visits (N) and/or the value (V) (S2306) may be included.

자세히, 시간 관리 모델 서버(500)는 횟수 예측부(551)를 동작하여 소정의 바둑판 상태(S)에 대한 제1 방문 횟수(N1)와 제2 방문 횟수(N2) 및/또는 제1 가치값(V1)과 제2 가치값(V2)을 기초로 상기 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(이하, 최적 시뮬레이션 횟수)를 결정할 수 있다. 이때, 상기 최적 시뮬레이션 횟수는 MCTS 시뮬레이션 시 다음 착수점이 변경되지 않는 임계점 이내에서 최소한의 시간을 소요하는 시뮬레이션 횟수(C)를 포함하는 시뮬레이션 횟수일 수 있다. 실시예로, 최적 시뮬레이션 횟수는 MCTS 시뮬레이션 시 다음 착수점이 변경되지 않는 임계점 이내에서 최소한의 착수 준비 시간(PP)을 소요하는 시뮬레이션 횟수(C)일 수 있다. In detail, the time management model server 500 operates the number prediction unit 551 to operate the first number of visits (N1) and the second number of visits (N2) and/or the first value for a predetermined checkerboard state (S). Based on (V1) and the second value (V2), it is possible to determine the number of simulations optimized for the checkerboard state (S) (hereinafter, the optimal number of simulations). In this case, the optimal number of simulations may be the number of simulations including the number of simulations (C) that take the minimum time within a critical point in which the next starting point is not changed during MCTS simulation. In an embodiment, the optimal number of simulations may be the number of simulations (C) that takes the minimum start preparation time (PP) within a critical point in which the next start point is not changed during MCTS simulation.

일반적으로 MCTS에서는 여러 번의 시뮬레이션을 수행하여 복수의 착수 후보수(즉, MCTS 내 복수의 노드) 중 가장 큰 방문 횟수(N)를 가지는 착수 후보수를 선택하여 다음 착수점을 결정한다. 이때, 시뮬레이션 횟수(C)가 증가해도 다음 착수점 즉, 가장 큰 방문 횟수(N)를 가지는 착수 후보수가 변경되지 않는다면 시뮬레이션 횟수(C)를 증가시킬 필요가 없을 수 있다. In general, the MCTS performs several simulations to select the number of start candidates having the largest number of visits (N) among a plurality of start candidates (ie, a plurality of nodes in the MCTS) to determine the next start point. At this time, even if the number of simulations (C) increases, it may not be necessary to increase the number of simulations (C) if the number of start-up candidates having the next start point, ie, the largest number of visits (N), does not change.

그리하여 다른 실시예에 따른 시간 관리 모델 서버(500)는 소정의 바둑판 상태(S)에 대한 MCTS 수행의 시뮬레이션 횟수(C) 증가에 따른 다음 착수점(즉, 선택되는 노드) 변경 여부를 판단하고, 판단 결과에 따라서 상기 바둑판 상태(S)에 대한 최적 시뮬레이션 횟수를 예측할 수 있다. Thus, the time management model server 500 according to another embodiment determines whether the next starting point (ie, the selected node) is changed according to the increase in the number of simulations (C) of performing MCTS for a predetermined checkerboard state (S), According to the determination result, the optimal number of simulations for the checkerboard state (S) can be predicted.

자세히, 시간 관리 모델 서버(500)는 소정의 바둑판 상태(S)에 대한 시뮬레이션 횟수(C) 증가에 따른 다음 착수점(즉, 선택되는 노드) 변경 여부를 판단할 수 있다. In detail, the time management model server 500 may determine whether to change the next starting point (ie, the selected node) according to the increase in the number of simulations (C) for a predetermined checkerboard state (S).

구체적으로, 시간 관리 모델 서버(500)는 착수 후보수 별 1) 방문 횟수(N)에 기초하여 다음 착수점 변경 여부를 판단할 수 있다. Specifically, the time management model server 500 may determine whether to change the next start point based on 1) the number of visits (N) per the number of start candidates.

도 26은 본 발명의 다른 실시예에 따른 방문 횟수(N)에 기초하여 최적 시뮬레이션 횟수를 결정하는 방법을 설명하기 위한 도면의 일례이다. 26 is an example of a diagram for explaining a method of determining the optimal number of simulations based on the number of visits (N) according to another embodiment of the present invention.

도 26을 참조하면, 시간 관리 모델 서버(500)는 제1 MCTS 즉, 디폴트 횟수(D)만큼 시뮬레이션이 수행되어 획득된 착수 후보수 별 제1 방문 횟수(N1)와, 제2 MCTS 즉, 적어도 하나 이상의 추가 단위횟수 각각을 기초로 시뮬레이션이 수행되어 획득되는 적어도 하나 이상의 착수 후보수 별 제2 방문 횟수(N2)에 기초하여 상기 제1 방문 횟수(N1) 및 제2 방문 횟수(N2) 간 변화율(gradient)을 산출할 수 있다. Referring to FIG. 26 , the time management model server 500 includes the first MCTS, that is, the first number of visits N1 for each number of candidates to be launched by performing simulation as much as the default number of times (D), and the second MCTS, that is, at least The rate of change between the first number of visits (N1) and the second number of visits (N2) based on the second number of visits (N2) for each at least one or more starting candidates obtained by performing a simulation based on each of one or more additional unit times (gradient) can be calculated.

실시예로 시간 관리 모델 서버(500)는, 디폴트 횟수(D)와 추가 단위횟수 간의 차이에 기반한 시뮬레이션 횟수 증가량과, 제1 방문 횟수(N1)와 제2 방문 횟수(N2) 간의 차이에 기반한 방문 횟수 증가량을 이용하여 상기 변화율을 산출할 수 있다. 자세히, 시간 관리 모델 서버(500)는 상기 시뮬레이션 횟수 증가량 대비 상기 방문 횟수 증가량(즉, 방문 횟수 증가량/시뮬레이션 횟수 증가량)을 산출하여 상기 변화율을 획득할 수 있다. 다만, 이는 일 실시예일 뿐 상기 변화율을 산출하는 방법 자체가 한정되는 것은 아니며 다양한 실시예가 가능할 수 있다. In an embodiment, the time management model server 500 is a visit based on the increase in the number of simulations based on the difference between the default number (D) and the number of additional units, and the difference between the first number of visits (N1) and the second number of visits (N2). The change rate may be calculated using the increase in the number of times. In detail, the time management model server 500 may obtain the change rate by calculating the increase in the number of visits (ie, increase in the number of visits/increase in the number of simulations) compared to the increase in the number of simulations. However, this is only an example, and the method of calculating the change rate itself is not limited, and various embodiments may be possible.

도 26을 더 참조하면, 예를 들어 시간 관리 모델 서버(500)는 제1 착수 후보수에 대하여 제1 시뮬레이션 횟수(C1) 즉, 디폴트 횟수(D)가 500회이고 제2 시뮬레이션 횟수(C2) 즉, 소정의 추가 단위횟수가 1000회이며, 제1 방문 횟수(N1)가 400회이고 제2 방문 횟수(N2)가 800회인 경우, 변화율(즉, 방문 횟수 증가량(N2-N1)/시뮬레이션 횟수 증가량(C2-C1))을 0.8로 산출할 수 있다. Referring further to Figure 26, for example, the time management model server 500 is the first simulation number (C1), that is, the default number of times (D) is 500 with respect to the first number of start candidates, and the second number of simulations (C2) That is, when the predetermined number of additional units is 1000, the first number of visits (N1) is 400, and the second number of visits (N2) is 800, the rate of change (that is, the increase in the number of visits (N2-N1)/the number of simulations) The increase (C2-C1)) can be calculated as 0.8.

또한, 시간 관리 모델 서버(500)는 상기 산출된 변화율이 소정의 기준(예를 들어, 소정의 상수값(예컨대, 0.5) 이상 등)을 충족하면, 시뮬레이션 횟수(C) 증가를 중단(즉, 제2 MCTS를 중단)할 수 있다. In addition, the time management model server 500 stops the increase in the number of simulations (C) when the calculated change rate meets a predetermined criterion (eg, a predetermined constant value (eg, 0.5) or more) stop the second MCTS).

즉, 시간 관리 모델 서버(500)는 변화율(실시예에서, 방문 횟수 증가량(N2-N1)/시뮬레이션 횟수 증가량(C2-C1))이 소정의 기준(예컨대, 소정의 변화율 이상 등)을 충족하면, 상기 소정의 바둑판 상태(S)에 대한 시뮬레이션 횟수(C)가 증가해도 다음 착수점(즉, 선택되는 노드)이 불변하는 것으로 판단할 수 있다. 또한, 시간 관리 모델 서버(500)는 상기 다음 착수점이 불변한다고 판단되면 시뮬레이션 횟수(C) 증가를 중단(즉, 제2 MCTS를 중단)할 수 있다. 또한, 시간 관리 모델 서버(500)는 상기 변화율 산출 시의 최소 시뮬레이션 횟수(즉, 제1 방문 횟수(N1)를 충족할 시의 시뮬레이션 횟수(C))를 최적 시뮬레이션 횟수로 결정할 수 있다. That is, the time management model server 500 determines that the change rate (in the embodiment, increase in the number of visits (N2-N1)/increase in the number of simulations (C2-C1)) meets a predetermined criterion (eg, greater than or equal to a predetermined rate of change). , even if the number of simulations (C) for the predetermined checkerboard state (S) increases, it can be determined that the next starting point (ie, the selected node) is unchanged. In addition, the time management model server 500 may stop increasing the number of simulations (C) (ie, stop the second MCTS) when it is determined that the next starting point is unchanged. In addition, the time management model server 500 may determine the minimum number of simulations when calculating the change rate (ie, the number of simulations C when the first visit number N1 is satisfied) as the optimal number of simulations.

또는, 시간 관리 모델 서버(500)는 착수 후보수 별 1) 방문 횟수(N) 비율에 기초하여 다음 착수점 변경 여부를 판단할 수 있다. Alternatively, the time management model server 500 may determine whether to change the next start point based on 1) the number of visits (N) ratio for each number of start candidates.

도 27은 본 발명의 다른 실시예에 따른 방문 횟수(N) 비율에 기초하여 최적 시뮬레이션 횟수를 결정하는 방법을 설명하기 위한 도면의 일례이다. 27 is an example of a diagram for explaining a method of determining the optimal number of simulations based on a ratio of the number of visits (N) according to another embodiment of the present invention.

자세히, 도 27을 참조하면, 시간 관리 모델 서버(500)는 복수의 착수 후보수(실시예에서, MCTS 내 소정의 제1 레이어에 존재하는 복수의 자식 노드(child node)들) 별 제1 방문 횟수(N1) 간 비율에 기초하여 상기 복수의 착수 후보수에 대한 제1 비율을 산출할 수 있다. In detail, referring to FIG. 27 , the time management model server 500 visits a plurality of initiation candidates (in the embodiment, a plurality of child nodes existing in a predetermined first layer in the MCTS) for each first visit. Based on the ratio between the number of times (N1), it is possible to calculate a first ratio for the plurality of start candidates.

도 27을 더 참조하면, 예를 들어 상기 복수의 착수 후보수가 제1 착수 후보수와 제2 착수 후보수를 포함하고, 상기 제1 착수 후보수의 제1 방문 횟수(N1)가 400회이고 상기 제2 착수 후보수의 제1 방문 횟수(N1)가 100회인 경우, 상기 제1 비율은 '제1 착수 후보수: 0.8(400), 제2 착수 후보수: 0.2(100)'일 수 있다. Referring further to FIG. 27 , for example, the plurality of start-up candidates includes a first start-up candidate number and a second start-up candidate number, and the first number of visits (N1) of the first start-up candidates is 400, and the When the number of first visits (N1) of the second number of start-up candidates is 100, the first ratio may be 'the number of first start-up candidates: 0.8 (400), the second number of start-up candidates: 0.2 (100)'.

또한, 시간 관리 모델 서버(500)는 위와 같이 산출된 제1 비율을 기초로 상기 복수의 착수 후보수를 비율순으로 정렬할 수 있다. In addition, the time management model server 500 may sort the plurality of start-up candidates in the ratio order based on the first ratio calculated as above.

예를 들면, 시간 관리 모델 서버(500)는 상술된 예시에서의 제1 비율을 기초로 '1순위: 제1 착수 후보수, 2순위: 제2 착수 후보수'로 정렬할 수 있다. For example, the time management model server 500 may be sorted by 'first priority: the number of first start candidates, second priority: the second number of start candidates' based on the first ratio in the above-described example.

또한, 시간 관리 모델 서버(500)는 복수의 착수 후보수 별 제2 방문 횟수(N2) 간 비율에 기초하여 상기 복수의 착수 후보수에 대한 제2 비율을 산출할 수 있다. In addition, the time management model server 500 may calculate a second ratio for the plurality of start-up candidates based on the ratio between the second number of visits N2 for each of the plurality of start-up candidates.

도 27을 더 참조하면, 예를 들어 상기 복수의 착수 후보수가 제1 착수 후보수와 제2 착수 후보수를 포함하고, 상기 제1 착수 후보수의 제2 방문 횟수(N2)가 800회이고 상기 제2 착수 후보수의 제2 방문 횟수(N2)가 200회인 경우, 상기 제2 비율은 '제1 착수 후보수: 0.8(800), 제2 착수 후보수: 0.2(200)'일 수 있다. Referring further to FIG. 27 , for example, the number of starting candidates includes a first starting candidate number and a second starting candidate number, and the second number of visits (N2) of the first starting candidate number is 800, and the When the second number of visits (N2) of the second number of start-up candidates is 200, the second ratio may be 'the number of first start candidates: 0.8 (800), the second number of start candidates: 0.2 (200)'.

또한, 시간 관리 모델 서버(500)는 산출된 제2 비율을 기초로 상기 복수의 착수 후보수를 비율순으로 정렬할 수 있다. In addition, the time management model server 500 may sort the plurality of start-up candidates in the ratio order based on the calculated second ratio.

예시적으로, 시간 관리 모델 서버(500)는 상술된 예시에서의 제2 비율을 기초로 '1순위: 제1 착수 후보수, 2순위: 제2 착수 후보수'로 정렬할 수 있다. Illustratively, the time management model server 500 may be sorted by 'first priority: the number of first start candidates, second priority: the second number of start candidates' based on the second ratio in the above-described example.

이때, 시간 관리 모델 서버(500)는 상술된 제2 MCTS를 반복 수행하여 획득되는 복수의 상기 제2 비율에 따른 순위가, 상기 제1 비율에 따른 순위와 비교하여 소정의 횟수 이상 불변하면, 상기 소정의 바둑판 상태(S)에 대한 시뮬레이션 횟수(C)가 증가해도 다음 착수점(즉, 선택되는 노드)이 불변하는 것으로 판단할 수 있다. At this time, the time management model server 500 may repeat the above-described second MCTS, and if the ranking according to the plurality of second ratios is unchanged by a predetermined number of times or more compared to the ranking according to the first ratio, the Even if the number of simulations (C) for a predetermined checkerboard state (S) increases, it can be determined that the next starting point (ie, the selected node) is unchanged.

예를 들어, 시간 관리 모델 서버(500)는 상기 제1 비율에 따른 순위가 '1순위: 제1 착수 후보수, 2순위: 제2 착수 후보수'이고, 소정의 횟수(예컨대, 10회 등) 이상 상기 제2 비율에 따른 순위가 '1순위: 제1 착수 후보수, 2순위: 제2 착수 후보수'로 유지되면, 상기 소정의 바둑판 상태(S)에 대한 시뮬레이션 횟수(C)가 증가해도 다음 착수점(즉, 선택되는 노드)이 불변하는 것으로 판단할 수 있다. For example, the time management model server 500 has a ranking according to the first ratio 'first priority: the number of first start candidates, second priority: the second number of start candidates', a predetermined number of times (eg, 10 times, etc.) ) or more, if the ranking according to the second ratio is maintained as 'first priority: the number of first start candidates, second priority: the second number of start candidates', the number of simulations (C) for the predetermined checkerboard state (S) increases However, it can be determined that the next starting point (ie, the selected node) is unchanged.

또한, 시간 관리 모델 서버(500)는 위와 같이 다음 착수점이 불변한다고 판단되면 시뮬레이션 횟수(C) 증가를 중단(즉, 제2 MCTS를 중단)할 수 있다. 또한, 시간 관리 모델 서버(500)는 상기 순위 비교 시의 최소 시뮬레이션 횟수(즉, 제1 방문 횟수(N1)를 충족할 시의 시뮬레이션 횟수(C))를 최적 시뮬레이션 횟수로 결정할 수 있다. In addition, the time management model server 500 may stop increasing the number of simulations (C) (ie, stop the second MCTS) when it is determined that the next starting point is unchanged as above. In addition, the time management model server 500 may determine the minimum number of simulations in the ranking comparison (ie, the number of simulations when the first number of visits N1 is satisfied (C)) as the optimal number of simulations.

따라서, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 소정의 바둑판 상태(S)에 대한 시뮬레이션 횟수(C)를 증가시켜 가면서 그에 따른 방문 횟수(N)의 변화를 파악하고, 이를 기초로 해당하는 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(C)를 결정할 수 있다. Therefore, the deep learning-based Go game service apparatus according to another embodiment increases the number of simulations (C) for a predetermined Go board state (S) while grasping the change in the number of visits (N) accordingly, and based on this It is possible to determine the number of simulations (C) optimized for the corresponding checkerboard state (S).

한편, 일반적으로 MCTS에서 여러 번의 시뮬레이션을 수행할 시 복수의 착수 후보수(즉, MCTS 내 복수의 노드) 중 어느 하나의 착수 후보수에 대한 가치값(V)(즉, 승률)이 편향적으로 증가하는 경향을 보이면, 더 이상 시뮬레이션 횟수(C)를 증가시킬 필요가 없을 수 있다. On the other hand, in general, when multiple simulations are performed in MCTS, the value (V) (ie, win rate) of any one of a plurality of start candidates (i.e., a plurality of nodes in MCTS) increases in a biased manner. If there is a tendency to do so, it may not be necessary to increase the number of simulations (C) anymore.

그리하여 다른 실시예에서 시간 관리 모델 서버(500)는 제1 가치값(V1)과 제2 가치값(V2)에 기초하여 상기 바둑판 상태(S)에 대한 최적 시뮬레이션 횟수를 예측할 수 있다. Thus, in another embodiment, the time management model server 500 may predict the optimal number of simulations for the checkerboard state S based on the first value V1 and the second value V2.

도 28은 본 발명의 다른 실시예에 따른 가치값(V)을 기초로 최적 시뮬레이션 횟수를 결정하는 방법을 설명하기 위한 도면의 일례이다. 28 is an example of a view for explaining a method of determining the optimal number of simulations based on the value V according to another embodiment of the present invention.

자세히, 도 28을 참조하면, 시간 관리 모델 서버(500)는 제1 가치값(V1)과 적어도 하나 이상의 제2 가치값(V2) 간의 증감율(gradient)이 소정의 기준(예를 들어, 소정의 상수값 이상 등)을 충족하면, 시뮬레이션 횟수(C) 증가를 중단(즉, 제2 MCTS를 중단)할 수 있다. In detail, referring to FIG. 28 , the time management model server 500 determines that a gradient between the first value V1 and at least one second value V2 is determined by a predetermined criterion (eg, a predetermined value). If the constant value or more) is satisfied, the increase in the number of simulations C may be stopped (ie, the second MCTS is stopped).

이때, 시간 관리 모델 서버(500)는 디폴트 횟수(D)와 추가 단위횟수 간의 차이에 기반한 시뮬레이션 횟수 증가량과, 제1 가치값(V1)과 제2 가치값(V2) 간의 차이에 기반한 가치값 증가량을 이용하여 상기 증감율을 산출할 수 있다. 자세히, 시간 관리 모델 서버(500)는 상기 시뮬레이션 횟수 증가량 대비 상기 가치값 증가량(즉, 가치값 증가량/시뮬레이션 횟수 증가량)을 산출하여 상기 증감율을 획득할 수 있다. 다만, 이는 일 실시예일 뿐 상기 증감율을 산출하는 방법 자체가 한정되는 것은 아니며 다양한 실시예가 가능할 수 있다. At this time, the time management model server 500 increases the simulation number based on the difference between the default number D and the number of additional units, and increases the value value based on the difference between the first value V1 and the second value V2. can be used to calculate the increase/decrease rate. In detail, the time management model server 500 may obtain the increase/decrease rate by calculating the increase in the value value (ie, increase in the value/increase in the number of simulations) compared to the increase in the number of simulations. However, this is only an example, and the method itself for calculating the increase/decrease rate is not limited, and various embodiments may be possible.

예를 들어, 도 28을 더 참조하면, 시간 관리 모델 서버(500)는 제1 착수 후보수에 대하여 제1 시뮬레이션 횟수(C1) 즉, 디폴트 횟수(D)가 500회이고 제2 시뮬레이션 횟수(C2) 즉, 소정의 추가 단위횟수가 1000회이며, 제1 가치값(V1)이 40이고 제2 가치값(V2)이 100인 경우, 증감율(즉, 가치값 증가량(V2-V1)/시뮬레이션 횟수 증가량(C2-C1))을 0.12로 산출할 수 있다. For example, referring further to Figure 28, the time management model server 500 is the first simulation number (C1), that is, the default number of times (D) is 500 with respect to the first number of candidates to start, and the second number of simulations (C2) ) that is, when the predetermined number of additional units is 1000, the first value (V1) is 40, and the second value (V2) is 100, the increase/decrease rate (that is, the increase in value (V2-V1)/number of simulations) The increment (C2-C1)) can be calculated as 0.12.

또한, 시간 관리 모델 서버(500)는 상기 산출된 증감율이 소정의 기준(예를 들어, 소정의 상수값(예컨대, 0.1) 이상 등)을 충족하면 시뮬레이션 횟수(C) 증가를 중단(즉, 제2 MCTS를 중단)할 수 있다. In addition, the time management model server 500 stops the increase in the number of simulations (C) when the calculated increase/decrease rate meets a predetermined criterion (eg, a predetermined constant value (eg, 0.1) or more). 2 MCTS can be stopped).

또한, 시간 관리 모델 서버(500)는 상기 증감율 산출 시의 최소 시뮬레이션 횟수(즉, 제1 가치값(V1)을 충족할 시의 시뮬레이션 횟수(C))를 최적 시뮬레이션 횟수로 결정할 수 있다. In addition, the time management model server 500 may determine the minimum number of simulations when calculating the increase/decrease rate (ie, the number of simulations when the first value V1 is satisfied (C)) as the optimal number of simulations.

따라서, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 소정의 바둑판 상태(S)에 대한 시뮬레이션 횟수(C)를 증가시켜 가면서 그에 따른 가치값(V)의 변화를 파악하고, 이를 기초로 해당하는 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(C)를 결정할 수 있다. Therefore, the deep learning-based Go game service apparatus according to another embodiment increases the number of simulations (C) for a predetermined Go board state (S) while grasping the change in the value value (V) accordingly, and based on this It is possible to determine the number of simulations (C) optimized for the corresponding checkerboard state (S).

또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 위와 같이 결정된 최적 시뮬레이션 횟수를 기초로 바둑판 상태-최적 시뮬레이션 횟수(SC) 데이터를 생성하는 단계(S2307)를 포함할 수 있다. In addition, the deep learning-based Go game service method according to another embodiment of the present invention includes the step of generating, by the time management model server 500, the state of the Go board-optimal number of simulations (SC) data based on the optimal number of simulations determined as above. (S2307) may be included.

즉, 시간 관리 모델 서버(500)는 복수의 바둑판 상태(S)와, 상기 복수의 바둑판 상태(S) 각각에 대한 상기 최적 시뮬레이션 횟수를 상호 매칭하여 복수의 바둑판 상태-최적 시뮬레이션 횟수(SC)를 생성할 수 있다. That is, the time management model server 500 matches a plurality of checkerboard states (S) and the optimal number of simulations for each of the plurality of checkerboard states (S) to obtain a plurality of checkerboard states - the optimal number of simulations (SC). can create

또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 생성된 바둑판 상태-최적 시뮬레이션 횟수(SC) 데이터를 기초로 횟수 조정 모델(552)을 학습하는 단계(S2308)를 포함할 수 있다. In addition, the deep learning-based Go game service method according to another embodiment of the present invention learns the number adjustment model 552 based on the time management model server 500 generated Go board state-optimal number of simulations (SC) data. It may include a step (S2308) of doing.

자세히, 시간 관리 모델 서버(500)는 상기 횟수 예측부(551)로부터 생성된 복수의 바둑판 상태-최적 시뮬레이션 횟수(SC) 데이터를 기초로 소정의 바둑판 상태(S)(실시예에서, 현재 바둑판 상태(S))를 입력으로 하고 상기 바둑판 상태(S)에 따른 시뮬레이션 횟수(C)를 출력으로 하는 횟수 조정 모델(552)을 학습시킬 수 있다. 즉, 여기서 횟수 조정 모델(552)은 상기 횟수 예측부(551)에서 제공하는 복수의 바둑판 상태-최적 시뮬레이션 횟수(SC)를 트레이닝 데이터 셋(Training data set)으로 하여 학습될 수 있고, 위와 같은 학습을 통하여 소정의 바둑판 상태(S)(실시예에서, 현재 바둑판 상태(S))를 입력받으면 상기 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(C)를 출력할 수 있다. In detail, the time management model server 500 is a predetermined checkerboard state (S) (in the embodiment, the current checkerboard state) based on a plurality of checkerboard states generated from the number prediction unit 551-optimal simulation number (SC) data. (S)) as an input and the number of simulations (C) according to the checkerboard state (S) as an output, the number of times adjustment model 552 can be trained. That is, here, the number adjustment model 552 may be learned by using a plurality of checkerboard states-optimal simulation times SC provided by the number prediction unit 551 as a training data set, and the above learning When a predetermined checkerboard state (S) (in the embodiment, the current checkerboard state (S)) is input through , the number of simulations (C) optimized for the checkerboard state (S) can be output.

따라서, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 입력되는 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(C)를 출력하도록 딥러닝 모델 즉, 횟수 조정 모델(552)을 학습시킬 수 있다. Accordingly, the deep learning-based Go game service device according to another embodiment may train the deep learning model, that is, the number adjustment model 552 to output the simulation number C optimized for the input Go board state S. .

또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 위와 같이 학습된 횟수 조정 모델(552)을 기초로 현재 바둑판 상태(S)에 따른 시뮬레이션 횟수(C)를 결정하는 단계(S2309)를 포함할 수 있다. In addition, in the deep learning-based Go game service method according to another embodiment of the present invention, the time management model server 500 simulates the number of times according to the current Go board state (S) based on the learned number adjustment model 552 as described above. It may include a step (S2309) of determining (C).

즉, 시간 관리 모델 서버(500)는 학습된 횟수 조정 모델(552)을 동작하여 현재 진행되고 있는 대국에서의 현재 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(C)를 결정할 수 있다. That is, the time management model server 500 may determine the number of simulations (C) optimized for the current checkerboard state (S) in the currently ongoing game by operating the learned number adjustment model 552 .

자세히, 시간 관리 모델 서버(500)는 현재 바둑판 상태(S)를 상기 횟수 조정 모델(552)에 입력 데이터로 입력할 수 있다. 또한, 시간 관리 모델 서버(500)는 상기 현재 바둑판 상태(S)를 입력받은 횟수 조정 모델(552)로부터 상기 현재 바둑판 상태(S)에 최적화된 시뮬레이션 횟수(C)를 출력 데이터로서 획득할 수 있다. 즉, 시간 관리 모델 서버(500)는 복수의 바둑판 상태-최적 시뮬레이션 횟수(SC) 데이터를 기초로 학습된 횟수 조정 모델(552)을 이용하여 현재 바둑판 상태(S)에 대한 최적의 시뮬레이션 횟수(C)를 결정할 수 있다. In detail, the time management model server 500 may input the current checkerboard state (S) as input data to the number adjustment model 552 . In addition, the time management model server 500 may obtain, as output data, the number of simulations (C) optimized for the current checkerboard state (S) from the number adjustment model 552 receiving the current checkerboard state (S). . That is, the time management model server 500 uses the number of times adjustment model 552 learned based on a plurality of checkerboard state-optimal simulation times (SC) data for the optimal number of simulations (C) for the current checkerboard state (S). ) can be determined.

따라서, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑판 상태(S)에 최적화되도록 MCTS 수행 시의 시뮬레이션 횟수(C)를 동적으로 조정할 수 있다. 또한, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑판 상태(S)를 기초로 중요한 국면에서 시뮬레이션 횟수(C)를 변경할 수 있다. Therefore, the deep learning-based Go game service apparatus according to another embodiment may dynamically adjust the number of simulations (C) when performing MCTS so as to be optimized for the Go board state (S). In addition, the deep learning-based Go game service device according to another embodiment may change the number of simulations (C) in an important phase based on the Go board state (S).

여기서, 다른 실시예에 따른 시간 관리 모델 서버(500)는 상기 횟수 조정 모델(552)에 의하여 결정되는 시뮬레이션 횟수(C)에 대한 정확성과 성능을 향상시키기 위해 1) 시뮬레이션 횟수(C)의 상한과 하한을 기 설정할 수 있다. Here, the time management model server 500 according to another embodiment 1) the upper limit of the number of simulations (C) and A lower limit may be preset.

자세히, 시간 관리 모델 서버(500)는 기 설정된 상한 횟수와 하한 횟수 이내에서 상기 시뮬레이션 횟수(C)가 결정되게 할 수 있다. 예를 들어, 시간 관리 모델 서버(500)는 시뮬레이션 횟수(C)의 상한을 '3200회', 하한을 디폴트 횟수(D)인 '800회'로 기 설정할 수 있다. In detail, the time management model server 500 may allow the number of simulations C to be determined within a preset upper limit number and a lower limit number. For example, the time management model server 500 may preset the upper limit of the number of simulations (C) to '3200' and the lower limit to '800', which is the default number (D).

또한, 시간 관리 모델 서버(500)는 2) 일정한 확률로 시뮬레이션 횟수(C)를 랜덤하게 증가시킬 수 있다. 즉, 시간 관리 모델 서버(500)는 확률적으로 시뮬레이션 횟수(C)를 소정의 횟수만큼 증가시킴으로써, 딥러닝 학습을 통하여 결정되는 시뮬레이션 횟수(C)의 신뢰성을 향상시킬 수 있다. In addition, the time management model server 500 may 2) randomly increase the number of simulations (C) with a certain probability. That is, the time management model server 500 probabilistically increases the number of simulations (C) by a predetermined number, thereby improving the reliability of the number of simulations (C) determined through deep learning learning.

또한, 시간 관리 모델 서버(500)는 3) 현재 바둑판 상태(S)에 대한 승률 및/또는 승률변화를 기초로 시뮬레이션 횟수(C)를 조정할 수 있다. In addition, the time management model server 500 may 3) adjust the number of simulations (C) based on the win rate and/or win rate change for the current checkerboard state (S).

자세히, 시간 관리 모델 서버(500)는 현재 바둑판 상태(S)에 대한 승률(즉, 가치값(V))이 소정의 기준(예컨대, 소정의 수치 이하 등)을 충족하면, 시뮬레이션 횟수(C)를 증가시킬 수 있다. 즉, 시간 관리 모델 서버(500)는 현재 승률에 기초하여 열세인 상황이라고 판단되면, 시뮬레이션 횟수(C)를 소정의 횟수만큼 증가시킬 수 있다. In detail, the time management model server 500 is the number of simulations (C) if the winning rate (ie, value value (V)) for the current checkerboard state (S) meets a predetermined criterion (eg, less than or equal to a predetermined number, etc.) can increase That is, when it is determined that the time management model server 500 is inferior based on the current win rate, the number of simulations C may be increased by a predetermined number of times.

또는, 시간 관리 모델 서버(500)는 현재 바둑판 상태(S)에 대한 승률변화를 기초로 시뮬레이션 횟수(C)를 증가시킬 수 있다. Alternatively, the time management model server 500 may increase the number of simulations (C) based on the change in the winning rate for the current checkerboard state (S).

도 29는 본 발명의 다른 실시예에 따른 승률변화에 기반하여 시뮬레이션 횟수(C)를 조정하는 방법을 설명하기 위한 도면의 일례이다. 29 is an example of a diagram for explaining a method of adjusting the number of simulations (C) based on a change in winning rate according to another embodiment of the present invention.

자세히, 도 29를 참조하면, 시간 관리 모델 서버(500)는 착수 모델 서버(300)로부터 현재 진행 중인 대국에서 각 착수마다 획득된 복수의 가치값(V)을 순차적으로 수신하고 누적하여 저장할 수 있다. 또한, 시간 관리 모델 서버(500)는 누적된 가치값(V)을 기초로 승률변화를 판단할 수 있다. In detail, referring to FIG. 29 , the time management model server 500 may sequentially receive, accumulate, and store a plurality of value values (V) obtained for each set-off in the current game in progress from the set-off model server 300 . . In addition, the time management model server 500 may determine the win rate change based on the accumulated value (V).

구체적으로, 시간 관리 모델 서버(500)는 누적된 가치값(V)에 기초하여 가치값(V)의 변동 추세 및/또는 변동 폭을 파악할 수 있다. 또한 시간 관리 모델 서버(500)는 파악된 가치값(V)의 변동 추세 및/또는 변동 폭을 기초로 상기 누적된 가치값(V)에 대한 승률변화를 판단할 수 있다. Specifically, the time management model server 500 may determine the fluctuation trend and/or the fluctuation range of the value value (V) based on the accumulated value value (V). In addition, the time management model server 500 may determine a change in the win rate for the accumulated value value (V) based on the fluctuation trend and/or the fluctuation range of the identified value value (V).

여기서, 상기 가치값(V)의 변동 추세란 가치값(V)(즉, 승률)이 점차적으로 상승하고 있는지 또는 하락하고 있는지 여부를 나타내는 가치값(V)의 변화율을 의미할 수 있다. 자세히, 상기 가치값(V)의 변동 추세는 대국 시작 이후 소정의 착수 시점부터 현재 착수 시점까지의 복수의 가치값(V)의 변화 상태를 나타낸다. 보다 상세히, 상기 복수의 가치값(V) 각각에 대한 변화율이 소정의 횟수 이상 음수인 경우 즉, 가치값(V)이 소정의 횟수 이상 감소하는 경우를 상기 가치값(V)이 점차적으로 하락하고 있는 형세라고 판단할 수 있다. 반면, 상기 복수의 가치값(V) 각각에 대한 변화율이 소정의 횟수 이상 양수인 경우 즉, 가치값(V)이 소정의 횟수 이상 증가하는 경우를 상기 가치값(V)이 점차적으로 상승하고 있는 형세라고 판단할 수 있다. 또한, 이러한 가치값(V)의 변동 추세는 해당 가치값(V) 변화율의 기울기로 판단할 수도 있다. 예를 들어, 도 29의 (a)를 참조하면, 상기 가치값(V)의 변동 추세는 최근 착수된 5수에 대한 가치값(V)의 변화율 및/또는 기울기(즉, 가치값(V)이 상승하고 있는지 또는 가치값(V)이 하락하고 있는지 여부)일 수 있다. Here, the change trend of the value value V may mean a rate of change of the value value V indicating whether the value value V (ie, win rate) is gradually increasing or decreasing. In detail, the change trend of the value value (V) represents the change state of a plurality of value values (V) from a predetermined start time to the current start time after the start of the game. In more detail, when the rate of change for each of the plurality of value values V is a negative number more than a predetermined number of times, that is, when the value value V decreases a predetermined number of times or more, the value value V gradually decreases and It can be judged that there is a situation. On the other hand, when the rate of change for each of the plurality of value values V is a positive number more than a predetermined number of times, that is, when the value value V increases more than a predetermined number of times, the value value V is gradually increasing. can be judged that In addition, the change trend of the value value (V) may be determined as the slope of the change rate of the value value (V). For example, referring to (a) of FIG. 29 , the change trend of the value value V is the rate of change and/or the slope (ie, the value value V) of the value value V with respect to the recently launched five numbers. is rising or the value V is falling).

한편, 상기 가치값(V)의 변동 폭이란 가치값(V)의 변화량(변화 정도)를 의미할 수 있다. 이러한 가치값(V)의 변동 폭은 소정의 제1 착수 시점의 가치값(V)과 상기 제1 착수 시점 이후의 제2 착수 시점의 가치값(V) 간의 차이를 기초로 획득될 수 있다. 이때, 상기 가치값(V)의 변동 폭은 상기 제2 착수 시점의 가치값(V)에서 상기 제1 착수 시점의 가치값(V)을 뺀 차이값이 음수인 경우 승률이 하락한 것으로 판단하게 할 수 있다. 또한, 상기 가치값(V)의 변동 폭은 상기 음수를 가지는 차이값이 소정의 수치 이상인 경우, 승률이 급격하게 하락한 것으로 판단하고 이를 중요한 국면으로 감지하게 할 수 있다. 예를 들어, 도 29의 (b)를 참조하면, 시간 관리 모델 서버(500)는 최근 착수된 2수(예컨대, 64수 및 66수) 간의 가치값(V) 간 차이값을 '35% - 58% = -23%'로 산출할 수 있다. 또한 시간 관리 모델 서버(500)는 상기 산출된 차이값을 기초로 가치값(V)의 하락 변동 폭이 소정의 기준 이상인지 판단할 수 있다. Meanwhile, the variation range of the value V may mean an amount (degree of change) of the value V. The range of variation of this value value (V) may be obtained based on the difference between the value value (V) of the predetermined first start time and the value value (V) of the second start time after the first start time. At this time, the fluctuation range of the value value (V) is to determine that the win rate has decreased if the difference value obtained by subtracting the value value (V) at the first start time from the value value (V) at the second start time is negative can In addition, when the difference value having a negative number is greater than or equal to a predetermined value, the fluctuation range of the value value V may determine that the win rate has sharply decreased and detect this as an important phase. For example, referring to Figure 29 (b), the time management model server 500 is the difference between the value value (V) between the recently launched two numbers (eg, 64 numbers and 66 numbers) '35% - 58% = -23%'. In addition, the time management model server 500 may determine whether the fluctuation range of the decrease in the value V is equal to or greater than a predetermined standard based on the calculated difference value.

또한, 시간 관리 모델 서버(500)는 상기 가치값(V)의 변동 추세 및/또는 변동 폭을 기초로 판단된 상기 복수의 가치값(V)에 대한 승률변화가 소정의 기준(실시예로, 상기 가치값(V) 변화율의 기울기가 소정의 횟수 이상 음수이거나, 또는 상기 가치값(V)의 하락 변동 폭이 소정의 기준 이상인 경우 등)을 충족하는 상황이라고 판단되면, 시뮬레이션 횟수(C)를 소정의 횟수만큼 증가시킬 수 있다. In addition, the time management model server 500 determines the win rate change for the plurality of value values (V) determined based on the fluctuation trend and/or the fluctuation range of the value value (V) based on a predetermined criterion (in an embodiment, If it is determined that the slope of the rate of change of the value value (V) is negative for more than a predetermined number of times, or the range of fluctuations in the decline of the value value (V) is greater than or equal to a predetermined standard), the number of simulations (C) It can be increased by a predetermined number of times.

즉, 시간 관리 모델 서버(500)는 대국 중에 획득되는 가치값(V)을 기초로 승률변화를 판단했을 시 승률이 점차 하락하거나 큰 폭으로 하락한 경우 이를 중요한 국면이라고 판단하고 시뮬레이션 횟수(C)를 소정의 횟수만큼 증가시킬 수 있다. That is, when the time management model server 500 determines the change in the win rate based on the value value (V) obtained during the game, if the win rate gradually decreases or falls significantly, it is determined that this is an important phase and the number of simulations (C) It can be increased by a predetermined number of times.

따라서, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 대국 중 승률이 낮아지는 중요한 국면인 경우 시뮬레이션 횟수(C)를 증가시켜 보다 좋은 다음 착수점을 결정하기 위해 더 많은 MCTS 시뮬레이션을 수행하게 할 수 있다. Therefore, the deep learning-based Go game service device according to another embodiment increases the number of simulations (C) in an important phase in which the win rate during a match is lowered to perform more MCTS simulations to determine a better next starting point. can do.

또한, 시간 관리 모델 서버(500)는 4) 대국이 완료된 이후, 승률이 큰 폭으로 하락한 시점의 바둑판 상태(S)에 대한 시뮬레이션 횟수(C)를 증가시킬 수 있다. In addition, the time management model server 500 may increase the number of simulations (C) for the state (S) of the checkerboard at the point in time when the win rate has significantly decreased after the game is completed 4).

자세히, 시간 관리 모델 서버(500)는 대국이 끝난 이후 상기 가치값(V)이 큰 변동 폭으로 하락한 시점을 검출할 수 있다. 또한, 시간 관리 모델 서버(500)는 상기 검출된 시점에서의 바둑판 상태(S)에 대한 시뮬레이션 횟수(C)를 소정의 횟수(예컨대, 기 설정된 횟수 이상 등)만큼 증가시킬 수 있다. 또한, 시간 관리 모델 서버(500)는 위와 같이 증가된 시뮬레이션 횟수(C)를 상기 바둑판 상태(S)에 대한 최적 시뮬레이션 횟수로 결정할 수 있다. 또한, 시간 관리 모델 서버(500)는 상기 바둑판 상태(S)와 상기 결정된 최적 시뮬레이션 횟수를 한 쌍으로 하는 바둑판 상태-최적 시뮬레이션 횟수(SC) 데이터를 생성할 수 있다. 또한, 시간 관리 모델 서버(500)는 위와 같이 생성된 바둑판 상태-최적 시뮬레이션 횟수(SC) 데이터를 트레이닝 데이터로 하여 상술된 횟수 조정 모델(552)을 학습시킬 수 있다. In detail, the time management model server 500 may detect a point in time at which the value value (V) has fallen by a large fluctuation range after the game is over. In addition, the time management model server 500 may increase the number of simulations (C) for the checkerboard state (S) at the detected time point by a predetermined number of times (eg, more than a preset number, etc.). In addition, the time management model server 500 may determine the number of simulations (C) increased as above as the optimal number of simulations for the checkerboard state (S). In addition, the time management model server 500 may generate the checkerboard state (S) and the determined optimal number of simulations as a pair, the state-optimal number of simulations (SC) data. In addition, the time management model server 500 may train the above-described number adjustment model 552 by using the generated checkerboard state-optimal simulation number (SC) data as training data.

즉, 시간 관리 모델 서버(500)는 대국이 완료된 이후 승률이 큰 폭으로 하락한 시점에서의 시뮬레이션 횟수(C)를 증가시킨 트레이닝 데이터 셋을 이용하여 상기 횟수 조정 모델(552)을 학습시켜 추후 해당하는 바둑판 상태(S)에서의 시뮬레이션 횟수(C)가 증가되게 할 수 있다. That is, the time management model server 500 learns the number adjustment model 552 by using the training data set that increases the number of simulations (C) at the point in time when the win rate has significantly decreased after the game is completed to learn the number adjustment model 552. The number of simulations (C) in the checkerboard state (S) can be increased.

또한, 시간 관리 모델 서버(500)는 5) 바둑 게임 서비스 상에서 진행된 복수의 대국에 기반한 통계를 기초로 시뮬레이션 횟수(C)를 조정할 수 있다. In addition, the time management model server 500 may adjust the number of simulations (C) based on statistics based on a plurality of games progressed on 5) Go game service.

자세히, 시간 관리 모델 서버(500)는 바둑서버(200) 및/또는 착수 모델 서버(300) 등과 연동하여 바둑 게임 서비스 상에서 진행된 복수의 대국 각각에 대한 전체 시뮬레이션 횟수(C) 대비 승률 관계에 따른 통계치를 획득할 수 있다. 예를 들면, 시간 관리 모델 서버(500)는 복수의 대국 각각에 대한 전체 시뮬레이션 횟수(C) 및 그에 따른 승리 여부 데이터에 기초하여 전체 시뮬레이션 횟수(C)가 소정의 횟수 이상일 때 승률이 소정의 퍼센트 이상 증가한다는 통계치를 획득할 수 있다. In detail, the time management model server 500 interlocks with the Go server 200 and/or the start model server 300, and the like, statistics according to the win rate relationship versus the total number of simulations (C) for a plurality of games performed on the Go game service. can be obtained. For example, the time management model server 500 is based on the total number of simulations (C) for each of the plurality of powers and whether or not the result wins data, when the total number of simulations (C) is a predetermined number or more, the win rate is a predetermined percentage You can get statistics that increase more than that.

또한, 시간 관리 모델 서버(500)는, 획득된 통계치에 기초하여 상기 횟수 조정 모델(552)에 의하여 산출된 시뮬레이션 횟수(C)를 조정할 수 있다. 실시예로 시간 관리 모델 서버(500)는 상기 통계치에서의 가장 높은 승률을 충족하는 시뮬레이션 횟수(C)를 만족하도록 상기 횟수 조정 모델(552)에 의하여 산출되는 시뮬레이션 횟수(C)를 조정할 수 있다. 예를 들면, 시간 관리 모델 서버(500)는 상기 통계치에서 일 대국에서의 전체 시뮬레이션 횟수(C)가 10만회 이상 20만회 이하일 때 승률이 가장 크게 증가한다고 판단되면, 상기 전체 시뮬레이션 횟수(C)가 충족될 수 있도록 상기 횟수 조정 모델(552)에 의하여 산출되는 시뮬레이션 횟수(C)를 증가 또는 감소시킬 수 있다. Also, the time management model server 500 may adjust the number of simulations C calculated by the number adjustment model 552 based on the obtained statistics. In an embodiment, the time management model server 500 may adjust the number of simulations (C) calculated by the number adjustment model 552 to satisfy the number of simulations (C) that satisfy the highest win rate in the statistics. For example, if the time management model server 500 determines that the win rate increases the most when the total number of simulations (C) in one game is greater than or equal to 100,000 times and less than or equal to 200,000 in the statistics, the total number of simulations (C) is To be satisfied, the number of simulations C calculated by the number adjustment model 552 may be increased or decreased.

따라서, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 다각도적 관점에서 최적의 시뮬레이션 횟수(C)를 예측할 수 있다. Therefore, the deep learning-based Go game service apparatus according to another embodiment can predict the optimal number of simulations (C) from a multi-angle point of view.

또한, 본 발명의 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 위와 같이 결정된 시뮬레이션 횟수(C)를 착수 모델로 제공하는 단계(S2310)를 포함할 수 있다. In addition, the deep learning-based Go game service method according to another embodiment of the present invention may include the step (S2310) of the time management model server 500 providing the number of simulations (C) determined as above as a start-up model. .

도 30은 본 발명의 다른 실시예에 따른 시뮬레이션 횟수(C)를 기초로 MCTS를 수행하는 착수 모델을 설명하기 위한 개념도이다. 30 is a conceptual diagram for explaining a set-up model for performing MCTS based on the number of simulations (C) according to another embodiment of the present invention.

자세히, 도 30을 참조하면, 시간 관리 모델 서버(500)는 상기 횟수 조정 모델(552)에 기반하여 결정된 시뮬레이션 횟수(C)를 착수 모델로 제공할 수 있다. 그리하여 시간 관리 모델 서버(500)는 상기 학습된 횟수 조정 모델(552)에 의하여 결정된 시뮬레이션 횟수(C)를 기초로 상기 착수 모델이 MCTS 시뮬레이션을 수행해 다음 착수점을 결정하게 할 수 있다. 예를 들면, 시간 관리 모델 서버(500)는 상기 횟수 조정 모델(552)에 의하여 결정된 시뮬레이션 횟수(C)가 '500회'인 경우, 상기 시뮬레이션 횟수(C)를 상술된 착수 모델에 제공할 수 있고, 상기 착수 모델은 수신된 시뮬레이션 횟수(C)인 '500회'만큼 MCTS 시뮬레이션을 수행하여 다음 착수점을 결정할 수 있다. In detail, referring to FIG. 30 , the time management model server 500 may provide the number of simulations C determined based on the number adjustment model 552 as an initiation model. Thus, the time management model server 500 may determine the next starting point by performing the MCTS simulation of the starting model based on the number of simulations (C) determined by the learned number adjustment model 552 . For example, when the number of simulations (C) determined by the number adjustment model 552 is '500 times', the time management model server 500 may provide the number of simulations (C) to the above-described start model. There, the start model can determine the next start point by performing MCTS simulation as many as '500 times' that is the number of simulations (C) received.

따라서, 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑판 상태(S)에 대한 최적의 시뮬레이션 횟수(C)를 기초로 MCTS를 수행하여 다음 착수를 결정할 수 있다. Therefore, the deep learning-based Go game service device according to another embodiment may determine the next start by performing MCTS based on the optimal number of simulations (C) for the Go board state (S).

<또 다른 실시예에 따른 시간 관리 모델 서버><Time management model server according to another embodiment>

도 31은 본 발명의 또 다른 실시예에 따른 시간 관리 모델 서버(500)의 시간 관리 모델을 설명하기 위한 도면이고, 도 32은 본 발명의 또 다른 실시예에 따른 게임 시간 정보를 생성하기 위해 사용되는 집수 변화량을 설명하기 위한 도면이고, 도 33은 본 발명의 또 다른 실시예에 따른 게임 시간 정보를 생성하기 위해 사용되는 집수 변화량을 설명하기 위한 도면이고, 도 34는 본 발명의 또 다른 실시예에 따른 게임 시간 정보를 생성하기 위해 사용되는 공배수를 설명하기 위한 도면이다.Figure 31 is a diagram for explaining a time management model of the time management model server 500 according to another embodiment of the present invention, Figure 32 is used to generate game time information according to another embodiment of the present invention Fig. 33 is a diagram for explaining the amount of change in water collection used to generate game time information according to another embodiment of the present invention, and Fig. 34 is another embodiment of the present invention It is a diagram for explaining a common multiple used to generate game time information according to .

도 33을 참조하면, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 시간 관리 모델 서버(500)의 시간 관리 모델을 이용하여 현재 바둑판 상태(S)에서 게임 시간 정보를 생성할 수 있다. 게임 시간 정보는 바둑판 상태(S)에 따른 경기 종료시까지 예측되는 남은 경기 길이를 포함할 수 있다. 시간 관리 모델은 시간 관리 모델 서버(500)의 딥러닝 모델로써 시간 관리 제1 신경망(520), 시간 관리 제2 신경망(530), 제1 입력 특징 생성부(540)를 포함할 수 있다. Referring to FIG. 33 , the deep learning-based Go game service according to an embodiment of the present invention may generate game time information in the current Go board state (S) by using the time management model of the time management model server 500 . . The game time information may include the remaining game length predicted until the end of the game according to the checkerboard state (S). The time management model is a deep learning model of the time management model server 500 , and may include a first time management neural network 520 , a second time management neural network 530 , and a first input feature generator 540 .

시간 관리 모델은 현재 바둑판 상태(S)의 게임 시간 정보인 남은 경기 길이를 예측할 수 있도록 지도 학습(supervised learning)할 수 있다. 보다 구체적으로, 시간 관리 모델 서버(500)는 바둑판 상태(S)에 따른 남은 경기 길이 예측을 위한 트레이닝 데이터 셋을 생성하고 생성된 트레이닝 데이터 셋을 이용하여 시간 관리 모델이 현재 바둑판 상태(S)에 따른 남은 경기 길이를 예측할 수 있도록 학습시킬 수 있다. 시간 관리 모델 서버(500)는 바둑서버(200)로부터 복수의 기보를 수신할 수 있다. 복수의 기보의 각 기보는 착수 순서에 따른 각각의 바둑판 상태(S)를 포함할 수 있다. 또한, 복수의 기복의 각 기보는 각각의 바둑판 상태(S)에서 게임 시간 정보, 특히 경기 종료시까지의 남은 경기 길이 정보를 포함할 수 있다. 또한, 시간 관리 모델 서버(500)는 형세 판단 모델 서버(400)로부터 형세 판단 정보를 수신할 수 있다. 형세 판단 정보는 바둑서버(200)에서 시간 관리 모델 서버(500)로 제공하는 복수의 기보에 기초한 형세 판단 정보이고, 집수의 변화량 정보 및 공배수 정보 등을 포함할 수 있다. 또한, 시간 관리 모델 서버(500)는 착수 모델 서버(300)로부터 가치값을 수신할 수 있다. 가치값은 바둑서버(200)에서 시간 관리 모델 서버(500)로 제공하는 복수의 기보에 기초한 가치값일 수 있다. The time management model may perform supervised learning to predict the remaining game length, which is game time information of the current checkerboard state (S). More specifically, the time management model server 500 generates a training data set for predicting the remaining game length according to the checkerboard state (S), and using the generated training data set, the time management model is currently in the checkerboard state (S). It can be trained to predict the remaining game length. The time management model server 500 may receive a plurality of notations from the Go server 200 . Each notation of a plurality of notations may include a respective checkerboard state (S) according to the starting order. In addition, each notation of a plurality of ups and downs may include game time information in each checkerboard state (S), in particular, the remaining game length information until the end of the game. Also, the time management model server 500 may receive the situation determination information from the situation determination model server 400 . The situation determination information is situation determination information based on a plurality of notations provided from the Go server 200 to the time management model server 500, and may include information on the amount of change in water collection and common multiple information. In addition, the time management model server 500 may receive a value value from the start model server (300). The value may be a value based on a plurality of notations provided from the Go server 200 to the time management model server 500 .

제1 입력 특징 추출부(540)는 복수의 기보의 바둑판 상태(S)에서 제1 입력 특징(IF1)을 추출하여 시간 관리 제1 신경망(520)에 트레이닝을 위한 입력 데이터로 제공할 수 있다. 바둑판 상태(S)의 제1 입력 특징(IF)은 흑 플레이어의 최근 8 수에 대한 돌의 위치 정보과 백 플레이어의 최근 8 수에 대한 돌의 위치 정보와 현재 플레이어가 흑인지 백인지에 대한 차례 정보를 포함한 19*19*18의 RGB 이미지일 수 있다. 일 예로, 제1 입력 특징 추출부(540)는 신경망 구조로 되어 있을 수 있으며 일종의 인코더를 포함할 수 있다.The first input feature extractor 540 may extract the first input feature IF1 from the checkerboard state S of a plurality of notations and provide the first input feature IF1 as input data for training to the time management first neural network 520 . The first input feature (IF) of the checkerboard state (S) is the position information of the stone for the last eight moves of the black player, the position information of the stone for the last eight moves of the white player, and turn information on whether the current player is black or white It may be an RGB image of 19*19*18 including . For example, the first input feature extractor 540 may have a neural network structure and may include a kind of encoder.

또한, 시간 관리 모델은 제2 입력 특징(IF2) 및 제3 입력 특징(IF3)을 시간 관리 제1 신경망(520)에 트레이닝을 위한 입력 데이터로 제공할 수 있다. 제2 입력 특징(IF2)은 바둑판 상태(S)에 따른 집수의 변화량 정보일 수 있다. 제3 입력 특징(IF3)은 바둑판 상태(S)에 따른 공배수 정보일 수 있다. 시간 관리 모델은 제4 입력 특징(IF4)를 시간 관리 제2 신경망(530)에 트레이닝을 위한 입력 데이터로 제공할 수 있다. 제4 입력 특징(IF4)는 바둑판 상태(S)에 따른 가치값일 수 있다. Also, the time management model may provide the second input feature IF2 and the third input feature IF3 as input data for training to the first time management neural network 520 . The second input feature IF2 may be information on the amount of change in water collection according to the checkerboard state S. The third input feature IF3 may be common multiple information according to the checkerboard state S. The time management model may provide the fourth input feature IF4 as input data for training to the second time management neural network 530 . The fourth input feature IF4 may be a value according to the checkerboard state S.

시간 관리 제1 신경망(520)은 제1 내지 제3 입력 특징(IF1 내지 IF3)를 입력으로 하여 출력값을 시간 관리 제2 신경망(530)에 제공할 수 있다. 시간 관리 제1 신경망(520)은 신경망 구조로 구성될 수 있다. 일 예로, 시간 관리 제1 신경망(520)은 20개의 레지듀얼(residual) 블록으로 구성될 수 있다. 도 8을 참조하면, 하나의 레지듀얼 블록은 256개의 3X3 컨볼루션 레이어, 일괄 정규화(batch normalization) 레이어, Relu 활성화 함수 레이어, 256개의 3X3 컨볼루션 레이어, 일괄 정규화(batch normalization) 레이어, 스킵 커넥션, Relu 활성화 함수 레이어 순으로 배치될 수 있다. 일괄 정규화(batch normalization) 레이어는 학습하는 도중에 이전 레이어의 파라미터 변화로 인해 현재 레이어의 입력의 분포가 바뀌는 현상인 공변량 변화(covariate shift)를 방지하기 위한 것이다. 스킵 커넥션은 블록 층이 두꺼워지더라도 신경망의 성능이 감소하는 것을 방지하고 블록 층을 더욱 두껍게 하여 전체 신경망 성능을 높일 수 있게 한다. 스킵 커넥션은 레지듀얼 블록의 최초 입력 데이터가 두 번째 일괄 정규화(batch normalization) 레이어의 출력과 합하여 두번째 Relu 활성화 함수 레이어에 입력되는 형태일 수 있다. The first time management neural network 520 may receive the first to third input features IF1 to IF3 as inputs and provide an output value to the second time management neural network 530 . The first time management neural network 520 may be configured in a neural network structure. As an example, the first time management neural network 520 may include 20 residual blocks. Referring to FIG. 8 , one residual block includes 256 3X3 convolutional layers, batch normalization layers, Relu activation function layers, 256 3X3 convolutional layers, batch normalization layers, skip connections, Relu activation function layers may be arranged in order. The batch normalization layer is to prevent covariate shift, a phenomenon in which the distribution of the input of the current layer is changed due to the parameter change of the previous layer during learning. Skip connection prevents the performance of the neural network from decreasing even if the block layer becomes thicker, and makes the block layer thicker to increase the overall neural network performance. The skip connection may be in a form in which the first input data of the residual block is combined with the output of the second batch normalization layer and input to the second Relu activation function layer.

시간 관리 제2 신경망(530)은 시간 관리 제1 신경망(520)의 출력값과 제4 입력 특징(IF4)를 입력으로 하여 예측한 남은 경기 길이에 관한 게임 시간 정보를 생성할 수 있다. 시간 관리 제2 신경망(530)은 신경망 구조로 구성될 수 있다. 일 예로, 시간 관리 제2 신경망(530)은 풀리 커넥티드 레이어 구조일 수 있다. The second time management neural network 530 may generate game time information regarding the predicted remaining game length by inputting the output value of the first time management neural network 520 and the fourth input feature IF4 as inputs. The time management second neural network 530 may be configured in a neural network structure. As an example, the time management second neural network 530 may have a fully connected layer structure.

시간 관리 모델은 제1 내지 제4 입력 특징(IF1 내지 IF4)을 입력 데이터로 하고 남은 경기 길이 정보를 타겟 데이터(

)로 한 트레이닝 데이터 셋을 이용하여 시간 관리 제1 신경망(520)을 거쳐 시간 관리 제2 신경망(530)에서 생성된 출력 데이터(r)가 타겟 데이터(

)와 동일해지도록 시간 관리 제1 신경망(520) 및 시간 관리 제2 신경망(530)을 충분히 학습할 수 있다. 일 예로, 시간 관리 모델은 남은 경기 길이 예측 손실(

)을 이용하여 남은 경기 길이 예측 손실(

)이 최소가 되도록 트레이닝 할 수 있다. 예를 들어, 남은 경기 길이 예측 손실(

)은 수학식 5를 따를 수 있다.The time management model uses the first to fourth input features (IF1 to IF4) as input data and the remaining game length information as target data (

), the output data (r) generated in the time management second neural network 530 through the first time management neural network 520 using the training data set as the target data (

) and the time management first neural network 520 and the time management second neural network 530 can be sufficiently trained. As an example, the time management model predicts the loss of the remaining game length (

) to predict the length of the remaining match (

) can be trained to be minimal. For example, the predicted loss of remaining match length (

) may follow Equation 5.

(수학식 5)(Equation 5)

수학식 5에서

는 집수의 변화량 손실이다. 남은 경기 길이 예측 손실(

)은 집수의 변화량 손실(

)을 이용한다. 남은 경기 길이 예측을 위하여 집수의 변화량을 이용하는 이유는 대국중 초반의 집수의 변화가 상대적으로 크고, 후반의 집수의 변화가 상대적으로 적을 가능성이 높으므로 남은 경기 길이 예측 판단의 요소로 사용할 수 있기 때문이다. 일 예로, 도 32를 참고하면, 도 32는 임의의 기보의 초반부의 바둑판 상태(S)에 대하여 형세 판단 모델 서버(400)를 이용하여 생성된 형세 판단 정보이다. 도 32(a)는 104수까지 둔 경우로 흑이 46집이고 백이 63.5집이 된다. 도 32(b)는 흑이 한수를 더 둔 105수까지 둔 경우로 흑이 46집이고 백이 59.5집으로 백의 경우 4집의 차이가 난다. 즉, 초반부의 바둑판 상태(S)는 한 수 둘때마다 집수의 변화량이 크다. 도 33를 참고하면, 도 33는 임의의 기보의 후반부의 바둑판 상태(S)에 대하여 형세 판단 모델 서버(400)를 이용하여 생성된 형세 판단 정보이다. 도 33(a)는 222수까지 둔 경우로 흑이 64집이고 백이 63.5집이 된다. 도 33(b)는 백이 한 수를 더 둔 223수까지 둔 경우로 흑이 63집이고 백이 64.5집으로 백의 경우 1집의 차이가 난다. 즉, 후반부의 바둑판 상태(S)는 한 수 둘때마다 집수의 변화량이 작다.in Equation 5

is the change in catchment loss. Remaining Match Length Prediction Loss (

) is the change in catchment loss (

) is used. The reason for using the change in catchment to predict the remaining game length is that the change in catchment in the early part of the game is relatively large and the change in catchment in the second half is likely to be relatively small, so it can be used as a factor in judging the remaining game length. to be. As an example, referring to FIG. 32 , FIG. 32 is situation determination information generated using the situation determination model server 400 with respect to the checkerboard state (S) of the early part of any notation. In Fig. 32(a), up to 104 numbers are placed, and black is 46 and white is 63.5. Fig. 32(b) shows a case where black has an additional number of 105, with a difference of 46 in black and 59.5 in white, and 4 in the case of white. That is, in the initial stage of the checkerboard state (S), the amount of change in water collection is large every time one or two. Referring to FIG. 33 , FIG. 33 is situation determination information generated using the situation determination model server 400 for the checkerboard state (S) of the second half of any notation. 33(a) shows a case where up to 222 numbers are placed, and the number of black is 64 and the number of white is 63.5. Fig. 33(b) is a case in which white has one more number up to 223 numbers. Black has 63 sets and white has 64.5 sets, and in the case of white, there is a difference of 1 set. That is, in the second half of the checkerboard state (S), the amount of change in water collection is small every one or two.

수학식 5에서

는 공배수 손실이다. 남은 경기 길이 예측 손실(

)은 공배수 손실(

)을 이용한다. 남은 경기 길이 예측을 위하여 공배수를 이용하는 이유는 대국 중에 후반으로 갈수록 공배수가 점점 줄어들기 때문에 이를 이용하여 남은 경기 길이 예측 판단의 요소로 사용할 수 있기 때문이다. 일 예로, 도 34을 참고하면, 도 34은 임의의 기보의 초반부와 후반부의의 바둑판 상태(S)에 대하여 형세 판단 모델 서버(400)를 이용하여 생성된 형세 판단 정보이다. 공배는 도 34의 형세 판단 결과에서 붉은 선으로 X표시가 된 곳이다. 도 34(a)는 대국의 초반부이기 때문에 공배가 많다. 도 34(b)는 대국의 후반부이기 때문에 공배가 적다.in Equation 5

is the common multiple loss. Remaining Match Length Prediction Loss (

) is the common multiple loss (

) is used. The reason why common multiples are used to predict the remaining game length is that the common multiples gradually decrease toward the second half of the game, so it can be used as a factor in judging the remaining game lengths. As an example, referring to FIG. 34 , FIG. 34 is situation determination information generated using the situation determination model server 400 with respect to the checkerboard state (S) of the first half and the second half of any notation. Gongbae is a place marked with an X in the red line in the situation judgment result of FIG. 34 . Figure 34 (a) is the early part of the game, so there are many open matches. Figure 34 (b) is the second half of the game, so there is less common match.

수학식 5에서

는 가치값 손실이다. 남은 경기 길이 예측 손실(

)은 가치값 손실(

)을 이용한다. 남은 경기 길이 예측을 위하여 가치값 이용하는 이유는 대국 중에 후반으로 갈수록 어느 한쪽의 가치값이 높아지기 때문에 이를 이용하여 남은 경기 길이 예측 판단의 요소로 사용할 수 잇기 때문이다. 또한, 가치값 손실(

)은 초반에는 큰 변화가 없기 때문에 경기 후반부에 이용될 수 있다. 예를 들어, 경기 후반부는 임의의 기보에서 250수이상의 착수가 된 바둑판 상태(S)일 수 있고, 이에 제한되는 것은 아니다.in Equation 5

is the loss of value. Remaining Match Length Prediction Loss (

) is the loss of value (

) is used. The reason why the value value is used for predicting the remaining game length is that the value of either side increases toward the second half of the game, so it can be used as a factor in judging the remaining game length using this value. Also, loss of value (

) can be used in the second half of the game because there is no big change in the beginning. For example, the second half of the game may be a checkerboard state (S) with more than 250 sets in any notation, but is not limited thereto.

수학식 5에서,

는 하이퍼 파라미터들이다. 사용자는 하이퍼 파라미터는 조절하여 각 손실의 상대적인 중요도를 조절할 수 있다. 예를 들어,

는 경기 후반부로 갈수록 중요도가 높아지므로 수치가 높아질 수 있고, 경기 초반부에는 중요도가 낮으므로 수치가 낮을 수 있다.In Equation 5,

are hyperparameters. The user can adjust the relative importance of each loss by adjusting the hyperparameter. for example,

may increase in importance as the game progresses toward the latter part of the game, and may have a low value in the early stage of the game because it is of low importance.

학습된 시간 관리 모델은 대국중 현재 바둑판 상태(S)가 입력되면 예측한 남은 경기 길이에 관한 게임 시간 정보를 생성할 수 있다. 보다 구체적으로, 학습된 시간 관리 모델은 대국중에 현재 바둑판 상태(S)가 입력되면 제1 입력 특징 추출부(540)에 의해 제1 입력 특징이 추출할 수 있다. 시간 관리 모델은 형세 판단 모델 서버가 현재 바둑판 상태(S)에 대한 형세 판단으로 제공하는 집수의 변화량 정보를 제2 입력 특징으로 하고, 공배수 정보를 제3 입력 특징으로 할 수 있다. 시간 관리 모델은 제1 입력 특징 내지 제3 입력 특징을 입력 데이터로 하여 시간 관리 제1 신경망(520)에서 생성한 출력값을 시간 관리 제2 신경망(530)에 제공할 수 있다. 시간 관리 모델은 착수 모델 서버가 제공하는 현재 바둑판 상태(S)에 착수 후보점에 대한 가치값을 제4 입력 특징으로 할 수 있다. 시간 관리 모델은 시간 관리 제1 신경망(520)의 출력값과 제4 입력 특징을 입력 데이터로 하여 시간 관리 제2 신경망(530)에서 예측된 남은 경기 길이에 관한 게임 시간 정보를 생성할 수 있다.The learned time management model may generate game time information regarding the predicted remaining game length when the current checkerboard state (S) is input during a match. More specifically, in the learned time management model, when the current checkerboard state (S) is input during a game, the first input feature may be extracted by the first input feature extractor 540 . In the time management model, information on the change amount of catchment provided by the situation determination model server as a situation determination for the current checkerboard state (S) as the second input feature, and the common multiple information as the third input feature. The time management model may provide the output value generated by the first time management neural network 520 to the second time management neural network 530 by using the first input characteristic to the third input characteristic as input data. The time management model may include a fourth input feature of a value value for a starting candidate point in the current checkerboard state (S) provided by the starting model server. The time management model may generate game time information about the remaining game length predicted by the second time management neural network 530 by using the output value and the fourth input characteristic of the first time management neural network 520 as input data.

또한, 시간 관리 모델 서버(500)는 게임 시간 정보의 예측된 남은 경기 길이에 따라서 착수 준비 시간을 조절할 수 있다. 시간 관리 모델 서버(500)는 조절된 착수 준비 시간을 착수 모델 서버(300)에 제공할 수 있다.In addition, the time management model server 500 may adjust the start preparation time according to the predicted remaining game length of the game time information. The time management model server 500 may provide the adjusted start preparation time to the set off model server 300 .

따라서, 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 남은 경기 길이를 예측할 수 있다. 또한, 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 예측된 남은 경기 길이를 이용하여 착수 준비 시간을 효과적으로 나눌 수 있다.Therefore, the deep learning-based Go game service device according to another embodiment may predict the remaining game length. In addition, the deep learning-based Go game service device according to another embodiment can effectively divide the start preparation time using the predicted remaining game length.

도 35은 본 발명의 또 다른 실시예에 따른 시간 관리 모델 서버(500)의 바둑 게임 서비스 시스템에서의 신호 흐름에 대한 예시도이다.35 is an exemplary diagram of a signal flow in the Go game service system of the time management model server 500 according to another embodiment of the present invention.

도 35을 참조하면, 바둑서버(200)는 복수의 기보를 착수 모델 서버(300), 형세 판단 모델 서버(400) 및 시간 관리 모델 서버(500)에 송신할 수 있다(S2701). 시간 관리 보델 서버(500)는 수신한 복수의 기보의 바둑판 상태(S)의 제1 입력 특징을 추출할 수 있다(S2702). 형세 판단 모델 서버(400)는 수신한 복수의 기보의 형세 판단 정보를 생성하고, 시간 관리 모델 서버(500)에 형세 판단 정보를 송신할 수 있다(S2703, S2704). 형세 판단 정보는 집수의 변화량 정보 및 공배수 정보일 수 있다. 시간 관리 모델 서버(500)는 집수의 변화량 정보를 제2 입력 특징으로 하고 공배수 정보를 제3 입력 특징으로 하여 시간 관리 모델의 시간 관리 제1 신경망에 입력할 수 있다(S2705). 착수 모델 서버(300)는 복수의 기보의 바둑판 상태(S)에 따른 가치값을 생성하고, 생성된 가치값을 시간 관리 모델 서버(500)에 전송할 수 있다(S2706, S2707). 시간 관리 모델 서버(500)는 수신한 가치값을 제4 입력 특징으로 하여 시간 관리 모델의 시간 관리 제2 신경망에 입력할 수 있다(S2708). 시간 관리 모델 서버(500)는 제1 내지 제4 입력 특징의 입력 데이터와 복수의 기보의 바둑판 상태(S)에 따른 남은 경기 길이 정보를 타겟 데이터로 하여 시간 관리 모델을 트레이닝 할 수 있다(S2709). 바둑서버(200)는 바둑 게임을 진행하며 단말기(100)와 착수 모델 서버(300)가 자신의 턴에 착수를 수행할 수 있다(S2710 내지 S2712). 형세 판단 모델 서버(400)는 현재 바둑판 상태(S)의 입력 특징을 추출하고, 딥러닝 모델인 형세 판단 모델이 입력 특징을 이용하여 형세값을 생성하고, 바둑판 상태(S)와 형세값을 이용하여 형세 판단을 수행할 수 있다(S2713). 시간 관리 모델 서버(500)는 현재 바둑판 상태(S)의 입력 특징을 추출하여 제1 입력 특징으로 하고, 형세 판단 정보를 제2 및 제3 입력 특징으로 하고, 착수 모델 서버(300)에서 제공되는 가치값을 제4 입력 특징으로 하여, 딥러닝 모델인 시간 관리 모델이 게임 시간 정보를 생성할 수 있다(S2714).Referring to FIG. 35 , the Go server 200 may transmit a plurality of notations to the start model server 300 , the situation determination model server 400 and the time management model server 500 ( S2701 ). Time management model server 500 may extract the first input feature of the checkerboard state (S) of a plurality of received notation (S2702). The situation determination model server 400 may generate the situation determination information of a plurality of received notations, and may transmit the situation determination information to the time management model server 500 (S2703, S2704). The situation determination information may be information on a change amount of catchment and information on a common drainage. The time management model server 500 may input the change amount information of the catchment as the second input feature and the common multiple information as the third input feature to the time management first neural network of the time management model ( S2705 ). Start model server 300 may generate a value value according to the checkerboard state (S) of a plurality of notations, and transmit the generated value value to the time management model server 500 (S2706, S2707). The time management model server 500 may use the received value as a fourth input feature and input it into the second neural network for time management of the time management model (S2708). The time management model server 500 may train the time management model by using the input data of the first to fourth input characteristics and the remaining game length information according to the checkerboard state (S) of a plurality of notations as target data (S2709) . The Go server 200 may perform a Go game, and the terminal 100 and the start model server 300 may perform a start in their turn (S2710 to S2712). The situation determination model server 400 extracts the input features of the current checkerboard state (S), and the situation determination model, which is a deep learning model, generates a situation value using the input features, and uses the checkerboard state (S) and the appearance value Thus, the situation can be determined (S2713). The time management model server 500 extracts the input features of the current checkerboard state (S) as the first input features, and the situation determination information as the second and third input features, which is provided by the start model server 300 By using the value as the fourth input feature, the time management model, which is a deep learning model, may generate game time information (S2714).

도 36은 본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 게임 시간 정보 생성 방법이다.36 is a method for generating game time information among the deep learning-based Go game service methods according to another embodiment of the present invention.

도 36을 참조하면, 본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 복수의 기보를 수신하는 단계(S2801)을 포함할 수 있다. 복수의 기보에 대한 설명은 도 31의 설명을 따른다.Referring to FIG. 36 , the deep learning-based Go game service method according to another embodiment of the present invention may include a step (S2801) of the time management model server 500 receiving a plurality of notations. A description of a plurality of notations follows the description of FIG. 31 .

본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 수신한 복수의 기보의 바둑판 상태(S)에서 제1 입력 특징을 추출하는 단계(s2802)를 포함할 수 있다. 제1 입력 특징을 추출하는 방법은 도 31의 설명을 따른다. 시간 관리 모델 서버(500)는 추출된 제1 입력 특징을 시간 관리 제1 신경망의 입력 데이터로 제공할 수 있다.Deep learning-based Go game service method according to another embodiment of the present invention extracts the first input feature from the checkerboard state (S) of a plurality of notations received by the time management model server 500 (s2802) may include A method of extracting the first input feature follows the description of FIG. 31 . The time management model server 500 may provide the extracted first input feature as input data of the first neural network for time management.

본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)가 복수의 기보의 바둑판 상태(S)에 따른 형세 판단 정보를 수신하는 단계(S2803)을 포함할 수 있다. 형세 판단 정보를 수신하는 방법은 도 31의 설명을 따른다. 형세 판단 정보는 집수의 변화량 정보 및 공배수 정보일 수 있다. 시간 관리 모델 서버(500)는 집수의 변화량 정보를 제2 입력 특징으로 하고 공배수 정보를 제3 입력 특징으로 할 수 있다.The deep learning-based Go game service method according to another embodiment of the present invention may include the step (S2803) of the time management model server 500 receiving the situation determination information according to the checkerboard state (S) of a plurality of notations. can A method of receiving the situation determination information follows the description of FIG. 31 . The situation determination information may be information on a change amount of catchment and information on a common drainage. The time management model server 500 may use the change amount information of the catchment as the second input feature and the common multiple information as the third input feature.

본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)는 제2 입력 특징 및 제3 입력 특징을 시간 관리 제1 신경망에 입력하는 단계(S2804)를 포함할 수 있다. The deep learning-based Go game service method according to another embodiment of the present invention includes the step (S2804) of the time management model server 500 inputting the second input feature and the third input feature to the first neural network for time management can do.

본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)는 착수 모델 서버로부터 가치값을 수신하는 단계(S2805)를 포함할 수 있다. 시간 관리 모델 서버(500)는 가치값을 제4 입력 특징으로 할 수 있다.The deep learning-based Go game service method according to another embodiment of the present invention may include the step (S2805) of the time management model server 500 receiving a value from the start model server. The time management model server 500 may use the value as the fourth input feature.

본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)는 제4 입력 특징을 시간 관리 제2 신경망에 입력하는 단계(S2806)을 포함할 수 있다.The deep learning-based Go game service method according to another embodiment of the present invention may include the step (S2806) of the time management model server 500 inputting the fourth input characteristic to the time management second neural network.

본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)는 제1 내지 제4 입력 특징의 입력 데이터와 복수의 기보의 바둑판 상태(S)에 따른 남은 경기 길이 정보를 타겟 데이터로 하여 시간 관리 모델을 트레이닝하는 단계(S2807)를 포함할 수 있다. 시간 관리 모델의 트레이닝 방법은 도 31 내지 도 34의 설명을 따른다.Deep learning-based Go game service method according to another embodiment of the present invention, the time management model server 500 is the remaining game according to the input data of the first to fourth input characteristics and the checkerboard state (S) of a plurality of notations It may include training the time management model using the length information as target data (S2807). The training method of the time management model follows the description of FIGS. 31 to 34 .

본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)는 게임 대국중 바둑판 상태(S), 형세 판단 정보, 가치값을 수신하는 단계(S2809)를 포함할 수 있다.Deep learning-based Go game service method according to another embodiment of the present invention, the time management model server 500 includes the step (S2809) of receiving the Go board state (S), situation determination information, and value during the game can do.

본 발명의 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 시간 관리 모델 서버(500)는 수신한 바둑판 상태(S), 형세 판단 정보, 가치값을 이용하여 게임 시간 정보를 생성하는 단계(S2810)을 포함할 수 있다. 게임 시간 정보를 생성하는 방법은 도 31의 설명을 따른다.Deep learning-based Go game service method according to another embodiment of the present invention, the time management model server 500 generates game time information using the received Go board state (S), situation determination information, and value values (S2810) may be included. A method of generating game time information follows the description of FIG. 31 .

따라서, 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 남은 대국 시간을 예측할 수 있다. 또한, 또 다른 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 예측된 남은 대국 시간을 이용하여 착수 준비 시간을 효과적으로 나눌 수 있다.Therefore, the deep learning-based Go game service method according to another embodiment can predict the remaining playing time. In addition, the deep learning-based Go game service method according to another embodiment can effectively divide the start preparation time using the predicted remaining game time.

이상 설명된 본 발명에 따른 실시예는 다양한 컴퓨터 구성요소를 통하여 실행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위하여 하나 이상의 소프트웨어 모듈로 변경될 수 있으며, 그 역도 마찬가지이다.The embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and used by those skilled in the art of computer software. Examples of the computer-readable recording medium include hard disks, magnetic media such as floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks. medium), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. A hardware device may be converted into one or more software modules to perform processing in accordance with the present invention, and vice versa.

본 발명에서 설명하는 특정 실행들은 일 실시 예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, “필수적인”, “중요하게” 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.The specific implementations described in the present invention are only examples, and do not limit the scope of the present invention in any way. For brevity of the specification, descriptions of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connections or connecting members of the lines between the components shown in the drawings exemplify functional connections and/or physical or circuit connections, and in an actual device, various functional connections, physical connections that are replaceable or additional may be referred to as connections, or circuit connections. In addition, unless there is a specific reference such as “essential” or “importantly”, it may not be a necessary component for the application of the present invention.

또한 설명한 본 발명의 상세한 설명에서는 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자 또는 해당 기술분야에 통상의 지식을 갖는 자라면 후술할 특허청구범위에 기재된 본 발명의 사상 및 기술 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 명세서의 상세한 설명에 기재된 내용으로 한정되는 것이 아니라 특허청구범위에 의해 정하여져야만 할 것이다.In addition, although the detailed description of the present invention has been described with reference to a preferred embodiment of the present invention, those skilled in the art or those having ordinary knowledge in the art will appreciate the spirit of the present invention described in the claims to be described later. And it will be understood that various modifications and variations of the present invention can be made without departing from the technical scope. Accordingly, the technical scope of the present invention should not be limited to the contents described in the detailed description of the specification, but should be defined by the claims.

100 단말기
200 바둑서버
300 착수 모델 서버
310 탐색부
320 셀프 플레이부
330 착수 신경망
400 형세 판단 모델 서버
410 형세 판단 신경망
420 입력 특징 추출부
430 정답 레이블 생성부
500 시간 관리 모델 서버
510 시간 관리부
520 시간 관리 제1 신경망
530 시간 관리 제2 신경망
540 제1 입력 특징 추출부
551 시뮬레이션 횟수 예측부
552 시뮬레이션 횟수 조정 모델100 terminals
200 Go server
300 launch model server
310 search unit
320 self-play
330 Initiation Neural Network
400 Situation Judgment Model Server
410 Context Judgment Neural Network
420 input feature extractor
430 Correct answer label generator
500 hours management model server
510 Time Management Department
520 time management first neural network
530 time management second neural network
540 First input feature extraction unit
551 simulation count prediction unit
552 Simulation Count Adjustment Model

Claims

a communication unit for receiving at least one of the checkerboard state, the value value, the number of visits, and the number of default simulations;
a memory for storing the simulation number prediction unit and the simulation number adjustment model; and
Reading the number prediction unit and determining the optimal number of simulations for the checkerboard state using at least one of the checkerboard state, the value value, the number of visits, and the default number of simulations,
training the number adjustment model based on the determined optimal number of simulations,
A processor that reads the number adjustment model and determines the number of simulations according to the current checkerboard state; characterized by comprising:
Simulation count management device.

The method of claim 1,
The optimal number of simulations is
In Monte Carlo Tree Search (MCTS) simulation, the next starting point selected during simulation includes the number of simulations that take the minimum time within the critical point that does not change.
Simulation count management device.

3. The method of claim 2,
The number prediction unit,
By matching the determined optimal number of simulations and the checkerboard state, the checkerboard state-optimum simulation number data is generated,
The processor is
Using the generated checkerboard state-optimal simulation number data as a training data set to train the number adjustment model
Simulation count management device.

3. The method of claim 2,
The number prediction unit,
Based on the number of default simulations, based on a first MCTS simulation performed based on the checkerboard state, obtain at least one of a first number of visits and a first value for each number of start candidates,
Based on the second MCTS simulation performed based on the checkerboard state based on the additional unit number of adding a predetermined number to the default simulation number, at least one of a second number of visits and a second value value for each number of starting candidates to obtain
Simulation count management device.

5. The method of claim 4,
The number prediction unit,
Determining whether the next starting point is changed when the number of simulations increases based on the number of first visits and the second number of visits,
Determining the number of simulations that take the minimum time within a critical point in which the next starting point does not change as the optimal number of simulations
Simulation count management device.

6. The method of claim 5,
The number prediction unit,
determining whether to change the next starting point based on the rate of change based on the first number of visits and the second number of visits,
The rate of change is
calculated based on an increase in the number of visits based on a difference between the first number of visits and the second number of visits compared to an increase in the number of simulations based on a difference between the default number of simulations and the additional unit number
Simulation count management device.

6. The method of claim 5,
The number prediction unit,
Determining whether to change the next starting point based on the ratio between the first number of visits for each of the plurality of starting candidates and the ratio between the second number of visits for each of the plurality of starting candidates
Simulation count management device.

5. The method of claim 4,
The number prediction unit,
Determining whether the next starting point is changed when the number of simulations increases based on the increase/decrease rate based on the first value and the second value,
The increase/decrease rate is
Calculated based on the increase in the value of the value based on the difference between the first value and the second value compared to the increase in the number of simulations based on the difference between the default number of simulations and the number of additional units
Simulation count management device.

9. The method of claim 8,
The number prediction unit,
When the increase/decrease rate meets a predetermined criterion, the minimum number of simulations when calculating the increase/decrease rate is determined as the optimal number of simulations.
Simulation count management device.

3. The method of claim 2,
The frequency adjustment model is,
performing an adjustment process for the number of simulations according to the current checkerboard state,
The adjustment process is
A process of setting at least one of an upper limit and a lower limit of the number of simulations; a process of increasing the number of simulations with a predetermined probability; at least one of a process of adjusting the number of simulations and a process of adjusting the number of simulations based on a predetermined statistic on the number of simulations.
Simulation count management device.

11. The method of claim 10,
The processor is
By providing the number of simulations according to the current checkerboard state as a set-off model for performing the MCTS simulation, the set-off model performs the MCTS simulation based on the number of simulations
Simulation count management device.

A method of managing the number of simulations of a deep learning-based Go game service in a time management model server, the method comprising:
receiving a predetermined checkerboard state;
performing a first MCTS based on the received checkerboard state;
obtaining at least one of a first number of visits and a first value based on the performed first MCTS;
performing a second MCTS based on the received checkerboard state;
obtaining at least one of a second number of visits and a second value based on the performed second MCTS;
Determining the optimal number of simulations for the checkerboard state based on the first number of visits and the second number of visits, or determining the optimal number of simulations for the checkerboard state based on the first value and the second value step;
generating checkerboard state-optimal simulation number data by matching the determined optimal number of simulations with the checkerboard state;
learning a deep learning model based on the generated checkerboard state-optimal number of simulations data;
determining the number of simulations according to the current checkerboard state based on the learned deep learning model; and
Comprising the step of providing the determined number of simulations as an initiation model
How to manage the number of simulations.