KR20210009588A

KR20210009588A - Deep-learning based baduk game service method and apparatus thereof

Info

Publication number: KR20210009588A
Application number: KR1020190086287A
Authority: KR
Inventors: 박정훈
Original assignee: 엔에이치엔 주식회사
Priority date: 2019-07-17
Filing date: 2019-07-17
Publication date: 2021-01-27
Also published as: JP2021018819A; KR102316930B1; JP7051946B2

Abstract

A deep learning-based Go game service method and a device thereof according to an embodiment of the present invention relate to a method and a device for determining a Go situation using a deep learning neural network. A situation determination model server according to an embodiment of the present invention comprises: a communication unit for receiving a plurality of notations; a storage unit for storing a situation determination model; and a processor which reads the situation determination model, performs learning of the situation determination model, and determines the situation of a checkerboard state using the learned situation determination model. The situation determination model includes: an input feature extraction unit for extracting input features from the input checkerboard state; and a situation determination neural network which generates a situation value for the intersection of the input checkerboard state based on the extracted input feature. A deep learning-based Go game service method and a device thereof according to an embodiment of the present invention can accurately determine the situation of the Go game by accurately classifying points in territory, a sacrifice, a stone, a neutral point, and a stalemate according to Go rules.

Description

Deep learning-based Go game service method and its device {DEEP-LEARNING BASED BADUK GAME SERVICE METHOD AND APPARATUS THEREOF}

딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치에 관한 것이다. 보다 상세하게는 딥러닝 신경망을 이용하여 바둑 형세를 판단하는 방법 및 그 장치에 관한 것이다. It relates to a deep learning-based Go game service method and apparatus thereof. In more detail, it relates to a method and apparatus for determining the situation of Go using a deep learning neural network.

스마트폰, 태블릿 PC, PDA(Personal Digital Assistant), 노트북 등과 같은 사용자 단말의 이용이 대중화되고 정보 처리 기술이 발달함에 따라 사용자 단말을 이용하여 보드 게임의 일종인 바둑을 할 수 있게 되었고 나아가 사람이 아닌 프로그램된 인공지능 컴퓨터와 바둑 대국을 할 수 있게 되었다. As the use of user terminals such as smartphones, tablet PCs, PDAs (Personal Digital Assistant), notebooks, etc. became popular and information processing technology developed, it became possible to play Go, a kind of board game, using user terminals. It became possible to play a game of Go with a programmed artificial intelligence computer.

바둑은 다른 보드게임인 체스나 장기와 비교하였을 때 경우의 수가 많아서 인공지능 컴퓨터가 사람 수준으로 대국을 하는데 한계가 있었고 인공지능 컴퓨터의 기력을 높이기 위한 연구가 활발하게 진행되고 있는 추세이다. 최근 개발자들은 인공지능 컴퓨터에 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 알고리즘과 딥러닝 기술을 적용하여 인공지능 컴퓨터의 기력을 프로기사들의 수준 이상으로 올렸다.When compared to other board games such as chess or shogi, there are a lot of cases in Baduk, so there is a limit to the ability of artificial intelligence computers to play at the level of humans, and research is being actively conducted to increase the energy of artificial intelligence computers. Recently, developers have applied the Monte Carlo Tree Search (MCTS) algorithm and deep learning technology to artificial intelligence computers to raise the energy of artificial intelligence computers to the level of professional engineers.

또한, 바둑 대국 진행 중 누가 얼마나 이기고 있는지 형세를 판단하는 것은 게임 전략을 세우는데 있어서 중요한 요소이다. 그러나 바둑은 규칙에 따른 경우의 수가 많아서 정확한 형세를 판단하는 것이 일반인이나 아마추어도 어려워하는 실정이고 인공지능 컴퓨터가 형세를 판단하는 것 또한 정확도가 낮았다. 최근 대국 중 바둑 형세를 판단하기 위한 인공지능 컴퓨터로서, 영향력 함수를 구현하는 방법이 있다. 그러나 영향력 함수를 이용한 형세 판단 방법은 바둑 게임 규칙이 반영이 되지 않아 판단 정확도가 낮은 문제가 있다. 다른 바둑 형세 판단 방법으로 패턴을 이용하는 방법은 바둑판에 놓여진 바둑돌의 패턴에 대한 영향력을 계산해서 형세를 예측하는 방법으로써 특정 상황에서 판단을 못하는 경우가 발생하고 성능이 좋지 못하여 정확한 판단을 못하는 문제가 있다. 또 다른 바둑 형세 판단 방법은 간단한 신경망인 롤아웃 신경망을 이용하여 현재 바둑판의 상태에서 일정한 수만큼 게임 끝까지 시뮬레이션을 해본 뒤에 게임 종료 상태의 바둑판에서 각 돌의 상태에 대한 평균 값으로 형세를 예측하는 방법이 있는데 시뮬레이션에 많은 시간이 필요하고 형세 판단 정확도도 좋지 않은 문제가 있다. 또 다른 바둑 형세 판단 방법은 딥러닝 기술인 CNN을 사용하여 바둑 형세를 학습한 모델을 이용한 방법(GoCNN)이 있는데 형세 예측 정확도가 낮고, 판위 사석의 예측 정확도가 낮고, 공배나 빅 예측 정확도가 낮은 문제점이 있다.Also, judging the situation of who is winning and how much during the Go game is an important factor in setting up a game strategy. However, since the number of cases according to the rules of Baduk is large, it is difficult for ordinary people or amateurs to judge the exact situation, and the accuracy of the artificial intelligence computer to judge the situation was also low. As an artificial intelligence computer for determining the status of Go among the recent major powers, there is a method of implementing an influence function. However, the method of determining the situation using the influence function has a problem that the determination accuracy is low because the rules of the game of Go are not reflected. The method of using a pattern as another method of determining the situation of Go is a method of predicting the situation by calculating the influence of the pattern of the Go stone placed on the board, and there are problems in that it is impossible to make an accurate judgment due to poor performance and poor performance. . Another method of determining the status of Go is to use a simple neural network, a rollout neural network, to simulate a certain number of times from the current board state to the end of the game, and then predict the situation from the average value of each stone state in the game end state. However, there is a problem that the simulation takes a lot of time and the accuracy of the situation judgment is not good. Another method of determining the situation of Go is a method using a model (GoCNN) that has learned the situation of Go using a deep learning technology, CNN, which has low accuracy of prediction of the situation, low accuracy of prediction of the rubble of the board, and low accuracy of common or big prediction. There is this.

특허문헌 1: 공개특허공보 제10-2015-0129265호Patent Document 1: Unexamined Patent Publication No. 10-2015-0129265

본 발명은 전술한 문제점을 해결하기 위해, 딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치에 관한 것이다. 보다 상세하게는 딥러닝 신경망을 이용하여 바둑 형세를 판단하는 방법 및 그 장치를 제안하고자 한다.The present invention relates to a deep learning-based Go game service method and apparatus for solving the above-described problem. In more detail, we propose a method and apparatus for determining the status of Go using a deep learning neural network.

자세히, 본 발명은 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 정확히 구분하여 바둑의 형세를 정확히 판단할 수 있는 딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치를 제공함을 목적으로 한다.In detail, an object of the present invention is to provide a deep learning-based Go game service method and apparatus that can accurately determine the status of Go by accurately classifying houses, sandstones, stones, gongbae, and big according to Go rules.

또한, 본 발명은 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 예측하여 바둑의 형세를 정확히 판단할 수 있는 딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치를 제공함을 목적으로 한다.In addition, an object of the present invention is to provide a deep learning-based Go game service method and apparatus for accurately determining the status of Go by predicting houses, sandstones, stones, gongbae, and big according to Go rules.

또한, 본 발명은 바둑 대국 중 신속하게 형세를 판단할 수 있는 딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치를 제공함을 목적으로 한다.In addition, an object of the present invention is to provide a deep learning-based Go game service method and apparatus capable of quickly determining the status of Go games.

실시예에 따른 형세 판단 모델 서버는, 복수의 기보를 수신하는 통신부; 형세 판단 모델을 저장하는 저장부; 및 상기 형세 판단 모델을 독출하여 상기 형세 판단 모델의 학습을 수행하고 상기 학습된 형세 판단 모델을 이용하여 바둑판 상태의 형세를 판단하는 프로세서;를 포함하고, 상기 형세 판단 모델은, 입력된 바둑판 상태에서 입력 특징을 추출하는 입력 특징 추출부; 상기 추출된 입력 특징을 기초하여 상기 입력된 바둑판 상태의 교차점에 대한 형세값을 생성하는 형세 판단 신경망을 포함할 수 있다.A situation determination model server according to an embodiment includes: a communication unit for receiving a plurality of notations; A storage unit for storing a situation determination model; And a processor that reads the situation determination model, performs learning of the situation determination model, and determines a situation in a checkerboard state by using the learned situation determination model, wherein the situation determination model includes, in the input checkerboard state, An input feature extractor for extracting an input feature; It may include a situation determination neural network that generates a position value for the intersection of the input checkerboard state based on the extracted input feature.

또한, 상기 형세 판단 모델 서버는, 상기 형세 판단 모델은, 상기 입력된 바둑판 상태에 기초하여 상기 학습때 이용되는 정답 레이블을 생성하는 정답 레이블 생성부를 더 포함하고, 상기 정답 레이블 생성부는, 상기 입력된 바둑판 상태에서 끝내기를 하는 제1 전처리를 수행하여 제1 전처리 상태를 생성하고, 상기 제1 전처리 상태에서 집 경계 내에 배치되며 집 구분에 불필요한 돌을 제거하는 제2 전처리를 수행하여 제2 전처리 상태를 생성할 수 있다.In addition, the situation determination model server, the situation determination model further comprises a correct answer label generation unit for generating a correct answer label used in the learning based on the input checkerboard state, the correct answer label generation unit, the input A first pre-treatment state is performed by performing a first pre-treatment in the checkerboard state to create a first pre-treatment state, and a second pre-treatment state is performed by performing a second pre-treatment to remove stones that are disposed within the house boundary in the first pre-treatment state and are unnecessary for the division of the house Can be generated.

또한, 상기 형세 판단 모델 서버는, 상기 정답 레이블 생성부는, 상기 제2 전처리 상태에서 각 교차점을 형세값으로 변경하는 제3 전처리를 수행하여 제3 전처리 상태를 생성하고, 상기 제3 전처리 상태를 상기 정답 레이블로 하여 상기 형세 판단 신경망에 제공할 수 있다.In addition, the situation determination model server, the correct answer label generation unit generates a third pre-processing state by performing a third pre-process of changing each intersection to a position value in the second pre-processing state, and the third pre-processing state It can be provided to the situation determination neural network as a correct answer label.

또한, 상기 형세 판단 모델 서버는, 상기 제3 전처리는 상기 제2 전처리 상태에서 소정의 교차점에서 내 돌이 배치되면 0, 내 집 영역이면 +1, 상대 돌이 배치되면 0, 상대 집 영역이면 -1로 대응한 형세값으로 변경할 수 있다.In addition, the situation determination model server, the third pre-processing is 0 if the stone is placed at a predetermined intersection in the second pre-processing state, +1 if the house area is placed, 0 if the stone is placed, and -1 if the opponent house area. It can be changed to the corresponding position value.

또한, 상기 형세 판단 모델 서버는, 상기 제3 전처리는 상기 제2 전처리 상태에서 소정의 교차점에서 내 돌이 배치되면 0, 내 집 영역이면 +1, 상대 돌이 배치되면 0, 상대 집 영역이면 -1, 빅 또는 공배이면 0으로 대응한 형세값으로 변경할 수 있다.In addition, the situation determination model server, the third pre-processing is 0 if my stone is placed at a predetermined intersection in the second pre-processing state, +1 if the opponent stone is placed, 0 if the opponent stone is placed, -1 if the relative house region, If it is big or common, it can be changed to 0 and the corresponding position value.

또한, 상기 형세 판단 모델 서버는, 상기 형세 판단 신경망은 복수의 레지듀얼 블록을 포함하고, 상기 복수의 레지듀얼 블록 각각은 컨볼루션 레이어, 일괄 정규화 레이어, Relu 활성화 함수 레이어, 스킵 커넥션을 포함할 수 있다.In addition, in the situation determination model server, the situation determination neural network includes a plurality of residual blocks, and each of the plurality of residual blocks may include a convolution layer, a batch normalization layer, a Relu activation function layer, and a skip connection. have.

또한, 상기 형세 판단 모델 서버는, 상기 프로세서는 형세 판단 손실을 이용하여 상기 형세 판단 신경망을 트레이닝할 수 있다.In addition, the situation determination model server, the processor may train the situation determination neural network using the situation determination loss.

또한, 상기 형세 판단 모델 서버는, 상기 입력 특징은 상기 입력된 바둑판 상태에서 흑 플레이어의 최근 8 수에 대한 돌의 위치 정보과 백 플레이어의 최근 8 수에 대한 돌의 위치 정보와 현재 플레이어가 흑인지 백인지에 대한 차례 정보를 포함할 수 있다.In addition, the situation determination model server, the input feature, in the input checkerboard state, the location information of the stones for the last 8 numbers of the black player, the location information of the stones for the last 8 numbers of the white player, and whether the current player is black. It may contain turn information about cognition.

실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은, 통신부, 형세 판단 모델이 저장된 저장부, 상기 형세 판단 모델을 구동하는 프로세서를 포함하는 형세 판단 모델 서버에 의해 바둑판 상태의 형세를 판단하는 딥러닝 기반의 바둑 게임 서비스 방법에 있어서, 상기 통신부가 복수의 기보를 수신하는 단계; 상기 프로세서가 상기 형세 판단 모델의 입력 특징 추출부를 이용하여 상기 복수의 기보의 바둑판 상태에서 입력 특징을 추출하는 단계; 상기 프로세서가 상기 형세 판단 모델의 정답 레이블 생성부를 이용하여 상기 복수의 기보의 바둑판 상태에서 정답 레이블을 생성하는 단계; 상기 프로세서가 상기 입력 특징과 상기 정답 레이블을 이용하여 상기 형세 판단 모델을 트레이닝 하는 단계; 상기 프로세서가 트레이닝을 완료하여 형세 판단 모델을 구축하는 단계; 및 상기 프로세서가 상기 트레이닝된 형세 판단 모델을 이용하여 형세 판단이 필요한 현재 바둑판 상태가 입력되면 상기 현재 바둑판 상태의 교차점에 대한 형세값을 생성하는 형세 판단을 수행하는 단계;를 포함하고, 상기 정답 레이블을 생성하는 단계는, 상기 복수의 기보의 바둑판 상태에서 끝내기를 하여 제1 전처리 상태를 생성하는 제1 전처리 단계; 상기 제1 전처리 상태에서 집 경계 내에 배치되며 집 구분에 불필요한 돌을 제거하여 제2 전처리 상태를 생성하는 제2 전처리 단계; 상기 제2 전처리 상태에서 각 교차점을 형세값으로 변경하는 제3 전처리 상태를 생성하는 제3 전처리 단계;를 포함할 수 있다.The deep learning-based Go game service method according to the embodiment is a deep learning that determines the status of a checkerboard state by a status judgment model server including a communication unit, a storage unit in which a status judgment model is stored, and a processor that drives the status judgment model. In the game based Go game service method, the communication unit receiving a plurality of notation; Extracting, by the processor, an input feature in a checkerboard state of the plurality of notations using an input feature extractor of the situation determination model; Generating, by the processor, a correct answer label in a checkerboard state of the plurality of notations using a correct answer label generator of the situation determination model; Training, by the processor, the situation determination model using the input feature and the correct answer label; Constructing, by the processor, a situation determination model by completing training; And performing, by the processor, a position determination of generating a position value for an intersection point of the current checkerboard state when a current checkerboard state for which a position determination is required is input using the trained position determination model, and the correct answer label. The generating step includes: a first pre-processing step of generating a first pre-processing state by ending in a checkerboard state of the plurality of notations; A second pre-treatment step of generating a second pre-treatment state by removing stones disposed within the boundary of the house in the first pre-treatment state and unnecessary to classify the house; And a third pre-treatment step of generating a third pre-treatment state in which each intersection point is changed to a shape value in the second pre-treatment state.

또한, 딥러닝 기반의 바둑 게임 서비스 방법은, 상기 제3 전처리는 상기 제2 전처리 상태에서 소정의 교차점에서 내 돌이 배치되면 0, 내 집 영역이면 +1, 상대 돌이 배치되면 0, 상대 집 영역이면 -1로 대응한 형세값으로 변경할 수 있다.In addition, in the deep learning-based Go game service method, the third pre-processing is 0 when my stone is placed at a predetermined intersection in the second pre-processing state, +1 when the opponent stone is placed, 0 when the opponent stone is placed, and It can be changed to the corresponding position value by -1.

또한, 딥러닝 기반의 바둑 게임 서비스 방법은, 상기 제3 전처리는 상기 제2 전처리 상태에서 소정의 교차점에서 내 돌이 배치되면 0, 내 집 영역이면 +1, 상대 돌이 배치되면 0, 상대 집 영역이면 -1, 빅 또는 공배이면 0으로 대응한 형세값으로 변경할 수 있다.In addition, in the deep learning-based Go game service method, the third pre-processing is 0 when my stone is placed at a predetermined intersection in the second pre-processing state, +1 when the opponent stone is placed, 0 when the opponent stone is placed, and If it is -1, big or common, it can be changed to the corresponding position value to 0.

실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치는 딥러닝 신경망을 이용하여 바둑 형세를 판단할 수 있다.The deep learning-based Go game service method and device thereof according to an embodiment may determine the status of Go by using a deep learning neural network.

또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치는 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 정확히 구분하여 바둑의 형세를 정확히 판단할 수 있다.In addition, the deep learning-based Go game service method and the device according to the embodiment can accurately determine the status of Go by accurately classifying houses, sandstones, stones, gongbae, and big according to the Go rule.

또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치는 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 예측하여 바둑의 형세를 정확히 판단할 수 있다.In addition, the deep learning-based Go game service method and the device according to the embodiment may accurately determine the status of Go by predicting houses, sandstones, stones, gongbae, and big according to the Go rule.

또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 및 그 장치는 바둑 대국 중 신속하게 형세를 판단할 수 있다.In addition, the deep learning-based Go game service method and device thereof according to the embodiment can quickly determine the situation among the Go game.

도 1은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템에 대한 예시도이다.
도 2는 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스에서 인공지능 컴퓨터의 착수를 위한 착수 모델 서버의 참수 모델 구조를 설명하기 위한 도면이다.
도 3은 착수 모델의 정책에 따른 착수점에 대한 이동 확률 분포를 설명하기 위한 도면이다.
도 4는 착수 모델의 착수점에 대한 가치값과 방문 횟수를 설명하기 위한 도면이다.
도 5는 착수 모델이 탐색부의 파이프 라인에 따라 착수하는 과정을 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스의 형세 판단 기능을 제공하는 화면을 보여 주는 예시도이다.
도 7은 본 발명의 형세 판단 모델 서버의 형세 판단 모델 구조를 설명하기 위한 도면이다.
도 8은 본 발명의 형세 판단 모델의 복수의 블록으로 이루어진 신경망 구조 중 하나의 블록을 설명하기 위한 도면이다.
도 9는 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제1 및 제2 전처리 단계를 설명하기 위한 도면이다.
도 10은 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제1 및 제2 전처리 단계를 설명하기 위한 도면이다.
도 11은 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제3 전처리 단계를 설명하기 위한 도면이다.
도 12는 본 발명의 형세 판단 모델의 형세 판단 결과를 설명하기 위한 도면이다.
도 13은 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이다.
도 14는 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이다.
도 15는 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이다.
도 16은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템에 신호 흐름에 대한 예시도이다.
도 17은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 형세 판단 방법이다.
도 18은 도 17의 형세 판단 방법 중 정답 레이블을 생성하기 위한 트레이닝 데이터의 전처리 방법이다.1 is an exemplary diagram for a deep learning-based Go game service system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a beheading model structure of a launching model server for launching an artificial intelligence computer in a deep learning-based Go game service according to an embodiment of the present invention.
3 is a diagram for explaining a distribution of a movement probability for a starting point according to a policy of a starting model.
4 is a diagram for explaining the value of the starting point and the number of visits of the starting model.
5 is a diagram for explaining a process in which the launch model starts according to the pipeline of the search unit.
6 is an exemplary view showing a screen providing a function of determining the status of a deep learning-based Go game service according to an embodiment of the present invention.
7 is a diagram for explaining the structure of a situation determination model of the situation determination model server of the present invention.
8 is a diagram for explaining one block of a neural network structure composed of a plurality of blocks of the situation determination model of the present invention.
9 is a view for explaining the first and second pre-processing steps for generating the correct answer label used to learn the situation determination model of the present invention.
10 is a diagram for explaining first and second pre-processing steps for generating a correct answer label used to learn a situation determination model of the present invention.
11 is a view for explaining a third pre-processing step for generating a correct answer label used to learn the situation determination model of the present invention.
12 is a view for explaining a situation determination result of the situation determination model of the present invention.
13 is a view comparing the situation determination result of the situation determination model of the present invention and the situation determination result using a deep learning model according to the prior art.
14 is a view comparing the situation determination result of the situation determination model of the present invention and the situation determination result by a deep learning model according to the prior art.
15 is a view comparing the situation determination result of the situation determination model of the present invention and the situation determination result by a deep learning model according to the prior art.
16 is an exemplary diagram of a signal flow in a deep learning-based Go game service system according to an embodiment of the present invention.
17 is a method for determining a situation among a deep learning-based Go game service method according to an embodiment of the present invention.
18 is a method of preprocessing training data for generating a correct answer label among the method of determining the situation of FIG. 17.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. 이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 또한, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 또한, 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다.Since the present invention can apply various transformations and have various embodiments, specific embodiments are illustrated in the drawings and will be described in detail in the detailed description. Effects and features of the present invention, and a method of achieving them will be apparent with reference to the embodiments described later in detail together with the drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various forms. In the following embodiments, terms such as first and second are not used in a limiting meaning, but are used for the purpose of distinguishing one component from another component. In addition, expressions in the singular include plural expressions unless the context clearly indicates otherwise. In addition, terms such as include or have means that the features or elements described in the specification are present, and do not preclude the possibility of adding one or more other features or elements in advance. In addition, in the drawings, the size of components may be exaggerated or reduced for convenience of description. For example, the size and thickness of each component shown in the drawings are arbitrarily shown for convenience of description, and the present invention is not necessarily limited to what is shown.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and when describing with reference to the drawings, the same or corresponding constituent elements are assigned the same reference numerals, and redundant descriptions thereof will be omitted. .

도 1은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템에 대한 예시도이다.1 is an exemplary diagram for a deep learning-based Go game service system according to an embodiment of the present invention.

도 1을 참조하면, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템은, 단말기(100), 바둑서버(200), 착수 모델 서버(300), 형세 판단 모델 서버(400) 및 네트워크(500)를 포함할 수 있다.Referring to FIG. 1, a deep learning-based Go game service system according to an embodiment includes a terminal 100, a Go server 200, a launch model server 300, a situation determination model server 400, and a network 500. It may include.

도 1의 각 구성요소는, 네트워크(500)를 통해 연결될 수 있다. 단말기(100), 바둑서버(200), 착수 모델 서버(300) 및 형세 판단 모델 서버(400) 등과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다. Each component of FIG. 1 may be connected through a network 500. It refers to a connection structure in which information can be exchanged between nodes such as the terminal 100, the Go server 200, the initiation model server 300, and the situation determination model server 400, and an example of such a network is 3GPP. (3rd Generation Partnership Project) network, Long Term Evolution (LTE) network, World Interoperability for Microwave Access (WIMAX) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), Personal Area Network (PAN), Bluetooth (Bluetooth) network, satellite broadcasting network, analog broadcasting network, Digital Multimedia Broadcasting (DMB) network, and the like, but are not limited thereto.

<단말기(100)><Terminal (100)>

먼저, 단말기(100)는, 바둑 게임 서비스를 제공받고자 하는 유저의 단말기이다. 또한, 단말기(100)는 다양한 작업을 수행하는 애플리케이션들을 실행하기 위한 유저가 사용하는 하나 이상의 컴퓨터 또는 다른 전자 장치이다. 예컨대, 컴퓨터, 랩탑 컴퓨터, 스마트 폰, 모바일 전화기, PDA, 태블릿 PC, 혹은 바둑서버(200)와 통신하도록 동작 가능한 임의의 다른 디바이스를 포함한다. 다만 이에 한정되는 것은 아니고 단말기(100)는 다양한 머신들 상에서 실행되고, 다수의 메모리 내에 저장된 명령어들을 해석하여 실행하는 프로세싱로직을 포함하고, 외부 입력/출력 디바이스상에 그래픽 사용자 인터페이스(GUI)를 위한 그래픽 정보를 디스플레이하는 프로세스들과 같이 다양한 기타 요소들을 포함할 수 있다. 아울러 단말기(100)는 입력 장치(예를 들면 마우스, 키보드, 터치 감지 표면 등) 및 출력 장치(예를 들면 디스플레이장치, 모니터, 스크린 등)에 접속될 수 있다. 단말기(100)에 의해 실행되는 애플리케이션들은 게임 어플리케이션, 웹 브라우저, 웹 브라우저에서 동작하는 웹 애플리케이션, 워드 프로세서들, 미디어 플레이어들, 스프레드시트들, 이미지 프로세서들, 보안 소프트웨어 또는 그 밖의 것을 포함할 수 있다.First, the terminal 100 is a terminal of a user who wants to receive a Go game service. In addition, the terminal 100 is one or more computers or other electronic devices used by a user to execute applications that perform various tasks. For example, it includes a computer, laptop computer, smart phone, mobile phone, PDA, tablet PC, or any other device operable to communicate with the Go server 200. However, the present invention is not limited thereto, and the terminal 100 is executed on various machines, and includes processing logic that interprets and executes instructions stored in a plurality of memories, and provides a graphical user interface (GUI) on an external input/output device. It may include a variety of other elements, such as processes for displaying graphical information. In addition, the terminal 100 may be connected to an input device (eg, a mouse, a keyboard, a touch-sensitive surface, etc.) and an output device (eg, a display device, a monitor, a screen, etc.). Applications executed by the terminal 100 may include a game application, a web browser, a web application operating in a web browser, word processors, media players, spreadsheets, image processors, security software, or the like. .

또한, 단말기(100)는 명령들을 저장하는 적어도 하나의 메모리(101), 적어도 하나의 프로세서(102) 및 통신부(103)를 포함할 수 있다. In addition, the terminal 100 may include at least one memory 101 storing instructions, at least one processor 102 and a communication unit 103.

단말기(100)의 메모리(101)는 단말기(100)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 단말기(100)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(102)로 하여금 동작들을 수행하게 하기 위해 프로세서(102)에 의해 실행 가능하고, 동작들은 바둑 게임 실행 요청 신호를 전송, 게임 데이터 송수신, 착수 정보 송수신, 형세 판단 요청 신호를 전송, 형세 판단 결과 수신 및 각종 정보 수신하는 동작들을 포함할 수 있다. 또한, 메모리(101)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(130)는 인터넷(internet)상에서 상기 메모리(101)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다. The memory 101 of the terminal 100 may store a plurality of application programs or applications driven by the terminal 100, data for operation of the terminal 100, and commands. Commands can be executed by the processor 102 to cause the processor 102 to perform the operations, and the operations transmit a signal for requesting to play a game of Go, transmit and receive game data, transmit and receive start information, and transmit a state judgment request signal. Operations of receiving a determination result and receiving various types of information may be included. In addition, the memory 101 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in hardware, and the memory 130 provides a storage function of the memory 101 on the Internet. It may be a web storage that performs.

단말기(100)의 프로세서(102)는 전반적인 동작을 제어하여 바둑 게임 서비스를 제공받기 위한 데이터 처리를 수행할 수 있다. 단말기(100)에서 바둑 게임 어플리케이션이 실행되면, 단말기(100)에서 바둑 게임 환경이 구성된다. 그리고 바둑 게임 어플리케이션은 네트워크(500)를 통해 바둑 서버(200)와 바둑 게임 데이터를 교환하여 단말기(100) 상에서 바둑 게임 서비스가 실행되도록 한다. 이러한 프로세서(102)는 ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 임의의 형태의 프로세서일 수 있다.The processor 102 of the terminal 100 may perform data processing to receive a Go game service by controlling the overall operation. When the Go game application is executed in the terminal 100, the Go game environment is configured in the terminal 100. In addition, the Go game application exchanges Go game data with the Go server 200 through the network 500 so that the Go game service is executed on the terminal 100. These processors 102 include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, and microcontrollers. Controllers (micro-controllers), microprocessors (microprocessors), may be any type of processor for performing other functions.

단말기(100)의 통신부(103)는, 하기 통신방식(예를 들어, GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced) 등), WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access)에 따라 구축된 네트워크망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신할 수 있다. The communication unit 103 of the terminal 100 includes the following communication methods (eg, Global System for Mobile communication (GSM)), Code Division Multi Access (CDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink (HSUPA). Packet Access), LTE (Long Term Evolution), LTE-A (Long Term Evolution-Advanced), etc.), WLAN (Wireless LAN), Wi-Fi (Wireless-Fidelity), Wi-Fi (Wireless Fidelity) Direct, DLNA ( Digital Living Network Alliance), WiBro (Wireless Broadband), WiMAX (World Interoperability for Microwave Access), a wireless signal can be transmitted and received with at least one of a base station, an external terminal, and a server on a network.

<바둑서버(200)><Go Server (200)>

바둑서버(200)가 제공하는 바둑 게임 서비스는 바둑서버(200)가 제공하는 가상의 컴퓨터 유저와 실제 유저가 함께 게임에 참여하는 형태로 구성될 수 있다. 이는 유저측 단말기(100) 상에서 구현되는 바둑 게임 환경에서 하나의 실제 유저와 하나의 컴퓨터 유저가 함께 게임을 플레이 한다. 다른 측면에서, 바둑서버(200)가 제공하는 바둑 게임 서비스는 복수의 유저측 디바이스가 참여하여 바둑 게임이 플레이되는 형태로 구성될 수도 있다.The Go game service provided by the Go server 200 may be configured in a form in which a virtual computer user and an actual user provided by the Go server 200 participate in the game together. This means that one real user and one computer user play the game together in a Go game environment implemented on the user's terminal 100. In another aspect, the Go game service provided by the Go server 200 may be configured in a form in which a plurality of user-side devices participate to play the Go game.

바둑서버(200)는 명령들을 저장하는 적어도 하나의 메모리(201), 적어도 하나의 프로세서(202) 및 통신부(203)를 포함할 수 있다. The Go server 200 may include at least one memory 201 for storing instructions, at least one processor 202 and a communication unit 203.

바둑서버(200)의 메모리(201)는 바둑서버(200)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 바둑서버(200)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(202)로 하여금 동작들을 수행하게 하기 위해 프로세서(202)에 의해 실행 가능하고, 동작들은 게임 실행 요청 신호 수신, 게임 데이터 송수신, 착수 정보 송수신, 형세 판단 요청 신호 송수신, 형세 판단 결과 송수신 및 각종 전송 동작을 포함할 수 있다. 또한, 메모리(201)는 바둑서버(200)에서 대국을 하였던 복수의 기보 또는 기존에 공개된 복수의 기보를 저장할 수 있다. 복수의 기보 각각은 대국 시작의 첫 착수 정보인 제1 착수부터 대국이 종료되는 최종 착수까지의 정보를 모두 포함할 수 있다. 즉, 복수의 기보는 착수에 관한 히스토리 정보를 포함할 수 있다. 바둑서버(200)는 형세 판단 모델 서버(400)의 트레이닝을 위하여 저장된 복수의 기보를 형세 판단 모델 서버(400)에 제공할 수 있게 한다. 또한, 메모리(201)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(201)는 인터넷(internet)상에서 상기 메모리(201)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.The memory 201 of the Go server 200 may store a plurality of application programs or applications running in the Go server 200, data for the operation of the Go server 200, and commands. . Instructions can be executed by the processor 202 to cause the processor 202 to perform operations, and the operations are game execution request signal reception, game data transmission/reception, launch information transmission/reception, situation determination request signal transmission/reception, situation determination result transmission/reception And various transmission operations. In addition, the memory 201 may store a plurality of notations played by the Go server 200 or a plurality of previously published notations. Each of the plurality of notations may include all information from the first start, which is information on the first start of the game, to the final start, at which the game ends. That is, the plurality of notations may include history information about the initiation. The Go server 200 may provide a plurality of notations stored for training of the situation determination model server 400 to the situation determination model server 400. In addition, the memory 201 may be various storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in terms of hardware, and the memory 201 may perform a storage function of the memory 201 on the Internet. It may be a web storage that performs.

바둑서버(200)의 프로세서(202)는 전반적인 동작을 제어하여 바둑 게임 서비스를 제공하기 위한 데이터 처리를 수행할 수 있다. 이러한 프로세서(202)는 ASICs (application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 임의의 형태의 프로세서일 수 있다.The processor 202 of the Go server 200 may perform data processing to provide a Go game service by controlling the overall operation. These processors 202 include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, and microcontrollers. Controllers (micro-controllers), microprocessors (microprocessors), may be any type of processor for performing other functions.

바둑서버(200)는 통신부(203)를 통해 네트워크(500)를 경유하여 단말기(100), 착수 모델 서버(300) 및 형세 판단 모델 서버(400)와 통신을 수행할 수 있다. The Go server 200 may communicate with the terminal 100, the starting model server 300, and the situation determination model server 400 via the network 500 through the communication unit 203.

<착수 모델 서버(300)><Launch Model Server (300)>

착수 모델 서버(300)는, 별도의 클라우드 서버나 컴퓨팅 장치를 포함할 수 있다. 또한, 착수 모델 서버(300)는 단말기(100)의 프로세서 또는 바둑서버(200)의 데이터 처리부에 설치된 신경망 시스템일 수 있지만, 이하에서 착수 모델 서버(300)는, 단말기(100) 또는 바둑 서버(200)와 별도의 장치로 설명한다.The launch model server 300 may include a separate cloud server or computing device. In addition, the launch model server 300 may be a neural network system installed in the processor of the terminal 100 or the data processing unit of the Go server 200, but hereinafter, the launch model server 300 is the terminal 100 or the Go server ( 200) and a separate device.

착수 모델 서버(300)는 명령들을 저장하는 적어도 하나의 메모리(301), 적어도 하나의 프로세서(302) 및 통신부(303)를 포함할 수 있다. The launch model server 300 may include at least one memory 301 for storing instructions, at least one processor 302 and a communication unit 303.

착수 모델 서버(300)는 바둑 규칙에 따라 스스로 학습하여 딥러닝 모델인 착수 모델을 구축하고 단말기(100)의 유저와 대국을 할 수 있는 인공지능 컴퓨터로써 자신의 턴에서 대국에서 이길 수 있도록 바둑돌의 착수를 수행할 수 있다. 착수 모델 서버(300)가 착수 모델로 트레이닝하는 자세한 설명은 도 2 내지 도 5의 착수 모델에 관한 설명을 따른다.The launching model server 300 is an artificial intelligence computer capable of building a deep learning model, which is a deep learning model by self-learning according to the Go rule, and playing a game with the user of the terminal 100. Initiation can be carried out. A detailed description of training in the initiation model server 300 with the initiation model follows the description of the initiation model in FIGS. 2 to 5.

착수 모델 서버(300)의 메모리(301)는 착수 모델 서버(300)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 착수 모델 서버(300)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(302)로 하여금 동작들을 수행하게 하기 위해 프로세서(302)에 의해 실행 가능하고, 동작들은 착수 모델 학습(트레이닝) 동작, 착수 정보 송수신 및 각종 전송 동작을 포함할 수 있다. 또한, 메모리(301)는 딥러닝 모델인 착수 모델을 저장 할 수 있다. 또한, 메모리(301)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(301)는 인터넷(internet)상에서 상기 메모리(301)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.The memory 301 of the initiation model server 300 includes a number of application programs or applications running in the initiation model server 300, data for the operation of the initiation model server 300, and instructions. Can be saved. The instructions are executable by the processor 302 to cause the processor 302 to perform the operations, and the operations may include an initiation model learning (training) operation, an initiation information transmission/reception, and various transmission operations. In addition, the memory 301 may store an initiation model that is a deep learning model. In addition, the memory 301 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in hardware, and the memory 301 can perform a storage function of the memory 301 on the Internet. It may be a web storage that performs.

착수 모델 서버(300)의 프로세서(302)는 메모리(302)에 저장된 착수 모델을 독출하여, 구축된 신경망 시스템에 따라서 하기 기술하는 착수 모델 학습 및 바둑알 착수를 수행하게 된다. 실시예에 따라서 프로세서(302)는, 전체 유닛들을 제어하는 메인 프로세서와, 착수 모델에 따라 신경망 구동시 필요한 대용량의 연산을 처리하는 복수의 그래픽 프로세서(Graphics Processing Unit, GPU)를 포함하도록 구성될 수 있다. The processor 302 of the initiation model server 300 reads out the initiation model stored in the memory 302, and performs training of the initiation model and initiation of the game according to the constructed neural network system. Depending on the embodiment, the processor 302 may be configured to include a main processor that controls all units, and a plurality of graphics processing units (GPUs) that process a large amount of computation required when driving a neural network according to an initiation model. have.

착수 모델 서버(300)는 통신부(303)를 통해 네트워크(500)를 경유하여 바둑 서버(200)와 통신을 수행할 수 있다. The launch model server 300 may communicate with the Go server 200 through the network 500 through the communication unit 303.

<형세 판단 모델 서버(400)><Present judgment model server 400>

형세 판단 모델 서버(400)는, 별도의 클라우드 서버나 컴퓨팅 장치를 포함할 수 있다. 또한, 형세 판단 모델 서버(400)는 단말기(100)의 프로세서 또는 바둑서버(200)의 데이터 처리부에 설치된 신경망 시스템일 수 있지만, 이하에서 형세 판단 모델 서버(400)는, 단말기(100) 또는 바둑 서버(200)와 별도의 장치로 설명한다.The situation determination model server 400 may include a separate cloud server or computing device. In addition, the situation determination model server 400 may be a neural network system installed in the processor of the terminal 100 or the data processing unit of the Go server 200, but hereinafter, the situation determination model server 400, the terminal 100 or the Go It will be described as a separate device from the server 200.

형세 판단 모델 서버(400)는 명령들을 저장하는 적어도 하나의 메모리(401), 적어도 하나의 프로세서(402) 및 통신부(403)를 포함할 수 있다. The situation determination model server 400 may include at least one memory 401 storing instructions, at least one processor 402 and a communication unit 403.

형세 판단 모델 서버(400)는 통신부(403)를 통하여 바둑서버(200)로부터 트레이닝 데이터 셋을 수신할 수 있다. 트레이닝 데이터 셋은 복수의 기보와 해당 복수의 기보에 대한 형세 판단 정보일 수 있다. 형세 판단 모델 서버(400)는 수신한 트레이닝 데이터 셋을 이용하여 바둑알이 놓인 바둑판의 상태에 대한 형세를 판단할 수 있도록 지도학습하여 딥러닝 모델인 형세 판단 모델을 구축하고 단말기(100) 유저의 형세 판단 요청에 따라 형세 판단을 수행할 수 있다. 형세 판단 모델 서버(400)가 형세 판단 모델로 트레이닝하는 자세한 설명은 도 6 내지 도 18의 형세 판단 모델에 관한 설명을 따른다.The situation determination model server 400 may receive a training data set from the Go server 200 through the communication unit 403. The training data set may be a plurality of notations and information on status determination for the plurality of notations. The situation determination model server 400 supervises and learns to determine the situation of the state of the board on which the go board is placed using the received training data set, and establishes a situation determination model, which is a deep learning model, and the situation of the terminal 100 user. The situation can be judged according to the request for judgment. A detailed description of the situation determination model server 400 training with the situation determination model follows the description of the situation determination model of FIGS. 6 to 18.

형세 판단 모델 서버(400)의 메모리(401)는 형세 판단 모델 서버(400)에서 구동되는 다수의 응용 프로그램(application program) 또는 애플리케이션(application), 형세 판단 모델 서버(400)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 명령들은 프로세서(402)로 하여금 동작들을 수행하게 하기 위해 프로세서(402)에 의해 실행 가능하고, 동작들은 형세 판단 모델 학습(트레이닝) 동작, 형세 판단 수행, 형세 판단 결과 송신, 복수의 기보 정보 수신 및 각종 전송 동작을 포함할 수 있다. 또한, 메모리(401)는 딥러닝 모델인 형세 판단 모델을 저장 할 수 있다. 또한, 메모리(401)는 하드웨어적으로, ROM, RAM, EPROM, 플래시 드라이브, 하드 드라이브 등과 같은 다양한 저장기기 일 수 있고, 메모리(401)는 인터넷(internet)상에서 상기 메모리(301)의 저장 기능을 수행하는 웹 스토리지(web storage)일 수도 있다.The memory 401 of the situation determination model server 400 includes a plurality of application programs or applications running in the situation determination model server 400, and data for the operation of the situation determination model server 400 , You can store commands. The instructions can be executed by the processor 402 to cause the processor 402 to perform the operations, and the operations include a situation determination model learning (training) operation, a situation determination operation, a situation determination result transmission, a plurality of notation information reception, and It may include various transmission operations. In addition, the memory 401 may store a situation determination model that is a deep learning model. In addition, the memory 401 may be a variety of storage devices such as ROM, RAM, EPROM, flash drive, hard drive, etc. in hardware, and the memory 401 provides a storage function of the memory 301 on the Internet. It may be a web storage that performs.

형세 판단 모델 서버(400)의 프로세서(402)는 메모리(402)에 저장된 형세 판단 모델을 독출하여, 구축된 신경망 시스템에 따라서 하기 기술하는 형세 판단 모델 학습 및 대국 중 바둑판의 형세 판단을 수행하게 된다. 실시예에 따라서 프로세서(402)는, 전체 유닛들을 제어하는 메인 프로세서와, 형세 판단 모델에 따라 신경망 구동시 필요한 대용량의 연산을 처리하는 복수의 그래픽 프로세서(Graphics Processing Unit, GPU)를 포함하도록 구성될 수 있다. The processor 402 of the situation determination model server 400 reads out the situation determination model stored in the memory 402, and according to the constructed neural network system, it learns the situation determination model described below and determines the situation of the board among the matches. . Depending on the embodiment, the processor 402 may be configured to include a main processor that controls all units, and a plurality of graphics processing units (GPUs) that process a large amount of computation required when driving a neural network according to a situation determination model. I can.

형세 판단 모델 서버(400)는 통신부(403)를 통해 네트워크(500)를 경유하여 바둑 서버(200)와 통신을 수행할 수 있다. The situation determination model server 400 may communicate with the Go server 200 via the network 500 through the communication unit 403.

<착수 모델><Launch model>

도 2는 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스에서 인공지능 컴퓨터의 착수를 위한 착수 모델 서버의 착수 모델 구조를 설명하기 위한 도면이고, 도 3은 착수 모델의 정책에 따른 착수점에 대한 이동 확률 분포를 설명하기 위한 도면이고, 도 4는 착수 모델의 착수점에 대한 가치값과 방문 횟수를 설명하기 위한 도면이고, 도 5는 착수 모델이 탐색부의 파이프 라인에 따라 착수하는 과정을 설명하기 위한 도면이다.FIG. 2 is a diagram for explaining the structure of a starting model of a starting model server for starting an artificial intelligence computer in a deep learning-based Go game service according to an embodiment of the present invention, and FIG. 3 is a starting point according to a policy of the starting model. It is a diagram for explaining the probability distribution of movement for, FIG. 4 is a diagram for explaining the value of the starting point and the number of visits of the starting model, and FIG. 5 illustrates a process in which the starting model starts according to the pipeline of the search unit. It is a drawing for explanation.

도 2를 참조하면, 본 발명의 실시예에 따른 착수 모델은 착수 모델 서버(300)의 딥러닝 모델로써 탐색부(310), 셀프 플레이부(320) 및 착수 신경망(330)을 포함할 수 있다. Referring to FIG. 2, the initiation model according to the embodiment of the present invention is a deep learning model of the initiation model server 300 and may include a search unit 310, a self-play unit 320, and an initiation neural network 330. .

착수 모델은 탐색부(310), 셀프 플레이부(320) 및 착수 신경망(330)을 이용하여 대국에서 이길 수 있도록 착수 하는 모델로 학습할 수 있다. 보다 구체적으로, 탐색부(310)는 착수 신경망(330)의 가이드에 따라 몬테 카를로 트리 서치(Monte Carlo Tree Search; MCTS) 동작을 수행 할 수 있다. MCTS는 모종의　의사 결정을 위한 체험적　탐색 알고리즘이다. 즉, 탐색부(310)는 착수 신경망(330)이 제공하는 이동 확률값(p) 및/또는 가치값(v)에 기초하여 MCTS를 수행할 수 있다. 일 예로, 착수 신경망(330)에 의해 가이드된 탐색부(310)는 MCTS를 수행하여 착수점들에 대한 확률분포값인 탐색 확률값(

)을 출력 할 수 있다. 셀프 플레이부(320)는 탐색 확률값(

)에 따라 스스로 바둑 대국을 할 수 있다. 셀프 플레이부(320)는 게임의 승패가 결정되는 시점까지 스스로 바둑 대국을 진행하고, 자가 대국이 종료되면 바둑판 상태(S), 탐색 확률값(

), 자가 플레이 가치값(z)을 착수 신경망(330)에 제공할 수 있다. 바둑판 상태(S)는 착수점들에 바둑돌이 놓여진 상태이다. 자가 플레이 가치값(z)은 바둑판 상태(S)에서 자가 대국을 하였을 때 승률 값이다. 착수 신경망(330)은 이동 확률값(p)과 가치값(v)을 출력할 수 있다. 이동 확률값(p)은 바둑판 상태(S)에 따라 착수점들에 대해 어느 착수점에 착수하는 것이 게임을 이길 수 있는 좋은 수인지 수치로 나타낸 확률분포값이다. 가치값(v)은 해당 착수점에 착수시 승률을 나타낸다. 예를 들어, 이동 확률값(p)이 높은 착수점이 좋은 수일 수 있다. 착수 신경망(330)은 이동 확률값(p)이 탐색 확률값(

)과 동일해지도록 트레이닝되고, 가치값(v)이 자가 플레이 가치값(z)과 동일해지도록 트레이닝될 수 있다. 이후 트레이닝된 착수 신경망(330)은 탐색부(310)를 가이드하고, 탐색부(310)는 이전 탐색 확률값(

)보다 더 좋은 수를 찾도록 MCTS를 진행하여 새로운 탐색 확률값(

)을 출력하게 한다. 셀프 플레이부(320)는 새로운 탐색 확률값(

)에 기초하여 바둑판 상태(S)에 따른 새로운 자가 플레이 가치값(z)을 출력하고 바둑판 상태(S), 새로운 탐색 확률값(

), 새로운 자가 플레이 가치값(z)을 착수 신경망(330)에 제공할 수 있다. 착수 신경망(330)은 이동 확률값(p)과 가치값(v)이 새로운 탐색 확률값(

)과 새로운 자가 플레이 가치값(z)으로 출력되도록 다시 트레이닝될 수 있다. 즉, 착수 모델은 이러한 과정을 반복하여 착수 신경망(330)이 대국에서 이기기 위한 더 좋은 착수점을 찾도록 트레이닝 될 수 있다. 일 예로, 착수 모델은 착수 손실(l)을 이용할 수 있다. 착수 손실(l)은 수학식 1과 같다.The launching model may be trained as a model that launches to win the game by using the search unit 310, the self-play unit 320, and the launching neural network 330. More specifically, the search unit 310 may perform a Monte Carlo Tree Search (MCTS) operation according to the guide of the initiating neural network 330. MCTS is an experiential search algorithm for making some kind of decision. That is, the search unit 310 may perform the MCTS based on the movement probability value (p) and/or the value value (v) provided by the initiating neural network 330. As an example, the search unit 310 guided by the initiation neural network 330 performs an MCTS to obtain a search probability value (the probability distribution value for the initiation points).

) Can be printed. The self-play unit 320 includes a search probability value (

), you can play the Go game yourself. The self-play unit 320 proceeds with the game of Go by itself until the time when the game is determined to win or lose, and when the self-playing ends, the board state (S), the search probability value (

), the self-play value z may be provided to the initiating neural network 330. The checkerboard state (S) is a state in which go stones are placed at the starting points. The self-play value (z) is the win rate value when the player plays the game in the checkerboard state (S). The initiating neural network 330 may output a moving probability value (p) and a value value (v). The moving probability value (p) is a numerical probability distribution value indicating which starting point is a good number to win the game with respect to the starting points according to the checkerboard state (S). The value (v) represents the odds of starting at the starting point. For example, a starting point with a high moving probability value p may be a good number. Initiating neural network 330, the moving probability value (p) is the search probability value (

), and the value v may be trained to be equal to the self-play value z. After the training initiation neural network 330 guides the search unit 310, the search unit 310 is the previous search probability value (

MCTS is performed to find a number better than) and a new search probability value (

) To be printed. The self-play unit 320 is a new search probability value (

), a new self-play value (z) according to the checkerboard state (S) is output, and the checkerboard state (S), a new search probability value (

), a new self-play value z may be provided to the initiating neural network 330. The initiating neural network 330 has a moving probability value (p) and a value value (v) being a new search probability value (

) And the new self can be retrained to be output as the play value z. That is, the initiating model may be trained to repeat this process so that the initiating neural network 330 finds a better starting point to win in the game. As an example, the initiation model may use the initiation loss (l). The starting loss (l) is shown in Equation 1.

(수학식 1)(Equation 1)

는 신경망의 파라미터이고, c는 매우 작은 상수이다.

Is a parameter of the neural network, and c is a very small constant.

수학식 1의 착수 손실(l)에서 z와 v가 같아 지도록 하는 것은 평균 제곱 손실(mean square loss) 텀에 해당되고,

와 p가 같아 지도록 하는 것은 크로스 엔트로피 손실(cross entropy loss) 텀에 해당되고,

에 c를 곱하는 것은 정규화 텀으로 오버핏을 방지하기 위한 것이다.Making z and v equal in the starting loss (l) of Equation 1 corresponds to the mean square loss term,

Making and p become equal corresponds to the term of cross entropy loss,

Multiplying by c is to prevent overfit with the normalization term.

예를 들어, 도 3을 참조하면 트레이닝된 착수 모델은 착수점들에 이동 확률값(p)을 도 3과 같이 확률분포값으로 나타낼 수 있다. 도 4를 참조하면 트레이닝 된 착수 모델의 가치값(v)은 도 4의 하나의 착수점에서 아래에 표시된 값으로 나타낼 수 있다. 착수 신경망(330)은 신경망 구조로 구성될 수 있다. 일 예로, 착수 신경망(330)은 한 개의 컨볼루션(convolution) 블록과 19개의 레지듀얼(residual) 블록으로 구성될 수 있다. 컨볼루션 블록은 3X3 컨볼루션 레이어가 여러개 중첩된 형태일 있다. 하나의 레지듀얼 블록은 3X3 컨볼루션 레이어가 여러개 중첩되고 스킵 커넥션을 포함한 형태일 수 있다. 스킵 커넥션은 소정의 레이어의 입력이 해당 레이어의 출력값과 합하여서 출력되어 다른 레이어에 입력되는 구조이다. 또한, 착수 신경망(330)의 입력은 흑 플레이어의 최근 8 수에 대한 돌의 위치 정보과 백 플레이어의 최근 8 수에 대한 돌의 위치 정보와 현재 플레이어가 흑인지 백인지에 대한 차례 정보를 포함한 19*19*17의 RGB 이미지가 입력될 수 있다.For example, referring to FIG. 3, the trained embarkation model may represent a moving probability value p at embarkation points as a probability distribution value as shown in FIG. 3. Referring to FIG. 4, the value v of the trained initiation model may be expressed as a value indicated below at one starting point of FIG. 4. The initiating neural network 330 may be configured in a neural network structure. As an example, the initiation neural network 330 may include one convolutional block and 19 residual blocks. The convolution block may be in the form of overlapping several 3X3 convolution layers. One residual block may be in a form in which several 3X3 convolution layers are overlapped and skip connections are included. The skip connection is a structure in which an input of a predetermined layer is combined with an output value of a corresponding layer to be output and input to another layer. In addition, the input of the initiation neural network 330 is 19* including position information of stones for the last 8 numbers of black players, position information of stones for the last 8 numbers of white players, and turn information about whether the current player is black or white. An RGB image of 19*17 can be input.

도 5를 참조하면, 학습된 착수 모델은 자신의 차례에서 착수 신경망(330)과 탐색부(310)를 이용하여 착수 할 수 있다. 착수 모델은 선택 과정(a)을 통하여 현재 제1 바둑판 상태(S1)에서 MCTS를 통해 탐색하지 않은 가지인 제2 바둑판 상태(S1-2)에서 활동 함수(Q)와 신뢰값(U)이 높은 착수점을 선택한다. 활동 함수(Q)는 해당 가지를 지날 때마다 산출된 가치값(v)들의 평균값이다. 신뢰값(U)은 해당 가지를 지나는 방문 횟수(N)에 비례한다. 착수 모델은 확장과 평가 과정(b)을 통하여 선택된 착수점에서의 제3 바둑판 상태(S1-2-1)로 확장하고 이동 확률값(p)을 산출 할 수 있다. 착수 모델은 백업 과정(c)을 통하여 확장된 제3 바둑판 상태(S1-2-1)의 가치값을 산출하고 지나온 가지들의 활동 함수(Q), 방문 횟수(N), 이동 확률값(p)을 저장할 수 있다. 착수 모델은 선택(a), 확장 및 평가(b), 백업(c) 과정을 반복하고 각 착수점에 대한 방문 횟수(N)를 이용하여 확률 분포를 만들어서 탐색 확률값(

)을 출력할 수 있다. 착수 모델은 착수점들 중 가장 높은 탐색 확률값(

)을 선택하여 착수 할 수 있다. Referring to FIG. 5, the trained initiation model can be initiated using the initiation neural network 330 and the search unit 310 in their turn. The initiation model has a high activity function (Q) and confidence value (U) in the second checkerboard state (S1-2), which is a branch not searched through the MCTS in the current first checkerboard state (S1) through the selection process (a). Choose the starting point. The activity function (Q) is an average value of the value values (v) calculated each time the branch passes. The confidence value (U) is proportional to the number of visits (N) passing through the branch. The starting model can be expanded to the third checkerboard state (S1-2-1) at the selected starting point through the expansion and evaluation process (b), and a moving probability value (p) can be calculated. The initiation model calculates the value of the extended third checkerboard state (S1-2-1) through the backup process (c), and calculates the activity function (Q), the number of visits (N), and the probability of movement (p) of the branches. Can be saved. The initiation model repeats the process of selection (a), expansion and evaluation (b), and backup (c), and creates a probability distribution using the number of visits (N) for each starting point,

) Can be printed. The initiation model is the highest search probability among the initiation points (

You can start by selecting ).

<형세 판단 모델><Present judgment model>

도 6은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스의 형세 판단 기능을 제공하는 화면을 보여 주는 예시도이고, 도 7은 본 발명의 형세 판단 모델 서버의 형세 판단 모델 구조를 설명하기 위한 도면이고, 도 8은 본 발명의 형세 판단 모델의 복수의 블록으로 이루어진 신경망 구조 중 하나의 블록을 설명하기 위한 도면이다.6 is an exemplary view showing a screen providing a position determination function of a deep learning-based Go game service according to an embodiment of the present invention, and FIG. 7 is a diagram illustrating a configuration of a position determination model of the position determination model server of the present invention. FIG. 8 is a diagram for explaining one block of a neural network structure composed of a plurality of blocks of a situation determination model of the present invention.

도 6을 참조하면, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스는 현재 바둑판 상태의 형세 판단을 할 수 있다. 일 예로, 도 6과 같이 유저가 단말기(100)의 화면에서 바둑 대국 중 형세 판단 메뉴(A)를 클릭하여 형세 판단을 요청하면 딥러닝 기반의 바둑 게임 서비스가 팝업 창에 형세 판단 결과를 제공할 수 있다. 형세 판단은 바둑 대국 중에 상대방과 나의 집을 계산하여 누가 몇점으로 이기고 있는지 판단하는 것이다. 예를 들어, 유저는 형세가 나에게 유리하다는 판단이 서면 더 이상 무리하지 말고 현재의 유리한 상황을 그대로 유지한 채 대국을 종료하는 방향으로 전략을 세울 것이고, 만약 불리하다는 판단이면 게임 국면을 새롭게 전환할 수 있도록 여러가지 전략을 모색할 수 있다. 형세 판단의 기준은 바둑돌이 바둑판에 배치된 상태에 따른 집, 사석, 돌, 공배, 빅이 된다. 돌은 바둑판에 놓여진 돌이고 한국 규칙에서는 점수가 아니다. 집은 한 가지 색의 바둑돌로 둘러쌓인 빈 점으로 구성된 영역으로 한국 규칙에서는 점수이다. 공배와 빅은 바둑이 끝났을 때 흑집도 백집도 아닌 영역으로 한국 규칙에서는 점수가 아니다. 판위사석은 바둑판 위에 놓여진 돌 중에서 어떻게 두어도 잡힐 수밖에 없어 죽게 된 돌로 한국 규칙에서는 상대방의 집을 메우는데 사용하므로 점수이다. 빅은 바둑이 끝났을 때, 흑집도 백집도 아닌 영역을 말한다.　 따라서, 형세 판단은 바둑돌이 놓인 바둑판 상태에서 집, 사석, 돌, 공배, 빅을 정확히 구분 또는 예측해야 정확한 판단이 될 수 있다. 이 때, 집, 사석, 돌, 공배, 빅을 정확히 구분하는 것은 집, 사석, 돌, 공배, 빅이 완전히 이루어진 상태를 구분하는 것이고, 집, 사석, 돌, 공배, 빅을 정확히 예측하는 것은 집, 사석, 돌, 공배, 빅이 될 가능성이 높은 상태를 예측하는 것일 수 있다. Referring to FIG. 6, a deep learning-based Go game service according to an embodiment of the present invention can determine the status of a current Go board state. For example, as shown in FIG. 6, when a user requests a situation determination by clicking the situation determination menu (A) during a game of Go on the screen of the terminal 100, a deep learning-based Go game service provides the situation determination result in a pop-up window. I can. The situation judgment is to determine who is winning with how many points by calculating the opponent and my house during the game of Go. For example, if the user decides that the situation is favorable to me, he will not overdo it anymore and will develop a strategy in the direction of ending the game while maintaining the current favorable situation. You can explore different strategies to help you do it. The criteria for judging the situation are houses, sandstones, stones, gongbae, and big according to the state in which the go stones are placed on the board. A stone is a stone placed on a checkerboard and is not a score in Korean rules. A house is an area made up of empty dots surrounded by single colored Go stones, and is a score in Korean rules. Gongbae and Big are the areas that are neither black nor white when Baduk is over, and are not points in Korean rules. Panwisaseok is a stone that dies because it is bound to be caught no matter how it is placed on the board. It is a score because it is used to fill the opponent's house under Korean rules. Big refers to an area that is neither a black house nor a white house when Baduk is over. Therefore, in order to determine the situation, accurate judgment can be made only when a house, a stone, a stone, a gongbae, and a big are accurately classified or predicted in the state of the board on which the go stones are placed. At this time, the exact classification of house, sandstone, stone, gongbae, and big is to distinguish the state where the house, sandstone, stone, gongbae, and vic are completely completed. It may be to predict a state that is likely to become a stone, stone, gongbae, or big.

도 7을 참조하면, 본 발명의 실시예에 따른 형세 판단 모델은 형세 판단 모델 서버(400)의 딥러닝 모델로써 형세 판단 신경망(410), 입력 특징 추출부(420) 및 정답 레이블 생성부(430)를 포함할 수 있다. Referring to FIG. 7, a situation determination model according to an embodiment of the present invention is a deep learning model of a situation determination model server 400, and a situation determination neural network 410, an input feature extraction unit 420, and a correct answer label generation unit 430 ) Can be included.

형세 판단 모델은 형세 판단 신경망(410)을 이용하여 현재 바둑판 상태의 형세를 판단할 수 있도록 지도 학습(supervised learning)할 수 있다. 보다 구체적으로, 형세 판단 모델 바둑판 상태(S)에 관한 트레이닝 데이터 셋을 생성하고 생성된 트레이닝 데이터 셋을 이용하여 형세 판단 신경망(410)이 현재 바둑판 상태(S)에 따른 형세를 판단할 수 있도록 학습시킬 수 있다. 형세 판단 모델 서버(400)는 바둑서버(200)로부터 복수의 기보를 수신할 수 있다. 복수의 기보의 각 기보는 착수 순서에 따른 각각의 바둑판 상태(S)를 포함할 수 있다. The situation determination model may perform supervised learning to determine the current state of the board by using the state determination neural network 410. More specifically, the training data set for the status determination model checkerboard state (S) is generated, and the training data set is used so that the situation determination neural network 410 can determine the situation according to the current checkerboard status (S). I can make it. The situation determination model server 400 may receive a plurality of notations from the Go server 200. Each notation of the plurality of notations may include a respective checkerboard state (S) according to the starting order.

입력 특징 추출부(420)는 복수의 기보의 바둑판 상태(S)에서 입력 특징(IF)을 추출하여 형세 판단 신경망(410)에 트레이닝을 위한 입력 데이터로 제공할 수 있다. 바둑판 상태(S)의 입력 특징(IF)은 흑 플레이어의 최근 8 수에 대한 돌의 위치 정보과 백 플레이어의 최근 8 수에 대한 돌의 위치 정보와 현재 플레이어가 흑인지 백인지에 대한 차례 정보를 포함한 19*19*18의 RGB 이미지일 수 있다. 일 예로, 입력 특징 추출부(420)는 신경망 구조로 되어 있을 수 있으며 일종의 인코더를 포함할 수 있다.The input feature extraction unit 420 may extract the input feature IF from the checkerboard state S of a plurality of notations and provide the state determination neural network 410 as input data for training. The input feature (IF) of the checkerboard status (S) includes the stone location information for the last 8 number of black players, the stone location information for the last 8 number of white players, and the turn information for whether the current player is black or white. It may be a 19*19*18 RGB image. For example, the input feature extraction unit 420 may have a neural network structure and may include a kind of encoder.

정답 레이블 생성부(430)는 현재 바둑판 상태(S)로 전처리 과정을 거쳐 정답 레이블(ground truth)을 생성하고 정답 레이블을 형세 판단 신경망(410)에 트레이닝을 위한 타겟 데이터(

)로 제공할 수 있다. 정답 레이블 생성부(430)의 정답 레이블 생성은 후술하는 도 9 내지 도 11의 설명을 따른다. 일 예로, 정답 레이블 생성부(430)는 신경망 구조의 롤아웃 또는 인코더를 포함할 수 있다.The correct answer label generation unit 430 generates a ground truth through a pre-processing process in the current checkerboard state (S), and sends the correct answer label to the situation determination neural network 410 for training target data (

) Can be provided. The generation of the correct answer label by the correct answer label generator 430 follows the description of FIGS. 9 to 11 to be described later. As an example, the correct answer label generator 430 may include a rollout or an encoder of a neural network structure.

형세 판단 모델은 입력 특징(IF)을 입력 데이터로 하고 정답 레이블을 타겟 데이터(

)로 한 트레이닝 데이터 셋을 이용하여 형세 판단 신경망(410)에서 생성된 출력 데이터(o)가 타겟 데이터(

)와 동일해지도록 형세 판단 신경망(420)을 충분히 학습할 수 있다. 일 예로, 형세 판단 모델은 형세 판단 손실(

)을 이용할 수 있다. 형세 판단 손실(

)은 평균 제곱 에러(mean square error)를 이용할 수 있다. 예를 들어, 형세 판단 손실(

)은 수학식 2와 같다.The situation judgment model uses the input feature (IF) as input data and the correct answer label as the target data (

Using the training data set as ), the output data (o) generated by the neural network 410 is the target data (

The situation determination neural network 420 may be sufficiently learned to become equal to ). As an example, the situation judgment model is the loss of situation judgment (

) Can be used. Loss of judgment (

) Can be used as the mean square error. For example, loss of position judgment (

) Is the same as in Equation 2.

(수학식 2)(Equation 2)

B는 바둑판의 전체 교차점 수이다. 바둑판은 가로 19줄 및 세로 19줄이 서로 교차하여 361개의 교차점이 배치된다. 이에 제한되는 것은 아니고 바둑판이 가로 9줄 및 세로 9줄일 경우 81개의 교차점이 배치될 수 있다.

는 현재 바둑판 상태(S)에서 정답 레이블에 따른 소정의 교차점(i)에 대한 형세값이다. 형세값에 대한 설명은 후술하는 도 11의 설명에 따른다.

는 현재 바둑판 상태(S)에서 소정의 교차점(i)을 형세 판단 신경망(410)에 입력하였을 때에 출력되는 출력 데이터이다. 형세 판단 모델은 형세 판단 손실(

)이 최소화되도록 경사 하강법(gradient-descent)과 역전파(backpropagation)을 이용하여 형세 판단 신경망(410) 내의 가중치와 바이어스 값들을 조절하여 형세 판단 신경망(410)를 학습시킬 수 있다.B is the total number of intersections on the board. In the checkerboard, 19 horizontal and 19 vertical lines cross each other, and 361 intersections are arranged. This is not limited thereto, and 81 intersections may be arranged when the checkerboard has 9 horizontal lines and 9 vertical lines.

Is the position value for a predetermined intersection (i) according to the correct answer label in the current checkerboard state (S). The description of the position value follows the description of FIG. 11 to be described later.

Is output data that is output when a predetermined intersection (i) is input to the situation determination neural network 410 in the current checkerboard state (S). The situation judgment model is the loss of situation judgment (

) To minimize the situation, the situation determination neural network 410 may be trained by adjusting weights and bias values in the situation determination neural network 410 using gradient-descent and backpropagation.

형세 판단 신경망(410)은 신경망 구조로 구성될 수 있다. 일 예로, 형세 판단 신경망(420)은 19개의 레지듀얼(residual) 블록으로 구성될 수 있다. 도 8을 참조하면, 하나의 레지듀얼 블록은 256개의 3X3 컨볼루션 레이어, 일괄 정규화(batch normalization) 레이어, Relu 활성화 함수 레이어, 256개의 3X3 컨볼루션 레이어, 일괄 정규화(batch normalization) 레이어, 스킵 커넥션, Relu 활성화 함수 레이어 순으로 배치될 수 있다. 일괄 정규화(batch normalization) 레이어는 학습하는 도중에 이전 레이어의 파라미터 변화로 인해 현재 레이어의 입력의 분포가 바뀌는 현상인 공변량 변화(covariate shift)를 방지하기 위한 것이다. 스킵 커넥션은 블록 층이 두꺼워지더라도 신경망의 성능이 감소하는 것을 방지하고 블록 층을 더욱 두껍게 하여 전체 신경망 성능을 높일 수 있게 한다. 스킵 커넥션은 레지듀얼 블록의 최초 입력 데이터가 두 번째 일괄 정규화(batch normalization) 레이어의 출력과 합하여 두번째 Relu 활성화 함수 레이어에 입력되는 형태일 수 있다.The situation determination neural network 410 may be configured in a neural network structure. For example, the situation determination neural network 420 may be composed of 19 residual blocks. Referring to FIG. 8, one residual block includes 256 3X3 convolution layers, batch normalization layers, Relu activation function layers, 256 3X3 convolution layers, batch normalization layers, skip connection, and Relu activation function layers may be arranged in order. The batch normalization layer is to prevent covariate shift, which is a phenomenon in which the distribution of the input of the current layer changes due to a parameter change of the previous layer during training. The skip connection prevents a decrease in the performance of the neural network even if the block layer becomes thicker, and increases the overall neural network performance by making the block layer thicker. The skip connection may be a form in which the first input data of the residual block is added to the output of the second batch normalization layer and input to the second relu activation function layer.

도 9 및 도 10은 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제1 및 제2 전처리 단계를 설명하기 위한 도면이고, 도 11은 본 발명의 형세 판단 모델을 학습하기 위해 사용되는 정답 레이블을 생성하기 위한 제3 전처리 단계를 설명하기 위한 도면이다.9 and 10 are diagrams for explaining the first and second preprocessing steps for generating the correct answer label used to learn the position determination model of the present invention, and FIG. 11 is a diagram for learning the position determination model of the present invention. A diagram for explaining a third pre-processing step for generating a correct answer label used for this purpose.

정답 레이블 생성부(430)는 형세 판단 신경망(410)이 정확한 형세 판단을 할 수 있도록 학습하는데 이용되는 정답 레이블을 생성할 수 있다.The correct answer label generation unit 430 may generate a correct answer label used for learning so that the situation determination neural network 410 can accurately determine the situation.

보다 구체적으로, 정답 레이블 생성부(430)는 입력 데이터에 기초가 되는 바둑판 상태(S)를 입력으로 받고, 현재 바둑판 상태(S)에서 끝내기를 하는 제1 전처리를 수행하여 제1 전처리 상태(P1)를 생성할 수 있다. 제1 전처리인 끝내기는 집 계산을 하기 전에 집의 경계가 명확해지도록 소정의 착수를 하여 게임을 마무리하는 과정이다. 일 예로, 도 9를 참조하면 정답 레이블 생성부(430)는 도 9의 (a)의 현재 바둑판 상태(S)에서 끝내기를 하여 도 9의 (b)의 제1 전처리 상태(P1)를 생성할 수 있다. More specifically, the correct answer label generation unit 430 receives a checkerboard state (S) based on the input data as an input, and performs a first preprocessing to end in the current checkerboard state (S) to perform a first preprocessing state (P1). ) Can be created. The first pre-processing, finishing, is a process of completing the game by performing a predetermined start so that the boundaries of the house become clear before calculating the house. As an example, referring to FIG. 9, the correct answer label generation unit 430 may generate the first pre-processing state P1 of FIG. 9 (b) by ending in the current checkerboard state (S) of FIG. 9 (a). I can.

정답 레이블 생성부(430)는 제1 전처리 상태(P1)에서 집 경계 내에 배치되며 집 구분에 불필요한 돌을 제거하는 제2 전처리를 수행하여 제2 전처리 상태(P2)를 생성할 수 있다. 예를 들어, 집 경계 내에 배치되며 집 구분에 불필요한 돌은 사석일 수 있다. 사석은 집안에 상대방 돌이 배치되어 어떻게 두어도 잡힐수 밖에 없어 죽게 된 돌임을 앞서 설명하였다. 또한, 집 경계 내에 배치되며 집 구분에 불필요한 돌은 집안에 배치된 자신의 돌일 수 있다. 일 예로, 도 9를 참조하면 정답 레이블 생성부(430)는 도 9의 (b)의 제1 전처리 상태(P1)에서 집 구분에 불필요한 돌을 제거하여 도 9의 (c)의 제2 전처리 상태(P2)를 생성할 수 있다.The correct answer label generator 430 may generate a second pre-processing state P2 by performing a second pre-processing of removing stones that are disposed within the boundary of the house in the first pre-processing state P1 and unnecessary to classify the house. For example, stones placed within the boundary of the house and unnecessary to divide the house may be sandstone. Saseok explained earlier that the other stone was placed in the house, so no matter how it was placed, it could only be caught and died. In addition, stones placed within the boundary of the house and unnecessary to divide the house may be own stones placed in the house. For example, referring to FIG. 9, the correct answer label generation unit 430 removes stones unnecessary for house division in the first preprocessing state P1 of FIG. 9B, and thus the second preprocessing state of FIG. 9C. (P2) can be created.

다른 예로, 도 10을 참조하면, 정답 레이블 생성부(430)는 도 10의 (a)의 현재 바둑판 상태(S)에서 제1 전처리인 끝내기를 위하여 도 10의 (b)와 같이 빨간색 x에 착수할 수 있다. 정답 레이블 생성부(430)는 도 10의 (b)에서 파란색 x로 표시된 사석을 제거하기 위하여 녹색 x에 착수하여 사석을 제거하고 사석 제거를 위해 사용된 녹색 x에 착수한 돌도 제거하여 제2 전처리를 수행할 수 있다.As another example, referring to FIG. 10, the correct answer label generation unit 430 starts the red x as shown in FIG. 10 (b) to end the first pre-processing in the current checkerboard state (S) of FIG. 10 (a). can do. The correct answer label generation unit 430 removes the rubble by embarking on the green x in order to remove the rubble indicated by the blue x in FIG. 10(b), and removes the green x used for removing the rubble. Pretreatment can be performed.

정답 레이블 생성부(430)는 제2 전처리 상태(P2)에서 각 교차점을 -1 부터 +1까지 표시된 형세값(g, 단 g는 정수)으로 변경하는 제3 전처리를 수행할 수 있다. 즉, 제3 전처리는 정답 레이블 생성부(430)가 이미지 특징인 제2 전처리 상태(P2)를 수치 특징인 제3 전처리 상태(P3)로 변경하는 것이다. 일 예로, 제2 전처리 상태(P2)에서 교차점에 내 돌이 배치되면 0, 내 집 영역이면 +1, 상대 돌이 배치되면 0, 상대 집 영역이면 -1로 대응할 수 있다. 이 경우, 형세 판단 신경망(410)은 형세 판단시 집, 돌, 사석을 구분할 수 있도록 학습될 수 있다. 다른 예로, 제2 전처리 상태(P2)에서 교차점에 내 돌이 배치되면 0, 내 집 영역이면 +1, 상대 돌이 배치되면 0, 상대 집 영역이면 -1, 빅 또는 공배이면 0으로 대응할 수 있다. 다른 예의 경우 형세 판단 신경망(410)은 형세 판단시 빅 또는 공배를 구분할 수 있도록 학습될 수 있다. 예를 들어, 도 11을 참조하면, 정답 레이블 생성부(430)는 도 11의 (a)의 제2 전처리 상태(P2)를 도 11의 (b)의 제3 전처리 상태(P3)로 특징을 변경할 수 있다. The correct answer label generation unit 430 may perform a third pre-process of changing each intersection point to a position value (g, but g is an integer) displayed from -1 to +1 in the second pre-processing state P2. That is, in the third pre-processing, the correct answer label generator 430 changes the second pre-processing state P2, which is an image feature, to a third pre-processing state P3, which is a numerical feature. For example, in the second pre-treatment state P2, if my stone is placed at the intersection, it may correspond to 0, if it is a home area, +1, if a counterpart stone is disposed, it may correspond to -1. In this case, the situation determination neural network 410 may be trained to distinguish a house, a stone, and a stone when determining the situation. As another example, in the second pretreatment state P2, if my stone is placed at the intersection, it may correspond to 0, if it is my house area, +1, if the opponent stone is disposed, it may correspond to 0, if it is a relative house area, it may correspond to -1, and if it is big or common, 0. In another example, the situation determination neural network 410 may be trained to distinguish between a big or common place when determining the situation. For example, referring to FIG. 11, the correct answer label generation unit 430 characterizes the second pre-processing state P2 of FIG. 11 (a) as the third pre-processing state P3 of FIG. 11 (b). You can change it.

제3 전처리 상태(P3)는 바둑판 상태(S)에서의 형세 판단의 정답 레이블이 되고 형세 판단 신경망(410)의 학습 시 타겟 데이터(

)로 이용될 수 있다. The third preprocessing state (P3) becomes the correct answer label of the situation determination in the checkered state (S), and target data (

) Can be used.

도 12는 본 발명의 형세 판단 모델의 형세 판단 결과를 설명하기 위한 도면이다.12 is a view for explaining a situation determination result of the situation determination model of the present invention.

학습된 형세 판단 모델은 바둑판 상태가 입력되면 바둑판의 모든 교차점에 대한 형세값을 제공할 수 있다. 즉, 바둑판 교차점의 361개 지점에 대해 형세값인 -1 내지 +1의 정수 값을 제공할 수 있다. The learned position determination model can provide position values for all intersection points of the board when the board state is input. That is, an integer value of -1 to +1, which is a layout value, can be provided for 361 points of the crossing of the checkerboard.

도 12를 참조하면, 형세 판단 모델 서버(400)는 형세 판단 모델이 제공한 형세값, 소정의 임계값, 돌의 유무를 이용하여 형세를 판단할 수 있다. 일 예로, 형세 판단 모델 서버(400)는 돌이 없는 곳이며, 형세 값이 제1 임계값을 넘으면 내 집이 될 가능성이 높은 곳으로 판단하고, +1에 가까운 값이면 내 집 영역으로 판단할 수 있다. 형세 판단 모델 서버(400)는 내 집일 가능성이 높을수록 점점 커지는 내 돌과 같은 색의 네모 형태로 표시할 수 있다. 형세 판단 모델 서버(400)는 돌이 없는 곳이며, 형세 값이 제2 임계값 이하이면 상대 집이 될 가능성이 높은 곳으로 판단하고, -1에 가까운 값이면 내집 영역으로 판단할 수 있다. 형세 판단 모델 서버(400)는 상대 집일 가능성이 높을수록 점점 커지는 상대 돌과 같은 색의 네모 형태로 표시할 수 있다. 형세 판단 모델 서버(400)는 돌이 없는 곳이며, 형세 값이 제3 임계값 범위 이내 또는 0에 가까운 값이면 공배 또는 빅으로 판단할 수 있다. 형세 판단 모델 서버(400)는 공배 또는 빅으로 판단하면 X로 표시할 수 있다. 형세 판단 모델 서버(400)는 돌이 있는 곳이며, 형세 값이 제3 임계값 범위 이내 또는 0에 가까운 값이면 내 돌 또는 상대 돌로 판단할 수 있다. 형세 판단 모델 서버(400)는 공배 또는 빅으로 판단하면 아무런 표시를 안할 수 있다. 형세 판단 모델 서버(400)는 돌이 있는 곳이며, 형세 값이 제1 임계값을 넘으면 상대 돌의 사석이 될 가능성이 높은 곳으로 판단하고, +1에 가까운 값이면 상대 돌의 사석으로 판단할 수 있다. 형세 판단 모델 서버(400)는 상대 돌의 사석일 가능성이 높을수록 점점 커지는 내 돌과 같은 색의 네모 형태로 표시할 수 있다. 형세 판단 모델 서버(400)는 돌이 있는 곳이며, 형세 값이 제2 임계값 이하이면 내 돌의 사석이 될 가능성이 높은 곳으로 판단하고, -1에 가까운 값이면 상대 돌의 사석으로 판단할 수 있다. 형세 판단 모델 서버(400)는 상대 돌의 사석일 가능성이 높을수록 점점 커지는 상대 돌과 같은 색의 네모 형태로 표시할 수 있다. Referring to FIG. 12, the situation determination model server 400 may determine the situation using a situation value provided by the situation determination model, a predetermined threshold value, and the presence or absence of a stone. As an example, the situation determination model server 400 is a place where there is no stone, and if the position value exceeds the first threshold, it is determined as a place with a high possibility of becoming my home, and if a value close to +1, it may be determined as my home area. have. The situation determination model server 400 may display a square shape of the same color as my stone, which gradually increases as the likelihood of the home is higher. The situation determination model server 400 may determine a location where there is no stone, and when the location value is less than or equal to the second threshold value, it may be determined as a location that is likely to be a relative house, and when a value close to -1, it may be determined as a home area. The situation determination model server 400 may display a square shape of the same color as the opponent stone, which gradually increases as the likelihood of the opponent's house increases. The position determination model server 400 is a place where there is no stone, and if the position value is within the range of the third threshold or close to 0, it may be determined as a commonplace or big. The situation determination model server 400 may display an X when it is determined as common or big. The situation determination model server 400 is a place where a stone is located, and if the position value is within the third threshold value range or a value close to 0, it may be determined as an inner stone or a relative stone. The situation determination model server 400 may not display anything if it determines that it is common or big. The situation determination model server 400 is a place where a stone is located, and if the situation value exceeds the first threshold value, it is determined as a place that is likely to be a stone stone of the other stone, and if a value close to +1, it can be determined as the stone stone of the other stone. have. The situation determination model server 400 may display a square shape of the same color as my stone, which gradually increases as the possibility of the stone stone of the opponent is increased. The situation determination model server 400 is a place where a stone is located, and if the position value is less than the second threshold, it is determined as a place that is likely to be the stone stone of my stone, and if the value is close to -1, it can be determined as the stone stone of the other stone. have. The situation determination model server 400 may display in a square shape of the same color as the opponent stone, which gradually increases as the probability of the stone stone of the opponent is increased.

또한, 형세 판단 모델 서버(300)는 각 교차점에서 판단한 형세 판단 기준을 이용하여 현재 바둑판 상태에서의 계가 결과를 표시할 수 있다. In addition, the situation determination model server 300 may display a counting result in the current checkerboard state by using the situation determination criteria determined at each intersection.

따라서, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 그 장치는 딥러닝 신경망을 이용하여 바둑 형세를 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 정확히 구분하여 바둑의 형세를 정확히 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 예측하여 바둑의 형세를 정확히 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 장치는 바둑 대국 중 신속하게 형세를 판단할 수 있다.Accordingly, the deep learning-based Go game service device according to the embodiment may determine the status of Go by using a deep learning neural network. In addition, the deep learning-based Go game service device according to the embodiment can accurately determine the status of Go by accurately classifying houses, sandstones, stones, gongbae, and big according to the Go rule. In addition, the deep learning-based Go game service apparatus according to the embodiment may accurately determine the status of Go by predicting houses, sandstones, stones, gongbae, and big according to the Go rule. In addition, the deep learning-based Go game service apparatus according to the embodiment may quickly determine the status of the Go game.

도 13은 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이고, 도 14는 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이고, 도 15는 본 발명의 형세 판단 모델의 형세 판단 결과와 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 결과를 비교한 모습이다.13 is a view comparing the situation determination result of the situation determination model of the present invention and the situation determination result by a deep learning model according to the prior art, and FIG. 14 is a view of the situation determination result of the situation determination model of the present invention and the prior art. It is a view comparing the situation determination result by the deep learning model, and FIG. 15 is a view comparing the situation determination result of the situation determination model of the present invention and the situation determination result by the deep learning model according to the prior art.

도 13을 참조하면, 본 발명의 형세 판단 모델은 도 13의 (a)의 B영역과 같이 교차점 마다 집, 돌, 사석을 구분하여 형세를 판단한다. 그러나 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 모델은 도 13의 (b)에서 도 13의 (a)와 대응 되는 영역의 교차점에 대하여 집, 돌, 사석을 구분하지 못한다.Referring to FIG. 13, the situation determination model of the present invention determines the situation by classifying houses, stones, and rubbles at each intersection, as in area B of FIG. 13A. However, the situation determination model based on the deep learning model according to the prior art cannot distinguish between a house, a stone, and a rubble with respect to the intersection of an area corresponding to that of FIG. 13(a) in FIG. 13(b).

마찬가지로 도 14를 참조하면, 본 발명의 형세 판단 모델은 도 14의 (a)의 C영역과 같이 교차점 마다 집, 돌, 사석을 구분하여 형세를 판단한다. 그러나 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 모델은 도 14의 (b)에서 도 13의 (a)와 대응 되는 영역의 교차점에 대하여 집, 돌, 사석을 구분하지 못한다.Similarly, referring to FIG. 14, the situation determination model of the present invention determines the situation by classifying houses, stones, and rubbles at each intersection, as in area C of FIG. 14A. However, the situation determination model based on the deep learning model according to the prior art cannot distinguish between a house, a stone, and a rubble with respect to the intersection of the area corresponding to that of FIG. 14(b) to 13(a).

도 15을 참조하면, 본 발명의 형세 판단 모델은 도 15의 (a)의 D영역과 같이 백집을 제대로 인식한다. 그러나 종래 기술에 따른 딥러닝 모델에 의한 형세 판단 모델은 도 15의 (b)에서 도 15의 (a)와 대응 되는 영역에서 백집을 구분하지 못한다.Referring to FIG. 15, the situation determination model of the present invention properly recognizes a bag house as shown in area D of FIG. 15A. However, the situation determination model based on the deep learning model according to the prior art cannot distinguish a bag house in an area corresponding to that of FIG. 15(b) to FIG. 15(a).

도 16은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 시스템에 신호 흐름에 대한 예시도이다.16 is an exemplary diagram of a signal flow in a deep learning-based Go game service system according to an embodiment of the present invention.

도 16을 참조하면, 착수 모델 서버(300)는 인공지능 컴퓨터로써 자신의 턴에서 대국에서 이길 수 있도록 바둑돌의 착수를 수행할 수 있도록 바둑 규칙에 따라 스스로 학습하여 딥러닝 모델인 착수 모델을 트레이닝 할 수 있다(s11). 바둑서버(22)는 복수의 기보를 형세 판단 모델 서버(400)에게 송신할 수 있다. 형세 판단 모델 서버(400)는 트레이닝 데이터 셋을 생성할 수 있다. 먼저, 형세 판단 모델 서버(400)는 복수의 기보의 바둑판 상태에서 입력 특징을 추출할 수 있다(S13). 형세 판단 모델 서버(400)는 입력 특징을 추출한 바둑판 상태를 이용하여 정답 레이블을 생성할 수 있다(S14). 형세 판단 모델 서버(400)은 입력 특징을 입력 데이터로 하고 정답 레이블을 타겟 데이터로 한 트레이닝 데이터 셋을 이용하여 형세 판단 모델을 트레이닝 할 수 있다(S15). 단말기(100)는 바둑서버(200)에 인공지능 컴퓨터를 상대로 또는 다른 유저 단말기를 상대로 바둑 게임을 요청할 수 있다(S16). 바둑서버(200)는 단말기(100)가 인공지능 컴퓨터를 상대로 바둑 게임을 요청하면 착수 모델 서버(300)에 착수를 요청할 수 있다(S17). 바둑서버(200)는 바둑 게임을 진행하며 단말기(100)와 착수 모델 서버(300)가 자신의 턴에 착수를 수행할 수 있다(S18 내지 S20). 대국 중 단말기(100)는 바둑서버(200)에 형세 판단을 요청할 수 있다(S21). 바둑서버(200)는 형세 판단 모델 서버(400)에게 현재 바둑판 상태에 대한 형세 판단을 요청할 수 있다(S22). 형세 판단 모델 서버(400)는 현재 바둑판 상태의 입력 특징을 추출하고, 딥러닝 모델인 형세 판단 모델이 입력 특징을 이용하여 형세값을 생성하고, 바둑판 상태와 형세값을 이용하여 형세 판단을 수행할 수 있다(S23). 형세 판단 모델 서버(400)는 형세 판단 결과를 바둑서버(200)에 제공할 수 있다(S24). 바둑서버(200)는 단말기(100)에 형세 판단 결과를 제공할 수 있다(S25).Referring to FIG. 16, the launching model server 300 is an artificial intelligence computer that trains the launching model, which is a deep learning model, by self-learning according to the Go rule so that the launching of Go can be performed in order to win the game in its own turn. Can be (s11). The Go server 22 may transmit a plurality of notations to the situation determination model server 400. The situation determination model server 400 may generate a training data set. First, the situation determination model server 400 may extract input features in a checkerboard state of a plurality of notations (S13). The situation determination model server 400 may generate a correct answer label by using the checkerboard state from which the input features are extracted (S14). The situation determination model server 400 may train the situation determination model using a training data set in which the input feature is used as input data and the correct answer label is used as target data (S15). The terminal 100 may request a game of Go from the Go server 200 against an artificial intelligence computer or against another user terminal (S16). When the terminal 100 requests a game of Go against the artificial intelligence computer, the Go server 200 may request the start model server 300 to initiate the game (S17). The Go server 200 progresses the Go game, and the terminal 100 and the start model server 300 may start on their own turn (S18 to S20). During the game, the terminal 100 may request the Go server 200 to determine the situation (S21). The Go server 200 may request the status determination model server 400 to determine the status of the current board (S22). The situation determination model server 400 extracts the input features of the current checkerboard state, the situation determination model, which is a deep learning model, generates a situational value using the input features, and performs a situational determination using the status of the checkerboard and the situational value. Can be (S23). The situation determination model server 400 may provide a result of the situation determination to the Go server 200 (S24). The Go server 200 may provide a result of determining the situation to the terminal 100 (S25).

도 17은 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법 중 형세 판단 방법이고, 도 18은 도 17의 형세 판단 방법 중 정답 레이블을 생성하기 위한 트레이닝 데이터의 전처리 방법이다.FIG. 17 is a method for determining a situation among the deep learning based Go game service methods according to an embodiment of the present invention, and FIG. 18 is a method for preprocessing training data for generating a correct answer label among the methods for determining the situation of FIG. 17.

도 17을 참조하면, 본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델 서버가 바둑서버로부터 복수의 기보를 수신하는 단계(S100)를 포함할 수 있다. Referring to FIG. 17, a deep learning-based Go game service method according to an embodiment of the present invention may include a step (S100) of receiving, by a situation determination model server, a plurality of notations from the Go server.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델 서버의 형세 판단 모델 중 입력 특징 추출부가 복수의 기보의 바둑판 상태에서 입력 특징을 추출하는 단계(S200)를 포함할 수 있다. 입력 특징을 추출하는 방법은 도 7의 설명을 따른다.The deep learning-based Go game service method according to an embodiment of the present invention may include a step S200 of extracting an input feature from a situation determination model of a situation determination model server, by an input feature extraction unit in a checkerboard state of a plurality of notations. . A method of extracting an input feature follows the description of FIG. 7.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델 중 정답 레이블 생성부가 입력 특징을 추출한 바둑판 상태에 기초하여 정답 레이블을 생성하는 단계(S300)를 포함할 수 있다. 일 예로, 도 18을 참조하면, 정답 레이블 생성 단계(S300)는 정답 레이블 생성부가 현재 바둑판 상태에서 끝내기 하는 제1 전처리하는 단계(S301)를 포함할 수 있다. 제1 전처리하는 단계(S301)는 도 9 내지 도 10의 설명을 따른다. 정답 레이블 생성 단계(S300)는 정답 레이블 생성부가 제1 전처리된 바둑판 상태에서 불필요한 돌을 제거하는 제2 전처리하는 단계(S302)를 포함할 수 있다. 제2 전처리하는 단계(S302)는 도 9 내지 도 10의 설명을 따른다. 정답 레이블 생성 단계(S300)는 정답 레이블 생성부가 제2 전처리된 바둑판 상태의 각 교차점을 형세값으로 변경하는 제3 전처리하는 단계(S303)를 포함할 수 있다. 제3 전처리하는 단계(S303)는 도 11의 설명을 따른다. 정답 레이블 생성 단계(S300)는 제3 전처리 상태를 정답 레이블로 하여 형세 판단 신경망에 타겟 데이터로 제공하는 단계(S303)를 포함할 수 있다. 타겟 데이터를 제공하는 단계(S301)는 도 7 및 도 11의 설명을 따른다.The deep learning-based Go game service method according to an embodiment of the present invention may include generating a correct answer label based on a checkerboard state from which the input feature is extracted by the correct answer label generation unit from the situation determination model (S300). As an example, referring to FIG. 18, the step of generating a correct answer label (S300) may include a first preprocessing step (S301) in which the correct answer label generating unit ends in a current checkerboard state. The first pre-processing step (S301) follows the description of FIGS. 9 to 10. The correct answer label generation step (S300) may include a second pre-processing step (S302) of removing unnecessary stones in the first pre-processed checkerboard state by the correct answer label generation unit. The second pre-processing step (S302) follows the description of FIGS. 9 to 10. The correct answer label generation step (S300) may include a third pre-processing step (S303) in which the correct answer label generator changes each intersection point of the second pre-processed checkerboard state into a layout value. The third pre-processing step (S303) follows the description of FIG. 11. The step of generating the correct answer label (S300) may include a step (S303) of using the third preprocessed state as the correct answer label and providing the target data to the situation determination neural network. The step of providing target data (S301) follows the description of FIGS. 7 and 11.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 트레이닝 데이터 셋을 이용하여 형세 판단 모델의 형세 판단 신경망을 트레이닝하는 단계(S400)을 포함할 수 있다. 형세 판단 신경망을 트레이닝(학습)하는 방법은 도 7의 설명을 따른다.The deep learning-based Go game service method according to an embodiment of the present invention may include training a situation determination neural network of a situation determination model using a training data set (S400). A method of training (learning) the situation determination neural network follows the description of FIG. 7.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 신경망의 트레이닝이 완료되어 형세 판단 모델을 구축하는 단계(S500)를 포함한다. 일 예로, 형세 판단 신경망의 트레이닝의 완료는 도 7의 형세 판단 손실이 소정의 값 이하가 된 경우일 수 있다.The deep learning-based Go game service method according to an embodiment of the present invention includes a step (S500) of constructing a situation determination model after training of a situation determination neural network is completed. For example, the completion of the training of the situation determination neural network may be a case in which the loss of the situation determination in FIG. 7 becomes less than or equal to a predetermined value.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 단말기의 형세 판단 요청에 의해 현재 바둑판 상태가 형세 판단 모델에 입력되는 단계(S600)를 포함할 수 있다. The deep learning-based Go game service method according to an embodiment of the present invention may include a step (S600) of inputting a current board state into a situation determination model in response to a situation determination request from a terminal.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델이 입력된 현재 바둑판 상태의 형세 판단을 수행하는 단계(S700)를 포함할 수 있다. 형세 판단을 수행하는 단계(S700)는 도 12에서 설명한 형세 판단 모델이 현재 바둑판 상태의 형세값을 생성하는 설명을 따를 수 있다.The deep learning-based Go game service method according to an embodiment of the present invention may include a step (S700) of performing a situation determination of a current board state in which a situation determination model is input. The step of performing the position determination (S700) may follow the description in which the position determination model described in FIG. 12 generates a position value of the current board state.

본 발명의 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 형세 판단 모델 서버가 형세 판단 결과를 출력하는 단계(S800)를 포함할 수 있다. 형세 판단 결과를 출력하는 단계(S800)는 도 12에서 설명한 형세 판단 모델 서버가 형세값, 바둑판의 상태, 소정의 임계값을 이용하여 형세 판단 결과를 제공하는 설명을 따를 수 있다. The deep learning-based Go game service method according to an embodiment of the present invention may include a step S800 of outputting a situation determination result by the situation determination model server. The step of outputting the position determination result (S800) may follow the description in which the position determination model server described in FIG. 12 provides the position determination result using the position value, the state of the board, and a predetermined threshold value.

따라서, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 딥러닝 신경망을 이용하여 바둑 형세를 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 정확히 구분하여 바둑의 형세를 정확히 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 바둑 규칙에 따른 집, 사석, 돌, 공배, 빅을 예측하여 바둑의 형세를 정확히 판단할 수 있다. 또한, 실시예에 따른 딥러닝 기반의 바둑 게임 서비스 방법은 바둑 대국 중 신속하게 형세를 판단할 수 있다.Accordingly, the deep learning-based Go game service method according to the embodiment may determine the status of Go by using a deep learning neural network. In addition, the deep learning-based Go game service method according to the embodiment can accurately determine the status of Go by accurately classifying houses, sandstones, stones, gongbae, and big according to the Go rule. In addition, the deep learning-based Go game service method according to the embodiment may accurately determine the situation of Go by predicting houses, sandstones, stones, gongbae, and big according to the Go rule. In addition, the deep learning-based Go game service method according to the embodiment can quickly determine the status of the Go game.

이상 설명된 본 발명에 따른 실시예는 다양한 컴퓨터 구성요소를 통하여 실행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위하여 하나 이상의 소프트웨어 모듈로 변경될 수 있으며, 그 역도 마찬가지이다.The embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded in the computer-readable recording medium may be specially designed and constructed for the present invention or may be known and usable to those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magnetic-optical media such as floptical disks. medium), and a hardware device specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of the program instructions include not only machine language codes such as those produced by a compiler but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device can be changed to one or more software modules to perform the processing according to the present invention, and vice versa.

본 발명에서 설명하는 특정 실행들은 일 실시 예들로서, 어떠한 방법으로도 본 발명의 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. 또한, “필수적인”, “중요하게” 등과 같이 구체적인 언급이 없다면 본 발명의 적용을 위하여 반드시 필요한 구성 요소가 아닐 수 있다.The specific implementations described in the present invention are examples and do not limit the scope of the present invention in any way. For brevity of the specification, descriptions of conventional electronic configurations, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connection or connection members of the lines between the components shown in the drawings exemplarily represent functional connections and/or physical or circuit connections, and in an actual device, various functional connections that can be replaced or added Connections, or circuit connections. In addition, if there is no specific mention such as “essential” or “importantly”, it may not be an essential component for the application of the present invention.

또한 설명한 본 발명의 상세한 설명에서는 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자 또는 해당 기술분야에 통상의 지식을 갖는 자라면 후술할 특허청구범위에 기재된 본 발명의 사상 및 기술 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 명세서의 상세한 설명에 기재된 내용으로 한정되는 것이 아니라 특허청구범위에 의해 정하여져야만 할 것이다.In addition, in the detailed description of the present invention, it has been described with reference to preferred embodiments of the present invention, but those skilled in the art or those of ordinary skill in the relevant technical field will have And it will be understood that various modifications and changes can be made to the present invention within a range not departing from the technical field. Accordingly, the technical scope of the present invention should not be limited to the contents described in the detailed description of the specification, but should be determined by the claims.

100 단말기
200 바둑서버
300 착수 모델 서버
310 탐색부
320 셀프 플레이부
330 착수 신경망
400 형세 판단 모델 서버
410 형세 판단 신경망
420 입력 특징 추출부
430 정답 레이블 생성부100 terminals
200 Go Server
300 launch model server
310 Search section
320 Self Play Department
330 initiating neural network
400 situation judgment model server
410 Neural Network
420 input feature extraction unit
430 correct answer label generator

Claims

A communication unit for receiving a plurality of notations;
A storage unit for storing a situation determination model; And
Including; a processor that reads the situation determination model to learn the situation determination model, and determines a situation in a checkerboard state by using the learned situation determination model,
The situation judgment model,
An input feature extraction unit for extracting an input feature in the input checkerboard state;
And a situation determination neural network that generates a position determination neural network for generating a position value for an intersection of the input checkerboard state based on the extracted input feature.

The method of claim 1,
The situation judgment model,
Further comprising a correct answer label generator for generating a correct answer label used in the learning based on the input checkerboard state,
The correct answer label generation unit,
Performing a first pre-process of ending in the input checkerboard state to generate a first pre-processing state
And generating a second pre-processing state by performing a second pre-processing of removing stones that are disposed within the boundary of the house in the first pre-processing state and unnecessary to classify the house.

The method of claim 2,
The correct answer label generation unit,
In the second pre-treatment state, a third pre-treatment state is generated by performing a third pre-treatment of changing each intersection point to a position value,
And providing the third preprocessing state as the correct answer label to the situation determination neural network.

The method of claim 3,
The third pre-treatment is changed to 0 when the stone is placed at a predetermined intersection in the second pre-treatment state, +1 if the stone is placed, 0 if the stone is placed, and -1 if the other stone is placed. The situation judgment model server made by

The method of claim 3,
The third pre-treatment is 0 if my stone is placed at a predetermined intersection in the second pre-treatment state, +1 if my house area is placed, 0 if the opponent stone is placed, -1 if the opponent house area, and 0 if it is big or common. A situation determination model server, characterized in that it changes to a value.

The method of claim 1,
The situation determination neural network includes a plurality of residual blocks,
Each of the plurality of residual blocks includes a convolution layer, a batch normalization layer, a Relu activation function layer, and a skip connection.

The method of claim 2,
The processor is a situation determination model server, characterized in that for training the situation determination neural network using the situation determination loss.

The method of claim 1,
The input feature includes location information of stones for the last 8 numbers of black players, location information of stones for the last 8 numbers of white players, and turn information about whether the current player is black or white in the input checkerboard state. The situation judgment model server characterized by.

In a deep learning-based Go game service method for determining the status of a checkerboard state by a communication unit, a storage unit storing the status judgment model, and a status judgment model server including a processor that drives the status judgment model,
Receiving a plurality of notations by the communication unit;
Extracting, by the processor, an input feature in a checkerboard state of the plurality of notations using an input feature extractor of the situation determination model;
Generating, by the processor, a correct answer label in a checkerboard state of the plurality of notations using a correct answer label generator of the situation determination model;
Training, by the processor, the situation determination model using the input feature and the correct answer label;
Constructing, by the processor, a situation determination model by completing training; And
Including, by the processor, when a current checkerboard state that needs to be determined is input using the trained position determination model, generating a position value for an intersection of the current checkerboard state; and
The step of generating the correct answer label,
A first pre-processing step of generating a first pre-processing state by completing the plurality of notations in a checkerboard state;
A second pre-treatment step of generating a second pre-treatment state by removing stones disposed within the boundary of the house in the first pre-treatment state and unnecessary to classify the house;
And a third pre-processing step of generating a third pre-processing state in which each intersection point is changed to a position value in the second pre-processing state. A deep learning-based Go game service method comprising:

The method of claim 9,
The third pre-treatment is changed to 0 when the stone is placed at a predetermined intersection in the second pre-treatment state, +1 if the stone is placed, 0 if the stone is placed, and -1 if the other stone is placed. Deep learning based Go game service method.

The method of claim 9,
The third pre-treatment is 0 if my stone is placed at a predetermined intersection in the second pre-treatment state, +1 if my house area is placed, 0 if the opponent stone is placed, -1 if the opponent house area, and 0 if it is big or common. A deep learning-based Go game service method characterized by changing to a value.