KR102315842B1

KR102315842B1 - System for providing carom billards guidance service using reinforcement learning of deep learning model

Info

Publication number: KR102315842B1
Application number: KR1020210057446A
Authority: KR
Inventors: 김종우
Original assignee: 김종우
Priority date: 2021-05-03
Filing date: 2021-05-03
Publication date: 2021-10-21

Abstract

Provided is a billiard game guidance service using reinforcement learning of a deep learning model. The billiard game guidance service using reinforcement learning of a deep learning model comprises: a user terminal; and a guidance service providing server. The user terminal is configured to recognize coordinates of a target and at least one ball as position vectors and output simulation results obtained by simulating trajectories of the target and the at least one ball in accordance with a point which is a striking part on the surface of the target and the force that strikes the point. The guidance service providing server includes: a building unit which builds a billiard simulation environment using at least one open source for building a physics engine-based simulation environment; an initialization unit which initially sets the coordinates of the target and the at least one ball randomly in the built billiard simulation environment, and sets the coordinates of the target and the at least one ball as position vectors; a training unit which conducts training by repeating the simulation of drawing the trajectories of the target and at least one ball in accordance with a point within the target and the force that strikes the target and then providing a reward in accordance with the result using reinforcement learning with at least one deep learning model; and a transmission unit which transmits, when the user terminal recognizes or specifies the coordinates of the target and at least one ball, simulation results in accordance with the number of at least one case in accordance with the point of the target and the force that strikes the point to the user terminal. In accordance with the present invention, by means of the billiard game guidance service, even a beginner can intuitively predict a trajectory or direction of the ball.

Description

Billiard guide service provision system using reinforcement learning of deep learning model

본 발명은 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 시스템에 관한 것으로, 데이터셋 없이도 결과에 대한 보상을 통하여 경우의 수에 따른 공의 궤적을 안내해줄 수 있는 인공지능을 구축하는 플랫폼을 제공한다.The present invention relates to a system for providing a billiard guide service using reinforcement learning of a deep learning model, and provides a platform for building artificial intelligence that can guide the trajectory of a ball according to the number of cases through compensation for the result without a dataset do.

당구는 익숙해지기까지 많은 시간의 훈련과 적응을 필요로 하는 물리적인 법칙으로 이루어진 스포츠이다. 공을 어떻게 치느냐에 따라 경로는 달라진다. 공이 움직일 때 가장 크게 작용하는 힘에는 마찰력과 반발력 등이 있다. 각 힘의 크기는 공의 당점, 치는 힘, 당구대의 마찰계수, 쿠션의 마찰계수 등에 따라 달라진다. 당구는 경로 설계와 당구공을 큐로 치는 것 큐잉의 두 가지가 모두 성공적으로 수행되었을 때 득점을 할 수 있다. 초보자가 나름대로 경로를 설계해 큐잉을 해도, 경로대로 공을 보내는 것 자체가 어려운 만큼 운이 아닌 실력만으로 득점을 내기는 쉽지 않다. 보통 당구는 체계적인 훈련이 아닌 놀이로서 시작하는 경우가 많으며, 놀이로서 학습을 지속하기 위해선 심리적인 보상이 필요하다. 당구에서는 득점이 이에 해당하는데, 이러한 진입 장벽에 빗대어 보면 당구 입문자는 실력이 갖춰질 때까지의 오랜 기간을 확실한 심리적인 보상 없이 훈련을 견뎌야 한다.Billiards is a sport made up of physical laws that require a lot of training and adaptation to get used to. The path changes depending on how you hit the ball. The forces that act the most when the ball moves include friction and repulsion. The magnitude of each force varies depending on the point of the ball, the striking force, the coefficient of friction of the pool table, and the coefficient of friction of the cushion. Billiards can score points when both path design and cueing and cueing are performed successfully. Even if beginners design their own path and cue it, it is difficult to send the ball along the path itself. Billiards usually starts as a play rather than a systematic training, and psychological rewards are needed to continue learning as a play. In billiards, scoring corresponds to this, and compared to these barriers to entry, a beginner in billiards has to endure long periods of training without clear psychological rewards until they become proficient.

이때, 당구를 가이드하기 위하여 시뮬레이션 및 궤적을 미리 보여주는 방법이 연구 및 개발되었는데, 이와 관련하여 선행기술인 한국공개특허 제2019-0068949호(2019년06월19일 공개), 한국공개특허 제2008-0062456호(2008년07월03일 공개) 및 한국등록특허 제10-0970924호(2010년07월21일 공개)에는, 영상 인식 및 당구 시뮬레이션 애플리케이션을 통하여 당구공의 예상 경로나 추천 경로를 알려줄 수 있도록, 당구대를 촬영하는 카메라를 이용하여 영상 데이터를 통하여 당구공의 좌표 및 추천 경로를 분석하고, 분석 데이터를 이용하여 가상에서 시뮬레이션을 한 후 경로를 추천해주는 구성과, 당구공의 위치변환을 감지하는 카메라와 영상정보 내 객체를 인식하며, 저장된 데이터와 현재 데이터를 비교, 판단 및 연산하여 처리된 데이터를 표시하는 구성과, 당구대, 당구공 및 큐대의 3차원 좌표정보를 연속하여 검출하고, 좌표정보에 대응하는 당구공의 진로정보와 큐대의 타법정보를 저장하며, 당구공 및 큐대의 형상을 3차원 동영상 시뮬레이션 정보로 변환하여 모니터에 재현하고, 제어에 따라 출력되는 진로정보를 당구대 상면에 표시하고 타법정보를 음성으로 가이드하도록 출력하는 구성이 각각 개시되어 있다.At this time, a method of showing a simulation and a trajectory in advance to guide the billiards has been researched and developed. Issues (published on July 03, 2008) and Korean Patent Registration No. 10-0970924 (published on July 21, 2010) provide information on the expected or recommended path of a billiard ball through image recognition and billiard simulation applications. , a configuration that analyzes the coordinates and recommended path of the billiard ball through image data using a camera that shoots the billiard table, and recommends a path after performing a virtual simulation using the analysis data, and detects the position change of the billiard ball A configuration that recognizes the camera and objects in the image information, compares, judges, and calculates the stored data and the current data to display the processed data, and continuously detects the 3D coordinate information of the billiard table, billiard ball and cue table, and coordinate information It stores the course information of the billiard ball and the cue table's hitting information corresponding to the Each configuration for outputting the other method information to be guided by voice is disclosed.

다만, 상술한 구성은 궤적이나 경로에만 집중하고 있을 뿐 타겟을 어느 방향으로 어느 정도의 힘으로 쳐야하는지에 대한 가이드가 전혀 개시되어 있지 않다. 또, 플레이어와 대결이 가능한 인공지능을 만들기 위해서는 시뮬레이션을 통하여 인공지능을 학습시켜야 하는데 이때 이미 누적된 경기의 결과를 모두 라벨링하여 데이터셋(DataSet)으로 구축하는 과정이 요구되므로 데이터셋을 구축하는데에만 드는 시간 및 비용이 막대하다. 이에, 당구 가이드를 제공할 때 인공지능을 이용하여 시뮬레이션 결과를 제공하면서도 데이터셋 없이도 모델링을 수행할 수 있는 플랫폼의 연구 및 개발이 요구된다.However, the above-described configuration only concentrates on the trajectory or path, and there is no guide on how to hit the target in which direction and how much force. In addition, in order to create artificial intelligence that can compete with players, it is necessary to learn artificial intelligence through simulation. The time and cost involved are enormous. Accordingly, research and development of a platform that can perform modeling without a dataset while providing simulation results using artificial intelligence when providing a billiards guide is required.

본 발명의 일 실시예는, 딥러닝 모델을 기반으로 데이터셋이 필요없는 강화학습을 통하여 시뮬레이션을 수행하고, 강화학습을 통하여 훈련 및 테스트를 위한 데이터셋 없이도 당구가 놓여진 환경(Enironment) 내에서 현재의 상태(State)를 인식하고 행동(Action)을 통하여 보상(Reward)를 최대화하는 방식으로 학습을 진행함으로써 보상이 좋은 방향으로 행동을 수정할 수 있도록 하는 훈련을 진행하며, 사용자에게 단순히 궤적을 안내해주는 것 외에도 당점의 위치 및 당점을 미는 힘과 방향을 어떻게 조절해야 하는지를 알려줌으로써 초보자일지라도 직관적으로 공의 궤적이나 방향을 예측할 수 있고, 플레이어와 대결가능한 인공지능을 만들 수 있는, 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법을 제공할 수 있다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.An embodiment of the present invention performs simulation through reinforcement learning that does not require a dataset based on a deep learning model, and currently in an environment where billiards is placed without a dataset for training and testing through reinforcement learning. By recognizing the state of the state and learning in a way that maximizes the reward through the action, training is carried out so that the behavior can be modified in a good direction for the reward, and the user is simply guided along the trajectory. In addition, reinforcement learning of deep learning models that can predict the trajectory or direction of the ball intuitively, even for beginners, and create artificial intelligence that can confront players It is possible to provide a method of providing a billiard guide service using However, the technical task to be achieved by the present embodiment is not limited to the technical task as described above, and other technical tasks may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예는, 타겟 및 적어도 하나의 공의 좌표를 위치 벡터로 인식한 후, 타겟의 표면 상의 타격 부분인 당점 및 당점을 타격하는 힘에 따른 타겟 및 적어도 하나의 공의 궤적을 시뮬레이션한 결과를 출력하는 사용자 단말 및 물리엔진 기반 시뮬레이션 환경의 구축을 위한 적어도 하나의 오픈소스를 이용하여 당구 시뮬레이션 환경을 구축하는 구축부, 구축된 당구 시뮬레이션 환경 내에서 타겟 및 적어도 하나의 공의 좌표를 랜덤으로 초기 설정한 후, 타겟 및 적어도 하나의 공의 좌표를 위치 벡터로 설정하는 초기화부, 타겟 내 당점 및 당점을 타격하는 힘에 따른 타겟 및 적어도 하나의 공의 궤적을 그리는 시뮬레이션을 반복한 후 적어도 하나의 딥러닝 모델로 강화학습을 이용하여 결과에 따른 보상을 제공함으로써 훈련을 진행하는 훈련부, 사용자 단말에서 타겟 및 적어도 하나의 공의 좌표를 인식 또는 지정하는 경우, 타겟의 당점 및 당점을 타격하는 힘에 따른 적어도 하나의 경우의 수에 따른 시뮬레이션 결과를 사용자 단말로 전송하는 전송부를 포함하는 가이드 서비스 제공 서버를 포함한다.As a technical means for achieving the above-described technical problem, an embodiment of the present invention recognizes the coordinates of a target and at least one ball as a position vector, and then the force to hit the hitting part on the surface of the target and the point A building unit that builds a billiard simulation environment using at least one open source for constructing a simulation environment based on a physics engine and a user terminal that outputs a result of simulating the trajectory of a target and at least one ball according to After initially setting the coordinates of the target and at least one ball at random in the environment, an initialization unit that sets the coordinates of the target and at least one ball as a position vector, the target and at least After repeating the simulation of drawing the trajectory of one ball, at least one deep learning model uses reinforcement learning to provide a reward according to the result, thereby recognizing the coordinates of the target and at least one ball in the training unit and user terminal Or, when designated, includes a guide service providing server including a transmission unit for transmitting a simulation result according to the number of at least one case according to the force hitting the point of the target and the point to the user terminal.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 딥러닝 모델을 기반으로 데이터셋이 필요없는 강화학습을 통하여 시뮬레이션을 수행하고, 강화학습을 통하여 훈련 및 테스트를 위한 데이터셋 없이도 당구가 놓여진 환경(Enironment) 내에서 현재의 상태(State)를 인식하고 행동(Action)을 통하여 보상(Reward)를 최대화하는 방식으로 학습을 진행함으로써 보상이 좋은 방향으로 행동을 수정할 수 있도록 하는 훈련을 진행하며, 사용자에게 단순히 궤적을 안내해주는 것 외에도 당점의 위치 및 당점을 미는 힘과 방향을 어떻게 조절해야 하는지를 알려줌으로써 초보자일지라도 직관적으로 공의 궤적이나 방향을 예측할 수 있고, 플레이어와 대결가능한 인공지능을 만들 수 있다.According to any one of the above-described problem solving means of the present invention, simulation is performed through reinforcement learning that does not require a dataset based on a deep learning model, and an environment in which billiards is placed without a dataset for training and testing through reinforcement learning By recognizing the current state within (Enironment) and learning in a way that maximizes the reward through the action, training is carried out so that the behavior can be modified in a good direction for the reward, and the user In addition to simply guiding the trajectory to the player, it also tells the location of the point and how to adjust the force and direction that pushes the point, so that even a beginner can intuitively predict the ball's trajectory or direction, and create an artificial intelligence that can compete with the player.

도 1은 본 발명의 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 시스템을 설명하기 위한 도면이다.
도 2는 도 1의 시스템에 포함된 가이드 서비스 제공 서버를 설명하기 위한 블록 구성도이다.
도 3 및 도 4는 본 발명의 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법을 설명하기 위한 동작 흐름도이다.1 is a view for explaining a billiard guide service providing system using reinforcement learning of a deep learning model according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a guide service providing server included in the system of FIG. 1 .
3 and 4 are diagrams for explaining an embodiment in which a billiard guide service using reinforcement learning of a deep learning model is implemented according to an embodiment of the present invention.
5 is an operation flowchart illustrating a billiard guide service providing method using reinforcement learning of a deep learning model according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a component, it means that other components may be further included, rather than excluding other components, unless otherwise stated, and one or more other features However, it is to be understood that the existence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded in advance.

명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다. The terms "about", "substantially", etc. to the extent used throughout the specification are used in a sense at or close to the numerical value when the manufacturing and material tolerances inherent in the stated meaning are presented, and serve to enhance the understanding of the present invention. To help, precise or absolute figures are used to prevent unfair use by unscrupulous infringers of the stated disclosure. As used throughout the specification of the present invention, the term “step for (to)” or “step for” does not mean “step for”.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체 지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware. Meanwhile, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example, '~' refers to components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and '~ units' may be combined into a smaller number of components and '~ units' or further separated into additional components and '~ units'. In addition, components and '~ units' may be implemented to play one or more CPUs in a device or secure multimedia card.

본 명세서에 있어서 단말, 장치 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말, 장치 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말, 장치 또는 디바이스에서 수행될 수도 있다. In this specification, some of the operations or functions described as being performed by the terminal, apparatus, or device may be performed instead of in a server connected to the terminal, apparatus, or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal, apparatus, or device connected to the server.

본 명세서에서 있어서, 단말과 매핑(Mapping) 또는 매칭(Matching)으로 기술된 동작이나 기능 중 일부는, 단말의 식별 정보(Identifying Data)인 단말기의 고유번호나 개인의 식별정보를 매핑 또는 매칭한다는 의미로 해석될 수 있다.In this specification, some of the operations or functions described as mapping or matching with the terminal means mapping or matching the terminal's unique number or personal identification information, which is the identification data of the terminal. can be interpreted as

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 시스템을 설명하기 위한 도면이다. 도 1을 참조하면, 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 시스템(1)은, 적어도 하나의 사용자 단말(100), 가이드 서비스 제공 서버(300), 적어도 하나의 카메라(400), 적어도 하나의 디스플레이(500)를 포함할 수 있다. 다만, 이러한 도 1의 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 시스템(1)은, 본 발명의 일 실시예에 불과하므로, 도 1을 통하여 본 발명이 한정 해석되는 것은 아니다.1 is a view for explaining a billiard guide service providing system using reinforcement learning of a deep learning model according to an embodiment of the present invention. Referring to FIG. 1 , a billiard guide service providing system 1 using reinforcement learning of a deep learning model includes at least one user terminal 100 , a guide service providing server 300 , at least one camera 400 , at least One display 500 may be included. However, since the billiard guide service providing system 1 using reinforcement learning of the deep learning model of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1 .

이때, 도 1의 각 구성요소들은 일반적으로 네트워크(Network, 200)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 적어도 하나의 사용자 단말(100)은 네트워크(200)를 통하여 가이드 서비스 제공 서버(300)와 연결될 수 있다. 그리고, 가이드 서비스 제공 서버(300)는, 네트워크(200)를 통하여 적어도 하나의 사용자 단말(100), 적어도 하나의 카메라(400), 적어도 하나의 디스플레이(500) 및 관리자 단말(600)과 연결될 수 있다. 또한, 적어도 하나의 카메라(400)는, 네트워크(200)를 통하여 가이드 서비스 제공 서버(300)와 연결될 수 있다. 그리고, 적어도 하나의 디스플레이(500)는, 네트워크(200)를 통하여 적어도 하나의 사용자 단말(100), 가이드 서비스 제공 서버(300) 및 적어도 하나의 카메라(400)와 연결될 수 있다. At this time, each component of FIG. 1 is generally connected through a network (Network, 200). For example, as shown in FIG. 1 , at least one user terminal 100 may be connected to the guide service providing server 300 through the network 200 . In addition, the guide service providing server 300 may be connected to at least one user terminal 100 , at least one camera 400 , at least one display 500 , and the manager terminal 600 through the network 200 . have. Also, the at least one camera 400 may be connected to the guide service providing server 300 through the network 200 . In addition, the at least one display 500 may be connected to the at least one user terminal 100 , the guide service providing server 300 , and the at least one camera 400 through the network 200 .

여기서, 네트워크는, 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷(WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), 5GPP(5th Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), RF(Radio Frequency), 블루투스(Bluetooth) 네트워크, NFC(Near-Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Here, the network refers to a connection structure in which information exchange is possible between each node, such as a plurality of terminals and servers, and an example of such a network includes a local area network (LAN), a wide area network (WAN: Wide Area Network), the Internet (WWW: World Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television networks, and the like. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), 5th Generation Partnership Project (5GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi (Wi-Fi) , Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), RF (Radio Frequency), Bluetooth (Bluetooth) network, NFC ( Near-Field Communication) networks, satellite broadcast networks, analog broadcast networks, Digital Multimedia Broadcasting (DMB) networks, and the like are included, but are not limited thereto.

하기에서, 적어도 하나의 라는 용어는 단수 및 복수를 포함하는 용어로 정의되고, 적어도 하나의 라는 용어가 존재하지 않더라도 각 구성요소가 단수 또는 복수로 존재할 수 있고, 단수 또는 복수를 의미할 수 있음은 자명하다 할 것이다. 또한, 각 구성요소가 단수 또는 복수로 구비되는 것은, 실시예에 따라 변경가능하다 할 것이다.In the following, the term at least one is defined as a term including the singular and the plural, and even if at least one term does not exist, each element may exist in the singular or plural, and may mean the singular or plural. it will be self-evident In addition, that each component is provided in singular or plural may be changed according to embodiments.

적어도 하나의 사용자 단말(100)은, 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 당구대에 있는 타겟 및 공을 촬영하거나 타겟 및 공의 위치를 설정한 후, 타겟의 당점, 힘 및 방향에 따른 타겟 및 공의 위치나 경로를 포함하는 궤적을 시뮬레이션한 결과로 출력하는 단말일 수 있다. The at least one user terminal 100 uses a web page, an app page, a program or an application related to a billiard guide service using reinforcement learning of a deep learning model to shoot a target and a ball in the pool table, or to set the position of the target and the ball After doing so, it may be a terminal that outputs a simulation result of a trajectory including the location or path of the target and the ball according to the target's point, force, and direction.

여기서, 적어도 하나의 사용자 단말(100)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 사용자 단말(100)은, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 사용자 단말(100)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one user terminal 100 may be implemented as a computer that can access a remote server or terminal through a network. Here, the computer may include, for example, a navigation device, a laptop computer equipped with a web browser, a desktop computer, and a laptop computer. In this case, the at least one user terminal 100 may be implemented as a terminal capable of accessing a remote server or terminal through a network. At least one user terminal 100, for example, as a wireless communication device that guarantees portability and mobility, navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) ) terminal, a smart phone, a smart pad, a tablet PC, etc. may include all kinds of handheld-based wireless communication devices.

가이드 서비스 제공 서버(300)는, 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 제공하는 서버일 수 있다. 그리고, 가이드 서비스 제공 서버(300)는, 적어도 하나의 종류의 오픈소스를 이용하여 당구대의 공의 시뮬레이션을 재현하는 환경을 구축하는 서버일 수 있다. 또한, 가이드 서비스 제공 서버(300)는, 훈련을 위하여 딥러닝 모델 기반으로 초기에 랜덤으로 타겟 및 공을 세팅한 후, 타겟 및 공의 위치 좌표를 벡터로 표현하며, 타겟 내 당점의 좌표, 당점을 타격하는 힘 및 방향에 따른 공의 궤적을 시뮬레이션하고, 결과에 따른 보상을 줌으로써 데이터셋 없이도 강화학습을 이용하여 훈련을 진행하는 서버일 수 있다. 그리고, 가이드 서비스 제공 서버(300)는, 사용자 단말(100)에서 타겟 및 공의 위치를 지정하거나 촬영한 경우, 촬영된 화면 상에 타겟 및 공을 식별하고, 타겟의 당점, 힘 및 방향에 따른 각 공의 예상 경로를 시뮬레이션한 후 사용자 단말(100)로 전송하는 서버일 수 있다. 이때, 가이드 서비스 제공 서버(300)는, 카메라(400)가 당구대를 수직방향으로 촬영하고 있는 경우, 당구대가 평면도와 같은 모습으로 촬영될 수 있으므로, 이를 이용하여 타겟 및 공의 위치를 추출하고 경우의 수에 따른 시뮬레이션 결과를 디스플레이(500)로 출력시키는 서버일 수도 있다. 만약, AR 글래스가 사용자 단말(100) 등과 연동되는 경우, 가이드 서비스 제공 서버(300)는, 각 경우의 수에 따른 궤적이나 당점의 좌표를 AR 글래스 상에 오버레이함으로써 가이드를 하는 서버일 수도 있다. 또한, 가이드 서비스 제공 서버(300)는, 힘의 크기를 물리적으로 계산하더라도 그 힘의 크기만큼 사용자가 큐대에 힘을 주어야 하는데, 그 힘의 크기를 가늠하지 못하는 경우를 위하여 적어도 하나의 장치(미도시)를 구비하여, 당점에 가해야 할 힘의 크기와 장치에 가해지는 힘의 크기를 비교한 후, 증감 표시를 사용자에게 줌으로써 힘의 크기를 가늠할 수 있도록 하는 서버일 수도 있다.The guide service providing server 300 may be a server that provides a billiard guide service web page, an app page, a program, or an application using reinforcement learning of a deep learning model. In addition, the guide service providing server 300 may be a server that builds an environment for reproducing a simulation of a pool table ball using at least one type of open source. In addition, the guide service providing server 300 initially sets the target and the ball at random based on the deep learning model for training, and then expresses the position coordinates of the target and the ball as a vector, the coordinates of our point in the target, our point It may be a server that conducts training using reinforcement learning without a dataset by simulating the trajectory of the ball according to the force and direction that strikes it and giving a reward according to the result. In addition, the guide service providing server 300 identifies the target and the ball on the captured screen when the location of the target and the ball is designated or photographed in the user terminal 100, and according to the target's point, force and direction After simulating the predicted path of each ball may be a server that transmits to the user terminal (100). At this time, the guide service providing server 300, when the camera 400 is shooting the billiard table in the vertical direction, since the billiard table can be photographed in the same shape as a plan view, using this to extract the positions of the target and the ball It may be a server that outputs a simulation result according to the number of , to the display 500 . If the AR glasses are interlocked with the user terminal 100 or the like, the guide service providing server 300 may be a server that guides by overlaying the trajectories or coordinates of the shop according to the number of each case on the AR glasses. In addition, the guide service providing server 300, even if the size of the force is physically calculated, the user has to apply the force to the cue table by the amount of the force, but at least one device (not shown) time), comparing the magnitude of the force to be applied to the shop with the magnitude of the force applied to the device, and then providing an increase/decrease indication to the user so that the magnitude of the force can be estimated.

여기서, 가이드 서비스 제공 서버(300)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다.Here, the guide service providing server 300 may be implemented as a computer capable of accessing a remote server or terminal through a network. Here, the computer may include, for example, a navigation device, a laptop computer equipped with a web browser, a desktop computer, and a laptop computer.

적어도 하나의 카메라(400)는, 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 촬영된 영상을 가이드 서비스 제공 서버(300)로 실시간으로 전송하는 장치일 수 있다. 이때, 카메라(400)는 지능형 CCTV로 구비될 수도 있고, 이러한 경우 피사체인 타겟 및 공을 식별한 후 궤적을 트래킹할 수 있고 그 결과를 가이드 서비스 제공 서버(300)로 제공할 수도 있다.The at least one camera 400 is a device for transmitting, in real time, an image captured using a web page, an app page, a program or an application related to a billiard guide service using reinforcement learning of a deep learning model to the guide service providing server 300 can be At this time, the camera 400 may be provided as an intelligent CCTV, and in this case, after identifying the target and the ball as the subject, the trajectory may be tracked, and the result may be provided to the guide service providing server 300 .

여기서, 적어도 하나의 카메라(400)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 카메라(400)는, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 카메라(400)는, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one camera 400 may be implemented as a computer that can connect to a remote server or terminal through a network. Here, the computer may include, for example, a navigation device, a laptop computer equipped with a web browser, a desktop computer, and a laptop computer. In this case, the at least one camera 400 may be implemented as a terminal capable of accessing a remote server or terminal through a network. The at least one camera 400 is, for example, a wireless communication device that ensures portability and mobility, such as navigation, Personal Communication System (PCS), Global System for Mobile communications (GSM), Personal Digital Cellular (PDC), and PHS. (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) It may include all kinds of handheld-based wireless communication devices such as terminals, smartphones, smartpads, and tablet PCs.

적어도 하나의 디스플레이(500)는, 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 가이드 서비스 제공 서버(300)에서 전송된 궤적이 담긴 화면을 출력하는 장치일 수 있다.The at least one display 500 outputs a screen containing the trajectory transmitted from the guide service providing server 300 using a billiard guide service related web page, app page, program or application using reinforcement learning of the deep learning model. It may be a device.

여기서, 적어도 하나의 디스플레이(500)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 디스플레이(500)는, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 디스플레이(500)는, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one display 500 may be implemented as a computer that can access a remote server or terminal through a network. Here, the computer may include, for example, a navigation device, a laptop computer equipped with a web browser, a desktop computer, and a laptop computer. In this case, the at least one display 500 may be implemented as a terminal capable of accessing a remote server or terminal through a network. The at least one display 500 is, for example, as a wireless communication device that ensures portability and mobility, navigation, Personal Communication System (PCS), Global System for Mobile communications (GSM), Personal Digital Cellular (PDC), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) It may include all kinds of handheld-based wireless communication devices such as terminals, smartphones, smartpads, and tablet PCs.

도 2는 도 1의 시스템에 포함된 가이드 서비스 제공 서버를 설명하기 위한 블록 구성도이고, 도 3 및 도 4는 본 발명의 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.2 is a block diagram for explaining a guide service providing server included in the system of FIG. 1, and FIGS. 3 and 4 are billiard guide services using reinforcement learning of a deep learning model according to an embodiment of the present invention. It is a diagram for explaining an embodiment.

도 2를 참조하면, 가이드 서비스 제공 서버(300)는, 구축부(310), 초기화부(320), 훈련부(330), 전송부(340), 판단부(350)를 포함할 수 있다.Referring to FIG. 2 , the guide service providing server 300 may include a construction unit 310 , an initialization unit 320 , a training unit 330 , a transmission unit 340 , and a determination unit 350 .

본 발명의 일 실시예에 따른 가이드 서비스 제공 서버(300)나 연동되어 동작하는 다른 서버(미도시)가 적어도 하나의 사용자 단말(100), 적어도 하나의 카메라(400) 및 적어도 하나의 디스플레이(500)로 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 전송하는 경우, 적어도 하나의 사용자 단말(100), 적어도 하나의 카메라(400) 및 적어도 하나의 디스플레이(500)는, 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 설치하거나 열 수 있다. 또한, 웹 브라우저에서 실행되는 스크립트를 이용하여 서비스 프로그램이 적어도 하나의 사용자 단말(100), 적어도 하나의 카메라(400) 및 적어도 하나의 디스플레이(500)에서 구동될 수도 있다. 여기서, 웹 브라우저는 웹(WWW: World Wide Web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(Hyper Text Mark-up Language)로 서술된 하이퍼 텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 넷스케이프(Netscape), 익스플로러(Explorer), 크롬(Chrome) 등을 포함한다. 또한, 애플리케이션은 단말 상의 응용 프로그램(Application)을 의미하며, 예를 들어, 모바일 단말(스마트폰)에서 실행되는 앱(App)을 포함한다.The guide service providing server 300 according to an embodiment of the present invention or another server (not shown) operating in conjunction with at least one user terminal 100 , at least one camera 400 , and at least one display 500 . ) to transmit a billiard guide service application, program, app page, web page, etc. using reinforcement learning of a deep learning model, at least one user terminal 100, at least one camera 400, and at least one display ( 500) may install or open a billiard guide service application, program, app page, web page, etc. using reinforcement learning of a deep learning model. In addition, a service program may be driven in at least one user terminal 100 , at least one camera 400 , and at least one display 500 using a script executed in a web browser. Here, the web browser is a program that enables the use of a web (WWW: World Wide Web) service, and refers to a program that receives and displays hypertext written in HTML (Hyper Text Mark-up Language), for example, Netscape. , Explorer, Chrome, and the like. In addition, the application means an application on the terminal, for example, includes an app (App) executed in a mobile terminal (smartphone).

도 2를 참조하면, 구축부(310)는, 물리엔진 기반 시뮬레이션 환경의 구축을 위한 적어도 하나의 오픈소스를 이용하여 당구 시뮬레이션 환경을 구축할 수 있다. 물체의 충돌은 물체의 모양이나 강도에 따라 다양하게 나타날 수 있다. 충돌 후 물체의 운동에너지 변화에 따라 탄성충돌과 비탄성충돌로도 나눌수 있다. 이때, 당구공의 경우 부서지지 않는 원형 물체의 탄성충돌이므로 이를 기준으로 한다. 여기서, 세 가지 형태의 충돌 연산이 사용될 수 있다. 첫 번째는 물체간의 충돌시 두 물체가 충돌하면 충돌한 면으로 힘이 작용하기 때문에, 물체에 작용한 힘을 계산하면 물체의 충돌 이후 속도를 알아낼 수 있다. 2차원 공간에서 두 물체가 탄성충돌을 하는 경우, 충돌면에 수직하는 새로운 기준축을 설정하고 기준축에 대해 1차원 탄성충돌을 계산한 뒤 새로운 속도를 구할 수 있고 이는 수학식 1과 같다. 수학식 1은 1차원 탄성충돌시의 속도 계산공식이다.Referring to FIG. 2 , the building unit 310 may construct a billiards simulation environment using at least one open source for constructing a physics engine-based simulation environment. The collision of an object may appear in various ways depending on the shape or strength of the object. According to the change in kinetic energy of an object after collision, it can be divided into elastic collision and inelastic collision. At this time, in the case of a billiard ball, since it is an elastic collision of an unbreakable circular object, this is taken as a reference. Here, three types of collision operation can be used. First, when two objects collide during a collision between objects, a force acts on the colliding surface. Therefore, by calculating the force acting on the object, the velocity after the collision can be found. When two objects collide elastically in a two-dimensional space, a new speed can be obtained after setting a new reference axis perpendicular to the collision surface, calculating a one-dimensional elastic collision with respect to the reference axis, and this is shown in Equation 1. Equation 1 is a speed calculation formula at the time of a one-dimensional elastic collision.

두 번째는, 물체와 선의 충돌이다. 탄성충돌을 하는 물체가 벽에 부딪히게 되면 부딪힌 면에 대한 속도의 방향이 반대가 된다. 즉, 충돌 면에 대한 입사각과 반사각이 같다. 세 번째는, 중력모사이다. 당구에서는 물체가 난간에 걸치게 되면 중력으로 인하여 난간방향으로 힘을 받게 되며, 이 힘을 계산할 수 있다. 새로운 속도를 계산하기 위해서는 중력으로 발생하는 속도를 계산해주어야 한다. 2차원 공간에서 중력을 고려하기 위해 3차원 공간으로 확장하는 경우, xy축은 2차원 평면이 되고 z축은 깊이가 된다. 접선으로 작용하는 힘은 물체의 속도에 영향을 주지 못하므로 무시된다. 그러므로 접선과 평행하는 힘을 고려해야 한다. 물체는 3차원 공간속에서 움직이므로 힘을 3차원에 대한 힘으로 변환해주어야 하고, 이렇게 계산된 힘에서 가속도를 계산해 새로운 속도를 계산해낼 수 있다.The second is the collision of an object and a line. When an elastically collided object collides with a wall, the direction of velocity with respect to the colliding surface is reversed. That is, the angle of incidence and the angle of reflection with respect to the collision surface are the same. The third is gravity simulation. In billiards, when an object crosses a railing, it receives a force in the direction of the railing due to gravity, and this force can be calculated. To calculate the new velocity, we need to calculate the velocity caused by gravity. When extending from a two-dimensional space to a three-dimensional space to consider gravity, the xy-axis becomes a two-dimensional plane and the z-axis becomes the depth. The force acting tangentially does not affect the speed of the object and is therefore neglected. Therefore, the force parallel to the tangent must be considered. Since an object moves in a three-dimensional space, the force must be converted into a three-dimensional force, and a new velocity can be calculated by calculating the acceleration from the calculated force.

강화학습을 진행하기 위해서는 환경이 필요하다. 이 환경을 구축하기 위해 Gym, MuJoCo, Pygame, Unity, Unreal Engine 등의 시뮬레이터가 사용될 수 있다. 이때, 시뮬레이터 중 Unity는 Unity ML-Agents Toolkit을 현재 지원하고 있어 환경에 대해 강화학습 알고리즘 적용을 쉽게 할 수 있고, 병렬처리 학습을 기본적으로 제공해주기 때문에 학습의 가속화 안정화를 보장할 수 있다. 또는 Unreal Engine을 이용할 수 있는데, 이는 C++ 기반으로 모든 소스코드가 깃허브(GitHub)에 공개되어 있으며 엔비디아사(社)의 피직스(PhysX) 3.3을 물리엔진으로 사용한다. 이를 사용하는 경우, 사실적인 그래픽(Photorealistic)과 언리얼CV를 사용하여 영상인식 알고리즘을 사용하는 것을 실험해볼 수 있다. 최신 언리얼 엔진은 실시간 레이트레이싱 기능도 제공된다. 둘째, 키보드, 마우스 뿐만 아니라 VR 기기, 조이스틱과 같은 다양한 장치의 인터페이스가 기본 제공되므로 복잡한 입력을 쉽게 매핑하고 제어할 수 있다. 셋째, 충돌 레벨을 설정하여 선별적인 충돌을 구현할 수 있고, 충돌모델을 물리엔진에 적합한 k-DOP(Discrete Oriented Polytope)로 자동 생성하는 기능이 내장되어 있다. 이 외에도 단순충돌과 복합충돌과 같은 다양한 설정이 가능하다. 넷째, 플러그인을 통하여 ROS와 연동할 수 있다. 다만 언리얼 엔진을 기반으로 코드를 통합하면 많은 언리얼 매크로가 포함된 C++를 사용하기 때문에 언리얼 스타일의 코딩 표준으로 프로그래밍하는 것을 표준 C++ 스타일보다 우선해야 된다. 그렇지 않으면 예상할 수 없는 함수의 동작이나 메모리 누수 문제를 야기할 수 있다.An environment is required for reinforcement learning to proceed. Simulators such as Gym, MuJoCo, Pygame, Unity, and Unreal Engine can be used to build this environment. At this time, among the simulators, Unity currently supports the Unity ML-Agents Toolkit, so it is easy to apply the reinforcement learning algorithm to the environment, and because it provides parallel processing learning by default, the acceleration and stabilization of learning can be guaranteed. Alternatively, Unreal Engine can be used, which is based on C++, all source codes are open to GitHub, and Nvidia's PhysX 3.3 is used as a physics engine. If you use it, you can experiment with using image recognition algorithms using Photorealistic and UnrealCV. The latest Unreal Engine also comes with real-time ray tracing capabilities. Second, interfaces for various devices such as VR devices and joysticks, as well as keyboards and mice, are built-in, making it easy to map and control complex inputs. Third, it is possible to implement selective collision by setting the collision level, and it has a built-in function to automatically generate a collision model as a k-DOP (Discrete Oriented Polytope) suitable for a physics engine. In addition to this, various settings such as simple collision and complex collision are possible. Fourth, it can be linked with ROS through a plug-in. However, since code integration based on Unreal Engine uses C++ with many Unreal macros, programming with Unreal-style coding standards should take precedence over standard C++ style. Otherwise, it may cause unpredictable function behavior or memory leak problems.

초기화부(320)는, 구축된 당구 시뮬레이션 환경 내에서 타겟 및 적어도 하나의 공의 좌표를 랜덤으로 초기 설정한 후, 타겟 및 적어도 하나의 공의 좌표를 위치 벡터로 설정할 수 있다. 도 3의 (a) 및 (d)와 같이 초기화부(320)는, 랜덤으로 설정된 타겟의 위치를 (xt, yt, zt), 제 1 공의 위치를 (xball1, yball1, zball1), 제 2 공의 위치를 (xball2, yball2, zball2)로 벡터로 표현하여 관찰정보로 이용할 수 있다. 여기서, 3 구의 경우에는 적어도 하나의 공의 수는 2 개일 수 있다. The initialization unit 320 may initially set the coordinates of the target and the at least one ball at random in the established billiard simulation environment, and then set the coordinates of the target and the at least one ball as a position vector. As shown in (a) and (d) of Figure 3, the initialization unit 320, the randomly set position of the target (xt, yt, zt), the position of the first ball (xball1, yball1, zball1), the second The position of the ball can be expressed as a vector as (xball2, yball2, zball2) and used as observation information. Here, in the case of three balls, the number of at least one ball may be two.

훈련부(330)는, 타겟 내 당점 및 당점을 타격하는 힘에 따른 타겟 및 적어도 하나의 공의 궤적을 그리는 시뮬레이션을 반복한 후 적어도 하나의 딥러닝 모델로 강화학습을 이용하여 결과에 따른 보상을 제공함으로써 훈련을 진행할 수 있다. 이때, 강화학습은 MODEL-BASED 방법 또는 MODEL-FREE 방법을 이용할 수 있다. 도 3b를 참조하면, MODEL-BASED 알고리즘들은 환경의 역학 자체를 학습하여, 최적의 행동을 선택하는 방식, 즉 정책이나 가치함수를 간접적으로 획득하는 방식이고, MODEL-FREE 알고리즘은 환경에 대한 정보 없이, 정책이나 가치함수를 직접 근사하여 이를 통해 최적의 행동을 선택하는 방식이다. 이때, 후술할 PPO의 경우 MODEL-FREE 방식이며, MCTS는 MODEL-BASED 방식에서 간접적으로 정책함수를 획득하는 방법 중 하나이다. 다만, MCTS 자체가 MODEL-BASED 알고리즘인 것은 아니다. 본 발명의 일 실시예에 따른 강화학습은 MuZero 알고리즘일 수 있고, 이는 MODEL-BASED 및 MODEL-FREE 방식의 장점만을 취한 방법이다. 이는, 환경의 역학 자체를 학습하면서 내부적으로 정책함수와 가치함수 또한 직접적으로 근사하게 된다. 이때, MuZero 알고리즘은 AlphaZero 알고리즘의 다음 버전으로, MODEL-BASED 방식으로 알려져 있지만 사실 어느 하나의 분류에 속한다고 정의할 수는 없다. 여기서, PPO에 대해서는 도 3의 (c)에서 후술하기로 한다. The training unit 330 repeats the simulation of drawing the trajectory of the target and at least one ball according to the target within the target and the force that hits the point, and then uses reinforcement learning with at least one deep learning model to provide a reward according to the result This will allow you to proceed with the training. In this case, reinforcement learning may use the MODEL-BASED method or the MODEL-FREE method. Referring to Figure 3b, the MODEL-BASED algorithms learn the dynamics of the environment itself and select the optimal behavior, that is, indirectly acquire a policy or a value function. , it is a method that directly approximates a policy or value function and selects the optimal behavior through it. At this time, in the case of PPO, which will be described later, the MODEL-FREE method is used, and the MCTS is one of the methods for obtaining policy functions indirectly in the MODEL-BASED method. However, MCTS itself is not a MODEL-BASED algorithm. Reinforcement learning according to an embodiment of the present invention may be the MuZero algorithm, which is a method that takes only the advantages of the MODEL-BASED and MODEL-FREE methods. In this way, while learning the dynamics of the environment itself, the policy function and the value function are also directly approximated internally. At this time, the MuZero algorithm is the next version of the AlphaZero algorithm and is known as a MODEL-BASED method, but in fact, it cannot be defined as belonging to any one classification. Here, the PPO will be described later in (c) of FIG. 3 .

훈련부(330)는, 강화학습을 통하여 훈련 및 테스트를 위한 데이터셋 없이도 당구가 놓여진 환경(Enironment) 내에서 현재의 상태(State)를 인식하고 행동(Action)을 통하여 보상(Reward)를 최대화하는 방식으로 학습을 진행할 수 있다. 훈련부(330)는, 도 4의 (a)와 같이 타겟의 표면상의 타격 부분인 당점을 타겟 중심점을 기준으로 하여 (x1, y1, z1)으로 벡터로 표현할 수 있다. 훈련부(330)는, 당점을 타격하는 힘과 당점을 타격하는 방향을 f(Force) 및 (x2, y2, z2)로 각각 벡터로 표현할 수 있다. 훈련부(330)는, (xt, yt, zt), (xball1, yball1, zball1) 및 (xball2, yball2, zball2)에 포함된 9 가지의 연속값을 입력으로 하고, (x1, y1, z1), (x2, y2, z2) 및 f에 포함된 7 가지의 연속값을 출력하는 딥러닝 모델을 시뮬레이션을 반복을 통하여 강화학습으로 훈련할 수 있다.The training unit 330 recognizes the current state in the environment in which the billiard is placed without a dataset for training and testing through reinforcement learning and maximizes the reward through action. learning can proceed. The training unit 330, as shown in (a) of FIG. 4 , may express the point, which is the striking part on the surface of the target, as a vector as (x1, y1, z1) based on the center point of the target. The training unit 330 may express the force to hit the point and the direction to hit the point as f(Force) and (x2, y2, z2) as vectors, respectively. Training unit 330, (xt, yt, zt), (xball1, yball1, zball1) and (xball2, yball2, zball2) included in nine consecutive values as input, (x1, y1, z1), A deep learning model that outputs seven consecutive values included in (x2, y2, z2) and f can be trained by reinforcement learning through repetition of the simulation.

전송부(340)는, 사용자 단말(100)에서 타겟 및 적어도 하나의 공의 좌표를 인식 또는 지정하는 경우, 타겟의 당점 및 당점을 타격하는 힘에 따른 적어도 하나의 경우의 수에 따른 시뮬레이션 결과를 사용자 단말(100)로 전송할 수 있다. 사용자 단말(100)은, 타겟 및 적어도 하나의 공의 좌표를 위치 벡터로 인식한 후, 타겟의 표면 상의 타격 부분인 당점 및 당점을 타격하는 힘에 따른 타겟 및 적어도 하나의 공의 궤적을 시뮬레이션한 결과를 출력할 수 있다.The transmission unit 340, when recognizing or designating the coordinates of the target and at least one ball in the user terminal 100, the simulation result according to the number of at least one case according to the target point and the force to hit the point It can be transmitted to the user terminal 100 . After recognizing the coordinates of the target and at least one ball as a position vector, the user terminal 100 simulates the trajectory of the target and at least one ball according to the force hitting the point and the point that is the hitting part on the surface of the target. You can print the results.

본 발명의 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 시스템(1)은, 사용자 단말(100)의 사용자가 당구를 치려는 당구대를 비추도록 당구대와 대향되는 방향에 설치된 적어도 하나의 카메라(400), 적어도 하나의 카메라(400)로 촬영된 당구대의 영상 내에 전송부에서 출력되는 시뮬레이션 결과를 오버레이하여 출력하는 적어도 하나의 디스플레이(500)를 더 포함할 수 있다. 카메라(400)를 통해 입력된 이미지를 분석하여 당구대와 당구공의 카메라에 대한 상대 자세를 계산하고, 증강 현실 장비, 예를 들어 구글 글래스와 같은 AR 글래스의 위치 추적 기능을 통해 얻은 사용자의 머리 자세(카메라 위치 및 회전)를 활용해 이를 절대 자세로 변환한다. 당구대와 당구공의 위치가 파악되면 이를 바탕으로 물리 시뮬레이션을 수행할 수 있다. 이는 플레이어가 칠 공을 360도 전 방향에 대해, 다양한 속도로 타격 시뮬레이션을 수행한 뒤 득점하는 경로를 모두 파악하는 식으로 이루어질 수 있다. 시각화 기능은 사용자의 머리 위치 및 경로의 가중치에 따라 가장 합리적인 경로를 유니티 엔진의 다양한 파티클 시스템을 활용하여 시각화할 수 있다. 위의 세 과정은 시스템이 구동되는 동안 실시간으로 수행될 수 있다. The billiard guide service providing system 1 using reinforcement learning of a deep learning model according to an embodiment of the present invention is installed in a direction opposite to the billiard table so that the user of the user terminal 100 illuminates the billiard table to play billiard. It may further include one camera 400 and at least one display 500 that overlays the simulation result output from the transmitter in the image of the billiard table photographed by the at least one camera 400 and outputs it. The user's head posture obtained by analyzing the image input through the camera 400 to calculate the relative posture of the billiard table and the billiard ball with respect to the camera, and augmented reality equipment, for example, AR glasses such as Google Glass (camera position and rotation) to convert it to an absolute pose. When the positions of the billiard table and billiard balls are identified, a physics simulation can be performed based on them. This can be done in such a way that the player conducts a hitting simulation at various speeds in all 360-degree directions of the seven ball and then grasps all the scoring paths. The visualization function can visualize the most reasonable path according to the user's head position and path weight using the Unity engine's various particle systems. The above three processes can be performed in real time while the system is running.

<영상처리><Image processing>

당구대의 펠트는 이미지 상에서 다른 물체와 뚜렷하게 구별되는 색상인 것을 전제한다. 이를 HSV 색공간인 색상(Hue), 포화(Saturation, 색의 진한 정도), 명도(Value)로 변환하여 평가하면 균일한 색상을 갖는 당구대의 펠트 표면은 안정적인 색상과 포화 값을 갖게 된다. 명도의 픽셀을 모두 골라내는 것만으로도 꽤 정확하게 당구대의 영역을 찾아낼 수 있다. 여기서는 OpenCV 라이브러리에서 제공하는, 각 채널의 값이 일정 범위 내에 존재하는 모든 픽셀을 찾아내는 함수인 inRange()를 사용하여 색상 값 및 포화 값을 지정한 모든 픽셀을 True로 지정할 수 있다.It is assumed that the felt of the pool table is a color that is distinctly different from other objects on the image. If this is converted into HSV color space, Hue, Saturation, and Value, and evaluated, the felt surface of a pool table with a uniform color will have stable color and saturation value. Just by picking out all the pixels of brightness, you can find the area of the pool table quite accurately. In this case, you can use inRange(), a function provided by the OpenCV library to find all pixels with a value in each channel within a certain range, to set all pixels with a color value and saturation value to be True.

필터링 결과에 침식 연산을 적용하고 원본에서 제함으로 계산한 경계선 이미지로부터, 이미지 내에 존재하는 모든 도형을 찾아낼 수 있다. 각 도형은 해당 도형을 이루는 꼭지점의 좌표 목록으로 구성될 수 있다. OpenCV 라이브러리에서는 findContours()라는 함수가 제공되는데, OpenCV 함수의 알고리즘은 정점 사이의 직선 여부를 매우 엄격하게 평가하기 때문에 사각형 도형은 네 개보다 훨씬 많은 수의 정점으로 평가된다. 따라서 도형을 이루는 정점의 개수를, 가장 뚜렷한 특성을 갖는 네 개의 정점으로 근사해줄 필요가 있다. 여기서는 Ramer-Douglas-Peucker 직선 근사 알고리즘을 구현하는 OpenCV의 approxPolyDP() 함수를 인자로 적용하여 무수한 정점들을 네 개의 꼭지점으로 근사할 수 있다. 이후 당구 큐대 등 다른 오브젝트에 의해 당구대 영역이 침범되는 경우에도 당구대의 꼭지점 네 개가 화면 안에 있는 한 당구대 영역이 검출될 수 있게끔 OpenCV의 convexHull() 함수를 적용해 모든 당구대 영역을 볼록 다각형으로 만들어준다. From the boundary image calculated by applying the erosion operation to the filtering result and subtracting it from the original, all figures existing in the image can be found. Each figure may be composed of a list of coordinates of vertices constituting the corresponding figure. A function called findContours() is provided in the OpenCV library. Since the algorithm of the OpenCV function evaluates whether a straight line between vertices is very strict, a rectangular shape is evaluated as a much larger number of vertices than four. Therefore, it is necessary to approximate the number of vertices constituting the figure to the four vertices with the most distinct characteristics. Here, by applying the approxPolyDP() function of OpenCV, which implements the Ramer-Douglas-Peucker linear approximation algorithm, as an argument, countless vertices can be approximated as four vertices. After that, even if the pool table area is invaded by other objects such as billiard cue tables, OpenCV's convexHull() function is applied so that the pool table area can be detected as long as the four vertices of the billiard table are in the screen to make all the pool table areas into convex polygons.

위의 과정을 거쳐 이미지 상에서 당구대와 같은 색상을 갖는 모든 영역에 대한 도형 정점 집합을 획득하게 된다. 이때 당구대의 꼭지점 개수가 4개인 것은 자명하고, 사용자가 당구대를 바라보고 있다면 당구대 영역의 크기 또한 화면 상에 서 일정 이상을 차지한다. 따라서 각각의 도형을 순회하며 조건에 부합하는 도형을 찾고, 이를 당구대의 네 개 꼭지점이라 가정한다. 당구대는 변형되지 않는 강체(Rigid Body)이며, 당구대의 치수는 이미 알고 있으므로, 당구대와 같은 치수의 물체가 원점(0, 0, 0)에 있다고 가정한다. 이때 원점은 카메라의 위치를 의미한다. 모델 공간에 위치한 3D 모델의 각 정점을 화면상에 투영된 좌표와 대조하여 물체의 자세(위치와 회전)을 구하는 알고리즘을 PNP(Perspective N-Point Algorithm) 알고리즘이라 하는데, OpenCV에서는 solvePnP()라는 함수에서 알고리즘을 구현한다. 단, OpenCV 구현은 화면에서 검출된 2D 정점의 순서가 모델 공간의 3D 정점의 순서와 정합되어 있다고 가정하는데, 실제로는 화면 상에 검출된 당구대의 긴 변-짧은 변 정점의 순서가 모델 공간에서의 순서와 다를 수 있다. 따라서 먼저 PNP 알고리즘을 적용하고, 인덱스 목록을 방향에 관계없이 한 번 회전시켜 다시 PNP 알고리즘을 적용, 더 오차가 적은 후보를 당구대의 최종 자세로 선정한다.Through the above process, a set of figure vertices for all areas having the same color as the pool table on the image is obtained. At this time, it is self-evident that the number of vertices of the pool table is 4, and if the user is looking at the pool table, the size of the pool table area also occupies a certain amount on the screen. Therefore, it traverses each figure to find a figure that meets the condition, and assumes that it is the four vertices of the pool table. Since the billiard table is a rigid body that does not deform, and the dimensions of the billiard table are already known, it is assumed that an object with the same dimensions as the billiard table is at the origin (0, 0, 0). In this case, the origin means the position of the camera. The algorithm that finds the posture (position and rotation) of an object by comparing each vertex of the 3D model located in the model space with the coordinates projected on the screen is called the PNP (Perspective N-Point Algorithm) algorithm. In OpenCV, a function called solvePnP() implement the algorithm in However, the OpenCV implementation assumes that the order of the 2D vertices detected on the screen matches the order of the 3D vertices in the model space. The order may be different. Therefore, the PNP algorithm is first applied, the index list is rotated once regardless of the direction, and the PNP algorithm is applied again, and the candidate with less error is selected as the final posture of the pool table.

이를 통해 획득한 당구대의 자세는 카메라에 대한 상대 자세이므로, 여기에 AR 글래스의 위치 추적 기능이 반환한 카메라의 절대 자세를 적용, 당구대의 절대 자세를 계산한다. 만약 영상 내에서 당구대의 자세를 추정해내지 못하더라도, 여전히 AR 글래스의 위치 추적 기능을 통해 카메라의 현재 자세가 갱신되므로 당구대의 저장된 절대 자세를 바탕으로 꾸준히 추적을 유지할 수 있다. 이로 인해, 최초 인식 이후 추가적인 당구대 인식은 AR 글래스 위치 추적기의 오차보정 차원에서 이루어진다. 한 번 당구대의 자세를 획득하고 나면, 화면 상에서 검출된 당구공의 2D 중점 방향으로, 카메라에서 출발하는 광선을 투사하여 당구대 평면과 충돌시킴으로써 당구공의 3D 중점을 계산할 수 있다. 당구공이 모두 한 평면상에 위치하기 때문으로, 당구공의 이미지 상 중점의 UV 위치만 정확하게 획득할 수 있다면 당구공의 중점 위치를 획득할 수 있다. 실제로 문제가 되는 것은 당구공의 중점 위치를 찾는 것으로, 육안으로 당구공을 손쉽게 찾을 수 있는 것과 달리 조명의 위치, 주변광 등의 영향으로 인해 기계적으로는 이를 식별하기가 어려워진다.Since the posture of the billiard table obtained through this is relative to the camera, the absolute posture of the camera returned by the position tracking function of AR glasses is applied to this to calculate the absolute posture of the billiard table. Even if the posture of the pool table cannot be estimated in the video, the camera's current posture is still updated through the position tracking function of the AR glasses, so that tracking can be maintained based on the stored absolute posture of the pool table. For this reason, additional recognition of the pool table after the initial recognition is made in the dimension of error correction of the AR glasses position tracker. Once the posture of the billiard table is obtained, the 3D midpoint of the billiard ball can be calculated by projecting a ray from the camera in the direction of the 2D midpoint of the billiard ball detected on the screen to collide with the billiard table plane. Since all the billiard balls are located on one plane, if only the UV position of the center point on the image of the billiard ball can be accurately obtained, the center position of the billiard ball can be obtained. The real problem is to find the central position of the billiard ball, and unlike the easy way to find the billiard ball with the naked eye, it becomes difficult to mechanically identify it due to the influence of the location of lighting and ambient light.

이에 따라, 변형된 템플릿 매칭을 통해 중점 탐색의 정확도와 성능 최적화 사이에서 균형을 찾을 수 있다. 공의 추정 중점 2D 좌표에서, 해당 위치에서의 공의 이미지 상 픽셀 반경 크기의 커널로 색상에 대한 템플릿 매칭을 수행하며, 이 때의 템플릿은 명도를 제외한 색상, 휘도의 비교만을 수행한다. 그러나 일반적인 템플릿 매칭과 같이 이미지 전체에 동일한 커널로 연산을 수행할 수는 없는데, 공과의 실제 거리에 따라 커널의 반경이 동적으로 바뀌어야 하기 때문이다. 커널 크기가 동적으로 바뀌게 되면서 당구공의 중심이 될 수 있는 모든 후보에 대해 각각의 커널 연산을 적용해야 하고, 이는 OpenCV에서 제공하는 GPU 가속의 힘을 빌리더라도 성능상의 손실을 야기한다.Accordingly, a balance can be found between the accuracy of the central search and the performance optimization through the modified template matching. In the estimated center 2D coordinates of the ball, template matching is performed for color with a kernel having a pixel radius on the image of the ball at the corresponding position, and only the color and luminance are compared with the template except for brightness. However, it is not possible to perform an operation with the same kernel over the entire image like normal template matching, because the radius of the kernel must change dynamically according to the actual distance to the ball. As the kernel size changes dynamically, each kernel operation must be applied to all candidates that can be the center of the billiard ball, which causes performance loss even if the power of GPU acceleration provided by OpenCV is borrowed.

따라서 희소 템플릿 커널(Sparse Template Kernel) 개념을 도입, 전체 픽셀의 약 5~10% 정도만을 샘플링하여 템플릿 매칭을 수행함으로써 성능 최적화를 달성할 수 있다. 희소 템플릿 커널은 단순한 가중치 합 연산을 수행하기 때문에, 해당 커널이 적용될 가중치 필드를 생성해야 한다. 이는 색상-휘도 색공간 이미지에서 각 공의 색상-휘도 대표값을 제하여 각 채널의 오차를 구하고, 각 채널간 오차의 유클리드 거리의 음수를 지수함수의 인자로 사용 하여 0~1 사이에 값이 분포하는 적합도를 획득한다. 각 픽셀에 대한 위의 적합도 연산을 식으로 나타내면 이하 수학식 2 내지 4와 같다.Therefore, performance optimization can be achieved by introducing the concept of a sparse template kernel and performing template matching by sampling only about 5 to 10% of the total pixels. Since the sparse template kernel performs a simple weight sum operation, it is necessary to create a weight field to which the kernel is applied. This is to obtain the error of each channel by subtracting the representative value of the color-luminance of each ball from the color-luminance color space image. A distributional fit is obtained. The above fitness calculation for each pixel is expressed as Equations 2 to 4 below.

이때 수학식 2의 hn, sn은 각 픽셀의 색상 및 휘도 값, hr, sr은 검출하고자 하는 공의 대표 색상 및 휘도 값을 나타내며, 수학식 3의 wh, ws는 오차의 유클리드 거리를 계산할 때 색상과 휘도 성분 오차 각각에 적용할 가중치이다. 수학식 4의 β는 오차의 영향력을 결정하는 계수로, 높을수록 오차에 민감하게 되어 정확도가 올라가나, 주변광이나 조도 등 환경 영향에 취약해진다. 각 채널은 모두 32비트 부동 소수점 형태로 계산되며, 이를 위해 0~1 사이의 값으로 정규화되므로 거리 d는 실질적으로 매우 작은 값을 갖게 된다. 이를 극복하기 위해 β에는 충분히 큰 값을 지정할 수 있다. 각 색상에 대한 적합도 필드를 획득하면, 희소 템플릿 커널을 위 필드에 곱하여 공의 중점을 추정하게 된다. 그러나 공이 존재할 수 있는 영역은 테이블의 ROI에서도 매우 적은 영역에 해당하므로, 비싼 템플릿 연산을 이미지 전체에 적용하는 것은 비효율적이다. 따라서 먼저 고정된 적합도 값을 문턱값으로 사용하여 이미지를 이진화, 인덱스를 추출하여 해당 인덱스에 대해서만 템플릿 연산을 적용할 수 있다.In this case, hn and sn in Equation 2 are the color and luminance values of each pixel, hr and sr represent the representative color and luminance values of the ball to be detected, and wh and ws in Equation 3 are the colors when calculating the Euclidean distance of error. and weight to be applied to each of the luminance component errors. β in Equation 4 is a coefficient that determines the influence of an error, and the higher the value, the more sensitive the error becomes, and the higher the accuracy, but the weaker it is to environmental influences such as ambient light or illuminance. Each channel is calculated in 32-bit floating point form, and for this purpose, it is normalized to a value between 0 and 1, so the distance d has a very small value in practice. To overcome this, β can be assigned a sufficiently large value. After obtaining the fitness field for each color, the midpoint of the ball is estimated by multiplying the above field by the sparse template kernel. However, since the area where the ball can exist corresponds to a very small area even in the ROI of the table, it is inefficient to apply the expensive template operation to the entire image. Therefore, the template operation can be applied only to the index by first binarizing the image using the fixed fitness value as a threshold and extracting the index.

이때, 공의 적합도를 단순히 이진화할 경우 광원 상황에 따라 정작 공의 실제 중점이 아닌 경계선에만 필드가 형성될 가능성이 높다. 따라서 일차적으로 이진화를 수행한 뒤 팽창-침식 연산을 반복 적용하여 당구공의 중점이 매칭 필드에 포함되도록 한다. 매칭 필드의 각 점에 대해 커널을, 적합도 필드에 적용하여 각 점의 적합도 합을 계산한다. 개념적으로 반지름 내에 목표 색상의 픽셀이 많을수록, 반지름 밖에 목표 색상의 픽셀이 적을수록 높은 적합도를 갖는다. 이후 적합도 합 필드에서 최대값을 갖는 점을 선택하면, 이것이 각 공의 2D 추정 중점 위치가 된다. 이때 동일한 색의 공이 두 개가 존재하는 경우, 예를 들어, 빨간색 공은 두 개가 존재하는 경우 위의 과정을 한 번 더 수행해주는데, 이미 검출된 공의 적합도 합필드는 의미가 없으므로 공의 반경으로 지워준다. 이렇게 계산된 공의 중점 위치로 원점에서 광선을 투사함으로써 카메라에 대한 공의 3D 중점 좌표를 획득할 수 있으며, 여기에 카메라 자세 행렬을 곱해 절대 위치로 바꾸어 준다. In this case, if the fitness of the ball is simply binarized, the field is highly likely to be formed only at the boundary line, not the actual center of the ball, depending on the light source situation. Therefore, after performing binarization first, the expansion-erosion operation is repeatedly applied so that the center of the billiard ball is included in the matching field. For each point in the matching field, the kernel is applied to the goodness-of-fit field to compute the sum of goodness-of-fits for each point. Conceptually, the more pixels of the target color within the radius and the fewer pixels of the target color outside the radius, the better the fit. After that, if the point with the maximum value is selected in the fitness sum field, this becomes the 2D estimated midpoint of each ball. At this time, if there are two balls of the same color, for example, if there are two red balls, the above process is performed once more. Since the already detected fit sum field is meaningless, it is erased with the radius of the ball. . By projecting a ray from the origin to the center position of the ball calculated in this way, the 3D center coordinates of the ball with respect to the camera can be obtained, which is multiplied by the camera attitude matrix and converted into an absolute position.

또는 당구공 인식을 위하여, 먼저 정규분포 및 확률분포에 의해 생성된 잡음을 제거하기 위해 가우시안 블러링(Gaussian Blurring) 과정을 거친 후 픽셀 기반의 이미지에서 기하학적 성분을 추출하는 방법인 허프 변환(Hough Transform)을 이용하여 원을 검출하고, OpenCV내의 허프 변환 함수는 그레이 스케일 이미지(Gray Scale Image) 자체를 입력으로 받기 때문에 내부적으로 행렬 연산 후 중심점 기준으로 변화량을 검출하는 Sobel 연산을 통해 변환을 수행한다. 또, 큐대 인식은 카메라로부터 전달받은 영상데이터를 흑백영상으로 변환시킨 후 영상 이진화(Image Binarization) 과정을 거쳐 이진영상으로 만든다. 처리된 이미지로부터 같은 값을 가진 인접한 영역끼리 그룹화 하는 라벨링(Labeling) 과정을 통하여 큐대의 위치를 인식하고 중심점을 검출한다. 그 다음 큐대의 끝점과 중심점을 연결하여 직선을 만들고 연결된 직선을 통해 일직선상의 궤도를 그린다. 궤도가 설정한 당구대 영역의 윤곽선에 닿을 경우 반사의 법칙을 이용하여 입사각의 크기대로 반사되도록 하는 방법을 이용할 수도 있다.Alternatively, in order to recognize a billiard ball, the Hough Transform is a method of extracting geometric components from a pixel-based image after undergoing a Gaussian blurring process to remove noise generated by normal and probability distributions. ) is used to detect a circle, and since the Hough transform function in OpenCV receives a gray scale image itself as an input, it internally performs a transformation through a Sobel operation that detects the amount of change based on the center point after matrix operation. In addition, cue recognition converts the image data received from the camera into a black-and-white image and then converts it into a binary image through the image binarization process. Through the labeling process of grouping adjacent areas with the same value from the processed image, the position of the cue table is recognized and the center point is detected. Then, connect the end point and the center point of the cue bar to make a straight line, and draw a straight trajectory through the connected straight line. When the trajectory touches the outline of the set pool table area, it is also possible to use the method of reflecting the size of the incident angle using the law of reflection.

<경로 시뮬레이션><path simulation>

이렇게 검출된 당구대와 각 당구공의 위치는 Unity Engine으로 구현된 AR 당구 애플리케이션으로 전달된다. 당구대와 네 공의 절대 위치를 모두 아는 경우, 물리 시뮬레이션을 통해 플레이어가 친 공의 경로를 시뮬레이션하고 득점이 나는 경우를 계산해낼 수 있다. 일반적으로 물리 시뮬레이션은 시분할, 또는 타임 스텝이라 불리는 이산적인 방법에 의해 제어된다. 예를 들어 시점 n 에서 물체의 가속도가 a→n이고, 시뮬레이션 간격이 Δt일 때 각각의 시뮬레이션 단계에서 다음 속도 v→n+1와 위치 s→n+1를 이하 수학식 5와 같이 계산할 수 있다.The detected billiard table and the location of each billiard ball are transmitted to the AR billiards application implemented with Unity Engine. If you know the absolute positions of the pool table and all four balls, you can simulate the path of the ball hit by the player through physics simulation and calculate the case where a goal is scored. In general, physics simulations are controlled by a discrete method called time division, or time step. For example, when the acceleration of the object at time n is a → n and the simulation interval is Δt, the next velocity v → n+1 and the position s → n+1 in each simulation step can be calculated as in Equation 5 below. .

수학식 5는 단위 시간 Δt마다 이산적으로 물리 시뮬레이션을 수행하는 시스템에서, 단위 시간 동안의 개체의 속도 변화를 계산하기 위한 적분 연산을 가속도에 대한 단위 시간의 곱으로 단순화하여 계산하고, 현재 속도에 누적한다. 수학식 6은 같은 맥락으로 위치 변화 계산을 위한 속도에 대한 적분 연산을 단순화한다.Equation 5 is calculated by simplifying the integral operation for calculating the speed change of an object for a unit time in a system that performs a physical simulation discretely for each unit time Δt as a product of the unit time for the acceleration, and accumulate Equation 6 simplifies the integral operation for the velocity for calculating the position change in the same context.

이산적인 시분할 물리 시뮬레이션에서 사용하는 위 방법은 적분을 단순화함으로써 복잡한 물리 수식과 점화식의 도입이 자유롭다는 장점이 있다. 또한 시뮬레이션 간격 Δt를 조절함으로써 시뮬레이션의 정확도와 최적화 수준을 제어할 수 있다. 그러나 시분할 방식의 시뮬레이션은 전체 시뮬레이션의 길이, 그리고 정확도 수준에 따라 연산량이 선형적으로 증가한다. 이는 경로 탐색을 위해 공의 타격을 360도 방향으로, 다양한 속도에 걸쳐 수천 회 이상 시뮬레이션 해야 하는 강화학습에는 다소 적합하지 않다.The above method used in discrete time division physics simulation has the advantage of being free to introduce complex physics equations and ignition equations by simplifying the integration. Also, by adjusting the simulation interval Δt, the accuracy and optimization level of the simulation can be controlled. However, in the time division simulation, the amount of computation increases linearly according to the length of the entire simulation and the level of accuracy. This is somewhat unsuitable for reinforcement learning, which requires simulating the hitting of a ball in a 360-degree direction and over thousands of times at various speeds to find a path.

따라서, 본 발명의 일 실시예에서는, 시분할이 아닌 이벤트 기반의 물리 시뮬레이션 기능을 이용할 수 있다. 이벤트 기반 모델은 시분할 기반 시뮬레이션에서 연산의 대부분을 차지하는, 충돌이 없는 기간에 대한 시뮬레이션 전체를 생략하는 모델이다. 일반적인 시분할 시뮬레이션이 매 시뮬레이션 스텝마다 Δt의 시간 동안 모든 물체가 움직인 거리를 연산하고, 이후 모든 물체의 충돌을 검사하는 것과 달리, 이벤트 기반 모델은 구체 궤적 교차를 바탕으로 매우 긴 시간 간격에 대해 모든 물체의 충돌을 검사하고, 가장 먼저 충돌이 발생하는 시간에 대해 모든 물체의 이동을 시뮬레이션 한 뒤, 해당 지점에서 충돌한 물체의 충돌 물리를 계산한다. 즉, 이벤트 기반 충돌 연산은 복잡도가 O(n2)인 충돌 검사를 시뮬레이션 기간 동안 충돌이 발생하는 횟수만큼만 시행하게 되며, 이는 시분할 방식에 비해 최적화 측면에서 많은 이득을 갖는다.Accordingly, in an embodiment of the present invention, an event-based physics simulation function, not time division, may be used. The event-based model is a model that omits the entire simulation for a collision-free period, which takes up most of the computation in the time-division-based simulation. Unlike a typical time-division simulation that calculates the distance traveled by all objects for a time of Δt at every simulation step, and then examines the collision of all objects, the event-based model calculates all objects over a very long time interval based on the spherical trajectory intersection. After examining the collision of an object, first, the movement of all objects is simulated for the time at which the collision occurs, and then the collision physics of the object that collided at that point is calculated. That is, the event-based collision operation executes collision detection with a complexity of O(n2) only as many times as the number of collisions occurs during the simulation period, which has many advantages in optimization compared to the time division method.

물론 속도가 상수나 자연 감쇠(v(t)=V0e-αt)가 아닌 경우 궤적 검사의 복잡도가 매우 크게 증가하게 되므로, 가속도 함수가 존재하는 경우 시뮬레이션 자체가 불가능하다는 큰 단점이 있지만, 당구의 경우 초기 속도를 제외하면 외력이 작용하지 않으므로 이벤트 기반 시뮬레이션의 도입이 적절할 수 있다. 일차적으로 구체 궤적 검사를 통해 충돌 여부를 검사하고 나면, 실질적인 충돌 물리 반응을 계산해야 한다. 해당 모델은 당구공과 쿠션에 작용하는 물리를 정교하게 구현하는 대신, 공과 공, 공과 쿠션 사이의 탄성 계수 을 직접 지정하고, 공과 쿠션 충돌 시 공에 적용되는 회전을 경험에 기반한 휴리스틱을 통해 구현할 수 있다. 공과 공, 공과 쿠션 사이에 적용되는 충돌을 계산하기 위해 이하의 수학식 7 내지 9의 식을 적용할 수 있다.Of course, if the velocity is not a constant or natural damping (v(t)=V0e-αt), the complexity of the trajectory inspection is greatly increased. Since no external force acts except for the initial velocity, the introduction of event-based simulation may be appropriate. After first checking for collision through spherical trajectory inspection, the actual collision physics reaction must be calculated. Instead of precisely realizing the physics acting on the billiard ball and the cushion, the model directly specifies the elastic modulus between the ball and the ball and between the ball and the cushion, and implements the rotation applied to the ball when the ball and cushion collide through an empirical heuristic. . In order to calculate the collision applied between the ball and the ball and between the ball and the cushion, the following Equations 7 to 9 may be applied.

수학식 7의 Pct는 두 객체 사이의 충돌 지점이고, 수학식 9의 V1'는 충돌 이후 객체 1의 속도이다. 충돌한 다른 물체인 V2에도 위와 같은 식이 적용된다. 만약 공과 쿠션이 충돌하는 경우 m2를 무한대로 두어, V1p'=-εV1p로 계산한다. ε은 공과 공 사이의 충돌에서 0.77, 공과 쿠션의 충돌에서 0.73이다. 회전의 영향 또한 어느 정도 고려하는데, 일반적으로 당구공이 충돌 이후 일정 시간 후에 진행 속도와 구르는 방향이 일치하게 된다는 점을 모사하여 최소 구름 시작 시간 troll을 설정, 마지막 충돌부터 흐른 시간 Δt에 따라 회전 속도가 진행 방향과 일치하게끔 현재 회전 속도 w→n과 최대 회전 속도 w→n(max) 사이를 이하의 수학식 10 내지 수학식 13과 같이 선형 보간한다.Pct in Equation 7 is a collision point between two objects, and V1' in Equation 9 is the velocity of object 1 after collision. The same formula applies to the other collided object, V2. If the ball collides with the cushion, m2 is set to infinity and V1p'=-εV1p is calculated. ε is 0.77 for the ball-to-ball collision and 0.73 for the ball-to-cushion collision. The effect of rotation is also considered to some extent. In general, the minimum rolling start time troll is set by simulating that the billiard ball travels and the rolling direction coincides after a certain period of time after the collision. Linear interpolation is performed between the current rotational speed w→n and the maximum rotational speed w→n(max) to match the moving direction as shown in Equations 10 to 13 below.

수학식 10의 α는 공 객체의 직선 속도와 회전 속도의 일치 정도를 나타내며, 단순히 상수 troll에 대한 Δt의 비로 나타낸다. 이때, α의 최대값은 1로 제한된다. 당구공이 진행 방향으로 미끌림 없이 회전하고 있다면, 공과 지면 사이의 접점의 속도는 곧 공의 직선 방향 속도의 반대 방향이 된다. 이때 각속도 벡터는 속도의 반대 방향과 접촉 평면의 법선(위 방향 벡터)의 외적을 반지름 r로 나누어 계산하고 이는 수학식 11과 같다. 수학식 12의 (hat)uup은 위 방향 벡터, r은 공의 반지름이다. 수학식 13의 w→n+1은 충돌 시점 tn+1에서의 회전 속도가 된다. 공과 공 사이에 충돌이 발생할 시 일차적으로 위의 탄성 계수 에 기반한 충돌식으로 계산된 속도 v0를 회전을 고려하여 조작한다. 이때 회전의 반영 비율을 마찰 상수 fs(ball) =0.23로 직접 지정한다. 이 과정은 이하 수학식 14와 같고 이를 충돌한 공 두 개에 각각 적용한다.α in Equation 10 indicates the degree of coincidence between the linear speed and the rotation speed of the ball object, and is simply expressed as the ratio of Δt to the constant troll. In this case, the maximum value of α is limited to 1. If the billiard ball rotates without slipping in the direction of travel, the velocity of the contact point between the ball and the ground is in the opposite direction to the linear velocity of the ball. At this time, the angular velocity vector is calculated by dividing the cross product of the opposite direction of the velocity and the normal (upward direction vector) of the contact plane by the radius r, which is shown in Equation 11. (hat)uup in Equation 12 is the upward direction vector, and r is the radius of the ball. w→n+1 in Equation 13 becomes the rotational speed at the collision time tn+1. When a collision occurs between a ball and a ball, the velocity v0 calculated by the collision equation based on the above elastic modulus is primarily manipulated in consideration of rotation. In this case, the reflection ratio of rotation is directly specified as the friction constant fs(ball) =0.23. This process is the same as Equation 14 below and is applied to the two colliding balls, respectively.

각속도 w→n+1와 지면의 법선 벡터 (hat)uup을 외적하고, 반지름 r을 곱하면 공의 각속도를 당구대 표면에 작용하는 선속도로 표현할 수 있다. 수학식 14는 마찰 상수 fs(ball)을 통해 이 중 일부 만을 당구공 객체의 최종 선속도에 반영하여 누적한다. 공과 쿠션 사이의 충돌은 좀 더 복잡하게 계산된다. 수학식 14와 마찬가지로 v0을 계산하고, 여기에 쿠션 평면과 공의 접점에서 공의 회전에 의해 발생하는 마찰력을 모사한다. 이는 공의 회전 속도 벡터 w→와 접촉한 쿠션 평면의 법선 (hat)n의 외적 방향 속도와 같다. 단, 중력을 고려하지 않으므로 수직 방향 속도를 제거하며, 법선 방향의 속도도 제거할 수 있다. 이는 상향 벡터와 법선 벡터의 외적 결과로 접점의 속도 벡터를 스케일링 하는 것과 같다. 이때 회전의 어느 정도를 속도에 반영할지를 결정하기 위해 마찰 상수 fs(cushion)=0.67을 적용할 수 있고 이는 수학식 15 및 수학식 16과 같다.By multiplying the angular velocity w→n+1 with the normal vector (hat)uup of the ground and multiplying by the radius r, the angular velocity of the ball can be expressed as the linear velocity acting on the surface of the pool table. In Equation 14, only a portion of this is reflected in the final linear velocity of the billiard ball object through the friction constant fs(ball) and accumulated. The collision between the ball and the cushion is calculated more complexly. Similarly to Equation 14, v0 is calculated, and the frictional force generated by the rotation of the ball at the contact point between the cushion plane and the ball is simulated. It is equal to the velocity in the external direction of the normal (hat)n of the plane of the cushion in contact with the ball's rotation velocity vector w→. However, since gravity is not taken into account, the vertical velocity is removed, and the normal velocity can also be removed. This is equivalent to scaling the velocity vector of the junction as a result of the cross product of the up vector and the normal vector. In this case, a friction constant fs(cushion)=0.67 may be applied to determine how much rotation is reflected in the speed, which is expressed in Equations 15 and 16.

수학식 15의 요소별 곱(◎ 기호) 왼쪽의 식은 수학식 14에서와 같이 지면에 작용하는 각속도의 선속도 표현을 계산하고, 여기에 당구대 쿠션의 마찰 상수 fs(cushion)로 적용 비율을 조절한다. 이렇게 계산된 선속도 성분은 당구대 쿠션의 수평 방향 벡터인 우측의 식 (hat)uup×(hat)n과 요소별 곱을 수행함으로써 접선방향으로 작용하는 속도만 남게 된다. 수학식 16은 수학식 15에서 계산된 접선 방향의 속도를 충돌 이후 당구공의 최종 선속도에 누적한다. 충돌이 공의 회전에도 영향을 주게끔, 공의 쿠션 방향의 속도 벡터 v→p에서 속도 벡터 v→0을 제하여 마찰 방향의 속도 벡터 v→f를 계산하고, 이를 쿠션의 법선과 외적하여 각속도 벡터를 계산한 뒤 상수 c=0.2의 비율만큼 각속도에 누적한다. 이하 수학식 17은 이를 식으로 표현한 것이다.Product by element (◎ sign) of Equation 15 The equation on the left calculates the linear velocity expression of the angular velocity acting on the ground as in Equation 14, and adjusts the application rate with the friction constant fs (cushion) of the billiard table cushion . For the linear velocity component calculated in this way, only the velocity acting in the tangential direction remains by performing the element-by-element product of the equation (hat)uup×(hat)n on the right, which is the horizontal vector of the billiard table cushion. In Equation 16, the tangential velocity calculated in Equation 15 is accumulated in the final linear velocity of the billiard ball after the collision. So that the collision also affects the rotation of the ball, the velocity vector v→0 in the direction of friction is calculated by subtracting the velocity vector v→0 from the velocity vector v→p in the direction of the cushion, and the angular velocity is obtained by cross product with the normal of the cushion. After calculating the vector, it is accumulated in the angular velocity by the ratio of the constant c=0.2. Equation 17 below expresses this as an equation.

위의 휴리스틱을 바탕으로 시뮬레이션 엔진을 구현하고, 당구공의 배치가 바뀔 때마다 공의 360도 각도를 0.5°단위로 나눈 뒤, 시작 속도 0.15, 0.3, 0.6, 0.9[m/s]에 대해 각각 시뮬레이션을 수행할 수 있다. 모든 경로에 대한 시뮬레이션 결과 중 득점하는 경로를 일차적으로 획득하고, 득점 시공과의 충돌 각도, 충돌 전 외부 요인에 의한 이동 여부 등 여러 가지 요소를 고려하여 최적의 경로 목록을 필터링한다. 이 목록에서 사용자의 머리 방향과 가장 가까운 경로를 선택해 시각화하게 된다.A simulation engine is implemented based on the above heuristic, and each time the arrangement of the billiard ball is changed, the 360-degree angle of the ball is divided by 0.5°, and then for starting speeds 0.15, 0.3, 0.6, and 0.9 [m/s], respectively. simulation can be performed. Among the simulation results for all paths, the scoring path is primarily acquired, and the list of optimal paths is filtered by considering various factors, such as the collision angle with the scoring construction and movement due to external factors before the collision. From this list, the path closest to the direction of the user's head is selected and visualized.

이렇게 AR 글래스를 이용하여 당구공 자체에 적용된 시각 효과에 흥미를 느끼고, 당구를 전혀 접해본 적이 없는 경우에도 애플리케이션이 제시하는 경로를 바탕으로 득점을 내며 큰 심리적인 보상과 동기를 얻을 수 있다. 충돌 계산이 휴리 스틱에 기반을 둔 만큼 모든 상황에서 이상적인 정확도를 보이지는 못하나, 이는 많은 경우 알고리즘이 제시하는 조건을 갖추기 어려운 당구 초심자가 인지하기 어려운 수준의 오차로서, 당구 학습 보조라는 목적을 고려하면 납득가능한 수준의 구현될 수 있다. 무엇보다도 AR 글래스를 이용하는 경우 시각화 표현의 범위가 입체로 확장되므로 몰입도 측면에서 상당한 이점을 갖고, 단순히 평면 미디어를 통한 영상 매체 학습 보다 뛰어난 효과를 기대할 수 있으며, 장기적으로 AR 장치의 소형화/보급화에 따라 새로운 학습 플랫폼으로도 이용될 수 있다.In this way, using AR glasses, you can feel interested in the visual effects applied to the billiard ball itself, and even if you have never played billiards, you can score points based on the route presented by the application and get great psychological rewards and motivation. As collision calculation is based on heuristics, it does not show ideal accuracy in all situations, but in many cases, this is an error of a level that is difficult to recognize by a billiard beginner who is difficult to meet the conditions suggested by the algorithm. It can be implemented to an acceptable level. Above all, when AR glasses are used, the scope of visualization is expanded to three-dimensional, so it has a significant advantage in terms of immersion, and can be expected to have a superior effect than simply learning video media through flat media. Therefore, it can be used as a new learning platform.

판단부(350)는, 전송부(340)에서 전송한 적어도 하나의 경우의 수에 따른 시뮬레이션 결과 중 어느 하나의 시뮬레이션 결과를 사용자 단말(100)에서 선택한 경우, 사용자 단말(100)에서 선택한 어느 하나의 시뮬레이션 결과를 실현하기 위한 힘을 측정해주는 장치에 사용자 단말(100)의 사용자로부터 압력을 받은 후, 압력에 따른 힘과 시뮬레이션 결과를 실현하기 위한 힘이 동일 또는 유사한지의 여부를 판단해줄 수 있다.The determination unit 350 is, when the user terminal 100 selects any one simulation result from among the simulation results according to the number of at least one case transmitted from the transmission unit 340 , any one selected by the user terminal 100 . After receiving the pressure from the user of the user terminal 100 to the device for measuring the force for realizing the simulation result of

덧붙여서, 본 발명의 일 실시예는 강화학습으로 훈련된 딥러닝 모델을 플레이어를 상대할 인공지능으로 개발할 수도 있다. 당구를 플레이하는 현실의 다양한 문제들은 연속 행동공간에서 행동을 차례로 선택하는 계획(Planning) 능력을 요구한다. 몬테-카를로 트리 탐색(Monte-Carlo Tree Search, 이하 MCTS)은 큰 탐색 공간을 가진 문제를 풀기 위한 효율적인 온라인 계획 알고리즘으로, 바둑과 실시간 게임 등 행동공간이 이산적인 경우들에 매우 성공적으로 적용된 바 있다. 그러나, 이산 행동공간에서와는 달리 행동공간이 연속적인 문제들에 대해 MCTS는 최우선으로 고려되는 계획 알고리즘이 아니었다. 이는 선택 가능한 행동의 수가 무한히 많아 트리 탐색을 위해 행동공간을 거칠게 이산화하는 과정이 필요하며, 이 결과로 매우 정밀한 조작을 요구하는 현실 문제들에 실효성이 떨어지는 결과를 초래할 수 있기 때문이다.In addition, an embodiment of the present invention may develop a deep learning model trained by reinforcement learning as an artificial intelligence to face the player. Various problems in the reality of playing billiards require the ability of planning to select actions one after another in a continuous action space. Monte-Carlo Tree Search (MCTS) is an efficient online planning algorithm for solving problems with large search spaces, and has been very successfully applied to cases where the action space is discrete, such as Go and real-time games . However, unlike in the discrete action space, MCTS was not a top-priority planning algorithm for problems in which the action space was continuous. This is because the process of coarsely discretizing the action space for tree search is necessary because the number of selectable actions is infinite, and as a result, it can lead to ineffectiveness in real problems requiring very precise manipulation.

현존하는 연속 행동공간을 다루는 MCTS 알고리즘들은 크게 두 가지 종류의 접근법을 취한다. 점진적 확장(Progressive Widening)은 트리 각 노드의 선택 가능한 행동의 가짓수를 노드의 방문 횟수에 기반하여 점진적으로 늘려나간다. 또 다른 접근법인 계층적 낙관 최적화(Hierarchical Optimistic Optimization)는 행동공간을 점진적으로 쪼개 나가며 선택될 행동의 정밀도를 높여나간다. 이러한 접근법들은 적당히 좋은 행동을 고르는 데에는 충분한 성능을 보여줄 수 있으나, 매우 정밀한 제어를 계획 결과로 얻기 위해서는 엄청난 양의 시뮬레이션이 필요하다. 반면, 기울기(Gradient) 기반의 최적화는 정확한 국소 최적점을 찾는 데에는 매우 효과적이지만 전역 최적점에 도달하기 매우 까다롭다. 이에 따라, 본 발명의 일 실시예에서는 MCTS와 기울기 기반의 행동 미세조정을 결합하는 방법을 이용할 수 있다. 이 알고리즘은 MCTS를 통한 전역적 탐색과 행동 값 기울기 기반의 지역적 탐색을 함께 수행할 수 있고, 연속 행동공간을 갖는 현실의 다양한 문제들이 지역적으로 상태와 행동에 대해 미분 가능한 환경 역학을 가지고 있다는 관찰에 기반을 두며, 이 경우 기울기 상승 기반의 최적화가 매우 효과적일 수 있다는 것을 전제한다.MCTS algorithms that deal with the existing continuous action space take two kinds of approaches. In progressive widening, the number of selectable actions of each node in the tree is gradually increased based on the number of visits to the node. Another approach, Hierarchical Optimistic Optimization, progressively partitions the action space to increase the precision of the action to be selected. Although these approaches can show sufficient performance for selecting moderately good behaviors, a huge amount of simulation is required to obtain very precise control as a planned result. On the other hand, gradient-based optimization is very effective in finding an accurate local optimum, but it is very difficult to reach the global optimum. Accordingly, in an embodiment of the present invention, a method of combining MCTS and gradient-based behavior fine-tuning may be used. This algorithm can perform both the global search through MCTS and the local search based on the behavioral value gradient, and is based on the observation that various real-world problems with continuous action spaces have locally differentiable environmental dynamics with respect to states and behaviors. It is based on the assumption that, in this case, the optimization based on gradient rise can be very effective.

이하, 상술한 도 2의 가이드 서비스 제공 서버의 구성에 따른 동작 과정을 도 3 및 도 4를 예로 들어 상세히 설명하기로 한다. 다만, 실시예는 본 발명의 다양한 실시예 중 어느 하나일 뿐, 이에 한정되지 않음은 자명하다 할 것이다.Hereinafter, an operation process according to the configuration of the guide service providing server of FIG. 2 will be described in detail with reference to FIGS. 3 and 4 as an example. However, it will be apparent that the embodiment is only one of various embodiments of the present invention and is not limited thereto.

도 3을 참조하면, (a) 각 공 및 타겟의 벡터 위치를 설정한 후, (b) 시뮬레이션 환경을 구축하여 강화학습을 시작한다. 이때, 기계학습의 한 영역인 강화학습은 특정 환경에서 사용할 수 있도록 정의된 에이전트(Agent)가 현재의 상태(State)를 인식 후 선택할 수 있는 행동들 중 보상(Reward)을 최대화하는 행동순서 또는 행동(Action)을 선택하는 방법이다. 이를 기반으로 심층 강화학습을 통해 실제 하드웨어를 강화학습으로 제어하여 경로계획을 동시에 수행하여 복잡한 문제의 해결을 모색할 수도 있다.Referring to FIG. 3 , (a) after setting the vector positions of each ball and target, (b) building a simulation environment to start reinforcement learning. At this time, reinforcement learning, an area of machine learning, is an action sequence or action that maximizes a reward among actions that an agent defined for use in a specific environment can select after recognizing the current state. How to choose (Action). Based on this, it is possible to solve complex problems by simultaneously performing path planning by controlling the actual hardware with reinforcement learning through deep reinforcement learning.

<가치기반 심층 강화학습><Value-based deep reinforcement learning>

강화학습의 목적은 환경과 상호작용을 이루면서 높은 보상을 획득하는 것이다. 여기서는 보상을 정의하는 두 가지 함수가 있다. 이 중 하나는 앞으로 받을 것이라 예상되는 보상의 기댓값을 정의하는 가치함수(Value function)이다. 이 가치 함수 v(s)는 agent가 t 시점에 상태 데이터 s, 즉 관찰된 데이터를 얻어 그 때 받았던 일련의 보상의 합을 구하는 개념으로 구성되어 있다. 여기에 중요성이 덜해지는 과거의 데이터에 감가율을 곱해 수학식 18 및 19와 같이 나타날 수 있다.The purpose of reinforcement learning is to obtain high rewards while interacting with the environment. Here, there are two functions that define the reward. One of these is the value function, which defines the expected value of the expected reward. This value function v(s) is composed of the concept that the agent obtains the state data s, that is, the observed data at time t, and calculates the sum of the series of rewards received at that time. Here, by multiplying the depreciation rate by the past data, which becomes less important, it can be expressed as Equations 18 and 19.

다만, 가치 함수는 그 상태로 지향하는 행동이 고려되지 않아 상태 가치함수(State-Value Function)라 표기되고, 행동 a를 같이 고려하여 구한 기댓값을 Q-function(Quality function)이라 하므로 수학식 20과 같이 표현될 수 있다.However, the value function is expressed as a state-value function because the behavior directed to the state is not considered, and the expected value obtained by considering the behavior a is called a Q-function (Quality function), so Equation 20 and can be expressed together.

Q-function은 수학식 20과 같이 상태 s와 행동 a로 사용하는 것으로 높은 보상을 기대하는 함수이다. 그리고 Q-function는 그 다음 상태 s′와 행동 a′를 받으며 Bellman 최적화 방정식을 통해 지속적으로 갱신되어 높은 보상을 얻게 된다. 이러한 가치 함수를 수학식 21과 같이 신경망 파라미터 θ를 기반으로 하는 Neural Network 신경망으로 대체하는 알고리즘을 DQN(Deep-Q-Network)이라 하며 수학식 21과 같이 목표 Q-function과의 비용함수(Cost Function)을 계산하고 이를 기초로 off-policy 방식 등을 복합하여 신경망을 갱신한다.The Q-function is a function that expects a high reward by using it as a state s and an action a as in Equation 20. And the Q-function receives the next state s′ and action a′ and is continuously updated through the Bellman optimization equation to obtain a high reward. An algorithm for replacing such a value function with a neural network based on the neural network parameter θ as shown in Equation 21 is called a Deep-Q-Network (DQN), and as shown in Equation 21, a cost function with a target Q-function ) and update the neural network by combining the off-policy method based on this.

<정책기반 심층 강화학습><Policy-based deep reinforcement learning>

정책기반 심층 강화학습에서 정책은 πθ(a|s)로 표현하며, 상태를 받아 정책 신경망 파라미터 θ를 기반으로 연산을 하여 행동을 내보내는 함수이다. 즉 정책 기반 강화학습은 상태에 따라 행동을 선택한다. 따라서 발생하는 가치함수의 변화를 토대로 파라미터 θ를 갱신하게 되고, 갱신 방향은 더 높은 가치함수를 얻는 Gradient Ascent 방향으로 향하게 된다. 이 과정을 PG(Policy Gradient)라 하고 이하 수학식 22와 같이 표현된다.In policy-based deep reinforcement learning, a policy is expressed as πθ(a|s), and it is a function that receives a state, performs an operation based on the policy neural network parameter θ, and emits an action. In other words, policy-based reinforcement learning selects actions according to states. Accordingly, the parameter θ is updated based on the change in the value function that occurs, and the update direction is directed toward the gradient ascent to obtain a higher value function. This process is called PG (Policy Gradient) and is expressed as Equation 22 below.

∇θJ(θ)는, 목표함수로, PG에서는 수학식 23과 같이 정의한다.∇θJ(θ) is a target function, and is defined as Equation 23 in PG.

여기에서 Bias를 줄이기 위해 Baseline V(s) 를 뺀 Advantage Function, 즉 A(s,a) = Q(s,a)-V(s)로 바꾸어 수학식 24로 대체되고, 이를 몬테카를로(Monte-carlo) PG라 부른다.Here, in order to reduce the bias, it is replaced with Equation 24 by changing the Advantage Function by subtracting the baseline V(s), that is, A(s,a) = Q(s,a)-V(s), and this is ) is called PG.

앞의 PG는 SPG(Stochastic Policy Gradient)라고도 하는데, 이 값을 연속한(Continuous)한 행동 값으로 연산하도록 구성한 알고리즘을 DPG라 하고, 여기에 앞의 DQN과 Actor-Critic을 결합시키면 Model-Free, Off-Policy, Actor-Critic 알고리즘인 DDPG가 되고 여기서 Distributional 개념이 추가로 반영되면 D4PG가 된다. Actor-Critic은 알고리즘 내의 정책과 가치함수를 각각 Actor와 Critic이라 할 때 상호 갱신에 도움을 주게 된다. 정책을 갱신할수록 가치함수의 값도 높아지게 되고 높아진 가치함수의 값을 기반으로 정책 또한 더 좋은 방향으로 주고받으며 갱신하기 때문이다.The preceding PG is also called SPG (Stochastic Policy Gradient), and the algorithm configured to calculate this value as a continuous action value is called DPG. It becomes DDPG, which is an Off-Policy, Actor-Critic algorithm, and becomes D4PG when the concept of Distributional is additionally reflected here. Actor-Critic helps mutual update when the policy and value functions in the algorithm are called Actor and Critic, respectively. This is because the value of the value function increases as the policy is updated, and the policy is also updated in a better direction based on the increased value of the value function.

TPRO(Trust Region Policy Optimization)와 PPO(Proximal Policy Optimization)는 Trust Region, 즉 범위 제한을 만들어 PG에서 발생하는 정책의 과한 갱신으로부터 일어나는 문제를 방지하여 안정적인 신경망의 갱신을 가능하게 하는 알고리즘이다. TRPO와 PPO에서는 기존 정책기반 강화학습과 같이 정책 신경망 πθ(a|s)가 환경과 상호작용하여 메모리에 데이터를 모은다. 해당 알고리즘에서는 Timestep 마다 보상값을 모은 후, 자신이 정한 A(advantage)를 계산하고 Importance Sampling을 곱한 값을 사용하여 신경망 갱신을 위한 Loss를 수학식 25와 같이 표현한다.Trust Region Policy Optimization (TPRO) and Proximal Policy Optimization (PPO) are algorithms that enable stable neural network updates by creating trust regions, that is, limiting the scope, and preventing problems from excessive policy updates occurring in the PG. In TRPO and PPO, like the existing policy-based reinforcement learning, the policy neural network πθ(a|s) interacts with the environment to collect data in memory. In the corresponding algorithm, after collecting compensation values for each timestep, the loss for updating the neural network is expressed as Equation 25 by calculating A (advantage) determined by the algorithm and using the value multiplied by Importance Sampling.

(hat) A는 예측된 보상에 관한 항이고, Q-V 값을 A로 사용할 수 있다. 이때, Loss의 범위를 제한하는 Surrogate Function을 PPO의 전신이 되는 TRPO에서는 KL-Divergency를 사용하여 수학식 26과 같이 나타낼 수 있다.(hat) A is a term related to the predicted reward, and the Q-V value can be used as A. In this case, the Surrogate Function for limiting the range of Loss can be expressed as Equation 26 by using KL-Divergency in TRPO, which is the predecessor of PPO.

Surrogate Function을 간단히 하고 TRPO와 유사한 성능을 발휘하는 PPO 알고리즘은, TPRO에서 표현한 위 수식과 달리 PPO에서는 제한 범위를 수학식 27의 rt(θ)를 상수 ε로 제한할 수 있고, 이에 따른 Ldip을 수학식 28과 같이 나타낼 수 있다.Unlike the above formula expressed in TPRO, the PPO algorithm, which simplifies the surrogate function and exhibits similar performance to TRPO, can limit the limiting range in PPO to a constant ε by rt(θ) in Equation 27, and calculate Ldip accordingly It can be expressed as Equation 28.

이후 일반적인 신경망을 갱신하듯 메모리에 쌓인 데이터를 배치(Batch)만큼 잘라 지정된 데이터를 Loss를 이용하여 순서대로 K회 갱신하게 되고, 위로부터 지금까지의 과정을 지속적으로 반복한다. 위 알고리즘을 바탕으로 훈련 코드를 실행하기 위해 PPO의 제한 범위 ε과 같은 상수들을 지정해 줄 필요가 있다. 이후에는 위 상수들을 이용한 알고리즘과 Loss를 이용해 신경망을 갱신하고 데이터를 모으는 과정을 반복하여 보상을 최적화하게 된다. 훈련 코드에서는 위와 같은 Actorcritic 기반 PPO 알고리즘을 사용하게 되는데, 이 알고리즘에서는 학습할 때 Actor와 Critic 신경망을 2개 갱신하는 과정을 거친다. 일반적으로 신경망의 연산은 높은 연산 능력을 필요로 하나 해당 신경망을 경량화하여 갱신하기 위해 State로부터 지정한 개수만큼의 상태 데이터를 받아들이고, Linear Regression과 Tanh Activation Function을 순서대로 번갈아 노드를 통하는 가벼운 연산을 거치도록 할 수도 있다. 또한 일반적인 신경망에서 사용하는 Fully Connected Layer가 없으므로 학습과 테스트 과정의 데이터 연산이 2중으로 경감될 수 있다.After that, like updating a general neural network, the data accumulated in the memory is cut by a batch, and the designated data is updated K times in order using Loss, and the process from the top to the present is continuously repeated. To execute the training code based on the above algorithm, it is necessary to specify constants such as the limiting range ε of the PPO. After that, the compensation is optimized by repeating the process of updating the neural network and collecting data using the algorithm and loss using the above constants. In the training code, the Actorcritic-based PPO algorithm as above is used. In this algorithm, when learning, two updates of Actor and Critic neural networks are performed. In general, the computation of a neural network requires high computational power, but in order to lighten and update the neural network, it receives the specified number of state data from the State, and performs light operation through the nodes in order by alternating Linear Regression and Tanh Activation Function. You may. In addition, since there is no Fully Connected Layer used in general neural networks, data operation in the training and testing process can be reduced to two.

Unity는 자체 소프트웨어를 강화학습의 환경으로 삼고 그 외 학습을 Python에서 신경망의 학습을 진행할 수 있도록 하는 서비스 ML-Agent(Machine Learning-Agent)를 제공한다. 이를 통해 Python으로 구성한 코드의 환경 테스트를 Unity 시뮬레이터에서 진행할 수 있게 된다. 강화학습 코드는 관측한 상태 데이터를 연산하여 행동, 즉 컨트롤 데이터를 시뮬레이터에 사용한다. 즉 사람이 넣던 입력을 인공지능이 대신해 넣게 된다. 이러한 결과는 (b)와 같이 사용자에게 공유되며, 사용자는 어느 강점을, 어느 정도의 힘으로 어느 방향으로 쳐야하는지를 알 수 있게 된다.Unity provides ML-Agent (Machine Learning-Agent), a service that uses its own software as an environment for reinforcement learning and enables learning of neural networks in Python. Through this, the environment test of the code composed in Python can be performed in the Unity simulator. The reinforcement learning code calculates the observed state data and uses the behavior, that is, the control data, to the simulator. In other words, artificial intelligence replaces human input. These results are shared with the user as shown in (b), and the user can know which strength to hit and in which direction with what amount of force.

이와 같은 도 2 내지 도 4의 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1을 통해 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters that have not been described with respect to the method of providing the billiard guide service using reinforcement learning of the deep learning model of FIGS. 2 to 4 are described above with respect to the method of providing the billiard guide service using the reinforcement learning of the deep learning model through FIG. 1. Since it is the same as the content or can be easily inferred from the described content, the following description will be omitted.

도 5는 본 발명의 일 실시예에 따른 도 1의 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 시스템에 포함된 각 구성들 상호 간에 데이터가 송수신되는 과정을 나타낸 도면이다. 이하, 도 5를 통해 각 구성들 상호간에 데이터가 송수신되는 과정의 일 예를 설명할 것이나, 이와 같은 실시예로 본원이 한정 해석되는 것은 아니며, 앞서 설명한 다양한 실시예들에 따라 도 5에 도시된 데이터가 송수신되는 과정이 변경될 수 있음은 기술분야에 속하는 당업자에게 자명하다.5 is a diagram illustrating a process in which data is transmitted/received between components included in the system for providing a billiard guide service using reinforcement learning of the deep learning model of FIG. 1 according to an embodiment of the present invention. Hereinafter, an example of a process in which data is transmitted and received between each component will be described with reference to FIG. 5, but the present application is not limited to such an embodiment, and the example shown in FIG. 5 according to the various embodiments described above will be described. It is apparent to those skilled in the art that the data transmission/reception process may be changed.

도 5를 참조하면, 가이드 서비스 제공 서버는, 물리엔진 기반 시뮬레이션 환경의 구축을 위한 적어도 하나의 오픈소스를 이용하여 당구 시뮬레이션 환경을 구축한다(S5100).Referring to FIG. 5 , the guide service providing server builds a billiards simulation environment using at least one open source for building a physics engine-based simulation environment ( S5100 ).

그리고, 가이드 서비스 제공 서버는, 구축된 당구 시뮬레이션 환경 내에서 타겟 및 적어도 하나의 공의 좌표를 랜덤으로 초기 설정한 후, 타겟 및 적어도 하나의 공의 좌표를 위치 벡터로 설정한다(S5200).Then, the guide service providing server randomly initially sets the coordinates of the target and the at least one ball in the established billiard simulation environment, and then sets the coordinates of the target and the at least one ball as a position vector ( S5200 ).

또, 가이드 서비스 제공 서버는, 타겟 내 당점 및 당점을 타격하는 힘에 따른 타겟 및 적어도 하나의 공의 궤적을 그리는 시뮬레이션을 반복한 후 적어도 하나의 딥러닝 모델로 강화학습을 이용하여 결과에 따른 보상을 제공함으로써 훈련을 진행한다(S5300).In addition, the guide service providing server repeats the simulation of drawing the trajectory of the target and at least one ball according to the force hitting the point in the target and the point in the target, and then uses reinforcement learning with at least one deep learning model to compensate according to the result. By providing the training proceeds (S5300).

마지막으로, 가이드 서비스 제공 서버는, 사용자 단말에서 타겟 및 적어도 하나의 공의 좌표를 인식 또는 지정하는 경우, 타겟의 당점 및 당점을 타격하는 힘에 따른 적어도 하나의 경우의 수에 따른 시뮬레이션 결과를 사용자 단말로 전송한다(S5400).Finally, the guide service providing server, when recognizing or designating the coordinates of the target and at least one ball in the user terminal, provides a simulation result according to the number of at least one case according to the force hitting the point of the target and the point of the target. It is transmitted to the terminal (S5400).

상술한 단계들(S5100~S5400)간의 순서는 예시일 뿐, 이에 한정되지 않는다. 즉, 상술한 단계들(S5100~S5400)간의 순서는 상호 변동될 수 있으며, 이중 일부 단계들은 동시에 실행되거나 삭제될 수도 있다.The order between the above-described steps (S5100 to S5400) is only an example and is not limited thereto. That is, the order between the above-described steps S5100 to S5400 may be mutually changed, and some of these steps may be simultaneously executed or deleted.

이와 같은 도 5의 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 4를 통해 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.Matters that have not been described with respect to the method of providing a billiard guide service using reinforcement learning of the deep learning model of FIG. Since it is the same as the content or can be easily inferred from the described content, the following description will be omitted.

도 5를 통해 설명된 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The method of providing a billiard guide service using reinforcement learning of a deep learning model according to an embodiment described with reference to FIG. 5 is in the form of a recording medium including instructions executable by a computer, such as an application or program module executed by a computer can also be implemented. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 일 실시예에 따른 딥러닝 모델의 강화학습을 이용한 당구 가이드 서비스 제공 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.The method for providing a billiard guide service using reinforcement learning of a deep learning model according to an embodiment of the present invention described above includes an application basically installed in a terminal (which may include a program included in a platform or an operating system basically installed in the terminal) ), and may be executed by an application (ie, a program) installed directly on the master terminal by a user through an application providing server such as an application store server, an application, or a web server related to the corresponding service. In this sense, the method for providing a billiard guide service using reinforcement learning of a deep learning model according to an embodiment of the present invention described above is implemented as an application (that is, a program) installed by default in a terminal or directly installed by a user and installed in the terminal. It may be recorded on a computer-readable recording medium, such as.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

After recognizing the coordinates of the target and the at least one ball as a position vector, the result of simulating the trajectory of the target and the at least one ball according to the hitting part on the surface of the target and the force hitting the point is output a user terminal to;
A building unit that builds a billiard simulation environment using at least one open source for building a physics engine-based simulation environment, after initially setting the coordinates of a target and at least one ball at random in the built billiard simulation environment, After repeating the simulation of drawing the trajectory of the target and the at least one ball according to the initialization unit setting the coordinates of the target and the at least one ball as a position vector, the point in the target, and the force hitting the point, at least one A training unit that conducts training by providing a reward according to the result using reinforcement learning as a deep learning model, when recognizing or designating the coordinates of the target and at least one ball in the user terminal, the target of the target and the A transmission unit that transmits a simulation result according to the number of at least one case according to the hitting force to the user terminal, and a simulation result of any one of a simulation result according to the number of at least one case transmitted from the transmission unit to the user When selected in the terminal, after receiving a pressure from the user of the user terminal to a device for measuring a force for realizing any one simulation result selected by the user terminal, the force according to the pressure and the force for realizing the simulation result A guide service providing server including a determination unit for determining whether the same or similar; and
AR glasses that work with the user terminal and the guide service providing server;
includes,
The training department
Through the reinforcement learning, learning is carried out in a way that recognizes the current state in the environment in which the billiard is placed and maximizes the reward through action without a dataset for training and testing. and
The guide service providing server,
Overlaying the trajectory or coordinates of our shop according to the number of cases included in the simulation result on the AR glass,
The device for measuring the force is
Reinforcement learning of a deep learning model, characterized in that after comparing the magnitude of the force to be applied to the point and the magnitude of the force applied to the device that measures the force, an increase/decrease indication is provided so that the user can estimate the magnitude of the force Billiard guide service provision system using

The method of claim 1,
The reinforcement learning is a billiard guide service providing system using reinforcement learning of a deep learning model, characterized in that it uses a MODEL-BASED method or a MODEL-FREE method.

The method of claim 1,
the number of said at least one ball is two;
The initialization unit,
Observation information by expressing the randomly set position of the target as (xt, yt, zt), the position of the first ball as (xball1, yball1, zball1), and the position of the second ball as (xball2, yball2, zball2) as vectors Billiard guide service providing system using reinforcement learning of deep learning model, characterized in that it is used as

4. The method of claim 3,
The training department
Billiard guide service providing system using reinforcement learning of a deep learning model, characterized in that the point, which is the hitting part on the surface of the target, is expressed as a vector as (x1, y1, z1) based on the target center point.

5. The method of claim 4,
The training department
Billiard guide service providing system using reinforcement learning of a deep learning model, characterized in that the force to hit the point and the direction to hit the point are expressed as vectors as f and (x2, y2, z2), respectively.

6. The method of claim 5,
The training department
9 consecutive values included in (xt, yt, zt), (xball1, yball1, zball1) and (xball2, yball2, zball2) are input, and (x1, y1, z1), (x2, y2, A billiard guide service providing system using reinforcement learning of a deep learning model, characterized in that the deep learning model that outputs seven consecutive values included in z2) and f is trained by reinforcement learning through repetition of simulation.

delete

The method of claim 1,
Billiard guide service providing system using reinforcement learning of the deep learning model,
at least one camera installed in a direction opposite to the billiard table to illuminate the billiard table for the user of the user terminal to play billiard;
Billiard guide service providing system using reinforcement learning of the deep learning model, characterized in that it further comprises.

9. The method of claim 8,
Billiard guide service providing system using reinforcement learning of the deep learning model,
at least one display for overlaying and outputting a simulation result output from the transmitter on the image of the pool table captured by the at least one camera;
Billiard guide service providing system using reinforcement learning of the deep learning model, characterized in that it further comprises.

delete