KR102258128B1

KR102258128B1 - User motion analysis method for dance training using ai-based image recognition

Info

Publication number: KR102258128B1
Application number: KR1020200154751A
Authority: KR
Inventors: 이상기
Original assignee: 주식회사 큐랩
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2021-05-31

Abstract

The present invention relates to a user motion analysis method for dance training using artificial intelligence based image recognition and, more particularly, to a user motion analysis method for dance training, which analyzes a dance motion of a user performing dance training in real time, comprising the steps of: (1) receiving image data of a user, who performs dance training, taken by a camera; (2) recognizing the user from the received image data and separating a background from a user area; (3) extracting a skeleton from the separated user area based on recognition of body portions of the user by using joint information and estimating a pose by using the extracted skeleton; and (4) analyzing the user motion for the estimated pose, wherein in step (3), the pose is estimated by using a deep learning based pretrained pose estimation model. According to the user motion analysis method for dance training using the artificial intelligence based image recognition proposed in the present invention, the pose is estimated and the user motion is analyzed with the pose estimation model pre-trained on the basis of the deep learning by using the image data photographed by a general camera, thereby efficiently and accurately analyzing the motion of the user, who performs the dance training, even without a special camera like Kinect.

Description

User motion analysis method for dance training using artificial intelligence-based image recognition {USER MOTION ANALYSIS METHOD FOR DANCE TRAINING USING AI-BASED IMAGE RECOGNITION}

본 발명은 댄스 트레이닝을 위한 사용자 모션 분석 방법에 관한 것으로서, 보다 구체적으로는 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에 관한 것이다.The present invention relates to a user motion analysis method for dance training, and more particularly, to a user motion analysis method for dance training using image recognition based on artificial intelligence.

일반적으로 한류(韓流)는 한국의 대중문화가 해외로 전파되어 대중적으로 소비되고 있는 현상을 말한다. 한류는 한류 생성기(1997년 내지 2000년) 및 한류 심화 2기(2000년대 중반)를 넘어, 한류 다양화의 3기(2000년 중반 이후)를 맞이하고 있다. 특히, 한류 3기에서 한류는 드라마, 음악, 영화, 게임 등의 콘텐츠를 소비하는 문화에서 직접 참여하거나 체험하고자 하는 문화로까지 발전하고 있다.In general, the Korean Wave refers to a phenomenon in which Korean popular culture is spread abroad and consumed by the public. The Korean Wave has passed the generation period of the Korean Wave (1997 to 2000) and the second stage of the deepening of the Korean Wave (mid-2000s), and is entering the third stage of Korean Wave diversification (after mid-2000). In particular, in the third period of the Korean Wave, the Korean Wave is evolving from a culture that consumes contents such as dramas, music, movies, and games to a culture that wants to directly participate or experience.

한류의 확산에 따라 K-Pop 아카데미 이외에도, 유튜브(Youtube) 등의 영상 스트리밍 서비스를 이용하여 직접 K-Pop 댄스를 트레이닝하고자 하는 사용자가 증가하고 있다. 그러나 이러한 동영상을 이용한 댄스 트레이닝 방식은 사용자가 업로드된 동영상에서 아이돌(Idol)의 댄스 동작을 단순히 따라 하는 것에 불과하므로, 제대로 된 댄스 트레이닝이 이루어질 수 없다는 문제점이 있다. 사용자는 아무런 피드백 없이 동영상을 통해 일방적으로만 영상 정보를 전달받으면서, 현재 자신이 추고 있는 댄스 동작이 얼마나 정확한지 또는 어느 부분이 일치하지 않는지를 스스로 판단해야 하기 때문이다.With the spread of the Korean Wave, in addition to the K-Pop Academy, users who want to directly train K-Pop dance using video streaming services such as YouTube are increasing. However, since the dance training method using such a video simply imitates the dance movement of an idol in the video uploaded by the user, there is a problem that proper dance training cannot be performed. This is because the user has to determine for himself how accurate the dance movement he is currently performing or which part does not match, while receiving video information only unilaterally through the video without any feedback.

특히, 동영상을 이용한 댄스 트레이닝 방식에서 사용자는 자신의 동작에 대해 아무런 피드백을 받을 수 없으므로, 사용자가 댄스 트레이닝을 지속하기가 매우 어렵다는 문제점이 있다. 댄스 트레이닝을 지속할 수 있도록 돕는 전문 강사 없이 사용자는 동영상만을 보며 정해진 동작을 반복해야 하므로, 댄스 트레이닝에 따른 성취감을 전혀 느낄 수 없기 때문이다.In particular, in the dance training method using a video, there is a problem that it is very difficult for the user to continue the dance training because the user cannot receive any feedback on his/her movement. This is because, without a professional instructor helping to continue the dance training, the user has to repeat the specified movements while only watching the video, so they cannot feel the sense of accomplishment from the dance training at all.

이와 같은 문제를 해결하기 위해, 등록특허 제10-1989447호(발명의 명칭: 증강현실을 이용하여 사용자에게 영상 피드백을 제공하는 댄스 모션 피드백 시스템, 등록일자: 2019년 06월 10일)가 개시된 바 있다.In order to solve such a problem, Patent No. 10-1989447 (name of invention: dance motion feedback system that provides image feedback to users using augmented reality, registration date: June 10, 2019) was disclosed. have.

선행기술에 따르면, 증강현실을 통해 사용자에게 영상으로 피드백이 제공됨으로써, 사용자가 댄스 동작을 따라 하며 정확하게 댄스를 습득할 수 있고, 사용자가 추고 있는 댄스 동작이 정확한지 판단할 수 있으며, 사용자의 모션 영상에 실시간으로 피드백을 제공함으로써, 사용자가 댄스를 추고 있는 동안에도 실시간으로 피드백을 제공할 수 있으며, 증강현실에 표현되는 영상 정보를 이용하여 사용자가 댄스 동작을 스스로 교정할 수 있도록 지원할 수 있다.According to the prior art, feedback is provided as an image to the user through augmented reality, so that the user can follow the dance movement and learn the dance accurately, determine whether the dance movement the user is dancing is correct, and the user's motion image By providing feedback in real time to the user, it is possible to provide feedback in real time even while the user is dancing, and it is possible to support the user to correct the dance movement by himself using image information expressed in augmented reality.

그러나 선행기술에서는, 모션 인식 카메라를 이용해 동작을 감지하기 때문에, 키넥트 등 특수 카메라를 사용해야 하고, 이를 지원하는 별도의 소프트웨어도 필요한 한계가 있다. 따라서 활용 범위를 넓히기 위해서는, 댄스 트레이닝을 위한 사용자 모션을 좀 더 효율적으로 분석할 수 있는 기술의 개발이 필요한 실정이다.However, in the prior art, since motion is detected using a motion recognition camera, a special camera such as Kinect must be used, and a separate software supporting this is also required. Therefore, in order to expand the scope of use, it is necessary to develop a technology that can more efficiently analyze user motion for dance training.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 일반 카메라에서 촬영한 영상 데이터를 이용해 딥러닝 기반으로 사전 학습된 포즈 추정 모델을 이용해 포즈를 추정하고 사용자 모션을 분석함으로써, 키넥트 등 특수한 카메라 없이도 댄스 트레이닝을 하는 사용자의 모션을 효율적이고 정확하게 분석할 수 있는, 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법을 제공하는 것을 그 목적으로 한다.The present invention is proposed in order to solve the above problems of the previously proposed methods. Using image data captured by a general camera, the present invention estimates a pose using a pre-learned pose estimation model based on deep learning and analyzes user motion. Therefore, it is an object of the present invention to provide a user motion analysis method for dance training using artificial intelligence-based image recognition that can efficiently and accurately analyze the motion of a user performing dance training without a special camera such as Kinect.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법은,A user motion analysis method for dance training using image recognition based on artificial intelligence according to a feature of the present invention for achieving the above object,

댄스 트레이닝을 위한 사용자 모션 분석 방법으로서,As a user motion analysis method for dance training,

댄스 트레이닝을 하는 사용자의 댄스 모션을 실시간으로 분석하되,Analyzing the dance motion of the user who is performing dance training in real time,

(1) 상기 댄스 트레이닝을 하는 사용자의 모습을 카메라가 촬영한 영상 데이터를 입력받는 단계;(1) receiving image data captured by a camera of the user performing the dance training;

(2) 상기 입력받은 영상 데이터에서 사용자를 인식하여, 배경과 사용자 영역을 분리하는 단계;(2) recognizing a user from the input image data and separating a background from a user region;

(3) 상기 분리된 사용자 영역에서, 사용자의 신체 부위 인식 기반의 관절 정보를 이용해 뼈대를 추출하고, 상기 추출한 뼈대를 이용해 포즈를 추정하는 단계; 및(3) extracting a skeleton using joint information based on the user's body part recognition in the separated user area, and estimating a pose using the extracted skeleton; And

(4) 상기 추정된 포즈에 대한 사용자 모션을 분석하는 단계를 포함하며,(4) comprising the step of analyzing the user motion for the estimated pose,

상기 단계 (3)에서는, 딥러닝 기반으로 사전 학습된 포즈 추정 모델을 이용해 포즈를 추정하는 것을 그 구성상의 특징으로 한다.In step (3), a pose is estimated using a pre-learned pose estimation model based on deep learning.

바람직하게는,Preferably,

상기 단계 (1)에서는, 상기 카메라가 촬영한 2차원의 영상 데이터를 처리하여 뎁스 맵(Depth map)으로 변환하며,In the step (1), the two-dimensional image data captured by the camera is processed and converted into a depth map,

상기 단계 (2)에서는, 상기 뎁스 맵으로부터 사용자 영역을 검출하여 분리할 수 있다.In step (2), the user region may be detected and separated from the depth map.

바람직하게는, 상기 단계 (2)는,Preferably, the step (2),

(2-1) 상기 입력받은 영상 데이터로부터 사용자 모습을 포함하는 복수의 후보 영역을 검출하는 단계; 및(2-1) detecting a plurality of candidate regions including a user image from the input image data; And

(2-2) 이전 프레임의 사용자 영역과의 관계를 고려하여, 상기 복수의 후보 영역 중 현재 프레임의 사용자 영역을 특정하고 배경과 분리하는 단계를 포함하는 것을 특징으로 하는, 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법.(2-2) AI-based image recognition, comprising the step of specifying a user region of the current frame among the plurality of candidate regions and separating it from the background in consideration of the relationship with the user region of the previous frame. User motion analysis method for dance training using

더욱 바람직하게는, 상기 단계 (2-2)에서는,More preferably, in the step (2-2),

상기 복수의 후보 영역과 이전 프레임의 사용자 영역 사이의 매칭 점수를 각각 산출하고, 가장 높은 매칭 점수가 산출된 후보 영역을 상기 현재 프레임의 사용자 영역으로 특정할 수 있다.Matching scores between the plurality of candidate regions and the user region of the previous frame may be calculated, and the candidate region for which the highest matching score was calculated may be specified as the user region of the current frame.

바람직하게는, 상기 단계 (2)에서는,Preferably, in step (2),

상기 입력받은 영상 데이터에서 인식되는 사용자가 복수이면, 복수의 사용자 영역을 배경과 각각 분리하며, 분리된 사용자 영역에 태깅하여 각각의 사용자를 식별할 수 있다.If there are a plurality of users recognized in the input image data, the plurality of user regions may be separated from the background, and each user may be identified by tagging the separated user regions.

바람직하게는, 상기 단계 (3)은,Preferably, the step (3),

(3-1) 상기 사용자 영역을 상기 포즈 추정 모델의 입력으로 하여, 각각의 관절의 좌표 분포 맵을 추정하는 단계;(3-1) estimating a coordinate distribution map of each joint by using the user region as an input of the pose estimation model;

(3-2) 상기 단계 (3-1)에서 추정된 관절의 좌표 분포 맵으로부터 각각의 관절의 좌표를 획득하는 단계;(3-2) obtaining the coordinates of each joint from the coordinate distribution map of the joint estimated in step (3-1);

(3-3) 상기 단계 (3-2)에서 획득한 관절의 좌표를 연결하여 뼈대를 추출하는 단계; 및(3-3) extracting a skeleton by connecting the coordinates of the joint obtained in step (3-2); And

(3-4) 상기 추출한 뼈대로부터 사용자 포즈를 추정하는 단계를 포함할 수 있다.(3-4) It may include the step of estimating a user pose from the extracted skeleton.

바람직하게는, 상기 단계 (4)에서는,Preferably, in step (4),

인체 레퍼런스 모델과 상기 추정된 포즈를 비교하여, 사용자 모션을 분석할 수 있다.The user motion may be analyzed by comparing the human body reference model with the estimated pose.

더욱 바람직하게는, 상기 단계 (4)에서는,More preferably, in the step (4),

상기 인체 레퍼런스 모델의 뼈대와 상기 단계 (3)에서 추출한 뼈대 사이의 유사도를 측정하여 비교할 수 있다.The similarity between the skeleton of the human body reference model and the skeleton extracted in step (3) can be measured and compared.

본 발명에서 제안하고 있는 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에 따르면, 일반 카메라에서 촬영한 영상 데이터를 이용해 딥러닝 기반으로 사전 학습된 포즈 추정 모델을 이용해 포즈를 추정하고 사용자 모션을 분석함으로써, 키넥트 등 특수한 카메라 없이도 댄스 트레이닝을 하는 사용자의 모션을 효율적이고 정확하게 분석할 수 있다.According to the user motion analysis method for dance training using image recognition based on artificial intelligence proposed in the present invention, a pose is estimated using a pre-learned pose estimation model based on deep learning using image data captured by a general camera. By analyzing user motion, it is possible to efficiently and accurately analyze the motion of the user who is performing dance training without a special camera such as Kinect.

도 1은 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법을 종래 기술과 비교하여 설명하기 위해 도시한 도면.
도 2는 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법의 구현을 위한 시스템 구성을 도시한 도면.
도 3은 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법의 흐름을 도시한 도면.
도 4는 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서, 단계 S200의 세부적인 흐름을 도시한 도면.
도 5는 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서, 단계 S300의 세부적인 흐름을 도시한 도면.
도 6은 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서, 입력된 영상 데이터로부터 포즈를 추정하는 단계 S200 및 단계 S300의 과정을 도시한 도면.
도 7은 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서, 포즈 추정 모델의 생성을 위해 수집한 영상 데이터를 예를 들어 도시한 도면.1 is a diagram illustrating a method for analyzing a motion of a user for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention in comparison with the prior art.
2 is a diagram showing a system configuration for implementing a user motion analysis method for dance training using artificial intelligence-based image recognition according to an embodiment of the present invention.
3 is a diagram illustrating a flow of a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention.
4 is a diagram showing a detailed flow of step S200 in a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention.
5 is a diagram showing a detailed flow of step S300 in a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention.
6 is a diagram illustrating a process of steps S200 and S300 of estimating a pose from input image data in a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention.
FIG. 7 is a diagram illustrating, for example, image data collected to generate a pose estimation model in a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.Hereinafter, preferred embodiments will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. However, in describing a preferred embodiment of the present invention in detail, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. In addition, the same reference numerals are used throughout the drawings for parts having similar functions and functions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 ‘연결’ 되어 있다고 할 때, 이는 ‘직접적으로 연결’ 되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 ‘간접적으로 연결’ 되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 ‘포함’ 한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.In addition, in the entire specification, when a part is said to be'connected' with another part, it is not only'directly connected', but also'indirectly connected' with another element in the middle. Includes. In addition, "including" a certain component means that other components may be further included rather than excluding other components unless otherwise stated.

도 1은 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법을 종래 기술과 비교하여 설명하기 위해 도시한 도면이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법은, 다중 센서, 3D 뎁스 카메라, 3D 장비 등 복잡하고 고가의 장비들을 이용하여 영상 및 모션 분석을 하는 종래의 기술을 극복하고, 일반 카메라의 영상에 AI 기술을 접목하여, 뎁스 카메라, 키넥트 등 특수한 장비 없이도 댄스 트레이닝을 하는 사용자의 모션을 효율적이고 정확하게 분석할 수 있는 방법에 관한 것이다.FIG. 1 is a diagram illustrating a method of analyzing a motion of a user for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention in comparison with the prior art. As shown in FIG. 1, the user motion analysis method for dance training using artificial intelligence-based image recognition according to an embodiment of the present invention uses complex and expensive equipment such as multiple sensors, 3D depth cameras, and 3D equipment. By overcoming the conventional technology of analyzing video and motion using video and motion analysis, and by incorporating AI technology to the video of a general camera, it is possible to efficiently and accurately analyze the motion of the user who performs dance training without special equipment such as depth camera and Kinect. It's about the method.

도 2는 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법의 구현을 위한 시스템 구성을 도시한 도면이다. 도 2에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법은, 카메라(200)로부터 사용자의 댄스 영상을 전달받아 분석하는 모션 분석 장치(100)에 의해 구현될 수 있다. 보다 구체적으로, 댄스 트레이닝 콘텐츠를 제공하는 댄스 트레이닝 시스템(10)은, 카메라(200), 모션 분석 장치(100) 및 댄스 트레이닝 제공 장치(300)를 포함하여 구성될 수 있으며, 댄스 트레이닝 제공 장치(300)가 댄스 트레이닝 콘텐츠를 사용자에게 제공하면서, 모션 분석 장치(100)가 분석한 사용자의 모션을 기반으로 다양한 피드백을 제공할 수 있다.2 is a diagram illustrating a system configuration for implementing a user motion analysis method for dance training using artificial intelligence-based image recognition according to an embodiment of the present invention. As shown in FIG. 2, the user motion analysis method for dance training using artificial intelligence-based image recognition according to an embodiment of the present invention is a motion analysis that receives and analyzes a user's dance image from the camera 200. It can be implemented by the device 100. More specifically, the dance training system 10 providing dance training content may include a camera 200, a motion analysis device 100, and a dance training providing device 300, and a dance training providing device ( While 300) provides the dance training content to the user, the motion analysis apparatus 100 may provide various feedbacks based on the user's motion analyzed.

본 발명은 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에 관한 것으로서, 메모리 및 프로세서를 포함한 하드웨어에서 기록되는 소프트웨어로 구성될 수 있다. 예를 들어, 본 발명의 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법은, 개인용 컴퓨터, 노트북 컴퓨터, 서버 컴퓨터, PDA, 스마트폰, 태블릿 PC 등에 저장 및 구현될 수 있다. 이하에서는 설명의 편의를 위해, 각 단계를 수행하는 주체는 생략될 수 있다.The present invention relates to a user motion analysis method for dance training using artificial intelligence-based image recognition, and may be composed of software recorded in hardware including a memory and a processor. For example, the user motion analysis method for dance training using artificial intelligence-based image recognition of the present invention may be stored and implemented in a personal computer, a notebook computer, a server computer, a PDA, a smart phone, a tablet PC, or the like. Hereinafter, for convenience of description, a subject performing each step may be omitted.

도 3은 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법의 흐름을 도시한 도면이다. 도 3에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법은, 댄스 트레이닝 시스템(10)을 통해 댄스 트레이닝 콘텐츠를 제공받아 댄스 트레이닝을 하는 사용자의 댄스 모션을 실시간으로 분석하되, 댄스 트레이닝을 하는 사용자의 모습을 카메라(200)가 촬영한 영상 데이터를 입력받는 단계(S100), 입력받은 영상 데이터에서 사용자를 인식하여, 배경과 사용자 영역을 분리하는 단계(S200), 분리된 사용자 영역에서 사용자의 신체 부위 인식 기반의 관절 정보를 이용해 뼈대를 추출하고, 추출한 뼈대를 이용해 포즈를 추정하는 단계(S300) 및 추정된 포즈에 대한 사용자 모션을 분석하는 단계(S400)를 포함하여 구현될 수 있다.3 is a diagram illustrating a flow of a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention. As shown in FIG. 3, a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention is provided with dance training content through the dance training system 10 to perform dance training. Analyzing the dance motion of the user performing the dance in real time, but receiving the image data captured by the camera 200 of the user performing the dance training (S100), by recognizing the user from the received image data, the background and the user Separating the region (S200), extracting a skeleton using joint information based on the user's body part recognition from the separated user region, estimating a pose using the extracted skeleton (S300), and a user motion for the estimated pose It may be implemented including the step of analyzing (S400).

단계 S100에서는, 댄스 트레이닝을 하는 사용자의 모습을 카메라(200)가 촬영한 영상 데이터를 입력받을 수 있다. 여기서, 카메라(200)는 일반적인 2차원 영상을 촬영하는 카메라(200)로, 웹캠, 노트북이나 휴대전화 등에 구비된 카메라(200) 등을 모두 포함할 수 있다.In step S100, image data captured by the camera 200 of a user performing dance training may be input. Here, the camera 200 is a camera 200 for photographing a general 2D image, and may include all of the cameras 200 provided in a webcam, a laptop computer, a mobile phone, and the like.

실시예에 따라서, 단계 S100에서는, 카메라(200)가 촬영한 2차원의 영상 데이터를 처리하여 뎁스 맵(Depth map)으로 변환할 수 있다. 뎁스 맵은, 3차원 컴퓨터 그래픽스에서 관찰 시점(viewpoint)으로부터 물체 표면과의 거리와 관련된 정보가 담긴 하나의 영상을 의미하는 것으로서, 단계 S100에서는, 카메라(200)가 촬영한 영상 데이터를 흑백으로 변환하고, 명도 차이에 따라 깊이가 차등화하여 표시되도록 변환하여 뎁스 맵을 생성할 수 있다. 뎁스 맵을 통해 2차원의 영상 데이터로부터 3차원의 사용자 모션을 파악할 수 있다.According to an embodiment, in step S100, the two-dimensional image data captured by the camera 200 may be processed and converted into a depth map. The depth map means one image containing information related to a distance from an observation point to an object surface in 3D computer graphics. In step S100, the image data captured by the camera 200 is converted to black and white. In addition, the depth map may be generated by converting so that the depth is differentiated and displayed according to the difference in brightness. It is possible to grasp a 3D user motion from 2D image data through the depth map.

단계 S200에서는, 입력받은 영상 데이터에서 사용자를 인식하여, 배경과 사용자 영역을 분리할 수 있다. 이때, 단계 S200에서는 딥러닝 기반으로 사전 학습된 객체 인식 모델을 통해 사용자를 인식하고, 배경과 사용자 영역을 분리할 수 있다. 여기서, 단계 S100에서 영상 데이터를 뎁스 맵으로 변환해 사용하는 경우, 단계 S200에서는 뎁스 맵으로부터 사용자 영역을 검출하여 분리할 수 있다.In step S200, by recognizing a user from the input image data, the background and the user region may be separated. In this case, in step S200, a user may be recognized through an object recognition model pre-trained based on deep learning, and a background and a user region may be separated. Here, when the image data is converted into a depth map in step S100 and used, in step S200, a user region may be detected and separated from the depth map.

단계 S200의 사전 학습된 객체 인식 모델은, 사람을 인식해 경계 상자(bounding box)로 구분하도록 대량의 학습 데이터로 학습된 딥러닝 모델로서, ResNet, ResNet50-FPN, Fast R-CNN, Faster R-CNN, Mask R-CNN 등을 이용해 사용자를 인식하고 사용자 영역을 분리할 수 있다.The pre-trained object recognition model in step S200 is a deep learning model trained with a large amount of training data to recognize people and classify them into bounding boxes. Users can be recognized and user domains can be separated using CNN, Mask R-CNN, etc.

또한, 단계 S200에서는, 입력받은 영상 데이터에서 인식되는 사용자가 복수이면, 복수의 사용자 영역을 배경과 각각 분리하며, 분리된 사용자 영역에 태깅하여 각각의 사용자를 식별할 수 있다. 즉, 영상 데이터에 복수의 사용자가 촬영되어 있을 수 있는데, 단계 S200에서는 사용자별로 각각 사용자 영역을 분리하고, 각각의 사용자를 식별할 수 있다.In addition, in step S200, if there are a plurality of users recognized in the input image data, the plurality of user regions may be separated from the background, and each user may be identified by tagging the separated user regions. That is, a plurality of users may be photographed in the image data. In step S200, each user area may be separated for each user, and each user may be identified.

이하에서는, 도 6을 참조하여 단계 S200의 세부적인 흐름에 대해 상세히 설명하도록 한다.Hereinafter, a detailed flow of step S200 will be described in detail with reference to FIG. 6.

도 4는 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서, 단계 S200의 세부적인 흐름을 도시한 도면이다. 도 4에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법의 단계 S200은, 입력받은 영상 데이터로부터 사용자 모습을 포함하는 복수의 후보 영역을 검출하는 단계(S210) 및 이전 프레임의 사용자 영역과의 관계를 고려하여 복수의 후보 영역 중 현재 프레임의 사용자 영역을 특정하고 배경과 분리하는 단계(S220)를 포함하여 구현될 수 있다.4 is a diagram illustrating a detailed flow of step S200 in a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention. As shown in FIG. 4, step S200 of the user motion analysis method for dance training using artificial intelligence-based image recognition according to an embodiment of the present invention includes a plurality of candidates including a user image from the input image data. It may be implemented including the step of detecting the region (S210) and the step (S220) of specifying the user region of the current frame among the plurality of candidate regions and separating it from the background in consideration of the relationship with the user region of the previous frame.

단계 S210에서는, 입력받은 영상 데이터로부터 사용자 모습을 포함하는 복수의 후보 영역을 검출할 수 있다. 여기서, 복수의 후보 영역들은 적어도 일부가 중첩될 수 있다. 객체 인식 모델을 이용해 사용자 영역을 검출하면, 경계 상자(bounding box) 형태로 후보 영역을 검출하게 되는데, 목표로 하는 하나의 사용자 영역에 대해 다양한 크기와 형태를 가지는 복수 개의 후보 영역이 검출될 수 있다.In step S210, a plurality of candidate regions including a user image may be detected from the input image data. Here, at least some of the plurality of candidate regions may overlap. When the user region is detected using the object recognition model, the candidate region is detected in the form of a bounding box, and a plurality of candidate regions having various sizes and shapes can be detected for one target user region. .

단계 S220에서는, 이전 프레임의 사용자 영역과의 관계를 고려하여, 복수의 후보 영역 중 현재 프레임의 사용자 영역을 특정하고 배경과 분리할 수 있다. 보다 구체적으로, 단계 S220에서는, 복수의 후보 영역과 이전 프레임의 사용자 영역 사이의 매칭 점수를 각각 산출하고, 가장 높은 매칭 점수가 산출된 후보 영역을 현재 프레임의 사용자 영역으로 특정할 수 있다.In step S220, in consideration of the relationship with the user region of the previous frame, the user region of the current frame among the plurality of candidate regions may be specified and separated from the background. More specifically, in step S220, matching scores between the plurality of candidate regions and the user region of the previous frame may be calculated, and the candidate region for which the highest matching score is calculated may be specified as the user region of the current frame.

실시예에 따라서, 단계 S220에서는, 단계 S210에서 검출된 각각의 후보 영역의 특징맵과, 이전 프레임에서 특정된 사용자 영역의 특징맵을 사용해, 유사도를 비교하고 가장 유사도가 높은 후보 영역을 현재 프레임의 사용자 영역으로 특정할 수 있다. 따라서 연속적인 영상 데이터가 프레임 단위로 입력되는 댄스 트레이닝 영상에서, 댄스 트레이닝을 하는 사용자의 모습을 정확하게 추적하고 분석할 수 있다.According to an embodiment, in step S220, the feature map of each candidate region detected in step S210 and the feature map of the user region specified in the previous frame are used to compare the similarity, and the candidate region with the highest similarity is selected from the current frame. It can be specified as a user area. Therefore, it is possible to accurately track and analyze the appearance of a user performing dance training in a dance training image in which continuous image data is input in a frame unit.

단계 S300에서는, 분리된 사용자 영역에서, 사용자의 신체 부위 인식 기반의 관절 정보를 이용해 뼈대를 추출하고, 추출한 뼈대를 이용해 포즈를 추정할 수 있다. 여기서, 단계 S300에서는, 딥러닝 기반으로 사전 학습된 포즈 추정 모델을 이용해 포즈를 추정할 수 있다. 이때, 딥러닝 기반으로 사전 학습된 포즈 추정 모델은, Mask R-CNN, Faster R-CNN, MNC 등일 수 있다.In step S300, in the separated user region, a skeleton may be extracted using joint information based on the user's body part recognition, and a pose may be estimated using the extracted skeleton. Here, in step S300, a pose may be estimated using a pre-learned pose estimation model based on deep learning. In this case, the pose estimation model pre-trained based on deep learning may be Mask R-CNN, Faster R-CNN, MNC, or the like.

이하에서는, 도 5를 참조하여 단계 S300의 세부적인 흐름에 대해 상세히 설명하도록 한다.Hereinafter, a detailed flow of step S300 will be described in detail with reference to FIG. 5.

도 5는 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서, 단계 S300의 세부적인 흐름을 도시한 도면이다. 도 5에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법의 단계 S300은, 사용자 영역을 포즈 추정 모델의 입력으로 하여, 각각의 관절의 좌표 분포 맵을 추정하는 단계(S310), 추정된 관절의 좌표 분포 맵으로부터 각각의 관절의 좌표를 획득하는 단계(S320), 관절의 좌표를 연결하여 뼈대를 추출하는 단계(S330) 및 추출한 뼈대로부터 사용자 포즈를 추정하는 단계(S340)를 포함하여 구현될 수 있다.5 is a diagram illustrating a detailed flow of step S300 in a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention. As shown in FIG. 5, step S300 of the user motion analysis method for dance training using artificial intelligence-based image recognition according to an embodiment of the present invention includes a user region as an input of a pose estimation model, and each Estimating the coordinate distribution map of the joint (S310), obtaining the coordinates of each joint from the estimated coordinate distribution map of the joint (S320), extracting the skeleton by connecting the coordinates of the joint (S330), and the extracted It may be implemented including the step (S340) of estimating the user pose from the skeleton.

단계 S310에서는, 사용자 영역을 포즈 추정 모델의 입력으로 하여, 각각의 관절의 좌표 분포 맵을 추정할 수 있다. 즉, 백본 네트워크(Backbone Network)로 사전 학습된 CNN 또는 RNN 기반의 포즈 추정 모델을 사용해 사용자 영역으로부터 특징맵을 추출하고, 추론 모델을 연결하여 사용자 영역 이미지 내에서 각각의 관절의 좌표 분포 맵을 도출할 수 있다.In step S310, the coordinate distribution map of each joint may be estimated by using the user region as an input of the pose estimation model. That is, a feature map is extracted from the user domain using a pose estimation model based on CNN or RNN that is pre-trained with a backbone network, and a coordinate distribution map of each joint in the user domain image is derived by connecting the inference model. can do.

단계 S320에서는, 단계 S310에서 추정된 관절의 좌표 분포 맵으로부터 각각의 관절의 좌표를 획득할 수 있다. 즉, 단계 S320에서는, 하나의 관절에 대해 복수의 예측 결과(복수의 예측 좌표)를 관절의 좌표 분포 맵으로 도출할 수 있다. 좌표 분포 맵에서 가장 확률이 높은 좌표를 관절의 좌표로 하여, 관절마다 하나의 좌표를 획득할 수 있다. 실시예에 따라서는, 좌표 분포 맵에 NMS(Non-Maximum Suppression)를 적용해 하나의 관절 좌표를 획득할 수도 있다.In step S320, the coordinates of each joint may be obtained from the coordinate distribution map of the joint estimated in step S310. That is, in step S320, a plurality of prediction results (a plurality of prediction coordinates) for one joint may be derived as a coordinate distribution map of the joint. In the coordinate distribution map, one coordinate can be obtained for each joint by using the coordinate with the highest probability as the coordinate of the joint. Depending on the embodiment, one joint coordinate may be obtained by applying NMS (Non-Maximum Suppression) to the coordinate distribution map.

단계 S330에서는, 단계 S320에서 획득한 관절의 좌표를 연결하여 뼈대를 추출할 수 있다. 단계 S320에서는 관절의 개수만큼 좌표를 출력할 수 있는데, 단계 S330에서는 이를 연결해 뼈대를 추출할 수 있다. 예를 들어, 머리, 목, 좌우 어깨, 팔꿈치, 손목, 손등, 가슴, 허리, 골반, 무릎, 발목 등 18개의 관절을 직선으로 연결해 뼈대를 추출할 수 있다.In step S330, a skeleton may be extracted by connecting the coordinates of the joint obtained in step S320. In step S320, coordinates as many as the number of joints may be output, and in step S330, a skeleton may be extracted by connecting them. For example, a skeleton can be extracted by connecting 18 joints including the head, neck, left and right shoulders, elbows, wrists, back of hands, chest, waist, pelvis, knees, and ankles in a straight line.

단계 S340에서는, 추출한 뼈대로부터 사용자 포즈를 추정할 수 있다. 즉, 단계 S330에서 추출한 뼈대의 형태로부터 사용자의 포즈를 추정할 수 있다.In step S340, a user pose may be estimated from the extracted skeleton. That is, the user's pose may be estimated from the shape of the skeleton extracted in step S330.

한편, 단계 S200 및 단계 S300은, 딥러닝 기반으로 사전 학습된 포즈 추정 모델을 이용해 처리될 수도 있다. 예를 들어, Mask R-CNN은 사용자를 인식해 사용자 영역을 특정하고, 사용자의 포즈를 추정할 수 있는 포즈 추정 모델로서, 단계 S200 및 단계 S300을 모두 수행할 수 있다.Meanwhile, steps S200 and S300 may be processed using a pre-learned pose estimation model based on deep learning. For example, Mask R-CNN is a pose estimation model capable of recognizing a user, specifying a user region, and estimating a user's pose, and may perform both steps S200 and S300.

도 6은 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서, 입력된 영상 데이터로부터 포즈를 추정하는 단계 S200 및 단계 S300의 과정을 도시한 도면이다. 도 6에 도시된 바와 같이, 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법은, Mask R-CNN을 사용해 단계 S200 및 단계 S300을 수행할 수 있다. 보다 구체적으로, 단계 S200에서 사용자를 인식하도록 사전 학습된 ResNet50-FPN을 이용해 사용자 모습이 포함된 복수의 후보 영역을 도출할 수 있다.6 is a diagram illustrating a process of steps S200 and S300 of estimating a pose from input image data in a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention. . As shown in FIG. 6, the user motion analysis method for dance training using artificial intelligence-based image recognition according to an embodiment of the present invention may perform steps S200 and S300 using Mask R-CNN. . More specifically, in step S200, a plurality of candidate regions including a user's appearance may be derived by using the ResNet50-FPN pre-learned to recognize the user.

단계 S300에서는, 단계 S200에서 도출된 후보 영역으로부터 RoIAlign을 이용해 특징 맵을 추출할 수 있다. RoIAlign은 이중선형보간법(bilinear interpolation)을 이용해 특징맵의 후보 영역(관심 영역, RoI)을 정확하게 정렬되도록 하여, 보정된 특징맵을 생성할 수 있다. 보정된 특징맵을 ResNet 등 CNN 기반 포즈 추정 모델의 입력으로 하여 클래스를 분류해 사용자 영역을 특정하고, 사용자의 신체의 키-포인트 즉, 관절의 좌표를 검출하여 각 관절의 좌표를 연결해 뼈대를 추출할 수 있다. Mask R-CNN은 클래스 분류와 관절 좌표 검출을 동시에 수행할 수 있다.In step S300, a feature map may be extracted from the candidate region derived in step S200 using RoIAlign. RoIAlign can generate a corrected feature map by accurately aligning the candidate regions (regions of interest, RoI) of the feature map using bilinear interpolation. Classify the class using the corrected feature map as an input to a CNN-based pose estimation model such as ResNet to specify the user area, detect the key-point of the user's body, that is, the coordinates of the joint, and connect the coordinates of each joint to extract the skeleton can do. Mask R-CNN can perform class classification and joint coordinate detection at the same time.

도 7은 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서, 포즈 추정 모델의 생성을 위해 수집한 영상 데이터를 예를 들어 도시한 도면이다. 본 발명의 일실시예에 따른 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에서 사용하는 포즈 추정 모델을 생성하기 위해서는, 다양한 자세를 포함하는 대량의 학습 데이터가 필요하다. 도 7에 도시된 바와 같이, 인체 동작의 학습을 위해 학습 자료로 활용할 수 있는 다양한 자세의 영상 자료를 수집하고, 자세별로 분류하여 포즈 추정 모델의 사전 학습을 위한 자료로 사용할 수 있다.7 is a diagram illustrating, for example, image data collected to generate a pose estimation model in a user motion analysis method for dance training using image recognition based on artificial intelligence according to an embodiment of the present invention. In order to generate a pose estimation model used in a user motion analysis method for dance training using artificial intelligence-based image recognition according to an embodiment of the present invention, a large amount of training data including various postures is required. As shown in FIG. 7, image data of various postures that can be used as learning data for learning human body movements may be collected, classified by posture, and used as data for pre-learning a pose estimation model.

단계 S400에서는, 추정된 포즈에 대한 사용자 모션을 분석할 수 있다. 즉, 단계 S400에서는, 추정된 포즈를 사용해 사용자의 모션을 분석하여 분석 결과를 댄스 트레이닝 제공 장치(300)에 전달함으로써, 사용자의 댄스 트레이닝에 대한 피드백을 제공하거나 다양한 영상 효과를 제공하도록 할 수 있다.In step S400, a user motion for the estimated pose may be analyzed. That is, in step S400, the user's motion is analyzed using the estimated pose and the analysis result is transmitted to the dance training providing apparatus 300, thereby providing feedback on the user's dance training or providing various image effects. .

여기서, 단계 S400에서는, 인체 레퍼런스 모델과 추정된 포즈를 비교하여, 사용자 모션을 분석할 수 있다. 보다 구체적으로, 단계 S400에서는, 인체 레퍼런스 모델의 뼈대와 단계 S300에서 추출한 뼈대 사이의 유사도를 측정하여 비교할 수 있다. 즉, 인체의 댄스 동작 표현을 위해 뼈대 기반의 인체 레퍼런스 모델을 구축하고, 단계 S300에서 추출한 사용자의 뼈대와 비교해 사용자의 댄스 동작을 검증하고 평가할 수 있다.Here, in step S400, a user motion may be analyzed by comparing the human body reference model with the estimated pose. More specifically, in step S400, the similarity between the skeleton of the human body reference model and the skeleton extracted in step S300 may be measured and compared. That is, a skeleton-based human body reference model can be constructed to express the human body's dance movement, and the user's dance movement can be verified and evaluated by comparing it with the user's skeleton extracted in step S300.

실시예에 따라서, 인체 레퍼런스 모델은 3차원 모델로 구성하고, 단계 S300에서 추출한 뼈대와 인체 레퍼런스 모델을 비교한 결과를 이용해, 사용자 포즈를 3차원 데이터로 재구성할 수도 있다.According to an embodiment, the human body reference model may be configured as a 3D model, and the user pose may be reconstructed into 3D data using the result of comparing the skeleton extracted in step S300 with the human body reference model.

전술한 바와 같이, 본 발명에서 제안하고 있는 인공지능 기반의 영상 인식을 이용한 댄스 트레이닝을 위한 사용자 모션 분석 방법에 따르면, 일반 카메라(200)에서 촬영한 영상 데이터를 이용해 딥러닝 기반으로 사전 학습된 포즈 추정 모델을 이용해 포즈를 추정하고 사용자 모션을 분석함으로써, 키넥트 등 특수한 카메라(200) 없이도 댄스 트레이닝을 하는 사용자의 모션을 효율적이고 정확하게 분석할 수 있다.As described above, according to the user motion analysis method for dance training using image recognition based on artificial intelligence proposed in the present invention, a pose previously learned based on deep learning using image data captured by a general camera 200 By estimating the pose using the estimation model and analyzing the user motion, it is possible to efficiently and accurately analyze the motion of the user who performs dance training without a special camera 200 such as Kinect.

한편, 본 발명은 다양한 통신 단말기로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터에서 판독 가능한 매체를 포함할 수 있다. 예를 들어, 컴퓨터에서 판독 가능한 매체는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD_ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다.Meanwhile, the present invention may include a computer-readable medium including program instructions for performing operations implemented by various communication terminals. For example, a computer-readable medium includes magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD_ROMs and DVDs, and floptical disks. It may include a hardware device specially configured to store and execute program instructions such as magneto-optical media and ROM, RAM, flash memory, and the like.

이와 같은 컴퓨터에서 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 이때, 컴퓨터에서 판독 가능한 매체에 기록되는 프로그램 명령은 본 발명을 구현하기 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예를 들어, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Such a computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. At this time, the program instructions recorded in a computer-readable medium may be specially designed and configured to implement the present invention, or may be known to and usable by those skilled in computer software. For example, it may include not only machine language codes such as those produced by a compiler but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.The present invention described above can be modified or applied in various ways by those of ordinary skill in the technical field to which the present invention belongs, and the scope of the technical idea according to the present invention should be determined by the following claims.

10: 댄스 트레이닝 시스템
100: 모션 분석 장치
200: 카메라
300: 댄스 트레이닝 제공 장치
S100: 댄스 트레이닝을 하는 사용자의 모습을 카메라가 촬영한 영상 데이터를 입력받는 단계
S200: 입력받은 영상 데이터에서 사용자를 인식하여, 배경과 사용자 영역을 분리하는 단계
S210: 입력받은 영상 데이터로부터 사용자 모습을 포함하는 복수의 후보 영역을 검출하는 단계
S220: 이전 프레임의 사용자 영역과의 관계를 고려하여 복수의 후보 영역 중 현재 프레임의 사용자 영역을 특정하고 배경과 분리하는 단계
S300: 분리된 사용자 영역에서 사용자의 신체 부위 인식 기반의 관절 정보를 이용해 뼈대를 추출하고, 추출한 뼈대를 이용해 포즈를 추정하는 단계
S310: 사용자 영역을 포즈 추정 모델의 입력으로 하여, 각각의 관절의 좌표 분포 맵을 추정하는 단계
S320: 추정된 관절의 좌표 분포 맵으로부터 각각의 관절의 좌표를 획득하는 단계
S330: 관절의 좌표를 연결하여 뼈대를 추출하는 단계
S340: 추출한 뼈대로부터 사용자 포즈를 추정하는 단계
S400: 추정된 포즈에 대한 사용자 모션을 분석하는 단계10: dance training system
100: motion analysis device
200: camera
300: device for providing dance training
S100: Step of receiving image data captured by a camera of a user performing dance training
S200: Recognizing a user from the input image data, and separating the background and the user area
S210: Detecting a plurality of candidate regions including a user image from the input image data
S220: In consideration of the relationship with the user area of the previous frame, specifying a user area of the current frame among a plurality of candidate areas and separating it from the background
S300: Extracting a skeleton using joint information based on the user's body part recognition in the separated user area, and estimating a pose using the extracted skeleton
S310: Estimating a coordinate distribution map of each joint by using the user region as an input of a pose estimation model
S320: Acquiring the coordinates of each joint from the estimated coordinate distribution map of the joint
S330: Step of extracting the skeleton by connecting the coordinates of the joint
S340: Estimating a user pose from the extracted skeleton
S400: Analyzing user motion for the estimated pose

Claims

As a user motion analysis method for dance training,
Analyzing the dance motion of the user who is performing dance training in real time,
(1) receiving image data captured by the camera 200 of the user performing the dance training;
(2) recognizing a user from the input image data and separating a background from a user region;
(3) extracting a skeleton using joint information based on the user's body part recognition in the separated user area, and estimating a pose using the extracted skeleton; And
(4) comprising the step of analyzing the user motion for the estimated pose,
In step (3), a pose is estimated using a pre-learned pose estimation model based on deep learning,
In the step (1), the two-dimensional image data captured by the camera 200 is processed and converted into a depth map, but the two-dimensional image data is converted to black and white, and the depth is A depth map is generated by converting so that is displayed by being differentiated, so that a 3D user motion can be grasped from 2D image data through the depth map.
In the step (2), the user area is detected and separated from the depth map,
The step (2),
(2-1) Using ResNet50-FPN, which is pre-learned to recognize the user, a plurality of candidate regions including the user's appearance are detected from the input image data, but the size and shape of each target user region are Detecting a plurality of candidate regions that are different and at least partially overlap each other; And
(2-2) including the step of specifying the user area of the current frame among the plurality of candidate areas and separating it from the background in consideration of the relationship with the user area of the previous frame,
In step (2-2),
Each of the matching scores between the plurality of candidate regions and the user region of the previous frame is calculated, and the candidate region for which the highest matching score is calculated is specified as the user region of the current frame,
The step (3),
(3-1) Using the user region as an input of the pose estimation model, the coordinate distribution map of each joint is estimated, but the candidate region (region of interest, RoI) of the feature map is determined using bilinear interpolation. The corrected feature map is extracted using RoIAlign, which generates a corrected feature map to be aligned, and the corrected feature map is used as an input of a CNN or RNN-based pose estimation model pre-trained with a backbone network. Specifying a user area and connecting an inference model to derive a coordinate distribution map of each joint in the image of the user area;
(3-2) Obtain the coordinates of each joint from the coordinate distribution map of the joint estimated in step (3-1), but apply one for each joint by applying NMS (Non-Maximum Suppression) to the coordinate distribution map. Obtaining joint coordinates of;
(3-3) Connect the coordinates of the joints obtained in step (3-2) to extract the skeleton, but consist of head, neck, left and right shoulders, elbows, wrists, back of hands, chest, waist, pelvis, knees and ankles. Extracting a skeleton by connecting the 18 joints that are formed in a straight line; And
(3-4) including the step of estimating a user pose from the extracted skeleton,
In step (4),
The human body reference model and the estimated pose are compared, but the similarity between the skeleton of the human body reference model and the skeleton extracted in step (3) is measured and compared to analyze the user's motion, and the analysis result is transmitted to the dance training providing device (300). ) To
The human body reference model is composed of a 3D model, and a user pose is reconstructed into 3D data using a result of comparing the skeleton extracted in step (3) with the human body reference model. User motion analysis method for dance training using image recognition.

delete

The method of claim 1, wherein in step (2),
When there are a plurality of users recognized in the input image data, the plurality of user areas are separated from the background, and each user is identified by tagging the separated user areas. User motion analysis method for dance training.

delete