KR20180094532A

KR20180094532A - a method to learn and analyze motion of the user by using composed sequential depth images

Info

Publication number: KR20180094532A
Application number: KR1020170020513A
Authority: KR
Inventors: 박순찬; 김영희; 박지영; 심광현; 유문욱; 이재호; 장주용; 정혁
Original assignee: 한국전자통신연구원
Priority date: 2017-02-15
Filing date: 2017-02-15
Publication date: 2018-08-24

Abstract

According to the present invention, a system for learning and analyzing a user motion by composing continuous depth images comprises a composite image generation module generating a second depth image by synthesizing first depth images obtained from a depth image camera; a general image analysis module analyzing the first depth image and outputting first motion analysis data corresponding to the first depth image; a composite image analysis module analyzing the second depth image and outputting second motion analysis data corresponding to the second depth image; and a posture analysis result generation module generating a posture analysis result by combining the first motion analysis data and the second motion analysis data.

Description

A method and system for learning and analyzing user motion by synthesizing continuous depth images and a system therefor,

본 발명은 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법 및 그 시스템에 관한 것으로서, 구체적으로 사용자 동작을 학습하는 컴퓨터 비전, 머신 러닝 기술에 관한 것이다.The present invention relates to a method and system for learning and analyzing user actions by composing continuous depth images, and more particularly, to a computer vision and a machine learning technique for learning user's actions.

컴퓨터 비전(Computer Vision), 머신 러닝(Machine learning)은 다양한 센서로부터 들어오는 영상을 분석하여 의미 있는 데이터를 추출하는 분야로서, 사용자 자세분석, 무인 자동차 차선감지, 사용자 얼굴 인식 등 다양한 산업 분야에 활용되고 있다. 이에 더하여 최근 마이크로소프트 Kinect와 같은 깊이 영상 센서가 상용화 됨에 따라 보다 안정된 사용자 자세 분석이 가능해져서, 이를 학습해 내는 알고리즘에 대한 활발한 연구가 진행되고 있다.Computer Vision and Machine learning is a field that extracts meaningful data by analyzing incoming images from various sensors and is used in various industrial fields such as user attitude analysis, unmanned vehicle lane detection, user face recognition have. In addition, as the depth image sensor such as Microsoft Kinect is commercialized recently, more stable user attitude analysis becomes possible, and active research is being conducted on the algorithm for learning this.

도 2는 깊이 영상과 모션 캡처 데이터를 대비하여 표시한 예시도이다. 일반적으로 사용자 자세를 분석하는 연구들은 사용자 자세를 학습하기 위한 데이터 구축을 위해 깊이 영상 센서(초당 30회 작동)에서 나오는 영상과 모션 캡처 시스템(초당 120회 이상으로도 작동가능)에서 나오는 사용자 자세 데이터를 동기화하여 학습 데이터 집합을 구축하는 작업을 선행하게 되는데, 이때 비교적 속도가 느린 깊이 영상 센서를 기준으로 데이터를 구성할 수 밖에 없어 초당 30개 이하의 데이터만 활용하는 문제가 있다. 충분하지 아니한 데이터로 학습이 된 경우, 골프 스윙이나, 야구 스윙과 같은 고속 자세 분석에 있어 정확도가 떨어지고, 사용자가 분석을 원하는 지점의 데이터가 없어서 원하는 분석 결과를 얻지 못하는 문제가 있다. FIG. 2 is a diagram illustrating an example in which the depth image and the motion capture data are compared with each other. In general, studies analyzing user attitudes are based on user attitude data from a depth image sensor (30 times per second) and a motion capture system (which can also operate over 120 times per second) to build data to learn user attitude In this case, it is only necessary to construct data based on a relatively slow depth image sensor, so that there is a problem in utilizing only 30 or less data per second. In the case of learning with insufficient data, there is a problem that the accuracy is low in fast pose analysis such as a golf swing or a baseball swing, and the user does not have data at a point where analysis is desired, so that the desired analysis result is not obtained.

본 발명은 전술한 문제를 해결하기 위하여, 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법 및 그 시스템을 제공하는 것을 그 목적으로 한다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a method and system for learning and analyzing user actions by synthesizing continuous depth images in order to solve the above problems.

본 발명의 다른 목적은 사용자 자세 추정을 위하여 연속적인 다수의 깊이 영상을 합성하고, 이를 학습하는 방법을 제공하는 것이다. 연속적인 깊이 영상을 단순 합성하고, 이를 데이터로 활용하여 학습하고, 나아가 사용자 자세를 분석하는 방법을 제공한다. 일반적으로 사용자 자세 학습데이터를 위하여 초당 30회 동작하는 깊이 센서에서 깊이 영상을 취득하고, 이와 동시에 초당 120회 또는 그 이상으로 동작할 수 있는 모션캡처 시스템에서 참(ground-truth) 자세 값을 취득하게 되는데, 기존 방법들은 취득된 깊이 영상만을 학습에 활용하다 보니 초당 30개 데이터만 활용하여 학습을 하게 되는 단점이 있고, 이는 골프스윙과 같은 고속의 사용자 동작이 주어졌을 경우 데이터가 충분하지 않아 자세 분석이 원활하지 않다는 한계가 있다.Another object of the present invention is to provide a method of synthesizing a plurality of continuous depth images for user posture estimation and learning the depth images. It provides a method to synthesize continuous depth images, learn them by using them as data, and further analyze the user attitude. In general, for the user attitude learning data, a depth sensor operating at 30 times per second acquires a depth image, and at the same time acquires a ground-truth attitude value in a motion capture system capable of operating 120 times or more per second However, since existing methods use only acquired depth image for learning, there is a disadvantage that only 30 data per second is used for learning. In case of giving high speed user operation such as golf swing, there is not sufficient data, There is a limitation that this is not smooth.

본 발명은 기존 초당 30회 속도로 출력되는 깊이 영상을 데이터 합성 방법으로 증폭하여 초당 60회 또는 그 이상의 데이터를 생성해냄으로써 이를 학습한 결과가 고속 동작을 포함하는 스포츠의 자세 분석에 이용할 수 있다. The present invention amplifies a depth image outputted at a speed of 30 times per second using a data synthesis method to generate data 60 times or more per second, and the result of the learning can be used for posture analysis of a sport including a high-speed operation.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

전술한 목적을 달성하기 위한 본 발명에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 시스템은, 깊이 영상 카메라로부터 얻은 제1 깊이 영상들을 합성하여 제2 깊이 영상을 생성하는 합성 영상 생성 모듈; 상기 제1 깊이 영상을 분석하여, 상기 제1 깊이 영상과 대응되는 제1 모션 분석 데이터를 산출하는 일반 영상 분석 모듈; 상기 제2 깊이 영상을 분석하여, 상기 제2 깊이 영상과 대응되는 제2 모션 분석 데이터를 산출하는 합성 영상 분석 모듈; 및 상기 제1 모션 분석 데이터와, 상기 제2 모션 분석 데이터를 조합하여 자세 분석 결과를 생성하는 자세 분석 결과 생성 모듈;을 포함한다.According to another aspect of the present invention, there is provided a system for synthesizing continuous depth images according to the present invention and learning and analyzing a user operation, the system comprising: a first depth image synthesizing unit for synthesizing first depth images obtained from a depth image camera, module; A general image analysis module for analyzing the first depth image and calculating first motion analysis data corresponding to the first depth image; A composite image analysis module for analyzing the second depth image and calculating second motion analysis data corresponding to the second depth image; And a posture analysis result generation module for generating a posture analysis result by combining the first motion analysis data and the second motion analysis data.

바람직하게는, 일반 영상 학습 모듈; 및 합성 영상 학습 모듈;을 더 포함할 수 있다. Preferably, the general video learning module; And a composite image learning module.

상기 일반 영상 학습 모듈은 깊이 영상 카메라로부터 얻은 제1 깊이 영상과, 상기 제1 깊이 영상에 대응하는 제1 모션 캡처 데이터 사이의 관계를 학습하는 것이고, 상기 합성 영상 학습 모듈은 상기 제1 깊이 영상으로부터 합성된 제2 깊이 영상과, 상기 제2 깊이 영상에 대응하는 제2 모션 캡처 데이터 사이의 관계를 학습하는 것을 특징으로 한다.Wherein the general image learning module learns a relationship between a first depth image obtained from a depth image camera and first motion capture data corresponding to the first depth image, And a relationship between the synthesized second depth image and second motion capture data corresponding to the second depth image is learned.

상기 일반 영상 분석 모듈은 상기 일반 영상 학습 모듈의 학습 결과를 이용하여 분석하는 것이고, 상기 합성 영상 분석 모듈은 상기 합성 영상 학습 모듈의 학습 결과를 이용하여 분석하는 것을 특징으로 한다. The general image analysis module analyzes the image using the learning result of the general image learning module, and the composite image analysis module analyzes the result using the learning result of the composite image learning module.

본 발명의 다른 일면에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법은 깊이 영상 카메라로부터 얻은 제1 깊이 영상과 모션 캡처 카메라로부터 얻은 제1 모션 캡처 데이터를 맵핑하여 그 관계를 학습하는 일반 영상 학습 단계; 상기 제1 깊이 영상으로부터 합성된 제2 깊이 영상과 제2 모션 캡처 데이터를 맵핑하여 그 관계를 학습하는 합성 영상 학습 단계; 상기 일반 영상 학습 결과 및 상기 합성 영상 학습 결과를 이용하여, 시간적으로 연속된 제1 깊이 영상들로부터 사용자 자세 분석 결과를 생성하는 영상 분석 단계;를 포함한다.A method for learning and analyzing a user operation by synthesizing continuous depth images according to another aspect of the present invention includes mapping a first depth image obtained from a depth image camera and first motion capture data obtained from a motion capture camera, A general image learning step; A synthetic image learning step of mapping the second depth image synthesized from the first depth image and the second motion capture data and learning the relationship; And an image analysis step of generating a user attitude analysis result from the temporally successive first depth images using the general image learning result and the composite image learning result.

상기 영상 분석 단계는, 상기 일반 영상 학습 결과 및 상기 합성 영상 학습 결과를 이용하여, 시간적으로 연속된 제1 깊이 영상들로부터 제2 깊이 영상을 합성하고, 상기 제1 깊이 영상을 분석하여 제1 모션 분석 데이터를 산출하고, 상기 제2 깊이 영상을 분석하여 제2 모션 분석 데이터를 산출하고, 상기 제1 모션 분석 데이터 및 상기 제2 모션 분석 데이터를 조합하여 사용자 자세 분석 결과를 생성하는 것을 특징으로 한다.Wherein the image analysis step synthesizes a second depth image from temporally successive first depth images using the general image learning result and the composite image learning result, analyzes the first depth image, Analyzing the second depth image to calculate second motion analysis data, and combining the first motion analysis data and the second motion analysis data to generate a user attitude analysis result .

상기 제1 깊이 영상은 상기 일반 영상 학습 단계에서 학습된 결과를 이용하여 제1 모션 분석 데이터를 산출하는 것이고, 상기 제1 모션 분석 데이터는 학습 단계에서 이용되는 모션 캡처 시스템이 생성하는 모션 캡처 데이터와 유사한 데이터 구조를 갖는다. Wherein the first depth image is obtained by calculating first motion analysis data using a result of learning in the general image learning step, and the first motion analysis data includes motion capture data generated by a motion capture system used in a learning step, Have a similar data structure.

상기 제2 깊이 영상은 상기 합성 영상 학습 단계에서 학습된 결과를 이용하여 제2 모션 분석 데이터를 산출하는 것이고, 상기 제2 모션 분석 데이터는 학습 단계에서 이용되는 모션 캡처 시스템이 생성하는 모션 캡처 데이터와 유사한 데이터 구조를 갖는다. Wherein the second depth image is used to calculate second motion analysis data using the result of learning in the synthetic image learning step and the second motion analysis data includes motion capture data generated by the motion capture system used in the learning step, Have a similar data structure.

본 발명에 따르면, 저빈도 FPS의 깊이 영상들로부터 합성한 깊이 영상 데이터를 이용하여 고빈도 FPS의 깊이 영상들을 생성하고, 생성한 고빈도 FPS의 깊이 영상들을 모션 캡처 데이터와 맵핑하는 것을 학습하여, 학습된 고빈도 FPS의 데이터를 토대로 시간적으로 세밀하게 정확한 자세 분석을 할 수 있다.According to the present invention, the depth images of the high frequency FPS are generated using the depth image data synthesized from the low frequency FPS depth images, and the mapping of the generated high frequency FPS depth images with the motion capture data, Based on the data of the learned high frequency FPS, precise posture analysis can be performed with precise time.

본 발명에 따르면 저빈도 FPS의 깊이 영상을 고빈도 FPS의 깊이 영상으로 시간적으로 증폭하여 사용자의 자세를 분석할 수 있어, 야구, 골프 스윙 등 고속으로 이루어지는 동작을 더 세밀하게 분석할 수 있다.According to the present invention, the depth image of the low frequency FPS can be temporally amplified with the depth image of the high frequency FPS, and the posture of the user can be analyzed, so that the motion at high speed such as baseball and golf swing can be analyzed more finely.

도 1은 본 발명에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법이 구현되는 컴퓨터 시스템의 구성을 설명하기 위한 예시도.
도 2는 깊이 영상과 모션 캡처 데이터를 대비하여 표시한 예시도.
도 3은 본 발명의 부분 실시예에 따른 깊이 영상과 모션 캡처 데이터를 이용하여 영상을 합성하는 방법을 설명하기 위한 예시도.
도 4는 본 발명의 부분 실시예에 따른 학습 방법을 설명하기 위한 블록도.
도 5는 본 발명에 부분 실시예에 따른 깊이 영상의 합성 방법을 설명하기 위한 예시도.
도 6은 본 발명의 부분 실시예에 따른 자세 분석 모듈의 실행 방법을 설명하기 위한 예시도.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an exemplary diagram for explaining a configuration of a computer system in which a method of learning and analyzing a user's operation by combining continuous depth images according to the present invention is implemented; FIG.
FIG. 2 is a diagram illustrating an example in which the depth image and the motion capture data are compared with each other. FIG.
FIG. 3 is an exemplary view for explaining a method of composing images using a depth image and motion capture data according to a specific embodiment of the present invention; FIG.
4 is a block diagram for explaining a learning method according to a partial embodiment of the present invention;
5 is a diagram illustrating an example of a depth image synthesis method according to an embodiment of the present invention.
FIG. 6 is an exemplary diagram for explaining a method of executing an attitude analysis module according to a partial embodiment of the present invention; FIG.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성소자, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성소자, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or "comprising" refer to the presence or absence of one or more other components, steps, operations, and / Or additions.

이하, 본 발명의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 상세히 설명하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법이 구현되는 컴퓨터 시스템의 구성을 설명하기 위한 예시도이다.FIG. 1 is an exemplary diagram illustrating a configuration of a computer system in which a method of learning and analyzing a user's operation by combining continuous depth images according to the present invention is implemented.

한편, 본 발명의 실시예에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법은 컴퓨터 시스템에서 구현되거나, 또는 기록매체에 기록될 수 있다. 도 1에 도시된 바와 같이, 컴퓨터 시스템은 적어도 하나 이상의 프로세서(110)와, 메모리(120)와, 사용자 입력 장치(150)와, 데이터 통신 버스(130)와, 사용자 출력 장치(160)와, 저장소(140)를 포함할 수 있다. 전술한 각각의 구성 요소는 데이터 통신 버스(130)를 통해 데이터 통신을 한다.Meanwhile, a method of synthesizing continuous depth images according to an embodiment of the present invention and learning and analyzing user actions may be implemented in a computer system or recorded on a recording medium. 1, a computer system includes at least one processor 110, a memory 120, a user input device 150, a data communication bus 130, a user output device 160, And may include a storage 140. Each of the above-described components performs data communication via the data communication bus 130. [

컴퓨터 시스템은 네트워크(180)에 연결된 네트워크 인터페이스(170)를 더 포함할 수 있다. 상기 프로세서(110)는 중앙처리 장치(central processing unit (CPU))이거나, 혹은 메모리(130) 및/또는 저장소(140)에 저장된 명령어를 처리하는 반도체 장치일 수 있다. The computer system may further include a network interface 170 coupled to the network 180. The processor 110 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 130 and / or the storage 140.

상기 메모리(120) 및 상기 저장소(140)는 다양한 형태의 휘발성 혹은 비휘발성 저장매체를 포함할 수 있다. 예컨대, 상기 메모리(120)는 ROM(123) 및 RAM(126)을 포함할 수 있다.The memory 120 and the storage 140 may include various forms of volatile or non-volatile storage media. For example, the memory 120 may include a ROM 123 and a RAM 126.

따라서, 본 발명의 실시예에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법은 컴퓨터에서 실행 가능한 방법으로 구현될 수 있다. 본 발명의 실시예에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법이 컴퓨터 장치에서 수행될 때, 컴퓨터로 판독 가능한 명령어들이 본 발명에 따른 운영 방법을 수행할 수 있다.Accordingly, a method of synthesizing continuous depth images according to an embodiment of the present invention to learn and analyze user actions can be implemented in a computer-executable method. When a method of synthesizing continuous depth images according to an embodiment of the present invention and learning and analyzing a user's operation is performed in a computer device, computer-readable instructions can perform an operating method according to the present invention.

한편, 상술한 본 발명에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현되는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체로는 컴퓨터 시스템에 의하여 해독될 수 있는 데이터가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래시 메모리, 광 데이터 저장장치 등이 있을 수 있다. 또한, 컴퓨터로 판독 가능한 기록매체는 컴퓨터 통신망으로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 읽을 수 있는 코드로서 저장되고 실행될 수 있다.Meanwhile, the method of learning and analyzing the user operation by synthesizing the continuous depth images according to the present invention described above can be implemented as a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording media storing data that can be decoded by a computer system. For example, there may be a ROM (Read Only Memory), a RAM (Random Access Memory), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device and the like. The computer-readable recording medium may also be distributed and executed in a computer system connected to a computer network and stored and executed as a code that can be read in a distributed manner.

도 2는 깊이 영상과 모션 캡처 데이터를 대비하여 표시한 예시도이다.FIG. 2 is a diagram illustrating an example in which the depth image and the motion capture data are compared with each other.

기존의 깊이 영상 및 사용자 모션 캡처 데이터 분석 방법들은 속도가 다소 늦은 깊이 영상(예컨대, 30 FPS, Frame per second)을 기준으로 학습함으로써 초당 30개 이하의 데이터만을 학습에 활용하지만, 본 발명에서는 이를 증폭시켜, 최소 초당 60개 이상의 데이터를 생성하여 학습하고, 이를 바탕으로 자세를 분석하기 때문에 시간적으로 더 세밀하게 자세를 분석할 수 있다.Conventional depth image and user motion capture data analysis methods use only 30 or less data per second for learning by learning based on a depth image (for example, 30 FPS, frame per second) that is somewhat slower, In this way, it is possible to analyze attitude more precisely in time because it learns by generating more than 60 data per second at minimum and analyzes posture based on it.

일반적으로 사용자 자세를 분석하는 연구들은 사용자 자세를 학습하기 위한 데이터 구축을 위해 깊이 영상 카메라(30 FPS, Frame per second)에 의해 생성되는 깊이 영상과 모션 캡처 시스템(120 FPS 이상 가능)에 의해 생성되는 모션 캡처 데이터를 이용하여 사용자의 자세 분석을 위한 학습 단계를 진행한다. 학습을 위하여, 깊이 영상과 모션 캡처 데이터의 동기화를 맞추는데, 30 FPS의 깊이 영상에 대응하도록 120 FPS의 모션 캡처 데이터도 초당 30 프레임만을 선별하여 깊이 영상과 동기화를 맞춘다. 즉, 초당 90 프레임의 모션 캡처 데이터는 버려지게 된다. 기존 방법들은 모션 캡처 시스템은 초당 120개 이상의 데이터를 출력하지만, 깊이 센서 촬영 속도의 한계로 정작 1/4수준인 초당 30개 데이터밖에 활용하지 못한다.Generally, studies analyzing user attitudes are generated by a depth image generated by a depth camera (30 FPS, Frame per second) and a motion capture system (possible over 120 FPS) to construct data for learning user attitude The learning step for analyzing the user's posture is performed using the motion capture data. For training, synchronization of the depth image with motion capture data is matched. For motion capture data of 120 FPS corresponding to depth image of 30 FPS, only 30 frames per second are selected to synchronize with depth image. That is, motion capture data of 90 frames per second is discarded. Conventional methods have shown that motion capture systems output more than 120 data per second, but only 30 data per second, which is a quarter of the depth sensor shooting speed limit.

상술한 수치는 예시적인 것이며, 발명의 범위를 제한하는 것은 아니다. The above-described numerical values are illustrative and do not limit the scope of the invention.

이러한 문제는 모션 캡쳐 시스템은 사용자에게 모션 캡쳐 센서를 부착하고, 사용자의 움직임을 감지하는 것으로 사용하기 불편한 점이 있고, 비교적 고가이어서 고빈도 FPS의 모션 데이터를 생성할 수 있는 반면, 깊이 영상 카메라는 비교적 저가이나, 깊이 정보를 추출하기 위하여, 시차가 존재하는 영상을 합성하는데 많은 시간을 요하므로, 저빈도 FPS의 깊이 영상을 생성하는 점에 차이가 있다.This problem is caused by the fact that the motion capture system is inconvenient to use the motion capture sensor to detect motion of a user and can generate motion data of a high frequency FPS with a relatively high price, Since it takes much time to synthesize images with parallax, in order to extract depth information, there is a difference in generating a depth image of low frequency FPS.

본 발명은 저빈도 FPS의 깊이 영상 카메라를 이용하여 얻은 깊이 영상과 모션 캡쳐 시스템의 모션 캡쳐 데이터를 맵핑하여 학습한 데이터 베이스를 이용하여, 깊이 영상을 분석하여 모션 캡쳐 데이터와 유사한 자세 분석 데이터를 산출하는 것이다. 학습 데이터가 충분한 경우에는 깊이 영상 카메라로부터 얻은 깊이 영상만으로 사용자의 자세 분석을 수행할 수 있다.The present invention analyzes a depth image using a database obtained by mapping a depth image obtained by using a low-frequency FPS depth camera and motion capture data of a motion capture system, and calculates posture analysis data similar to motion capture data . If the learning data is sufficient, the posture analysis of the user can be performed only by the depth image obtained from the depth image camera.

도 2의 경우, 깊이 영상과 대응되는 모션 캡처 데이터가 표시되어 있다. 상단의 그림은 깊이 영상을 의미하고, 하단의 그림은 모션 캡쳐 데이터를 가시화하여 특징점들을 연결한 선으로 표시된 것이다. 도 2는 깊이 영상은 30 FPS를 가정하고, 모션 캡처 데이터는 120 FPS를 가정하였다. 이는 발명의 범위를 제한하기 위한 것이 아니고, 제시된 수치에 한정되는 것이 아니다. 연속된 깊이 영상 사이의 시간 간격은 1/30 초이고, 120 FPS의 모션 캡처 데이터를 표시하기 위하여 붉은 선으로 모션 캡처 데이터를 표시하였다. 깊이 영상에 대응되는 모션 캡처 데이터 사이에 3개의 프레임은 대응되는 깊이 영상이 존재하지 아니하므로, 학습시 버려지게되는 문제가 있다. In the case of Fig. 2, the motion capture data corresponding to the depth image is displayed. The top figure shows the depth image and the bottom figure shows the line connecting the feature points by visualizing the motion capture data. FIG. 2 assumes a depth image of 30 FPS and motion capture data of 120 FPS. It is not intended to limit the scope of the invention and is not limited to the numerical values set forth. The time interval between successive depth images is 1/30 second, and the motion capture data is displayed with a red line to display 120 FPS motion capture data. There is a problem that three frames between the motion capture data corresponding to the depth image are discarded during learning because there is no corresponding depth image.

도 2의 경우, 깊이 영상은 모두 사용된다. 대응되는 깊이 영상이 존재하지 않는 모션 캡션 데이터는 사용하는 모션 캡처 데이터 사이에 선으로서 표시하였다. In the case of FIG. 2, all depth images are used. The motion caption data in which there is no corresponding depth image is displayed as a line between the motion capture data to be used.

도 3는 본 발명에 따른 연속된 깊이 영상을 이용한 사용자 동작을 학습하고 분석하는 시스템을 설명하기 위한 블록도를 나타낸다.3 is a block diagram illustrating a system for learning and analyzing user activity using a continuous depth image according to the present invention.

본 발명에 따른 연속된 깊이 영상을 이용한 사용자 동작을 학습하고 분석하는 시스템은 합성 영상 학습 모듈(300), 일반 영상 학습 모듈(350) 및 자세 분석 모듈(800)을 포함한다.The system for learning and analyzing a user operation using a continuous depth image according to the present invention includes a composite image learning module 300, a general image learning module 350, and a posture analysis module 800.

도 3에서는 합성 학습 모듈 모듈과 일반 영상 학습의 블록도를 나타내고 있다. FIG. 3 shows a block diagram of the synthesis learning module and general image learning.

데이터 합성 모듈(310)은 영상 합성 모듈(313)과 합성 데이터 맵핑 모듈(316)을 포함한다.The data synthesis module 310 includes an image synthesis module 313 and a composite data mapping module 316.

도 4은 본 발명의 부분 실시예에 따른 깊이 영상의 합성 방법을 설명하기 위한 예시도이다.4 is a diagram illustrating an exemplary method of synthesizing a depth image according to an embodiment of the present invention.

상술한 바와 같이 깊이 영상은 저빈도 FPS이고, 모션 캡처 데이터는 고빈도 FPS를 가지므로, 모션 캡처 데이터의 프레임들이 낭비되고, 저빈도 FPS를 기준으로 분석된 결과는 정확도가 떨어지는 문제가 있다. 이를 해결하기 위하여, 저빈도 FPS인 깊이 영상들을 합성하여 중간 시점의 깊이 영상들을 생성할 수 있다.As described above, since the depth image is a low frequency FPS and the motion capture data has a high frequency FPS, the frames of the motion capture data are wasted, and the result of analyzing based on the low frequency FPS has a problem of low accuracy. To solve this problem, it is possible to generate depth images at an intermediate point by synthesizing depth images having a low frequency FPS.

예컨대, 30 FPS의 깊이 영상들은 초당 30 프레임의 영상을 생성하는데, 연속된 프레임의 2개의 깊이 영상을 이용하여, 중간 시점의 깊이 영상을 합성할 수 있다. 도 4에서 좌측 그림과 우측 그림은 깊이 영상 카메라가 직접 생성한 깊이 영상을 의미하고, 가운데 그림은 좌측 깊이 영상과 우측 깊이 영상을 이용하여 새롭게 생성한 깊이 영상을 의미한다. 이때, 깊이 영상 카메라로부터 얻은 깊이 영상을 제1 깊이 영상이라 하고, 제1 깊이 영상들을 합성하여 새롭게 생성된 깊이 영상을 제2 깊이 영상이라 한다.For example, a depth image of 30 FPS generates an image of 30 frames per second, and a depth image of an intermediate point can be synthesized by using two depth images of successive frames. In FIG. 4, the left image and the right image represent the depth image directly generated by the depth image camera, and the middle image represents the newly generated depth image by using the left depth image and the right depth image. At this time, the depth image obtained from the depth image camera is referred to as a first depth image, and the newly generated depth image by synthesizing the first depth images is referred to as a second depth image.

예컨대, 30 FPS의 깊이 영상들을 이용하여, 프레임 사이의 합성 깊이 영상(제2 깊이 영상)을 생성하고, 제1 깊이 영상과 결합한다면, 60 FPS의 깊이 영상을 얻을 수 있게 되며, 120 FPS의 모션 캡처 데이터와 동기화할 때, 초당 60 프레임의 모션 캡처 데이터와 맵핑하여, 학습할 수 있을 것이다. 상술한 제2 깊이 영상은 시간적으로 연속된 깊이 영상들을 합성하는 것이므로, 일치하는 객체의 위치를 이용하여, 중간 시점의 예측되는 객체의 위치 및 형태를 결정할 수 있다. For example, if a combined depth image (second depth image) between frames is generated using depth images of 30 FPS and combined with the first depth image, a depth image of 60 FPS can be obtained, and a motion of 120 FPS When synchronizing with the capture data, you will be able to learn by mapping 60 frames per second of motion capture data. Since the second depth image composes temporally successive depth images, the position and shape of the predicted object at the intermediate viewpoint can be determined using the position of the coincident object.

연속된 깊이 영상간의 합성으로 깊이 영상을 생성함으로써, 사용자 자세 학습 및 분석에 사용될 깊이 영상을 증폭시킬 수 있다. 증폭된 깊이 영상에 대응되는 모션 캡처 데이터를 맵핑함으로써 사용자 자세 학습 및 분석의 품질을 증가시킬 수 있다. By creating a depth image by composing the continuous depth images, the depth image to be used for user attitude learning and analysis can be amplified. By mapping the motion capture data corresponding to the amplified depth image, the quality of user posture learning and analysis can be increased.

영상 합성 모듈(313)은 카메라로부터 얻은 깊이 영상(제1 깊이 영상)들로부터 새로운 깊이 영상(제2 깊이 영상)을 합성한다. 예컨대, 1초간의 30fpss 깊이 영상 30 프레임이 존재하고, 순서대로 f1, f2, ... , f30이라 할 때, f5와 f6 사이의 깊이 영상을 f5와 f6을 이용하여 합성한다. 각 프레임 사이에 합성된 영상을 이용하면, 30fps의 깊이 영상은 60fps의 깊이 영상으로 증폭되는 것이다. The image synthesis module 313 synthesizes a new depth image (second depth image) from the depth images (first depth images) obtained from the camera. For example, if there are 30 frames of 30 fpss depth image for 1 second and f1, f2, ..., f30 in order, depth images between f5 and f6 are synthesized using f5 and f6. Using the synthesized image between each frame, the depth image of 30 fps is amplified to a depth image of 60 fps.

다만, 깊이 영상 카메라로부터 얻은 제1 깊이 영상과, 제1 깊이 영상들로부터 합성된 제2 깊이 영상은 생성된 방법이 상이하고 특성에 차이가 있어서, 동일한 학습 방법 및 분석 방법을 사용할 경우, 품질이 다소 저하되는 것을 확인하였다. 그리하여, 제1 깊이 영상은 일반 영상 학습 모듈 및 일반 영상 분석 모듈을 이용하여 학습 및 분석을 진행하고, 제2 깊이 영상은 합성 영상 학습 모듈 및 합성 영상 분석 모듈을 이용하여 학습 및 분석을 진행한다. 학습된 결과 및 분석 결과를 조합하여 데이터 베이스를 구축한다. However, since the first depth image obtained from the depth image camera and the second depth image synthesized from the first depth images have different methods and have different characteristics, when the same learning method and analysis method are used, And it was found that it was lowered somewhat. Thus, the first depth image is subjected to learning and analysis using a general image learning module and a general image analysis module, and the second depth image is subjected to learning and analysis using a composite image learning module and a composite image analysis module. The database is constructed by combining the learned and analyzed results.

도 5는 본 발명에 부분 실시예에 따른 깊이 영상과 모션 캡처 데이터를 맵핑하는 방법을 설명하기 위한 예시도이다.FIG. 5 is an exemplary diagram illustrating a method of mapping a depth image and motion capture data according to an embodiment of the present invention. Referring to FIG.

도 4에서 증폭된 깊이 영상을 이용하면, 모션 캡처 데이터를 더 많이 사용할 수 있게 된다. 종래 작성된 깊이 영상뿐만 아니라 합성된 깊이 영상에 맵핑되는 모션 캡처 데이터도 사용할 수 있게 된다. 예컨대, 30fps의 깊이 영상(제1 깊이 영상)을 영상 합성 모듈(313)에 의하여 60fps의 깊이 영상(제2 깊이 영상)으로 증폭하고, 120fps의 모션 캡처 데이터와 맵핑한다면, 초당 120개의 모션 캡처 데이터 중 초당 60개의 모션 캡처 데이터를 사용하고, 사용하지 않는 모션 캡처 데이터는 3/4(90개)에서 1/2(60개)로 줄게 된다.4, it is possible to use more motion capture data. The motion capture data mapped to the synthesized depth image as well as the conventional depth image can be used. For example, if a depth image (first depth image) of 30 fps is amplified to a depth image of 60 fps (second depth image) by the image synthesizing module 313 and is mapped to 120 fps of motion capture data, 120 motion capture data 60 motion capture data is used per second, and unused motion capture data is reduced from 3/4 (90) to 1/2 (60).

또한, 30 FPS의 깊이 영상으로부터 120 FPS의 깊이 영상으로 증폭할 수도 있다. 학습한 모션 캡처 데이터의 FPS 값에 따라 깊이 영상을 증폭할 수 있다. Also, it can be amplified by a depth image of 120 FPS from a depth image of 30 FPS. The depth image can be amplified according to the FPS value of the learned motion capture data.

또한, 3이상의 제1 깊이 영상들로부터 하나의 제2 깊이 영상을 생성할 수도 있다. 2개의 제1 깊이 영상들을 이용하여 하나의 제2 깊이 영상을 생성하는 경우에는 속도 및 토크를 이용할 수 있으나, 3이상의 제1 깊이 영상들을 이용할 경우, 가속도 및 각가속도를 이용할 수 있고, 토크의 회전 방향을 예측할 수 있는 장점이 있다. 다만, 연산 시간이 증가하는 단점이 있다. Also, one second depth image may be generated from three or more first depth images. In the case of generating one second depth image using two first depth images, speed and torque can be used. However, when three or more first depth images are used, acceleration and angular speed can be used, Can be predicted. However, there is a drawback that the calculation time increases.

영상 합성 모듈(313)은 다수의 깊이 영상(제1 깊이 영상)을 합성하여 하나의 깊이 영상(제2 깊이 영상)을 생성한다. 시간적으로 연속된 깊이 영상을 합성하여 새로운 깊이 영상을 생성할 수 있다. 데이터 합성 모듈은 영상에 포함된 다수의 픽셀 데이터 사이에 사칙 연산을 수행하거나, 픽셀 데이터의 최대값 또는 최소값을 이용하여 합성된 픽셀 데이터를 산출한다.The image synthesis module 313 synthesizes a plurality of depth images (first depth images) to generate one depth image (second depth image). A new depth image can be generated by synthesizing temporally successive depth images. The data synthesis module performs arithmetic operation between a plurality of pixel data included in the image or calculates synthesized pixel data using the maximum value or the minimum value of the pixel data.

도 5는 시간적으로 연속된 영상(좌측 영상 및 우측 영상)을 '합'의 방법으로 합성한 영상(중앙 영상)을 생성하는 방법을 개괄적으로 보여주고 있다. 이는 예시적인 것이며, 제1 깊이 영상을 합성하는 방법을 제한하고자 하는 것은 아니다.FIG. 5 schematically shows a method of generating a video (central image) obtained by synthesizing temporally continuous images (left and right images) by a method of 'sum'. This is an example and is not intended to limit the method of synthesizing the first depth image.

영상 합성 모듈(313)은 영상을 합성할 때, 각 영상 또는 각 영상에 포함된 픽셀 데이터에 가중치를 반영하여 합성할 수 있다. 상기 가중치는 각 픽셀 데이터에 반영되어 합성 영상 데이터에 영향을 주게 된다.When synthesizing the images, the image synthesis module 313 may synthesize the images or the pixel data included in each image by reflecting the weights. The weight value is reflected in each pixel data to affect the composite image data.

통상적으로 깊이 영상의 f5 프레임과 f6 프레임을 합성하여 f5와 f6의 사이의 깊이 영상을 합성할 때, f5 프레임과 f6 프레임만을 이용하여 합성할 수도 있으나, f4 및 f7을 포함하여 합성하는 것이 가능하다. f4 및 f5을 이용하여 객체의 이동 속도를 산출하고, f6과 f7을 이용하여 객체의 이동 속도를 산출하여 객체의 위치를 보다 정확하게 예측할 수 있다. 유사한 방법으로 더 많은 프레임을 이용하여 합성 깊이 영상을 생성할 수 있으며, 더 많은 정보를 이용하여 더 정확한 합성 깊이 영상을 생성할 수 있다. Generally, when synthesizing the depth image between f5 and f6 by combining the f5 frame and the f6 frame of the depth image, the depth image may be synthesized using only the f5 frame and the f6 frame, but it is possible to synthesize the depth image including f4 and f7 . the moving speed of the object is calculated using f4 and f5 and the moving speed of the object is calculated using f6 and f7 to predict the position of the object more accurately. In a similar manner, a composite depth image can be generated using more frames, and a more accurate synthetic depth image can be generated using more information.

이때, 인공 신경망 방법을 이용하여 상술한 기능을 구현할 수 있다. 깊이 영상 데이터는 통상 2차원 데이터이고, 인공 신경망의 각 노드는 픽셀 데이터가 될 것이므로, 인공 신경망의 각 레이어(layer)을 2차원 데이터로 구성해야 하므로, 깊이 영상 합성에 적용할 기계 학습 방법으로 랜덤 포레스트 방법 또는 서포트 벡터 기계 방법을 이용할 수 있다.At this time, the above-described functions can be implemented using an artificial neural network method. Since the depth image data is usually two-dimensional data and each node of the artificial neural network will be pixel data, each layer of the artificial neural network must be composed of two-dimensional data. Therefore, A forest method or a support vector mechanical method can be used.

합성 데이터 맵핑 모듈(316)은 합성된 깊이 영상과 가장 적합한 모션 캡처 데이터를 맵핑하는 모듈로서 합성된 두 데이터의 시점으로부터 현재 합성된 데이터의 시점을 추정하고, 이와 가장 근접한 모션 데이터와 맵핑시키는 기능을 한다. 예컨대, 1초에 찍힌 영상 A와 1.1초에 찍힌 영상 B를 가중치 없이 합성한 영상 C는 1.05초의 데이터라고 추정할 수 있으며, 이와 시간적으로 가장 근접한 모션 캡처 데이터를 참 값(ground-truth value)으로 맵핑한다.The composite data mapping module 316 is a module for mapping the synthesized depth image and the best motion capture data, and estimates the point of time of the data currently synthesized from the viewpoints of the two synthesized data and maps the motion data with the motion data closest thereto do. For example, the image C obtained by combining the image A at 1 second and the image B at 1.1 seconds without weighting can be estimated to be 1.05 seconds, and the temporally closest motion capture data is regarded as a ground-truth value Mapping.

합성 데이터 학습 모듈(320)은 데이터 합성 모듈(310)에서 합성된 깊이 영상(제2 깊이 영상)을 입력값으로 하고, 상기 제2 깊이 영상과 대응되는 모션 캡처 데이터를 출력값으로 하는 인공 신경망 모듈에 의하여 학습하는 것일 수 있다. 이는 예시적인 것이며, 학습 방법은 인공 신경망 모듈을 이용하는 것에 한정되는 것이 아니며, 제2 깊이 영상과 모션 캡처 데이터 사이의 관계를 학습하는 것으로 그 학습 방법은 다양할 수 있다. 예컨대, 학습 방법은 랜덤 포레스트(Random Forest) 방법 또는 서포트 벡터 기계(Support Vector Machine) 방법일 수 있다. The synthetic data learning module 320 receives the depth information (second depth image) synthesized by the data synthesis module 310 as an input value and uses the motion capture data corresponding to the second depth image as an output value, It can be to learn by. The learning method is not limited to the use of the artificial neural network module, and the learning method may be varied by learning the relationship between the second depth image and the motion capture data. For example, the learning method may be a random forest method or a support vector machine method.

일반 영상 학습 모듈(350)은 일반 데이터 맵핑 모듈(360)과 일반 데이터 학습 모듈(370)로 구성된다.The general image learning module 350 includes a general data mapping module 360 and a general data learning module 370.

일반 데이터 학습 모듈과 합성 데이터 학습 모듈은 입력값이 각각 제1 깊이 영상 및 제2 깊이 영상이고, 출력값이 각각 제1 깊이 영상에 대응하는 모션 캡처 데이터 및 제2 깊이 영상에 대응하는 모션 캡처 데이터라는 점에서 차이가 있다. 학습 방법 자체는 동일한 것일 수 있으나, 인공 신경망을 이용한 학습 방법을 이용할 경우, 제1 깊이 영상과 제2 깊이 영상의 특성 차이로 인하여 학습된 결과(층간의 행렬)는 서로 상이할 것이다. The general data learning module and the composite data learning module are respectively called the first depth image and the second depth image, and the output values are motion capture data corresponding to the first depth image and motion capture data corresponding to the second depth image, respectively There is a difference in point. The learning method itself may be the same, but when the learning method using the artificial neural network is used, the learned results (matrix between layers) will be different due to the difference in characteristics between the first depth image and the second depth image.

일반 데이터 맵핑 모듈(360)은 깊이 영상 카메라로부터 얻은 깊이 영상의 시점 근방의 모션 캡처 데이터를 맵핑한다. 일반 데이터 학습 모듈(370)은 합성 영상 분석 모듈(300)의 합성 데이터 학습 모듈(320)과 유사하다. 합성 영상 학습 모듈(300)은 일반 영상 학습 모듈과 학습 방법이 상이할 수도 있고, 동일할 수도 있다. 예컨대, 합성 영상을 학습하기 위하여 랜덤 포레스트 방법을 이용하고, 일반 영상을 학습하기 위한 방법으로는 서포트 벡터 기계 방법을 이용할 수 있다The general data mapping module 360 maps the motion capture data near the start point of the depth image obtained from the depth image camera. The general data learning module 370 is similar to the synthetic data learning module 320 of the composite image analysis module 300. The synthetic image learning module 300 may be different from the general image learning module and the learning method or may be the same. For example, a random forest method may be used to learn a composite image, and a support vector machine method may be used as a method of learning a general image

도 6은 깊이 영상의 합성 없이 자세 분석 모듈을 설명하기 위한 예시도이다.6 is an exemplary diagram for explaining an attitude analysis module without synthesizing a depth image.

깊이 영상(610)에 대응하는 모션 캡처 데이터(640)가 추출되고, 자세 분석 모듈(620)은 깊이 영상(610)과 모션 캡처 데이터(640)를 대응시켜 자세 분석 결과(630)를 산출한다.The motion capture data 640 corresponding to the depth image 610 is extracted and the posture analysis module 620 calculates the posture analysis result 630 by associating the depth image 610 and the motion capture data 640 with each other.

도 6의 자세 분석 방법은 도 7 및 도 8의 일반 영상 분석 모듈이 사용하는 방법과 거의 유사하다 할 것이다.The posture analysis method of FIG. 6 will be almost similar to the method used by the general image analysis module of FIGS.

도 7은 본 발명의 부분 실시예에 따른 자세 분석 모듈의 실행 방법을 설명하기 위한 예시도이다.7 is an exemplary diagram for explaining an execution method of the posture analyzing module according to a partial embodiment of the present invention.

도 8은 본 발명에 따른 자세 분석 모듈의 실행 방법을 구체적으로 표시한 블록도 이다.8 is a block diagram specifically showing an execution method of the posture analyzing module according to the present invention.

합성 영상 학습 모듈(300)과 일반 영상 학습 모듈(350)에서는 깊이 영상과 모션 캡처 데이터 사이의 관계를 학습한다. 자세 분석 모듈(800)은 상기 합성 영상 학습 모듈(300) 및 일반 영상 학습 모듈(350)에서 학습된 내용을 토대로, 입력된 연속적인 깊이 영상을 입력값으로 하여 사용자 자세를 분석한다. The synthetic image learning module 300 and the general image learning module 350 learn the relationship between the depth image and the motion capture data. The posture analyzing module 800 analyzes the user's posture using the inputted continuous depth image as an input value based on the contents learned in the composite image learning module 300 and the general image learning module 350.

상기 분석된 사용자 자세는 모션 캡처 데이터와 유사한 데이터 구조로 출력할 수 있다. The analyzed user posture can be output in a data structure similar to motion capture data.

본 발명의 일면에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 시스템은 연속적인 깊이 영상들을 합성하여 새로운 시점의 깊이 영상을 생성하는 합성 영상 생성 모듈; 상기 생성된 깊이 영상과 대응되는 모션 캡처 데이터를 산출하여 분석하는 합성 영상 분석 모듈; 깊이 영상 카메라로부터 얻은 깊이 영상과 대응되는 모션 캡처 데이터를 산출하여 분석하는 일반 영상 분석 모듈; 및 상기 합성 영상 분석 모듈과 일반 영상 분석 모듈의 분석 결과를 조합하여 자세 분석 결과를 생성하는 자세 분석 결과 생성 모듈;을 포함한다.A system for synthesizing continuous depth images according to an embodiment of the present invention to learn and analyze a user operation includes a synthesized image generation module for synthesizing continuous depth images to generate a new depth image; A composite image analysis module for calculating and analyzing motion capture data corresponding to the generated depth image; A general image analysis module for calculating and analyzing motion capture data corresponding to a depth image obtained from a depth image camera; And a posture analysis result generation module for generating a posture analysis result by combining the analysis results of the composite image analysis module and the general image analysis module.

자세 분석 모듈(800)은 합성 영상 생성 모듈(810), 일반 영상 분석 모듈(830), 합성 영상 분석 모듈(820)을 포함한다.The posture analysis module 800 includes a composite image generation module 810, a general image analysis module 830, and a composite image analysis module 820.

일반 영상 분석 모듈(830)은 깊이 영상 카메라로부터 얻은 제1 깊이 영상을 입력값으로 하여, 상술한 일반 영상 학습 모듈(350)에서 학습된 결과 및 학습 알고리즘을 이용하여 사용자의 자세 분석 결과(730)를 산출한다. The general image analysis module 830 uses the first depth image obtained from the depth image camera as an input value and calculates the posture analysis result 730 of the user using the result learned in the general image learning module 350 and the learning algorithm, .

합성 영상 생성 모듈(810)은 합성 영상 학습 모듈(300)의 영상 합성 모듈(313)과 유사한 기능을 가지고 있다. 다만, 합성 영상 생성 모듈(810)은 합성 영상 학습 모듈(300)의 학습 결과 및 학습 알고리즘을 이용하므로, 합성 영상 학습 모듈(300)의 영상 합성 모듈(313)에 비해 합성 품질을 높일 수 있다. The composite image generation module 810 has a function similar to that of the image synthesis module 313 of the composite image learning module 300. However, since the synthetic image generation module 810 uses the learning result and the learning algorithm of the synthetic image learning module 300, the synthetic quality can be improved compared to the image synthesis module 313 of the synthetic image learning module 300.

합성 영상 분석 모듈(820)은 합성 영상 생성 모듈(810)로부터 생성된 합성 영상(720)을 입력으로 하여 해당 입력의 시점에 해당하는 사용자 자세를 분석 결과로서 출력한다. 합성 영상 분석 모듈(830)은 합성 영상 학습 모듈(300)의 합성 데이터 맵핑 모듈(316)의 맵핑 결과 및 합성 데이터 학습 모듈(320)의 학습 결과를 이용하여 제2 깊이 영상을 분석하여 사용자 자세 분석 결과를 생성한다. The composite image analysis module 820 receives the composite image 720 generated from the composite image generation module 810 and outputs the user attitude corresponding to the time of the input as an analysis result. The composite image analysis module 830 analyzes the second depth image using the mapping result of the composite data mapping module 316 of the composite image learning module 300 and the learning result of the composite data learning module 320, Results.

일반 영상 분석 모듈(830)에서 분석한 결과와, 합성된 깊이 영상을 입력으로 합성 영상 분석 모듈(830)에서 분석한 결과를 종합하여 최종 사용자 자세 분석 결과를 생성한다. 예컨대, 30hz의 입력속도를 가지는 깊이 센서를 이용하여 60hz이상의 자세 분석 결과를 생성할 수 있다.The result of analyzing in the general image analysis module 830 and the result of analyzing the synthesized depth image by the synthesized image analysis module 830 are integrated to generate the end user attitude analysis result. For example, a depth sensor having an input speed of 30 Hz can be used to generate a posture analysis result of 60 Hz or more.

본 발명은 시간적으로 보다 빈번하게 사용자의 자세를 분석할 수 있어 야구, 골프 스윙 등 고속으로 이루어지는 동작을 더 정확하게 분석할 수 있다.The present invention can analyze the posture of the user more frequently over time and more accurately analyze motion at high speed such as baseball and golf swing.

도 9는 본 발명의 일 실시예에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 시스템을 설명하기 위한 예시도이다.9 is an exemplary diagram illustrating a system for learning and analyzing user actions by synthesizing continuous depth images according to an embodiment of the present invention.

도 9는 상술한 도 8의 내용과 도 3의 내용을 결합하여 작성된 도면이다.FIG. 9 is a diagram created by combining the contents of FIG. 8 with the contents of FIG.

본 발명에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 시스템은, 깊이 영상 카메라로부터 얻은 제1 깊이 영상들을 합성하여 제2 깊이 영상을 생성하는 합성 영상 생성 모듈; 상기 제1 깊이 영상을 분석하여, 상기 제1 깊이 영상과 대응되는 제1 모션 분석 데이터를 산출하는 일반 영상 분석 모듈; 상기 제2 깊이 영상을 분석하여, 상기 제2 깊이 영상과 대응되는 제2 모션 분석 데이터를 산출하는 합성 영상 분석 모듈; 및 상기 제1 모션 분석 데이터와, 상기 제2 모션 분석 데이터를 조합하여 자세 분석 결과를 생성하는 자세 분석 결과 생성 모듈;을 포함한다.A system for synthesizing continuous depth images according to the present invention and learning and analyzing a user operation includes a composite image generation module for generating a second depth image by synthesizing first depth images obtained from a depth image camera; A general image analysis module for analyzing the first depth image and calculating first motion analysis data corresponding to the first depth image; A composite image analysis module for analyzing the second depth image and calculating second motion analysis data corresponding to the second depth image; And a posture analysis result generation module for generating a posture analysis result by combining the first motion analysis data and the second motion analysis data.

도 10는 본 발명에 따른 연속된 깊이 영상들을 합성하여 사용자 동작을 학습하고 분석하는 방법을 설명하기 위한 예시도이다.FIG. 10 is a diagram for explaining a method of learning and analyzing a user operation by synthesizing continuous depth images according to the present invention.

이상, 본 발명의 구성에 대하여 첨부 도면을 참조하여 상세히 설명하였으나, 이는 예시에 불과한 것으로서, 본 발명이 속하는 기술 분야에 통상의 지식을 가진 자라면 본 발명의 기술적 사상의 범위 내에서 다양한 변형과 변경이 가능함은 물론이다. 따라서 본 발명의 보호 범위는 전술한 실시예에 국한되어서는 아니 되며 이하의 특허청구범위의 기재에 의하여 정해져야 할 것이다.While the present invention has been described in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the above-described embodiments. Those skilled in the art will appreciate that various modifications, Of course, this is possible. Accordingly, the scope of protection of the present invention should not be limited to the above-described embodiments, but should be determined by the description of the following claims.

100: 컴퓨터 시스템
110: 프로세서
120: 메모리
123: ROM
126: RAM
130: 데이터 통신 버스
140: 저장소
150: 사용자 입력 장치
160: 사용자 출력 장치
170: 네트워크 인터페이스
180: 네트워크
300: 합성 영상 학습 모듈
310: 데이터 합성 모듈
313: 영상 합성 모듈
316: 합성 데이터 맵핑 모듈
320: 합성 데이터 학습 모듈
350: 일반 영상 학습 모듈
360: 일반 데이터 맵핑 모듈
370: 일반 데이터 학습 모듈
610: 깊이 영상 카메라로부터 얻은 깊이 영상 프레임들
620: 자세 분석 모듈
630: 자세 분석 결과
640: 모션 캡처 카메라로부터 얻은 모션 캡처 데이터 프레임들
710: 깊이 영상 카메라로부터 얻은 깊이 영상 프레임들
720: 합성한 깊이 영상 프레임을 포함하는 깊이 영상 프레임들
730: 자세 분석 결과
800: 자세 분석 모듈
810: 합성 영상 생성 모듈
820: 합성 영상 분석 모듈
830: 일반 영상 분석 모듈100: Computer system
110: Processor
120: Memory
123: ROM
126: RAM
130: Data communication bus
140: Store
150: User input device
160: User output device
170: Network interface
180: Network
300: Composite image learning module
310: Data Synthesis Module
313: image synthesis module
316: Composite Data Mapping Module
320: synthesis data learning module
350: general image learning module
360: Generic Data Mapping Module
370: General data learning module
610: Depth image frames obtained from a depth image camera
620: Posture Analysis Module
630: Posture analysis result
640: Motion capture data frames obtained from the motion capture camera
710: Depth image frames from depth camera
720: depth image frames including the synthesized depth image frame
730: Posture analysis result
800: Posture Analysis Module
810: Composite image generation module
820: Composite image analysis module
830: General image analysis module

Claims

A composite image generation module for generating a second depth image by synthesizing first depth images obtained from the depth image camera;
A general image analysis module for analyzing the first depth image and calculating first motion analysis data corresponding to the first depth image;
A composite image analysis module for analyzing the second depth image and calculating second motion analysis data corresponding to the second depth image; And
A posture analysis result generation module for generating a posture analysis result by combining the first motion analysis data and the second motion analysis data;
The depth image is synthesized to learn and analyze the user's motion.