KR20160138729A

KR20160138729A - Feature extraction method for motion recognition in image and motion recognition method using skeleton information

Info

Publication number: KR20160138729A
Application number: KR1020150072935A
Authority: KR
Inventors: 강제원; 강민주; 김나영; 류수경; 이지은
Original assignee: 이화여자대학교 산학협력단
Priority date: 2015-05-26
Filing date: 2015-05-26
Publication date: 2016-12-06
Also published as: KR101711736B1

Abstract

A user motion recognizing method using skeleton information, by a motion recognizing apparatus, includes the steps of: performing a neural network learning by using a first feature point based on a plurality of first joint points among human joint points in a training image including a plurality of human training-movements; obtaining an input image including a human input-movement by using a camera; extracting a second feature point based on a plurality of second joint points, which have positions same as the first joint points, among the human joint points in the input image; and inputting the second feature point into a neural network formed by the neural network learning to determine a training-movement corresponding to the input movement.

Description

TECHNICAL FIELD [0001] The present invention relates to a feature extraction method for recognizing motion in an image, and a user motion recognition method using skeleton information. [0002]

이하 설명하는 기술은 영상에 사용자 동작을 인식하기 위한 기술에 관한 것이다.The techniques described below relate to techniques for recognizing user actions on a video.

모션 인식 기술은 사용자의 신체 움직임을 인식하여 컴퓨터와 상호 작용 하는 기술이다. 사람의 움직임을 컴퓨터 장치에 대한 인터페이싱 입력으로 사용할 수 있다.Motion recognition technology is a technology that recognizes the user's body movements and interacts with the computer. Human motion can be used as an interfacing input to a computer device.

모션 인식을 적용한 대표적인 예는 자이로 센서를 기반으로 한 스마트폰, 컨트롤러 센서를 기반으로 한 게임 콘솔 및 카메라 센서 기반의 스마트폰 등이 있다.Typical examples of motion recognition are smart phones based on gyro sensors, game consoles based on controller sensors, and smart phones based on camera sensors.

미국공개특허 US2010-0199228호U.S. Published Patent Application No. US2010-0199228 미국공개특허 US2012-0157198호U.S. Published Patent Application No. US2012-0157198

MS의 Kinect는 영상에 포함된 사람의 골격 정보를 추출하여 동작을 인식한다. 다만 Kinect는 원거리에서 사람의 골격 정보를 추적하므로 정밀한 인식을 제공하지 못하는 한계가 있다. The Kinect of the MS extracts the skeleton information of the person included in the image and recognizes the motion. However, since Kinect tracks human skeleton information at a long distance, there is a limit in not providing accurate recognition.

이하 설명하는 기술은 영상에 포함된 사람의 골격 정보 중 동작을 정확하게 인식할 수 있는 특징점을 이용하는 동작 인식 방법을 제공하고자 한다. The technique described below is to provide an operation recognition method using a feature point that can accurately recognize an operation among skeleton information of a person included in an image.

영상에서 동작 인식을 위한 특징점 추출 방법은 동작 인식 장치가 카메라로 깊이 정보를 갖는 영상을 획득하는 단계, 동작 인식 장치가 영상에 포함된 객체를 추출하고, 객체의 골격 정보를 추출하는 단계, 동작 인식 장치가 골격 정보 중 복수의 지점에 대한 좌표를 결정하는 단계 및 동작 인식 장치가 복수의 지점이 형성하는 각도 및 복수의 지점과 특정 평면이 형성하는 각도를 특징점으로 결정하는 단계를 포함한다. 객체는 사람이고, 골격 정보는 사람의 관절 및 주요뼈에 해당하는 복수의 지점을 포함할 수 있다.A method for extracting a feature point for motion recognition in an image includes the steps of acquiring an image having depth information by a motion recognition device, extracting an object included in the motion recognition device, extracting skeleton information of the object, Determining the coordinates of the plurality of points in the skeleton information by the apparatus; and determining, as the feature points, the angle formed by the plurality of points and the angle formed by the specific plane and the plurality of points formed by the plurality of points. The object may be a person, and the skeleton information may include a plurality of points corresponding to the joints and major bones of a person.

골격 정보를 이용한 사용자 동작 인식 방법은 동작 인식 장치가 사람의 복수의 훈련 동작을 포함하는 훈련 영상에서 사람의 관절 지점 중 제1 복수의 관절 지점에 기반한 제1 특징점으로 이용하여 신경망 학습을 수행하는 단계, 동작 인식 장치가 카메라로 사람의 입력 동작을 포함하는 입력 영상을 획득하는 단계, 동작 인식 장치가 입력 영상에서 사람의 관절 지점 중 제1 복수의 관절 지점과 동일한 위치의 제2 복수의 관절 지점에 기반한 제2 특징점을 추출하는 단계 및 동작 인식 장치가 제2 특징점을 신경망 학습을 통해 형성한 신경망 네트워크에 입력하여 입력 동작에 대응하는 훈련 동작을 결정하는 단계를 포함한다.A method for recognizing a user's motion using skeleton information includes a step of performing neural network learning using a motion recognition apparatus as a first feature point based on a first plurality of joint points of a human joint point in a training image including a plurality of training operations of a person A step of the motion recognition device acquiring an input image including a human input operation with a camera, a step of recognizing the motion of the input image at a second plurality of joint points at the same position as the first plurality of joint points of the human joint point Extracting a second feature point based on the second feature point, and inputting the second feature point to a neural network formed through neural network learning to determine a training operation corresponding to the input operation.

특징점은 복수의 지점 중 3개의 지점이 형성하는 2개의 직선 사이의 각도 및 3개의 지점 중 2개의 지점이 형상하는 직선과 특정 평면과의 각도 중 적어도 하나일 수 있다.The feature point may be at least one of an angle between two straight lines formed by three points out of a plurality of points and an angle between a straight line formed by two points out of the three points and a specific plane.

이하 설명하는 기술은 다른 장비 없이 영상을 통해 사람의 골격 정보를 추적하여 비교적 손쉽게 동작을 인식할 수 있다. 또한 이하 설명하는 기술은 사람의 골격 정보 중 동작을 명확하게 인식할 수 있는 특징점을 사용하고, 신경망 기계학습을 통해 사전에 마련된 기준으로 입력 동작을 명확하게 구분한다.The technique described below can relatively easily recognize the motion by tracking the skeleton information of the person through the image without any other equipment. In addition, the technique described below uses a feature point that can clearly recognize an operation among human skeleton information, and clearly distinguishes an input operation based on a previously prepared standard through neural network machine learning.

도 1은 골격 정보에 기반하여 동작을 인식하는 동작 인식 장치에 대한 구성을 도시한 블록도의 예이다.
도 2는 사람의 골격 정보에 대한 예이다.
도 3은 사람을 포함하는 깊이 영상에서 골격 정보를 도시한 예이다.
도 4는 골격 정보 중 일부를 이용한 특징점에 대한 예이다.
도 5는 골격 정보 중 일부를 이용한 다른 특징점에 대한 예이다.
도 6은 영상에서 동작 인식을 위한 특징점 추출 방법에 대한 순서도의 예이다.
도 7은 골격 정보를 이용한 사용자 동작 인식 방법에 대한 순서도의 예이다.1 is an example of a block diagram illustrating a configuration of an operation recognition apparatus for recognizing an operation based on skeleton information.
2 is an example of human skeleton information.
3 is an example showing skeleton information in a depth image including a person.
4 is an example of feature points using some of the skeleton information.
5 is an example of other minutiae points using some of the skeleton information.
6 is an example of a flowchart of a feature point extraction method for motion recognition in an image.
7 is an example of a flowchart of a method of recognizing a user's movement using skeleton information.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The following description is intended to illustrate and describe specific embodiments in the drawings, since various changes may be made and the embodiments may have various embodiments. However, it should be understood that the following description does not limit the specific embodiments, but includes all changes, equivalents, and alternatives falling within the spirit and scope of the following description.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The terms first, second, A, B, etc., may be used to describe various components, but the components are not limited by the terms, but may be used to distinguish one component from another . For example, without departing from the scope of the following description, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.As used herein, the singular " include "should be understood to include a plurality of representations unless the context clearly dictates otherwise, and the terms" comprises & , Parts or combinations thereof, and does not preclude the presence or addition of one or more other features, integers, steps, components, components, or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다. 따라서, 본 명세서를 통해 설명되는 각 구성부들의 존재 여부는 기능적으로 해석되어야 할 것이며, 이러한 이유로 이하 설명하는 골격 정보를 이용한 사용자 동작 인식 방법의 구성은 이하 설명하는 기술의 목적을 달성할 수 있는 한도 내에서 대응하는 도면과는 상이해질 수 있음을 명확히 밝혀둔다.Before describing the drawings in detail, it is to be clarified that the division of constituent parts in this specification is merely a division by main functions of each constituent part. That is, two or more constituent parts to be described below may be combined into one constituent part, or one constituent part may be divided into two or more functions according to functions that are more subdivided. In addition, each of the constituent units described below may additionally perform some or all of the functions of other constituent units in addition to the main functions of the constituent units themselves, and that some of the main functions, And may be carried out in a dedicated manner. Therefore, the existence of each component described in the present specification should be interpreted as a function, and for this reason, the configuration of the user motion recognition method using skeleton information described below is limited to the extent that the object of the following description can be achieved It should be clear that this can be different from the corresponding drawings in Fig.

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.Also, in performing a method or an operation method, each of the processes constituting the method may take place differently from the stated order unless clearly specified in the context. That is, each process may occur in the same order as described, may be performed substantially concurrently, or may be performed in the opposite order.

도 1은 골격 정보에 기반하여 동작을 인식하는 동작 인식 장치(100)에 대한 구성을 도시한 블록도의 예이다. 도 1에서는 사람이 일정한 동작을 취하고 있고, 동작 인식 장치(100)가 제1 카메라(111) 및 제2 카메라(112)를 통해 사용자를 포함한 영상을 획득하고 있다. 동작 인식 장치(100)는 두 개의 카메라로 깊이 정보가 있는 영상을 획득한다. 두 개의 카메라는 RGB 카메라일 수도 있고, 적외선 카메라일 수도 있다. 깊이 정보 영상을 생성하는 과정은 널리 알려진 것이므로 자세한 설명은 생략한다.1 is an example of a block diagram showing the configuration of the motion recognition apparatus 100 for recognizing motion based on skeleton information. In FIG. 1, a person performs a certain operation, and the motion recognition apparatus 100 acquires an image including the user through the first camera 111 and the second camera 112. The motion recognition apparatus 100 acquires an image having depth information with two cameras. The two cameras may be RGB cameras or infrared cameras. Since the process of generating the depth information image is well known, detailed description thereof will be omitted.

도 1에서는 동작 인식 장치(100)의 구성으로 두 개의 카메라(111 및 112)만을 도시하였다. 후술하겠지만 동작 인식 장치(100)는 사람(객체)을 추출하고, 사람의 골격 정보를 기반으로 특징점을 추출한다. 따라서 동작 인식 장치(100)는 획득한 영상을 저장하는 저장 매체, 영상 처리를 위한 중앙 처리 장치 등과 같은 장치를 포함한다. 동작 인식 장치(100)는 일종의 컴퓨터 장치라고 할 수 있다. 동작인식 장치(100)는 게임 콘솔, 스마트 TV, PC, 스마트 도어락, 수화 번역 장치 등과 같은 장치일 수 있다.In FIG. 1, only two cameras 111 and 112 are shown in the configuration of the motion recognition apparatus 100. As will be described later, the motion recognition apparatus 100 extracts a person (object) and extracts a feature point based on a human skeleton information. Accordingly, the motion recognition apparatus 100 includes a storage medium for storing the acquired image, a central processing unit for image processing, and the like. The motion recognition apparatus 100 may be a kind of computer apparatus. The motion recognition device 100 may be a device such as a game console, a smart TV, a PC, a smart door lock, a sign language translation device, and the like.

동작 인식 장치(100)는 일정한 동작을 취하는 사람이 포함된 영상을 획득하고, 이미지 처리 기법을 통해 획득한 영상에서 사람을 추출한다. 사람 추출은 영상에서 배경을 제외한 특정 객체를 추출하는 기법을 이용한다. The motion recognition apparatus 100 acquires an image including a person who takes a certain action, and extracts a person from the image acquired through the image processing technique. Human extraction uses a technique to extract a specific object from the image except for the background.

깊이 정보는 인간의 양안을 모델로 한두 개 이상의 카메라를 활용하여 얻은 두 영상/동영상 사이의 변위정보(disparity information)을 3차원 정보로 변환하여 얻거나 적외선 센서를 이용하여 원거리에서 3차원 깊이를 측정 할 수 있다. 깊이 정보는 [0,255] 사이의 픽셀 값을 갖는 흑백이미지로 표현한다. 피사체가 카메라의 기준점에서 가까울수록 밝은 영역으로 표현되고, 멀수록 어둡게 표현된다. 깊이 영상에서 사람은 배경보다 앞에 위치하여 배경보다 밝게 표현되는 특성이 있다. 이러한 특성을 반영하여 컬러 정보와 깊이 정보의 혼합을 통해 영상으로부터 사람을 구분 할 수 있다.Depth information is obtained by transforming disparity information between two images / videos obtained by using two or more cameras using human binocular model into three-dimensional information or measuring three-dimensional depth at a distance using an infrared sensor can do. Depth information is represented as a black and white image with pixel values between [0,255]. The closer the subject is to the reference point of the camera, the brighter the area, and the darker the image becomes. In a depth image, a person is positioned in front of the background and has a characteristic that it is expressed brighter than the background. By reflecting these characteristics, it is possible to distinguish people from images through the mixture of color information and depth information.

동작 인식 장치(100)는 영상에서 사람을 추출하고, 추가적인 이미지 처리 기법을 사용하여 사람의 골격(skeleton) 정보를 추출한다. 적외선 카메라로 얻은 깊이 정보 값은 전경 또는 배경 내에서 평탄한 값을 가지며, 가까이 있는 물체 중에서도 사람의 골격을 가진 객체를 사람으로 인식한다. 도 2는 사람의 골격 정보에 대한 예이다.The motion recognition apparatus 100 extracts a person from the image and extracts a person's skeleton information using an additional image processing technique. The depth information value obtained by the infrared camera has a flat value in the foreground or background, and recognizes a person having a human skeleton as a human among nearby objects. 2 is an example of human skeleton information.

골격 정보는 사람의 관절 부위 및 몸의 움직임에 관여하는 주요한 뼈의 위치를 포함한다. 팔에서 골격 정보는 손 지점, 손목 지점, 팔꿈치 지점 및 어깨 지점을 포함하고, 다리에서 골격 정보는 발 지점, 발목 지점, 무릎 지점 및 엉덩이 지점을 포함한다. 나머지 골격 정보는 머리 지점, 어깨 중심 지점, 척추 지점, 엉덩이 중심 지점을 포함한다. 도 2는 모두 20개의 위치를 골격 정보로 도시한다.The skeletal information includes the locations of the main bones involved in the human joint region and the movement of the body. The skeleton information in the arm includes a hand point, a wrist point, an elbow point and a shoulder point, and the skeleton information in the leg includes a foot point, an ankle point, a knee point, and a hip point. The remaining skeleton information includes the head point, the shoulder center point, the spinal point, and the hip center point. FIG. 2 shows all 20 positions as skeleton information.

동작 인식 장치(100)는 일정한 지점을 기준으로 골격 정보에 포함되는 각 부위의 위치를 3차원 좌표로 결정할 수 있다.The motion recognition apparatus 100 can determine the position of each part included in the skeleton information as a three-dimensional coordinate with reference to a certain point.

도 3은 사람을 포함하는 깊이 영상에서 골격 정보를 도시한 예이다. 3 is an example showing skeleton information in a depth image including a person.

도 3(a)는 사람이 나타난 깊이 영상에서 사람에 해당하는 객체를 구분하고, 객체에 대한 골격화를 수행한 예에 해당한다. 도 3(a)에서 사람 내부에 직선으로 표현된 부분이 골격화의 결과물에 해당한다.FIG. 3 (a) is an example of dividing an object corresponding to a person and skeletonizing the object in a depth image of a person. In Fig. 3 (a), the portion represented by a straight line in the human body corresponds to the result of skeletonization.

골격화는 이진영상에서 물체의 크기와 모양을 요약하는 선 과 곡선의 집합을 만드는 과정이다. 골격을 정의하는 방법이 다양하므로 주어진 물체에서 서로 다른 모양의 골격이 있을 수 있다. 다만 동작 인식 장치(100)가 인식하는 영상은 사람에 대한 영상이므로, 사람의 해부학적인 구조는 유사하여 골격화에 따른 결과물로 유사한다. 골격화는 침식연산을 이용해 수행한다. 골격화는 서로 다른 방향성이 여러 개 있는 침식마스크를 이용하여 영상 내의 물체를 서서히 깎아 침식을 반복하여 객체의 하부 구조를 나타낸다. 사람의 구조적인 특징에 대한 정보를 기반으로 골격화 결과물인 직선의 길이, 직선의 위치 및 직선의 휘어짐 등을 기준으로 전술한 골격 정보를 결정할 수 있다. 도 3(a)에서는 골격 정보를 직선에서 작은 점으로 표시하였다. 도 3(a)에서 어깨 지점, 팔꿈치 지점 및 손목 지점은 별도로 표시하였다.Skeletonization is the process of creating a set of lines and curves that summarize the size and shape of an object in binary space. Since there are many ways to define a skeleton, there can be different skeletons in a given object. However, since the image recognized by the motion recognition apparatus 100 is a human image, the anatomical structure of the human being is similar to the result of the skeleton. Skeletonization is performed using erosion operations. The skeleton is an erosion mask with several different directional directions, and the objects in the image are gradually shaved and eroded repeatedly to reveal the substructure of the object. The skeleton information described above can be determined on the basis of the length of the skeleton, the position of the straight line, and the deflection of the straight line based on the information on the structural characteristics of the person. In Fig. 3 (a), skeleton information is indicated by a small point in a straight line. In Fig. 3 (a), the shoulder point, the elbow point and the wrist point are separately shown.

도 3(b)는 깊이 정보 영상에서 골격 정보의 위치만을 십자 모양으로 도시한 예이다. 동작 인식 장치(100)는 획득한 영상에서 사람의 골격 정보를 결정하고, 이후 사람의 동작에 따라 각 골격 정보의 위치를 추적할 수 있다. FIG. 3 (b) shows an example in which only the position of the skeleton information is shown in a cross shape in the depth information image. The motion recognition apparatus 100 can determine the skeleton information of the person in the acquired image and then track the position of each skeleton information according to the motion of the person.

동작 인식 장치(100)는 골격 정보의 위치를 기반으로 특징점을 추출한다. 해당 특징점이 사람의 동작을 인식하는 기준이 된다.The motion recognition apparatus 100 extracts feature points based on the position of the skeleton information. The feature point is a criterion for recognizing the motion of a person.

동작 인식 장치(100)는 앞서 설명한 20개의 골격 정보를 조합하여 고차원의 특징점을 생성할 수 있다. 예컨대, R={p1, p2, .. pn} 이라는 골격 정보가 있을 때 z1=p1+p2, z2=p1*p3 과 같은 조합을 통해 특징점 Z={z1, z2, … , zm}를 생성할 수 있다. 결국 복수의 골격 정보를 이용하여 다양한 특징점을 생성할 수 있다.The motion recognition apparatus 100 can generate high-dimensional feature points by combining the above-described 20 pieces of skeleton information. For example, when there is skeleton information of R = {p1, p2, ... pn}, feature points Z = {z1, z2, ..., pn} are obtained through combinations such as z1 = p1 + p2, , zm} can be generated. As a result, various feature points can be generated using a plurality of skeleton information.

도 4 및 도 5를 통해 몇 가지 특징점을 추출하는 과정을 설명한다. 도 4는 골격 정보 중 일부를 이용한 특징점에 대한 예이다. 도 4(a)는 일정한 동작을 취한 사람 영상을 도시하고, 도 4(b)는 골격 정보의 위치 중 일부를 이용하여 특징점을 결정하는 예를 도시한다.A process of extracting several feature points through FIGS. 4 and 5 will be described. 4 is an example of feature points using some of the skeleton information. Fig. 4 (a) shows an image of a person taking a certain action, and Fig. 4 (b) shows an example of determining a feature point using a part of the positions of skeleton information.

먼저 동작 인식 장치(100)는 골격 정보에 대한 3차원 좌표값을 결정한 상태라고 전제한다. 도 4에서 골격 정보는 왼쪽 어깨 지점(P1), 왼쪽 팔꿈치 지점(P2) 및 왼쪽 손목 지점(P3)을 사용한다. 각 골격 정보에 대한 3차원 좌표는 P1(x1, y1, z1), P2(x2, y2, z2) 및 P3(x3, y3, z3)이다. 아래와 같이 3가지 특징점을 구할 수 있다. 한편 x, y, z 축은 특정한 기준에 따라 설정될 수 있다.First, it is assumed that the motion recognition apparatus 100 has determined the three-dimensional coordinate value of the skeleton information. In Fig. 4, the skeleton information uses the left shoulder point P1, the left elbow point P2, and the left wrist point P3. The three-dimensional coordinates of each skeleton information are P1 (x1, y1, z1), P2 (x2, y2, z2), and P3 (x3, y3, z3). Three feature points can be obtained as shown below. On the other hand, the x, y, and z axes can be set according to specific criteria.

(1) 두 개의 직선 사이의 각도(u)(1) the angle between two straight lines (u)

P1과 P2는 하나의 직선(제1 직선)을 형성하고, P2와 P3도 하나의 직선(제2 직선)을 형성한다. 따라서 제1 직선과 제2 직선 사이의 각도(u)를 특징점으로 삼을 수 있다. 제1 직선과 제2 직선 사이의 각도(u)는 아래의 수학식 1을 통해 연산할 수 있다.P1 and P2 form one straight line (first straight line), and P2 and P3 form one straight line (second straight line). Therefore, the angle u between the first straight line and the second straight line can be used as a characteristic point. The angle u between the first straight line and the second straight line can be calculated by the following equation (1).

(2) 하나의 직선과 제1 평면 사이의 각도(

)(2) the angle between one straight line and the first plane

)

P1과 P2가 형성하는 제1 직선과 제1 평면 사이의 각도(

)를 특징점으로 삼을 수 있다. 제1 평면은 다양한 기준의 평면을 사용할 수 있다. 예컨대, 도 4에서는 z축에 평행한 축을 제1 평면으로 사용하였다. 제1 직선과 제1 평면 사이의 각도(

)는 아래의 수학식 2를 통해 연산할 수 있다.The angle between the first straight line and the first plane formed by P1 and P2

) As a feature point. The first plane may use various reference planes. For example, in FIG. 4, the axis parallel to the z-axis is used as the first plane. The angle between the first straight line and the first plane

) Can be calculated by the following equation (2).

또는 P2와 P3가 형성하는 제2 직선과 제1 평면 사이의 각도를 특징점으로 삼을 수도 있을 것이다.Or an angle between the second straight line formed by P2 and P3 and the first plane may be used as a feature point.

(3) 하나의 직선과 제2 평면 사이의 각도(θ)(3) an angle (?) Between one straight line and the second plane

P1과 P2가 형성하는 제1 직선과 제2 평면 사이의 각도(θ)를 특징점으로 삼을 수 있다. 제2 평면은 다양한 기준의 평면을 사용할 수 있다. 예컨대, 도 4에서는 y축에 평행한 축을 제2 평면으로 사용하였다. 제1 직선과 제1 평면 사이의 각도(θ)는 아래의 수학식 3을 통해 연산할 수 있다.The angle? Between the first straight line and the second plane formed by P1 and P2 can be used as a characteristic point. The second plane may use a plane of various criteria. For example, in FIG. 4, an axis parallel to the y-axis is used as the second plane. The angle? Between the first straight line and the first plane can be calculated by the following equation (3).

도 4는 사람의 왼팔에 대해서 3가지 특징점을 도시한 것이다. 동일한 방법으로 사람의 오른팔에 대해서도 3가지 특징점을 추출할 수 있다.Figure 4 shows three characteristic points for the left arm of a person. In the same way, three characteristic points can be extracted for a human right arm.

도 5는 골격 정보 중 일부를 이용한 다른 특징점에 대한 예이다. 도 5는 다리의 골격 정보를 이용한 예를 도시한다. 도 5(a)는 일정한 동작을 취한 사람 영상을 도시하고, 도 5(b)는 골격 정보의 위치 중 일부를 이용하여 특징점을 결정하는 예를 도시한다.5 is an example of other minutiae points using some of the skeleton information. FIG. 5 shows an example using skeleton information of a leg. Fig. 5 (a) shows a person image taken with a certain operation, and Fig. 5 (b) shows an example of determining a minutiae using a part of the positions of the skeleton information.

도 5에서 골격 정보는 오른쪽 엉덩이 지점(Q1), 오른쪽 무릎 지점(Q2) 및 오른쪽 발목 지점(Q3)을 사용한다. 각 골격 정보에 대한 3차원 좌표는 Q1(x1, y1, z1), Q2(x2, y2, z2) 및 Q3(x3, y3, z3)이다. In Fig. 5, the skeleton information uses the right hip point Q1, the right knee point Q2, and the right ankle point Q3. The three-dimensional coordinates of each skeleton information are Q1 (x1, y1, z1), Q2 (x2, y2, z2) and Q3 (x3, y3, z3).

도 4에 대한 설명과 동일한 방법으로 (1) Q1과 Q2가 형성하는 제1 직선과 Q2와 Q3가 형성하는 제2 직선 사이의 각도(u), (2) Q1과 Q2가 형성하는 제1 직선과 제1 평면인 z축에 평행한 평면 사이의 각도(

) 및 (3) Q1과 Q2가 형성하는 제1 직선과 제2 평면인 y축에 평행한 평면 사이의 각도(θ)를 특징점으로 결정할 수 있다.(1) an angle (u) between a first straight line formed by Q1 and Q2 and a second straight line formed by Q2 and Q3, (2) a first straight line formed by Q1 and Q2 And the plane parallel to the z-axis, the first plane (

) And (3) an angle? Between the first straight line formed by Q1 and Q2 and the plane parallel to the y-axis which is the second plane.

또한 왼쪽 발에 대해서도 동일한 방법으로 3개의 특징점을 추출할 수 있다.In addition, three feature points can be extracted in the same way for the left foot.

도 4 및 도 5에서 설명한 바와 같이 왼쪽 팔, 오른쪽 팔, 왼쪽 다리 및 오른쪽 다리의 일부 골격 정보를 이용하여 각각 3개의 특징점을 추출한다면, 모두 12개의 특징점을 마련할 수 있다.As described with reference to FIGS. 4 and 5, if three feature points are respectively extracted using the skeleton information of the left arm, the right arm, the left leg, and the right leg, 12 feature points can be provided.

도 6은 영상에서 동작 인식을 위한 특징점 추출 방법(300)에 대한 순서도의 예이다. 전술한 특징점 추출을 정리하면 동작 인식 장치는 먼저 카메라로 깊이 정보를 갖는 영상을 획득한다(310). 동작 인식 장치는 영상에 포함된 특정 객체(사람)을 추출한다. 동작 인식 장치는 추출한 객체의 골격 정보를 추출한다(320). 동작 인식 장치는 골격 정보 중 특징점의 기준으로 사용하는 복수의 지점에 대한 3차원 위치를 결정한다(330). 마지막으로 동작 인식 장치는 3차원 위치를 결정한 복수의 지점 중 일부 또는 전부가 형성하는 직선 사이의 각도 및 직선과 특정 평면 사이의 각도 중 적어도 하나를 특징점으로 결정한다(340).6 is an example of a flowchart of a feature point extraction method 300 for motion recognition in a video. When the feature point extraction is summarized, the motion recognition apparatus first obtains an image having depth information with the camera (310). The motion recognition device extracts a specific object (person) included in the image. The motion recognition apparatus extracts skeleton information of the extracted object (320). The motion recognition apparatus determines a three-dimensional position of a plurality of points used as a reference of a minutia in the skeleton information (330). Finally, the motion recognition device determines at least one of an angle between a straight line formed by a part or all of a plurality of points determining the three-dimensional position and an angle between the straight line and the specific plane as a feature point (340).

전술한 특징점 외에 다른 다양한 특징점이 사용될 수도 있다. 기본적으로 골격 정보 중 다른 지점(머리 지점, 어깨 중심 지점, 척추 지점 등)을 사용할 수도 있다. 어떤 동작을 인터페이스로 사용하느냐에 따라 특징점에 사용하는 골격 정보가 달라질 수 있다. 또한 사람의 동작이 몸 전체의 동작인지, 팔과 같이 일부의 동작인지에 따라 특징점이 달라질 수도 있다.Various feature points other than the above-described feature points may also be used. Basically, other points (skull point, shoulder point, spinal point, etc.) of the skeleton information can be used. The skeleton information used for the minutiae may vary depending on which operation is used as an interface. In addition, the characteristic points may be different depending on whether the human motion is an entire body motion or a part of motion such as an arm.

전술한 특징점(u)는 인접한 골격 정보가 형성하는 직선을 사용하였지만, 반드시 인접한 골격 정보 사이의 직선을 사용하지 않을 수도 있다. 나아가 각도를 결정하기 위한 기준이 되는 직선이나 평면을 다른 것을 사용할 수도 있다. 즉 매우 다양한 조합의 특징점 생성이 가능하다.Although the feature point u described above uses a straight line formed by adjacent skeleton information, a straight line between adjacent skeleton information may not necessarily be used. Further, it is possible to use another straight line or plane which is a reference for determining the angle. In other words, it is possible to generate feature points in various combinations.

도 7은 골격 정보를 이용한 사용자 동작 인식 방법(400)에 대한 순서도의 예이다. 7 is an example of a flowchart of a method 400 for recognizing a user's motion using skeleton information.

동작 인식 장치가 사용자 동작을 인식하기 위해서는 먼저 사용자 동작에 대한 참조 데이터가 필요하다. 다양한 방법을 사용할 수 있지만, 기본적으로 기계학습 중 신경망 학습을 기준으로 설명한다.In order for the motion recognition apparatus to recognize the user's operation, reference data for the user's operation is first required. Although various methods can be used, basically, it is explained based on neural network learning during machine learning.

기계학습은 적용 어플리케이션과 데이터의 형태에 따라 감독 학습(supervised learning)과 무감독 학습(unsupervised learning)으로 구분 할 수 있다. 무감독 학습의 목적은 학습을 위한 입력과 출력의 관계가 없어 입력 패턴의 유형을 파악하여 입력 간 공통적인 특성을 파악하는 것이다. 반면 감독 학습은 입력과 출력의 쌍이 미리 알려진 사전 정보로부터 입출력을 매핑하는 함수를 학습하는 과정이다. 즉 시스템은 입력 벡터 x와 이에 대한 올바른 응답 y를 학습 데이터로 제공받아 함수 f를 학습하고, 학습하지 않은 새로운 데이터 x'가 들어 왔을 때 출력 y'를 추론하는 학습을 학습한다. 따라서 사용자 동작 인식을 위해서는 감독 학습 기법을 이용한다.Machine learning can be divided into supervised learning and unsupervised learning depending on the application and data type. The goal of unconditional learning is to identify the type of input pattern and to understand common characteristics between inputs because there is no relation between input and output for learning. On the other hand, supervised learning is a process of learning a function that maps input and output from a dictionary information that is known in advance. In other words, the system learns the function f by providing the input vector x and the correct response y to it as learning data, and learns to infer the output y 'when the new data x' that has not been learned comes in. Therefore, supervised learning techniques are used for user motion recognition.

기계학습의 일종인 신경망 학습은 감독학습 중 하나로 입력 데이터에 X에 대해 올바른 정답 Y를 학습하는 방식이다. 동작 인식 장치는 N개의 동작에 해당하는 Label 1~Label N 모션을 각각 여러 번 입력 받은 후 각 동작에 대한 12개의 특징점(입력X)들을 입력 후 학습하여 Label1 ~ Label N 모션을 판별할 수 있는 신경망 네트워크를 생성한다. 신경망 학습의 결과로 노드의 구조 정보와 노드 간 연결 계수를 얻는다.Neural network learning, which is a type of machine learning, is one of the supervised learning methods in which correct answer Y is learned about X in input data. The motion recognition device receives Label 1 ~ Label N motions corresponding to N motions respectively and inputs 12 minutiae points (input X) for each motion, Create a network. As a result of neural network learning, node structure information and inter-node connection coefficients are obtained.

동작 인식 장치는 사람의 훈련 동작을 포함하는 훈련 영상을 이용하여 특징점을 추출한다. 학습을 위한 동작을 훈련 동작이라고 하고, 훈련 동작이 포함된 영상을 훈련 영상이라고 명명한다. 추출한 특징점은 전술한 바와 같이 골격 정보 중 복수의 관절 지점을 이용하여 생성한다. 학습 과정에 사용되는 복수의 관절 지점을 제1 복수의 관절 지점이라고 명명한다. 학습 과정에서 제1 복수의 관절 지점을 사용하여 추출한 특징점을 제1 특징점이라고 명명한다. 제1 특징점은 전술한 바와 같은 12개의 특징점일 수 있다. 동작 인식 장치는 복수의 훈련 동작을 제1 특징점을 기준으로 신경망 학습을 수행한다(410).The motion recognition apparatus extracts feature points using a training image including a training operation of a person. An operation for learning is referred to as a training operation, and an image including a training operation is referred to as a training image. The extracted feature points are generated using a plurality of joint points in the skeleton information as described above. The plurality of joint points used in the learning process is called the first plurality of joint points. The feature points extracted using the first plurality of joint points in the learning process are referred to as first feature points. The first feature point may be twelve feature points as described above. The motion recognition apparatus performs a neural network learning based on the first feature point in a plurality of training operations (410).

동작 인식 장치는 훈련 동작들에 대한 신경망 학습이 완료되지 않았다면(420의 No) 반복적으로 신경망 학습을 수행하고(410), 훈련 동작들에 대한 신경망 학습이 완료되었다면(420의 Yes) 동작 인식이 필요한 영상을 대기한다.If the neural network learning for the training operations is not completed (No at 420), the neural network learning is repeatedly performed (410). If the neural network learning for the training operations is completed (Yes at 420) Wait for video.

동작 인식 장치가 두 개의 카메라로 동작 인식이 필요한 입력 영상을 획득한다(430). 두 개의 카메라를 사용하는 이유는 깊이 정보를 갖는 영상을 생성하기 위함이다. The motion recognition apparatus acquires an input image requiring recognition of the motion using two cameras (430). The reason for using two cameras is to generate images with depth information.

동작 인식 장치는 입력 영상에서 영사에 포함된 객체(사람)을 추출하고, 추출한 사람에서 골격 정보를 추출한다. 사람이라는 객체에서 복수의 관절 지점을 결정한다(440). 입력 영상에서 결정한 복수의 관절 지점을 제2 복수의 관절 지점이라고 명명한다. 제2 복수의 관절 지점은 학습 단계에서 사용한 제1 복수의 관절 지점과 동일한(대응되는) 위치여야 한다.The motion recognition device extracts the object (person) included in the projection from the input image and extracts the skeleton information from the extracted person. A plurality of joint points are determined in an object called a person (440). The plurality of joint points determined in the input image are referred to as a second plurality of joint points. The second plurality of joint points should be the same (corresponding) position as the first plurality of joint points used in the learning phase.

동작 인식 장치는 제2 복수의 관절 지점 중 복수의 지점을 사용하여 특징점을 결정한다(450). 입력 영상에 대한 특징점을 제2 특징점이라고 명명한다. 제2 특징점은 제1 특징점과 대응된다. 제1 특징점이 도 4 및 도 5에서 설명한 12개의 특징점이라면, 제2 특징점도 동일한 12개의 특징점이어야 한다.The motion recognition apparatus determines a feature point using a plurality of points of the second plurality of joint points (450). The feature point for the input image is called the second feature point. The second feature point corresponds to the first feature point. If the first feature points are the 12 feature points described in FIGS. 4 and 5, the second feature points should be the same 12 feature points.

동작 인식 장치는 최종적으로 제2 특징점을 신경망 네트워크에 입력하여 최종적으로 결정되는 동작을 입력 동작으로 인식한다. 예컨대, 학습 과정에서 N개의 동작에 대한 신경망 네트워크를 생성했다면, 동작 인식 장치는 입력 동작을 N개의 동작 중 하나로 판단한다.The motion recognition device finally enters the second feature point into the neural network network and recognizes the final determined motion as an input motion. For example, if a neural network for N actions is created in the learning process, the motion recognition device determines the input action as one of the N actions.

동작 인식 장치는 입력 영상에서 학습과정과 동일한 절차로 특징점 12개를 추출하고, 학습을 이용해 취득한 파라미터 세트(신경망 네트워크의 경우 노드의 구조와 노드 간 연결 계수)를 이용하여 학습한 신경망 네트워크에 입력하면 최종 출력 노드에서 Label1 ~ Label N 모션 중 1개로 판별 한다. 신경망 네트워크를 통해 모션을 판별하는 한편 새로운 동작을 인식하면서 얻어진 신경망 정보는 410으로 피드백 되어 신경망을 업데이트 한다(470).The motion recognition device extracts 12 feature points from the input image in the same procedure as the learning process and inputs them to the learned neural network using the obtained parameter set (in the case of the neural network, the structure of the node and the connection coefficient between the nodes) And one of Label1 to Label N motions in the final output node. The neural network information obtained by recognizing the motion through the neural network network while recognizing the new operation is fed back to 410 to update the neural network (470).

본 실시예 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시예는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.It should be noted that the present embodiment and the drawings attached hereto are only a part of the technical idea included in the above-described technology, and those skilled in the art will readily understand the technical ideas included in the above- It is to be understood that both variations and specific embodiments which can be deduced are included in the scope of the above-mentioned technical scope.

100 : 동작 인식 장치 111 : 제1 카메라
112 : 제2 카메라100: motion recognition device 111: first camera
112: Second camera

Claims

The motion recognition device acquiring an image having depth information with a camera;
Extracting an object included in the image and extracting skeleton information of the object;
The motion recognition device determining coordinates of a plurality of points in the skeleton information; And
Wherein the motion recognition device determines an angle formed by the plurality of points and an angle formed by the plurality of points and the specific plane as feature points.

The method according to claim 1,
Wherein the object is a person and the skeleton information includes a plurality of points corresponding to a joint and a main bone of a person.

The method according to claim 1,
Wherein the feature points include at least one of an angle between two straight lines formed by three points among the plurality of points and an angle between a straight line formed by two points out of the three points and a specific plane, Extraction method.

The method according to claim 1,
For the shoulder point, elbow point and wrist point of one of the plurality of points
Wherein the characteristic point is defined by an angle between a first straight line formed by the shoulder point and the elbow point, a second straight line formed by the elbow point and the wrist point, an angle formed by the first straight line and the first plane, And extracting a feature point for motion recognition in an image that is at least one of an angle formed by a straight line and an angle formed by the second plane.

The method according to claim 1,
The hip, knee, and ankle points of one of the plurality of points
Wherein the characteristic points include an angle between a first straight line formed by the hip point and the knee point, a second straight line formed by the knee point and the ankle point, an angle formed by the first straight line and the first plane, And extracting a feature point for motion recognition in an image that is at least one of an angle formed by a straight line and an angle formed by the second plane.

Performing neural network learning using the motion recognition apparatus as a first feature point based on a first plurality of joint points of a human joint point in a training image including a plurality of training operations of a person;
The operation recognition device acquiring an input image including a human input operation with a camera;
Extracting a second feature point based on a second plurality of joint points at the same position as the first plurality of joint points of the human joint points in the input image; And
And inputting the second feature point to a neural network formed through the neural network learning to determine a training operation corresponding to the input operation.

The method according to claim 6,
The motion recognition apparatus extracts the person from each of the training image and the input image, performs skeletonization of the extracted human image, and generates the skeletonized object based on human skeleton information, And determining a plurality of joint points corresponding to the region.

The method according to claim 6,
The step of performing
The motion recognition device determining coordinates for the first plurality of points;
Determining, by the motion recognition apparatus, the angle formed by the first plurality of points and the angle formed by the plurality of points and the specific plane as the first feature point; And
Performing a neural network learning based on the first characteristic point to prepare a neural network,
Wherein the performing of the plurality of training operations comprises using skeleton information that is repeatedly performed for each of the plurality of training operations.

The method according to claim 6,
The extracting step
The motion recognition device determining coordinates for the second plurality of points; And
And determining the angle formed by the second plurality of points and the angle formed by the plurality of points and the specific plane as the second feature point by the motion recognition apparatus.

The method according to claim 6,
I) the shoulder point of the left arm, the elbow point of the left arm and the wrist point of the left arm, ii) the shoulder point of the right arm, the elbow point of the right arm and the wrist point of the right arm, iii) A hip point of the leg, a knee point of the left leg and an ankle point of the left leg, and iv) a hip point of the right leg, a knee point of the right leg, and an ankle point of the right leg.

The method according to claim 6,
Wherein the characteristic points include an angle between a first straight line formed by the shoulder point and the elbow point, a second straight line formed by the elbow point and the wrist point, an angle formed by the first straight line and the first plane, Wherein the skeleton information is at least one of angles formed by two planes.

The method according to claim 6,
Wherein the characteristic points include an angle between a first straight line formed by the hip point and the knee point, a second straight line formed by the knee point and the ankle point, an angle formed by the first straight line and the first plane, Wherein the skeleton information is at least one of angles formed by two planes.