KR102371127B1

KR102371127B1 - Gesture Recognition Method and Processing System using Skeleton Length Information

Info

Publication number: KR102371127B1
Application number: KR1020180165196A
Authority: KR
Inventors: 김성흠; 황영배; 박한무; 류한나
Original assignee: 한국전자기술연구원
Priority date: 2018-12-19
Filing date: 2018-12-19
Publication date: 2022-03-07
Also published as: KR20200076267A

Abstract

관절 간 길이와 ?향 정보를 이용하여 제스쳐를 인식하고, 이를 클라우드 시스템을 통해 수집/보관/처리하여 다양한 응용 분야에 활용하는 방법 및 시스템이 제공된다. 본 발명의 실시예에 따른 제스쳐 인식 방법은 입력 영상으로부터 사람의 관절 위치를 추출하는 단계; 추출된 관절 위치를 보완하는 단계; 보완된 관절 위치들 간의 길이를 계산하는 제1 계산단계; 보완된 관절 위치들 간의 방향을 계산하는 제2 계산단계; 골격의 길이 정보와 방향 정보를 이용하여 제스처를 인식하는 단계;를 포함한다.
이에 의해, 사람(사용자)의 골격의 상대적인 관계성을 학습하고 이를 입력 데이터로 사용하기 때문에, 촬영 환경에 제약 없이 동작 인식 모델을 구축할 수 있게 된다.A method and system for recognizing a gesture using inter-joint length and movement information, and collecting/storing/processing it through a cloud system, and utilizing it in various application fields are provided. A gesture recognition method according to an embodiment of the present invention includes extracting a joint position of a person from an input image; Compensating the extracted joint position; A first calculation step of calculating the length between the complementary joint positions; a second calculation step of calculating a direction between the supplemented joint positions; and recognizing a gesture using the length information and direction information of the skeleton.
Accordingly, since the relative relationship of the skeleton of a person (user) is learned and used as input data, it is possible to build a motion recognition model without restrictions on the shooting environment.

Description

Gesture Recognition Method and Processing System using Skeleton Length Information}

본 발명은 제스쳐 인식 방법에 관한 것으로, 더욱 상세하게는 골격의 길이 정보를 이용한 제스쳐 인식 방법 및 처리 시스템에 관한 것이다The present invention relates to a gesture recognition method, and more particularly, to a gesture recognition method and processing system using length information of a skeleton.

종래 기술에서는 신체에 부착된 모션 캡쳐 장비 및 움직임 좌표 추정 소프트웨어를 통해 대상의 동작을 인식할 수 있었으나, 지난 10년간 3D 센서의 보편화(e.g. 키넥트)로 마커리스 방법론의 정확도가 크게 향상되었다.In the prior art, it was possible to recognize the motion of an object through a motion capture device and motion coordinate estimation software attached to the body, but the accuracy of the markerless methodology has greatly improved with the generalization of 3D sensors (e.g. Kinect) over the past 10 years.

또 최근 딥러닝 기술에서는 2D 영상만으로 사람의 골격 구조를 습득하고, 골격 움직임의 패턴을 이용하는 연구 결과들이 계속해서 발표되고 있다. 이에 따라 단일 영상에서 인체의 골격 모델을 분석하여 동작을 인식하는 응용 분야 또한 확장되고 있다.In addition, in recent deep learning technology, research results that acquire human skeletal structures using only 2D images and use patterns of skeletal movements have been continuously published. Accordingly, the field of application for recognizing motion by analyzing a skeletal model of the human body in a single image is also expanding.

이와 같은 기술 수요에도 골격 모델로부터 동작으로 인식하는 것은 열려 있는 문제로, 여전히 저조한 인식률을 극복할 수 있는 방안이 필요한 상황이다. 예를 들어, 기존 분석에서 사용되었던 주요 골격 지점의 위치, 골격 간의 각도, 이동 속도 등은 일반적인 동작에 대해서는 불변하는 특징량이라 할 수 없어 이를 기반하는 인식 기술은 동작을 정확하게 분류할 수 없는 문제가 있다. Even with such a demand for technology, recognition as a motion from a skeletal model is an open problem, and a method is still needed to overcome the low recognition rate. For example, the position of major skeleton points, angles between bones, and movement speed used in the existing analysis cannot be said to be invariant feature quantities for general motions, so recognition technology based on these cannot accurately classify motions. there is.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 관절 간 길이와 ?향 정보를 이용하여 제스쳐를 인식하고, 이를 클라우드 시스템을 통해 수집/보관/처리하여 다양한 응용 분야에 활용하는 방법 및 시스템을 제공함에 있다.The present invention has been devised to solve the above problems, and an object of the present invention is to recognize a gesture using inter-joint length and direction information, and collect/store/process it through a cloud system to various application fields It is intended to provide a method and system for use in

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 제스쳐 인식 방법은 입력 영상으로부터 사람의 관절 위치를 추출하는 단계; 추출된 관절 위치를 보완하는 단계; 보완된 관절 위치들 간의 길이를 계산하는 제1 계산단계; 보완된 관절 위치들 간의 방향을 계산하는 제2 계산단계; 골격의 길이 정보와 방향 정보를 이용하여 제스처를 인식하는 단계;를 포함한다. According to an embodiment of the present invention for achieving the above object, a gesture recognition method includes extracting a joint position of a person from an input image; Complementing the extracted joint position; A first calculation step of calculating the length between the complementary joint positions; a second calculation step of calculating a direction between the supplemented joint positions; and recognizing a gesture using the length information and direction information of the skeleton.

보완 단계는, 관절이 가려져 소실된 경우나 잘못된 관절 위치가 추출된 경우에 수행되는 것일 수 있다.The supplementary step may be performed when the joint is lost due to occlusion or when the wrong joint position is extracted.

보완 단계는, 관절 간의 상대 위치를 이용한 동작 인식 모델을 이용하여, 관절 위치를 보완하는 것일 수 있다.The complementation step may be to supplement the joint position by using a motion recognition model using the relative position between the joints.

제1 계산 단계는, 관절 위치들을 기준으로 유클리디언 거리 정보를 계산하는 것일 수 있다.The first calculation step may be to calculate Euclidean distance information based on joint positions.

제1 계산 단계는, 유클리디언 거리 정보를 최대 유클리디언 거리 정보로 나누어 최종 거리 정보를 생성하는 것일 수 있다.The first calculation step may be to generate final distance information by dividing the Euclidean distance information by the maximum Euclidean distance information.

제1 계산 단계는, 관절 위치들의 외적을 이용하여 방향 정보를 계산하는 것일 수 있다.The first calculation step may be to calculate direction information using a cross product of joint positions.

제스처 인식 단계는, 연속하는 영상에서 추출한 골격의 길이 정보와 방향 정보를 이용하여 제스처를 인식하는 것일 수 있다.The gesture recognition step may be to recognize a gesture using length information and direction information of a skeleton extracted from successive images.

본 발명의 다른 측면에 따르면, 입력 영상으로부터 사람의 관절 위치를 추출하고, 추출된 관절 위치를 보완하며, 보완된 관절 위치들 간의 길이와 방향을 계산하고, 골격의 길이 정보와 방향 정보를 이용하여 제스처를 인식하는 프로세서; 및 인식한 제스처 정보를 출력하는 출력부;를 포함하는 것을 특징으로 하는 제스쳐 인식 시스템이 제공된다.According to another aspect of the present invention, the joint position of a person is extracted from the input image, the extracted joint position is supplemented, the length and direction between the supplemented joint positions are calculated, and the length and direction information of the skeleton are used to extract the joint position. a processor for recognizing a gesture; and an output unit for outputting recognized gesture information.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 사람(사용자)의 골격의 상대적인 관계성을 학습하고 이를 입력 데이터로 사용하기 때문에, 촬영 환경에 제약 없이 동작 인식 모델을 구축할 수 있게 된다.As described above, according to embodiments of the present invention, since the relative relationship of the human (user) skeleton is learned and used as input data, it is possible to build a motion recognition model without restrictions on the shooting environment.

또한, 본 발명의 실시예들에 따르면, 동영상의 전체프레임이 아닌, 촬영시점에 따른 사람(사용자)의 동작분석이 가능하기 때문에 데이터 저장 및 학습모델 구축에 소요되는 시간을 최소화 할 수 있게 된다.In addition, according to embodiments of the present invention, since it is possible to analyze the motion of a person (user) according to the shooting time, not the entire frame of the moving picture, it is possible to minimize the time required for data storage and building a learning model.

그리고, 본 발명의 실시예들에 따르면, 2장 이상의 연속된 프레임간의 관계성을 이용하여 특정 동작 인식 뿐만 아니라 전체 연속 동작을 하나의 연속 동작으로 추정할 수 있게 된다.And, according to embodiments of the present invention, it is possible to estimate not only a specific motion recognition but also the entire continuous motion as one continuous motion by using the relationship between two or more consecutive frames.

아울러, 본 발명의 실시예들에 따르면, 정확한 제스쳐 인식을 필요로 하는 응용, 예를 들어 수화 인식, 댄스 강습 및 오디션, 다양한 제스쳐 기반 Human-Computer Interface (HCI)의 게임 등의 다양한 분야에서 활용할 수 있다.In addition, according to embodiments of the present invention, it can be utilized in various fields such as applications requiring accurate gesture recognition, for example, sign language recognition, dance lessons and auditions, and various gesture-based Human-Computer Interface (HCI) games. there is.

뿐만 아니라, 본 발명의 실시예들에 따르면, 사람(사용자)의 특정 동작을 바탕으로 소셜 로봇과 IoT 기기 등의 상호 의사 소통으로 홈 네트워크 시스템으로 확장이 가능하다.In addition, according to embodiments of the present invention, it is possible to extend to a home network system through mutual communication between a social robot and an IoT device based on a specific motion of a person (user).

도 1은 본 발명의 일 실시예에 따른 골격의 길이 정보를 이용한 제스쳐 인식 방법의 설명에 제공되는 도면,
도 2 내지 도 4는, 골격의 길이 정보를 이용한 제스쳐 인식 결과들을 예시한 도면들,
도 5는 제스쳐 인식 시스템을 비즈니스 모델에 맞게 클라우드 저장소에 송수신하는 개념을 나타낸 도면, 그리고,
도 6은 본 발명의 다른 실시예에 따른 제스쳐 인식 시스템의 블럭도이다.1 is a view provided for explaining a gesture recognition method using length information of a skeleton according to an embodiment of the present invention;
2 to 4 are diagrams illustrating gesture recognition results using the length information of the skeleton;
5 is a diagram showing the concept of transmitting and receiving a gesture recognition system to a cloud storage in accordance with a business model, and,
6 is a block diagram of a gesture recognition system according to another embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

본 발명의 실시예에서는 골격의 길이 정보를 이용한 제스쳐 인식 방법 및 처리 시스템을 제시한다.An embodiment of the present invention provides a gesture recognition method and processing system using length information of a skeleton.

본 발명의 실시예에서는, 제스쳐 인식에 있어서, 각 모델 파라메터의 골격간의 길이와 방향성 추정을 핵심 요소로 하며, 사람(사용자)의 골격의 상대적인 관계성을 이용한 동작 인식 기술로, 사용자의 신체조건에 관계 없이 동작 인식이 가능도록 한다.In the embodiment of the present invention, in gesture recognition, estimation of the length and direction between the skeletons of each model parameter is a key factor, and it is a motion recognition technology using the relative relationship of the skeleton of a person (user). Regardless, it makes motion recognition possible.

나아가, 본 발명의 실시예에서는, 동작에 대한 골격 모델을 보관/처리/학습하는 시스템과 이를 이용한 서비스 및 비즈니스 모델에 맞게 송수신하는 시스템이 모두가 고려되었다.Furthermore, in the embodiment of the present invention, both a system for storing/processing/learning a skeletal model for motion and a system for transmitting/receiving according to a service and business model using the system were considered.

도 1은 본 발명의 일 실시예에 따른 골격의 길이 정보를 이용한 제스쳐 인식 방법의 설명에 제공되는 도면이다.1 is a diagram provided to explain a gesture recognition method using length information of a skeleton according to an embodiment of the present invention.

제스쳐 인식 방법이란 입력 영상에 존재하는 사람의 골격을 분석하여 특정 동작을 인식하는 기술로 사람의 관절 위치 추출, 관절간의 거리 및 방향성 추출 등의 요소 기술들로 구성된다. The gesture recognition method is a technology for recognizing a specific motion by analyzing a human skeleton present in an input image.

본 발명의 실시예에서 제시하는 random forest 학습 기반의 사람 움직임 추정 시스템은 도 1에 도시된 흐름으로 카메라 입력 영상을 처리한다. 본 발명의 실시예에서는, 카메라 간의 이격 거리와 사람의 신체 조건에 관계 없이 특정 동작(움직임)을 대표하는 특징 벡터를 추출하고, 이를 입력으로 random forest 기반의 사람(사용자)의 동작 인식 정확도를 평가하는 학습 평가 모델을 구축한다.The human motion estimation system based on random forest learning presented in an embodiment of the present invention processes a camera input image in the flow shown in FIG. 1 . In an embodiment of the present invention, a feature vector representing a specific motion (movement) is extracted regardless of the distance between the cameras and the body condition of the person, and the random forest-based human (user) motion recognition accuracy is evaluated as an input. to build a learning evaluation model that

이를 위해, 먼저 사람(사용자)의 동작 인식모델을 학습시킬 수 있는 데이터베이스를 구축한다. 본 발명의 실시예에 적용할 데이터 셋은 인터넷 상에 공개되어 있는 DB(e.g. Human3.6M)를 활용할 수 있으며 추후 추가되는 동작에 대해 자체 학습 영상을 촬영하여 학습 DB를 구축할 수 있다.To this end, first, a database that can train a human (user) motion recognition model is built. The data set to be applied to the embodiment of the present invention can utilize a DB (e.g. Human3.6M) that is publicly available on the Internet, and a learning DB can be built by shooting a self-learning image for a motion to be added later.

학습 데이터는 1대 이상의 카메라로부터 사람의 전체 골격이 포함된 일상 동작, 운동, 가사 활동 등의 여러 세부 동작의 입력 영상을 확보하였다. 이렇게 새로 추가된 영상은 기존 분류된 동작(집합)으로 분류되거나, 새로운 동작(집합) 군으로 분류한다.As the learning data, input images of various detailed movements such as daily movements, exercise, and housework activities including the entire human skeleton were obtained from one or more cameras. In this way, the newly added image is classified into an existing classified motion (set) or a new motion (set) group.

입력 영상으로부터 사람의 관절 위치를 추출하기 위해서는 딥러닝 기반의 방법론(e.g. Openpose)을 사용하였다. 이로부터 사람(사용자)의 관절(머리, 어깨 중앙, 어깨 왼쪽, 어깨 오른쪽 등)에 대한 좌표(x,y) 정보를 추출한다. 관절점이 가려져 소실되거나 잘못된 관절 좌표를 얻게 되는 경우라도 관절 간의 상대 위치를 이용한 동작 인식 모델로 이를 보완한다.To extract the human joint position from the input image, a deep learning-based methodology (e.g. Openpose) was used. From this, coordinates (x,y) information about the joint (head, shoulder center, left shoulder, right shoulder, etc.) of a person (user) is extracted. Even when joint points are lost because they are hidden or wrong joint coordinates are obtained, a motion recognition model using the relative positions between joints compensates for this.

추출된 14개의 관절 정보{ P(x,y) = [P1,P2,....,P14] }로부터 각 관절 점을 기준으로 유클리디언 거리 값과 외적을 이용한 방향 값을 구한다.From the extracted 14 joint information { P(x,y) = [P1,P2,....,P14] }, the Euclidean distance value and the direction value using the cross product are obtained based on each joint point.

예를 들어, 추출하는 관절점이 14개인 경우, 전체 14x14(=196)개의 값을 갖는 정방형 메트릭스로 표현된다. 그 중 대각행렬(Dij (i=j))을 기준으로 위쪽 행렬(upper-side triangle) 값을 추출하여 1x91(=91)로 matrix로 표현한다. For example, when there are 14 joint points to be extracted, it is expressed as a square matrix having a total of 14x14 (=196) values. Among them, the upper-side triangle value is extracted based on the diagonal matrix (Dij (i=j)) and expressed as a matrix in 1x91 (=91).

재표현된 91개의 결과 값 중 최대값을 구하고, 전체 데이터 셋을 최대값으로 나눈다. 이로써, 관절간의 거리 가중치를 정규화된 1x91의 matrix로 표현한다. 여기서, 소실된 관절 점을 포함할 경우, 두 점 사이의 거리 값을 0으로 정의함으로써, 전체 가중치에 대한 영향을 최소화하였다.Find the maximum value among 91 re-expressed result values, and divide the entire data set by the maximum value. Thus, the distance weight between joints is expressed as a normalized 1x91 matrix. Here, when the lost joint point is included, the effect on the overall weight is minimized by defining the distance value between the two points as 0.

또한, 관절 간의 상대적인 방향성을 판단하기 위해, 두 관절 점의 z 성분을 0이라 가정한 뒤, 외적(cross product)을 구한다.In addition, in order to determine the relative directionality between joints, a cross product is obtained after assuming that the z component of the two joint points is 0.

위의 수식으로부터 14x14(=196)개의 벡터 성분을 구할 수 있으며, 그 중 Z 성분만을 사용한다. 그리고 거리 matrix와 동일하게 위쪽 행렬(upper-side triangle) 성분을 추출하고, 부호(방향) 값을 표현한 매트릭스를 추출한다. 부호(방향) matrix와 유클리디언 거리 matrix를 요소별 곱을 계산함으로써 1x91(=91)의 특징 벡터를 만든다. 이를 동작 인식 모델을 입력으로 기존의 구축된 학습 평가 모델을 통해 특정 동작에 대한 label(사용자의 움직임)을 예측 한다. 도 2 내지 도 4에는 골격의 길이 정보를 이용한 제스쳐 인식 결과들을 예시하였다.From the above equation, 14x14 (=196) vector components can be obtained, of which only the Z component is used. And the same as the distance matrix, the upper-side triangle component is extracted, and the matrix expressing the sign (direction) value is extracted. A 1x91 (=91) feature vector is created by calculating the element-by-element product of the sign (direction) matrix and the Euclidean distance matrix. Using the motion recognition model as an input, the label (user's movement) for a specific motion is predicted through the existing learning evaluation model. 2 to 4 exemplify gesture recognition results using the length information of the skeleton.

연속 동작을 판단하는 경우, 사람마다 특정 동작을 1회 수행에 걸리는 시간과 동작의 반경 범위가 다르다. 동일한 사람이 특정 동작을 반복하는 경우에도 각 동작의 수행 속도 및 움직임의 형태가 다르다. 또한, 동작의 시작점과 끝점을 구분할 객관적 지표가 없다.When the continuous motion is determined, the time it takes to perform a specific motion once and the radius range of the motion are different for each person. Even when the same person repeats a specific motion, the speed and shape of each motion are different. Also, there is no objective indicator to distinguish the starting point and the ending point of an action.

따라서 입력 영상(단일 프레임)만을 가지고 특정 연속 동작을 판단할 수 없다. 따라서 연속 프레임에 대한 동작 인식을 위해서는, 시간 변화에 따른 관절의 움직임 변화를 예측하기 위해 연속된 N개의 입력 영상간의 관계를 추정한다.Therefore, it is impossible to determine a specific continuous motion based on only the input image (single frame). Therefore, for motion recognition for continuous frames, relationships between consecutive N input images are estimated in order to predict a change in joint motion according to time change.

또한, 특정 동작을 결정짓는 주된 관절점은 주로 골격 끝에 존재한다. 따라서 5개의 관절(머리, 왼손 끝, 오른손 끝, 왼손 끝, 왼쪽 발목, 오른쪽 발목)에 대한 모션 벡터를 구하고, 이를 이용하여 연속 동작을 예측한다.In addition, the main joint points that determine a specific motion mainly exist at the end of the skeleton. Therefore, motion vectors for 5 joints (head, left end, right end, left end, left ankle, right ankle) are obtained, and continuous motion is predicted using this.

먼저, 관절 변화를 추정하기 위해 골격의 중심 좌표를 동일하게 맞추어야 한다. 딥러닝으로부터 추출되는 관절 정보는 전체 영상의 내의 관절의 위치 값을 추출한다.First, in order to estimate the joint change, the center coordinates of the skeleton must be identically aligned. The joint information extracted from deep learning extracts the position value of the joint within the entire image.

따라서 사람이 특정 포즈를 취하며 이동(등속 운동)을 할 경우, 전체 관절의 위치 정보가 이동하게 되기 때문에 사람의 무게 중심을 기준으로 전체 관절 정보를 재배치한다. 이를 통해 연속 프레임간의 관계성을 파악해야만 특정 움직임에 대한 관절의 움직임 변화를 정확하게 파악할 수 있다.Therefore, when a person takes a specific pose and moves (constant velocity motion), since the position information of all joints moves, all joint information is rearranged based on the person's center of gravity. Through this, only by understanding the relationship between consecutive frames, it is possible to accurately grasp the change in movement of a joint with respect to a specific movement.

그러므로 영상이 입력될 때마다 occlusion 영역에 포함된 관절 정보를 제외하고, 나머지 관절들에 대해서만 무게 중심을 구한다. 그리고 현재 프레임의 무게 중심과 이전 프레임의 무게 중심간의 차이를 구하여 현재 프레임의 전체 관절 좌표를 차이 만큼 이동시킨다.Therefore, whenever an image is input, the center of gravity is obtained only for the remaining joints, except for the joint information included in the occlusion area. Then, the difference between the center of gravity of the current frame and the center of gravity of the previous frame is obtained, and the total joint coordinates of the current frame are moved by the difference.

이동된 현재 관절 정보와 이전 프레임의 관절 정보 에서 각 5개의 특징 관절을 추출하고, 각 관절 요소별로 외적을 계산한다. 계산된 결과에서 Z 성분의 크기를 기반으로 움직임의 정도를 판단할 수 있다.Each of the five characteristic joints is extracted from the moved current joint information and the joint information of the previous frame, and the cross product is calculated for each joint element. From the calculated result, it is possible to determine the degree of motion based on the magnitude of the Z component.

특정 동작 마다 특정 5개의 움직임의 정도의 비중이 다르기 때문에, 전체 움직임에 대한 특정 관절이 움직이는 정도를 파악할 수 있도록, 움직임 벡터를 normalization 하였다. 예를 들어, 손을 좌우로 흔드는 동작의 경우, 동작의 이동방향 정보는 중요하지 않으므로, 특정 동작을 판단하는 경우, 움직임의 크기 정보만을 사용하여 동작을 구분 짓는다.Since the weight of each of the five specific movements is different for each specific motion, the motion vectors are normalized so that the degree of movement of a specific joint for the entire motion can be grasped. For example, in the case of a motion of waving a hand from side to side, since movement direction information of the motion is not important, when determining a specific motion, only motion size information is used to classify the motion.

최종 특징 벡터는 입력 영상으로부터 관절 간의 상대적인 거리와 방향 정보 그리고 이전 프레임으로부터 특정 관절의 변화량을 정보를 포함하고 있다. 최종 특징벡터를 바탕으로 Random forest 기반의 학습 모델 평가가 이루어진다. The final feature vector includes information on the relative distance and direction between joints from the input image and the amount of change of a specific joint from the previous frame. Based on the final feature vector, a random forest-based learning model evaluation is performed.

이렇게 예측된 결과는 소셜로봇과 같은 네트워크 통신을 포함한 모듈에 전송될 수 있고, 특정 동작 인식을 필요로 하는 다양한 어플리케이션에 활용될 것으로 기대된다. 도 5는 제스쳐 인식 시스템을 비즈니스 모델에 맞게 클라우드 저장소에 송수신하는 개념을 나타내었다.The predicted result can be transmitted to a module including network communication, such as a social robot, and is expected to be utilized in various applications requiring specific motion recognition. 5 shows the concept of transmitting and receiving the gesture recognition system to the cloud storage in accordance with the business model.

도 6은 본 발명의 다른 실시예에 따른 제스쳐 인식 시스템의 블럭도이다. 본 발명의 실시예에 따른 제스쳐 인식 시스템은, 도 6에 도시된 바와 같이, 통신부(110), 출력부(120), 프로세서(130) 및 저장부(140)를 포함하여 구성된다.6 is a block diagram of a gesture recognition system according to another embodiment of the present invention. As shown in FIG. 6 , the gesture recognition system according to an embodiment of the present invention is configured to include a communication unit 110 , an output unit 120 , a processor 130 , and a storage unit 140 .

통신부(110)는 외부 기기/시스템 및 통신망에 연결하기 위한 통신 수단이다. 제스처 인식 대상인 입력 영상은 통신부(110)를 통해 제공받거나 저장부(140)에 저장된 것일 수 있다.The communication unit 110 is a communication means for connecting to an external device/system and a communication network. The input image, which is a gesture recognition target, may be provided through the communication unit 110 or stored in the storage unit 140 .

프로세서(130)는 전술한 제스쳐 인식 방법에 따라 입력 영상에 등장하는 사람(사용자)의 제스처를 인식한다.The processor 130 recognizes a gesture of a person (user) appearing in the input image according to the gesture recognition method described above.

출력부(120)는 프로세서(130)의 제스처 인식 결과를 출력하고, 저장부(140)는 프로세서(130)가 기능하고 동작함에 있어 필요한 저장 공간을 제공한다.The output unit 120 outputs the gesture recognition result of the processor 130 , and the storage unit 140 provides a storage space necessary for the processor 130 to function and operate.

지금까지, 골격의 길이 정보를 이용한 제스쳐 인식 방법 및 처리 시스템에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, a preferred embodiment of the gesture recognition method and processing system using the length information of the skeleton has been described in detail.

본 발명의 실시예에서는, 최소 1대의 카메라 구성으로부터 골격 및 각 주요점 간의 길이를 계산하고, 골격의 길이 정보와 방향 정보를 이용하여 제스쳐를 인식하며, 제스쳐 인식 방법을 2장 이상의 입력 영상에 대해 확장하였다.In the embodiment of the present invention, the length between the skeleton and each main point is calculated from the configuration of at least one camera, the gesture is recognized using the length information and the direction information of the skeleton, and the gesture recognition method is applied to two or more input images. expanded.

나아가, 본 발명의 실시예는, 입력 영상을 수집/보관/처리하는 클라우드 시스템과 연동 가능하며, 수화 인식, 댄스 강습 및 오디션, 다양한 제스쳐 기반 HCI(Human-Computer Interface)의 게임 등의 다양한 분야에서 활용 가능하다.Furthermore, embodiments of the present invention can be linked with a cloud system that collects/stores/processes input images, and is used in various fields such as sign language recognition, dance lessons and auditions, and various gesture-based HCI (Human-Computer Interface) games. can be used

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.On the other hand, it goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium may be any data storage device readable by the computer and capable of storing data. For example, the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims In addition, various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

110 : 통신부
120 : 출력부
130 : 프로세서
140 : 저장부110: communication department
120: output unit
130: processor
140: storage

Claims

extracting the human joint position from the input image;
Complementing the extracted joint position;
A first calculation step of calculating the length between the complemented joint positions;
a second calculation step of calculating a direction between the supplemented joint positions;
Recognizing a gesture using the length information and direction information of the skeleton;
The complementary step is
By using a motion recognition model using the relative position between joints, the joint position is supplemented,
The first calculation step is
Calculate the Euclidean distance information based on the joint positions, divide the Euclidean distance information by the maximum Euclidean distance information and normalize it to generate the final length information, but it is hidden when obtaining the maximum Euclidean distance information for normalization. The distance information to the lost joint is treated as 0,
The second calculation step is
A gesture recognition method, characterized in that after assuming that the z component of the joint positions is 0, the direction information is calculated only with the z component by obtaining the cross product.

The method according to claim 1,
The complementary step is
A gesture recognition method, characterized in that it is performed when the joint is hidden or lost or when the wrong joint position is extracted.

delete

The method according to claim 1,
The gesture recognition step is
A gesture recognition method, characterized in that a gesture is recognized using length information and direction information of a skeleton extracted from continuous images.

a processor for extracting a joint position of a person from an input image, supplementing the extracted joint position, calculating a length and direction between the supplemented joint positions, and recognizing a gesture using the length information and direction information of the skeleton; and
Including; an output unit for outputting the recognized gesture information;
The processor is
By using a motion recognition model using the relative position between joints, the joint position is supplemented,
Calculate the Euclidean distance information based on the joint positions, divide the Euclidean distance information by the maximum Euclidean distance information and normalize it to generate the final length information, but it is hidden when obtaining the maximum Euclidean distance information for normalization. The distance information to the lost joint is treated as 0,
A gesture recognition system, characterized in that after assuming that the z component of the joint positions is 0, the direction information is calculated only with the z component by obtaining the cross product.