KR101914717B1

KR101914717B1 - Human Action Recognition Using Rreproducing Kernel Hilbert Space for Product manifold of Symmetric Positive definite Matrices

Info

Publication number: KR101914717B1
Application number: KR1020170126288A
Authority: KR
Inventors: 조완현; 나인섭; 김상균; 박순영
Original assignee: 전남대학교산학협력단
Priority date: 2017-09-28
Filing date: 2017-09-28
Publication date: 2018-11-02

Abstract

The present invention relates to a human behavior recognition method. More specifically, the present invention relates to a human behavior recognition method capable of recognizing human behavior by using a feature and a K-nearest neighbor algorithm in a reproducing kernel Hilbert space for a product manifold of a symmetric positive definite matrix. The method comprises the following steps of: receiving a test image and extracting a slope direction histogram feature vector of the test image (hereinafter, a first HOG feature vector) and an optical flow histogram feature vector (hereinafter, a first HOF feature vector); calculating a covariance descriptor matrix of the first HOG feature vector (hereinafter, a first test sample) and a covariance descriptor matrix of the first HOF feature vector (hereinafter, a second test sample); dimension-transforming the first test sample, the second test sample, a first training sample, and a second training sample into the reproducing kernel Hilbert space (RKHS), and calculating a distance between the first test sample and the second test sample and between the first training sample and the second training sample; and recognizing the human behavior of a training image in which the distance calculated by using the K-nearest neighbor algorithm is the closest as the human behavior of the test image.

Description

[0001] The present invention relates to a human behavior recognition method using a reproduction kernel Hilbert space for a product manifold of a symmetric positive matrix,

본 발명은 인간 행동 인식 방법에 관한 것으로, 보다 구체적으로는 대칭 양정치 행렬의 곱 다양체에 대한 재생 커널 힐버트 공간상으로 변환을 실시하고 이 공간상의 점들에 대하여 최근접 이웃 알고리즘을 이용하여 인간 행동을 인식할 수 있는 인간 행동 인식 방법에 관한 것이다.The present invention relates to a method for recognizing human behavior, more specifically, a transformation to a reproduction kernel Hilbert space for a multiplicative manifold of a symmetric positive matrix, and a human proximity algorithm And to a perceptible human behavior recognition method.

3D 제스처 인식 기술분야는 게임 인터페이스, 모션캡쳐 시스템, 제스처 인식, 감시 카메라, 인간과 로봇 인터페이스 등 다양한 분야에서 활용 및 연구되고 있다.3D gesture recognition technology is used and studied in various fields such as game interface, motion capture system, gesture recognition, surveillance camera, human and robot interface.

3D 스테레오 비전시스템인 Microsoft 사의 XBox Kinect 센서는 3차원 실시간 위치추적을 이용한 위치와 깊이 정보를 추출하여 DTW(Dynamic time warping)기반의 동작분류기, 햅틱 렌더링, 3D 휴먼 아바타 생성, 무선 로봇의 자율주행 등의 연구에 이용되고 있다.The 3D stereo vision system, Microsoft's XBox Kinect sensor, extracts position and depth information using 3D real-time position tracking and generates dynamic time warping (DTW) -based motion classifier, haptic rendering, 3D human avatar generation, .

구조화된 빛 3D 스캐너 기술(Structured light 3D scanner technology)는 3D 컴퓨터 보조 디자인 시스템들을 포함하여 물체의 3차원 스캐닝에 대한 가장 이상적인 솔루션으로 자율적인 로봇을 위한 실시간 인간자세와 제스처 인식에 이용되고 있다.Structured light 3D scanner technology is the ideal solution for three-dimensional scanning of objects, including 3D computer-aided design systems, for real-time human posture and gesture recognition for autonomous robots.

3D Time of flight(TOF) 센서는 깊이 정보시스템 중에서 상대적으로 새로운 기술로 전파의 방출기로부터 물체까지의 빛의 파장을 전달하는 빛 탐색과 범위 시스템(Light detection and ranging system)의 일종으로 수신자는 빛이 카메라로부터 방출되어 물체에 도달하고 다시 이것이 카메라까지 되돌아오는 시간을 측정하여 물체의 깊이를 계산한다. 이렇게 획득된 물체의 깊이정보는 제스처의 인식에 이용된다.A 3D time of flight (TOF) sensor is a relatively new technology in depth information systems that is a type of light detection and ranging system that transmits the wavelength of light from an emitter to an object. The depth of the object is calculated by measuring the time that the camera emits from the camera and reaches the object and then returns to the camera. The obtained depth information of the object is used for recognition of the gesture.

전자통신연구원의 인간로봇상호작용연구팀은 고선명 공간투영 방식의 3D 디스플레이와 고정밀 핑거 공간터치 기술을 결합하여 공간에 투영된 3D 가상 물체를 터치하여 상호작용하는 시스템을 개발하였으며 TOF 카메라와 신경망을 이용한 제스처 인식기술을 개발하였다.The Human Robot Interaction Team of the Institute of Electronics, Information and Communication Engineers developed a system that interacts with a 3D virtual object projected in space by combining a high-definition spatial projection 3D display and a high precision finger space touch technology. The TOF camera and the gesture Recognition technology.

연세대학교 변혜란교수 연구팀은 키넥트 센서로부터 획득된 관절정보를 이용하여 팔 제스처를 인식하는 방법을 개발하고 이를 슈팅게임에 적용한 바 있으며, 이들의 인식방법은 2-계층 모형을 사용하였는데 계층1에서는 HMM모형을 사용하고, 계층2에서는 CRF 모형을 이용하였다.The research team of Yonsei University developed a method to recognize the arm gesture using the joint information obtained from the Kinect sensor and applied it to the shooting game. The recognition method of the 2-layer model was used, And the CRF model was used in layer 2.

고려대학교 고한석교수연구팀은 깊이에 기반을 둔 손 추적기술을 통하여 제스처 인식시스템을 제안하였다. 주요 제안 이론은 손 추적 알고리즘과 제스처 인식 알고리즘으로 구성되는데 가중된 깊이확률을 이용하여 손 끝지점을 탐지하고 이것을 평균이동알고리즘(Meam Shift Algorithm)의 입력 값으로 하여 손 영역을 추적한다. 그리고 HMM 모형을 이용하여 제스처를 인식한다.Koh Han - seok 's team of Korea University proposed a gesture recognition system using depth - based hand tracking technology. The main proposal theory consists of a hand tracking algorithm and a gesture recognition algorithm. We use the weighted depth probabilities to detect the fingertip and track it with the input of the mean shift algorithm (Meam Shift Algorithm). And we recognize the gesture using HMM model.

경희대학교 채옥삼교수연구팀은 깊이와 칼라정보를 이용한 손 영역의 탐지와 추적알고리즘을 제안하였다. 제안된 알고리즘은 깊이와 칼라정보를 이용하여 손 영역을 탐지하고 일반화 허프변환을 이용하여 손 영역을 추적한다.A team of Professor Seok Hee Sam of Kyung Hee University proposed a hand area detection and tracking algorithm using depth and color information. The proposed algorithm detects hand region using depth and color information and tracks hand region using generalized Hough transform.

미국 MIT 대학의 Media Lab에서 2009년 2월 발표한 'Sixth Sense'는 착용형 마커로 표시된 손가락의 움직임을 인식하고 제스처를 통해 사용자 명령을 인식하는 제스처 기반의 인터페이스 기술을 개발한 바 있다."Sixth Sense," published by MIT University's Media Lab in February 2009, has developed a gesture-based interface technology that recognizes the movement of a finger marked by a wearable marker and recognizes user commands through gestures.

이렇게 다양한 방식으로 사람 행동을 인식하고자 하는 연구가 이루어지고 있으며 더욱 정확한 인간 행동의 인식 방법을 개발하고자 하는 요구가 있다.There are researches to recognize human behavior in various ways and there is a demand to develop more accurate recognition method of human behavior.

[1]. J.K. Aggarwal and M/S. Ryoo, "Human Activity Analysis: A Review." ACM Comput. Survey, Vol. 43, No. 3,Apr. 2001.[One]. J.K. Aggarwal and M / S. Ryoo, "Human Activity Analysis: A Review." ACM Comput. Survey, Vol. 43, No. 3, Apr. 2001. [2]. 신경망(Neural Network): S. Kallio, J. Kela, and J. Mantyjarvi "Online gesture recognition system for mobile interaction." International Conference on Systems, Man and Cybernetics, PP. 2070-2076, 2003.[2]. Neural Network: S. Kallio, J. Kela, and J. Mantyjarvi "Online gesture recognition system for mobile interaction." International Conference on Systems, Man and Cybernetics, PP. 2070-2076, 2003. [3]. 동적 시간 정합(Dynamic Time Warping): C. Waithayanon and C. Aporntewan, "a Motion classifier for Microsoft Kinect," Iternational conference on Computer sciences and Convergence Information Technology, pp. 727-731, 2011[3]. Dynamic Time Warping: C. Waithayanon and C. Aporntewan, "a Motion Classifier for Microsoft Kinect," International Conference on Computer Science and Convergence Information Technology, pp. 727-731, 2011 [4]. 은닉 마르코프 모델(Hidden Markov Model): S. Mitra and T. Acharya, "Gesture Recognition: A Survey," IEEE Transactions on Systems, Man, and Cybernetics part c : Application and Review, Vol. 37, No. 3, pp. 311-324, 2007.[4]. Hidden Markov Model: S. Mitra and T. Acharya, "Gesture Recognition: A Survey," IEEE Transactions on Systems, Man, and Cybernetics part c: Application and Review, Vol. 37, No. 3, pp. 311-324, 2007. [5]. 영상인식 기반 지능형 기술의 비중은 4%에 불과하나 연간 30%의 고속 성장이 예측되고, 향후 90%이상의 업체에서 지능형 솔루션을 요구함(J.P. Freeman, 2008)[5]. Intelligent technology based on image recognition is only 4%, but high-speed growth of 30% per year is predicted and more than 90% of companies require intelligent solutions (J.P. Freeman, 2008)

본 발명은 상술한 요구를 충족하기 위해 안출된 것으로 본 발명의 목적은 대칭 양정치 행렬의 곱 다양체에 대한 재생 커널 힐버트 공간상에서의 특징과 최근접 이웃 알고리즘을 이용하여 보다 정확한 인간 행동 인식을 수행할 수 있는 인간 행동 인식 방법을 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made in order to satisfy the above-mentioned needs, and it is an object of the present invention to perform more accurate human behavior recognition using a feature in a reproduction kernel Hilbert space and a nearest neighbors algorithm for a product manifold of a symmetric positive- And to provide a method for recognizing human behavior.

상기의 목적을 달성하기 위하여 본 발명은 인식하고자 하는 인간 행동이 담긴 테스트 영상을 입력받고, 상기 테스트 영상의 기울기 방향 히스토그램 특징 벡터(이하, '제1 HOG 특징 벡터'라 함)와 광학적 흐름 히스토그램 특징 벡터(이하, '제1 HOF 특징 벡터'라 함)를 추출하는 단계; 상기 제1 HOG 특징 벡터의 공분산 행렬(covariance descriptor matrix, 이하, '제1 테스트 샘플'이라 함)과 상기 상기 제2 HOF 특징 벡터의 공분산 행렬(이하, '제2 테스트 샘플'이라 함)을 계산하는 단계; 재생 커널함수(reproducing kernel)를 이용하여 상기 제1 테스트 샘플 및 상기 제2 테스트 샘플과 서로 다른 인간 행동에 대한 훈련 영상들의 HOG 특징 벡터의 공분산 행렬인 제1 훈련 샘플과 HOF 특징 벡터의 공분산 행렬인 제2 훈련 샘플을 재생 커널 힐버트 공간(RKHS:reproducing kernel Hilbert space)으로 차원 변환하고, 상기 제1 테스트 샘플 및 상기 제2 테스트 샘플과 상기 제1 훈련 샘플 및 상기 제2 훈련 샘플 간의 거리를 계산하는 단계; 및 최근접 이웃(K-Nearest Neighbor) 알고리즘을 이용하여 계산된 거리가 가장 가까운 훈련 영상의 인간 행동을 상기 테스트 영상의 인간 행동으로 인식하는 단계;를 포함하는 것을 특징으로 하는 인간 행동 인식 방법을 제공한다.According to an aspect of the present invention, there is provided a histogram analyzing method including receiving a test image containing human behavior to be recognized, calculating a slope direction histogram feature vector (hereinafter referred to as a first HOG feature vector) and an optical flow histogram feature Extracting a vector (hereinafter, referred to as 'first HOF feature vector'); (Hereinafter referred to as a 'second test sample') of the second HOF feature vector and a covariance descriptor matrix (hereinafter referred to as 'first test sample') of the first HOG feature vector ; A first training sample, which is a covariance matrix of HOG feature vectors of training images for different human behaviors different from the first test sample and the second test sample, and a covariance matrix of HOF feature vectors, Transforming the second training sample into a reproducing kernel Hilbert space (RKHS), and calculating a distance between the first test sample and the second test sample, the first training sample and the second training sample step; And recognizing the human behavior of the training image having the distance calculated using the K-Nearest Neighbor algorithm as the human behavior of the test image. do.

바람직한 실시 예에 있어서, 상기 제1 테스트 샘플과 상기 제2 테스트 샘플은 아래의 수학식 1에 의해 계산된다.In a preferred embodiment, the first test sample and the second test sample are calculated by the following equation (1).

[수학식 1][Equation 1]

여기서, C_x1는 상기 제1 테스트 샘플, x1_i는 상기 제1 HOG 특징 벡터, μ_x1는 상기 제1 HOG 특징 벡터의 평균 벡터이고, C_y1는 상기 제2 테스트 샘플, y1_i는 상기 제1 HOF 특징 벡터, μ_y1는 상기 제1 HOF 특징 벡터의 평균 벡터이다.Wherein C _x1 is the first test sample, x1 _i is the first HOG feature vector, and _{x x1} is the mean vector of the first HOG feature vector, C _y1 is the second test sample, y 1 _i is the first test sample, HOF feature vector, and [mu] _y1 is an average vector of the first HOF feature vector.

바람직한 실시 예에 있어서, 상기 재생 커널 힐버트 공간으로 차원 변환은 아래의 수학식 2와 같이 재생 커널함수를 계산함으로써 수행되고, 상기 재생 커널함수는 상기 제1 테스트 샘플과 상기 제1 훈련 샘플의 로그 유클리디언 가우시안 커널(log-Euclidean Gaussian kernel) 함수 값과 상기 제2 테스트 샘플과 상기 제2 훈련 샘플의 로그 유클리디언 가우시안 커널 함수 값의 곱으로 계산된다.In a preferred embodiment, the dimension transformation into the playback kernel Hubert space is performed by calculating a playback kernel function as: < EMI ID = 2.0 > Euclidean Gaussian kernel function value and the log-Euclidean Gaussian kernel function value of the second test sample and the second training sample.

[수학식 2]&Quot; (2) "

여기서, k는 상기 재생 커널함수, k₁ 및 k₂는 상기 로그 유클리디언 가우시안 커널 함수, C_x2는 상기 제1 훈련 샘플, C_y2는 상기 제2 훈련 샘플이다.Where k is the regenerative kernel function, k ₁ and k ₂ are the Log Euclidian Gaussian kernel function, C _x2 is the first training sample, and C _y2 is the second training sample.

바람직한 실시 예에 있어서, 상기 로그 유클리디언 가우시안 커널 함수는 아래의 수학식 3과 같이 계산된다.In a preferred embodiment, the Log Euclidean Gaussian kernel function is calculated as: < EMI ID = 3.0 >

[수학식 3]&Quot; (3) "

여기서, k₁은 상기 제1 테스트 샘플과 상기 제1 훈련 샘플 간의 거리 계산을 위한 로그 유클리디언 가우시안 커널 함수이고, k₂는 상기 제2 테스트 샘플과 상기 제2 훈련 샘플 간의 거리 계산을 위한 로그 유클리디언 가우시안 커널 함수이며, γ는 상수 D_x 및 D_y는 로그 유클리디언 가우시안 거리 함수이다.Wherein k ₁ is a Log Euclidian Gaussian kernel function for calculating the distance between the first test sample and the first training sample and k ₂ is a log for calculating the distance between the second test sample and the second training sample Euclidean is the Gaussian kernel function, and γ is the logarithmic Gaussian distance function of the constants D _x and D _y .

바람직한 실시 예에 있어서, 상기 로그 유클리디언 가우시안 거리 함수는 아래의 수학식 4와 같이 계산된다.In a preferred embodiment, the Log Euclidean Gaussian distance function is calculated as: < EMI ID = 4.0 >

[수학식 4]&Quot; (4) "

여기서, D_x는 상기 제1 테스트 샘플과 상기 제1 훈련 샘플 간의 거리 계산을 위한 로그 유클리디언 가우시안 거리 함수, D_y는 상기 제2 테스트 샘플과 상기 제2 훈련 샘플 간의 거리 계산을 위한 로그 유클리디언 가우시안 거리 함수, ∥·∥_F는 프로베니우스 놈(Frobenius norm)이다.Wherein D _x is a Log Euclidean Gaussian distance function for calculating the distance between the first test sample and the first training sample, and D _y is a log-Euclidean distance function for calculating the distance between the second test sample and the second training sample. Clidian Gaussian distance function, _F is the Frobenius norm.

바람직한 실시 예에 있어서, 상기 재생 커널 힐버트 공간(RKHS:reproducing kernel Hilbert space) 상에서 상기 제1 테스트 샘플 및 상기 제2 테스트 샘플과 상기 제1 훈련 샘플 및 상기 제2 훈련 샘플 간의 거리는 아래의 수학식 5에 의해 계산된다.In a preferred embodiment, the distance between the first test sample and the second test sample, the first training sample and the second training sample on the reproducing kernel Hilbert space (RKHS) Lt; / RTI >

[수학식 5]&Quot; (5) "

여기서, d_Mx×My는 상기 재생 커널 힐버트 공간상에서 거리를 의미한다.Here, d _{Mx x My} denotes a distance in the reproduction kernel Hubert space.

또한, 본 발명은 컴퓨터를 기능시켜 상기 인간 행동 인식 방법을 수행하기 위한 기록 매체에 저장된 컴퓨터 프로그램을 더 제공한다.The present invention further provides a computer program stored in a recording medium for performing a human behavior recognition method by functioning as a computer.

본 발명은 다음과 같은 우수한 효과를 가진다.The present invention has the following excellent effects.

본 발명의 인간 행동 인식 방법에 의하면, 기울기 방향 히스토그램 특징과 광학적 흐름 히스토그램 특징을 대칭 양정치 행렬로 추출하고 추출된 행렬들의 곱 다양체에 대한 재생 커널 힐버트 공간상에서의 거리를 계산한 후, 최근접 이웃 알고리즘을 이용하여 인간 행동 인식이 가능하므로 보다 정확한 인간 행동 인식을 수행할 수 있는 장점이 있다.According to the human behavior recognition method of the present invention, after extracting the slope direction histogram feature and the optical flow histogram feature with the symmetric positive matrix and calculating the distance in the reproduction kernel Hilbert space with respect to the product manifold of the extracted matrices, It is possible to perform more accurate human behavior recognition because human behavior recognition can be performed using an algorithm.

도 1은 본 발명의 일 실시 예에 따른 인간 행동 인식 방법의 흐름도,
도 2는 본 발명의 일 실시 예에 따른 인간 행동 인식 방법의 성능 평가에 이용된 KTH 데이터 셋을 보여주는 도면,
도 3은 본 발명의 일 실시 예에 따른 인간 행동 인식 방법을 통한 행동 인식 분류율을 표시한 도면이다.1 is a flowchart of a human behavior recognition method according to an embodiment of the present invention;
2 is a view showing a KTH data set used for performance evaluation of a human behavior recognition method according to an embodiment of the present invention;
FIG. 3 is a diagram showing a behavior recognition classification rate through a human behavior recognition method according to an embodiment of the present invention.

본 발명에서 사용되는 용어는 가능한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있는데 이 경우에는 단순한 용어의 명칭이 아닌 발명의 상세한 설명 부분에 기재되거나 사용된 의미를 고려하여 그 의미가 파악되어야 할 것이다.Although the terms used in the present invention have been selected as general terms that are widely used at present, there are some terms selected arbitrarily by the applicant in a specific case. In this case, the meaning described or used in the detailed description part of the invention The meaning must be grasped.

이하, 첨부한 도면에 도시된 바람직한 실시 예들을 참조하여 본 발명의 기술적 구성을 상세하게 설명한다.Hereinafter, the technical structure of the present invention will be described in detail with reference to preferred embodiments shown in the accompanying drawings.

그러나 본 발명은 여기서 설명되는 실시 예에 한정되지 않고 다른 형태로 구체화될 수도 있다. 명세서 전체에 걸쳐 동일한 참조번호는 동일한 구성요소를 나타낸다.However, the present invention is not limited to the embodiments described herein but may be embodied in other forms. Like reference numerals designate like elements throughout the specification.

도 1은 본 발명의 일 실시 예에 따른 인간 행동 인식 방법의 흐름도이다.1 is a flowchart of a human behavior recognition method according to an embodiment of the present invention.

도 1을 참조하면 본 발명의 일 실시 예에 따른 인간 행동 인식 방법은 영상 내의 인간 행동이 어떠한 행동인지 분류하여 인식하는 방법이다.Referring to FIG. 1, a method for recognizing a human behavior according to an embodiment of the present invention is a method for classifying and recognizing a human behavior in an image.

또한, 본 발명의 일 실시 예에 따른 인간 행동 인식 방법은 실질적으로 컴퓨터에 의해 수행되며, 상기 컴퓨터에는 본 발명의 인간 행동 인식 방법을 수행하기 위한 컴퓨터 프로그램이 저장된다.Also, a method for recognizing human behavior according to an embodiment of the present invention is substantially performed by a computer, and a computer program for performing the human behavior recognition method of the present invention is stored in the computer.

또한, 상기 컴퓨터는 일반적인 퍼스널 컴퓨터뿐만아니라 임베디드 시스템, 스마트 기기를 포함하는 광의의 컴퓨팅 장치이다.In addition, the computer is a broad computing device including an embedded system, a smart device, as well as a general personal computer.

또한, 상기 컴퓨터 프로그램은 별도의 기록 매체에 저장되어 제공될 수 있으며, 상기 기록매체는 본 발명을 위하여 특별히 설계되어 구성된 것들이거나 컴퓨터 소프트웨어 분야에서 통상의 지식을 가진 자에게 공지되어 사용 가능한 것일 수 있다. In addition, the computer program may be stored in a separate recording medium, and the recording medium may be designed and configured specifically for the present invention or may be known and used by those having ordinary skill in the computer software field .

예를 들면, 상기 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD, DVD와 같은 광 기록 매체, 자기 및 광 기록을 겸할 수 있는 자기-광 기록 매체, 롬, 램, 플래시 메모리 등 단독 또는 조합에 의해 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치일 수 있다.For example, the recording medium may be a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical recording medium such as a CD and a DVD, a magneto-optical recording medium capable of serving also as magnetic and optical recording, Or the like, or a hardware device specially configured to store and execute program instructions by itself or in combination.

또한, 상기 컴퓨터 프로그램은 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등이 단독 또는 조합으로 구성된 프로그램일 수 있고, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라, 인터프리터 등을 사용하여 컴퓨터에 의해 실행될 수 있는 고급 언어 코드로 짜여진 프로그램일 수 있다.In addition, the computer program may be a program consisting of program commands, local data files, local data structures, etc., alone or in combination, and may be executed by a computer using an interpreter or the like as well as machine code Lt; RTI ID = 0.0 > language code. &Lt; / RTI >

이하에서는 본 발명의 일 실시 예에 따른 인간 행동 인식 방법의 과정을 상세히 설명한다.Hereinafter, the process of the human behavior recognition method according to an embodiment of the present invention will be described in detail.

먼저, 인식하고자하는 인간 행동이 담긴 테스트 영상(test video)을 입력받고, 상기 테스트 영상의 기울기 방향 히스토그램(HOG:Histograms of Oriented Gradients) 특징 벡터(이하, '제1 HOG 특징 벡터'라 함)와 광학적 흐름 히스토그램(HOF:Histogram of Optical Flow) 특징 벡터(이하, '제1 HOF 특징 벡터'라 함)를 추출한다(S1000).First, a test video including a human action to be recognized is received, and a histograms of Oriented Gradient (HOG) feature vector (hereinafter, referred to as a first HOG feature vector) Extracts a HOF (Histogram of Optical Flow) feature vector (hereinafter, referred to as 'first HOF feature vector') (S1000).

또한, 상기 기울기 방향 히스토그램은 영상 내의 대상 영역을 일정 크기의 셀로 분할하고, 각 셀마다 에지 픽셀(gradient magnitude가 일정 값 이상인 픽셀)들의 방향에 대한 히스토그램을 구한 후 이들 히스토그램 bin 값들을 일렬로 연결한 벡터이고, 상기 광학적 흐름 히스토그램은 에지 픽셀 들의 밝기 값의 변화로부터 계산된 벡터이다.In addition, the slope direction histogram divides the target area in the image into cells of a predetermined size, obtains a histogram of the direction of the edge pixels (pixels having a gradient magnitude of a predetermined value or more) for each cell, and then connects the histogram bin values Vector, and the optical flow histogram is a vector calculated from a change in the brightness value of the edge pixels.

즉, 본 발명의 일 실시 예에 따른 인간 행동 인식 방법은 영상 내의 기울기 방향 히스토그램과 광학적 흐름 히스토그램을 함께 영상 특징으로 이용한다.That is, the human behavior recognition method according to an embodiment of the present invention uses both the gradient direction histogram and the optical flow histogram as an image feature in the image.

다음, 상기 제1 HOG 특징 벡터와 상기 제1 HOF 특징 벡터의 공분산 행렬(covariance descriptor matrix)을 각각 계산한다(S2000).Next, a covariance descriptor matrix between the first HOG feature vector and the first HOF feature vector is calculated (S2000).

또한, 상기 제1 HOG 특징 벡터는 제1 테스트 샘플로 명명하고, 상기 제1 HOF 특징 벡터는 제2 테스트 샘플로 명명한다.Also, the first HOG feature vector is referred to as a first test sample, and the first HOF feature vector is referred to as a second test sample.

또한, 상기 제1 테스트 샘플과 상기 제2 테스트 샘플은 아래의 수학식 1과 같이 계산된다.The first test sample and the second test sample are calculated according to the following equation (1).

[수학식 1][Equation 1]

여기서, C_x1는 상기 제1 테스트 샘플, x1_i는 상기 제1 HOG 특징 벡터들, μ_x1는 상기 제1 HOG 특징 벡터들의 평균 벡터이고, C_y1는 상기 제2 테스트 샘플, y1_i는 상기 제1 HOF 특징 벡터들, μ_y1는 상기 제1 HOF 특징 벡터들의 평균 벡터이다.Here, C _x1 is the first test samples, x1 _i is the first HOG feature vector, μ _x1 is the mean vector of the first HOG feature vector, C _y1 is the second test sample, y1 _i is the first 1 HOF feature vectors, [mu] _y1 are the mean vectors of the first HOF feature vectors.

다음, 재생 커널 함수(reproducing kernel)를 이용하여 상기 제1 테스트 샘플 및 상기 제2 테스트 샘플과 제1 훈련 샘플 및 제2 훈련 샘플을 재생 커널 힐버트 공간간(RKHS:reproducing kernel Hilbert space)으로 차원 변환한다(S3000).Next, the first test sample, the second test sample, the first training sample, and the second training sample are dimensionally transformed to a reproducing kernel Hilbert space (RKHS) using a reproducing kernel function (S3000).

여기서, 상기 제1 훈련 샘플 및 상기 제2 훈련 샘플은 상기 테스트 영상을 비교하기 위한 기준이 되는 영상인 훈련 영상들로부터 생성된 샘플들이다.Here, the first training sample and the second training sample are samples generated from training images, which are reference images for comparing the test images.

즉, 상기 훈련 영상은 서로 다른 인간 행동이 담긴 표준영상을 의미한다.That is, the training image means a standard image containing different human behaviors.

또한, 상기 제1 훈련 샘플은 상기 훈련 영상의 HOG 특징 벡터의 공분산 행렬이고, 상기 제2 훈련 샘플은 상기 훈련 영상의 HOF 특징 벡터의 공분산 행렬이다.The first training sample is a covariance matrix of a HOG feature vector of the training image, and the second training sample is a covariance matrix of a HOF feature vector of the training image.

또한, 상기 제1 훈련 샘플과 상기 제2 훈련 샘플은 미리 계산되어 데이터베이스에 저장될 수 있고, 상기 제1 테스트 샘플과 상기 제2 테스트 샘플의 계산과 함께 실시간으로 계산이 가능하다.In addition, the first training sample and the second training sample can be calculated in advance and stored in a database, and can be calculated in real time together with the calculation of the first test sample and the second test sample.

또한, 상기 제1 및 제2 테스트 샘플과 상기 제2 및 제2 훈련 샘플은 대칭 양정치 행렬(Symmetric Positive definite Matrices)이다.In addition, the first and second test samples and the second and second training samples are symmetric positive definite matrices.

또한, 상기 재생 커널 함수는 아래의 수학식 2와 같이 커널 함수 k₁과 k₂의 곱으로 정의되는 곱 다양체(product manifold)이다.Also, the reproduction kernel function is a product manifold defined by a product of kernel functions k ₁ and k ₂ as shown in Equation (2) below.

[수학식 2]&Quot; (2) "

또한, 상기 로그 유클리디언 가우시안 커널 함수(log-Euclidean Gaussian kernel)는 아래의 수학식 3과 같다.Also, the log-Euclidean Gaussian kernel function is expressed by Equation (3) below.

[수학식 3]&Quot; (3) "

여기서, k₁은 상기 제1 테스트 샘플과 상기 제1 훈련 샘플 간의 거리 계산을 위한 로그 유클리디언 가우시안 커널 함수이고, k₂는 상기 제2 테스트 샘플과 상기 제2 훈련 샘플 간의 거리 계산을 위한 로그 유클리디언 가우시안 커널 함수이며, γ는 상수, D_x 및 D_y는 로그 유클리디언 가우시안 거리 함수이다.Wherein k ₁ is a Log Euclidian Gaussian kernel function for calculating the distance between the first test sample and the first training sample and k ₂ is a log for calculating the distance between the second test sample and the second training sample Euclidean is a Gaussian kernel function, y is a constant, D _x and D _y are log-Euclidean Gaussian distance functions.

또한, 상기 유클리디언 가우시안 거리 함수는 아래의 수학식 4와 같다.Further, the Euclidean Gaussian distance function is expressed by Equation (4) below.

[수학식 4]&Quot; (4) "

다음, 상기 재생 커널 힐버트 공간상에서 상기 제1 테스트 샘플 및 상기 제2 테스트 샘플과 상기 제1 훈련 샘플 및 상기 제2 훈련 샘플의 거리를 계산한다(S4000).Next, the distance between the first test sample and the second test sample, the first training sample, and the second training sample are calculated in the reproducing kernel Hilbert space (S4000).

또한, 상기 재생 커널 힐버트 공간상에서 상기 제1 테스트 샘플 및 상기 제2 테스트 샘플과 상기 제1 훈련 샘플 및 상기 제2 훈련 샘플의 거리는 아래의 수학식 5와 같이 계산할 수 있다.In addition, the distance between the first test sample and the second test sample, the first training sample, and the second training sample on the reproduction kernel Hubert space can be calculated by Equation (5) below.

[수학식 5]&Quot; (5) "

그러면, 하나의 테스트 샘플과 복수 개의 훈련 샘플 간의 거리가 계산되고, 최근접 이웃(K-NN:K-Nearest Neighbor) 알고리즘을 이용하여 계산된 거리가 가장 가까운 훈련 샘플의 행동을 상기 테스트 샘플의 행동으로 분류하여 인식한다(S5000).Then, the distance between one test sample and the plurality of training samples is calculated, and the behavior of the training sample whose distance calculated using the K-Nearest Neighbor (KNN) algorithm is the closest (S5000).

즉, 본 발명의 인간 행동 인식 방법에 의하면 기울기 방향 히스토그램 특징 벡터와 광학적 흐름 히스토그램 특징 벡터를 함께 이용하여 재생 커널 힐버트 공간상에서 유클리디언 거리를 이용하여 행동을 인식하므로 인식률을 매우 높일 수 있는 장점이 있다.That is, according to the human behavior recognition method of the present invention, since the behavior is recognized using the Euclidean distance in the reproduction kernel Hilbert space using the slope direction histogram feature vector and the optical flow histogram feature vector, the recognition rate can be greatly increased have.

도 2는 본 발명의 일 실시 예에 따른 인간 행동 인식 방법의 성능 평가에 이용된 KTH 데이터 셋을 보여주는 도면이고, 도 3은 본 발명의 일 실시 예에 따른 인간 행동 인식 방법을 통한 행동 인식 분류율을 표시한 도면이다.FIG. 2 is a view showing a KTH data set used in performance evaluation of a human behavior recognition method according to an embodiment of the present invention. FIG. 3 is a view showing a behavior recognition classification rate Fig.

본 발명의 인간 행동 인식 방법의 성능검증을 위해 인간 행동 인식분야에서 공인 데이터베이스로 사용되는 KTH 데이터 셋을 이용하여 실험을 실시하였다.In order to verify the performance of the human behavior recognition method of the present invention, an experiment was conducted using a KTH data set used as an official database in the field of human behavior recognition.

이 데이터 셋은 25명의 사람들이 6가지의 인간행동을 시연한 비디오 영상으로 구성되어 있으며 각 인간 행동 데이터 셋은 boxing, hand clapping, hand waving, walking, jogging 그리고 running로 구성되어 있다.This data set consists of video images of 25 people demonstrating six human behaviors. Each human behavior data set consists of boxing, hand clapping, hand waving, walking, jogging, and running.

또한, 각 프레임 영상으로부터 64차원의 HOG 특징과 HOF 특징을 특징 벡터로 추출하였으며 64×64 크기의 대칭 양정치 공분산 행렬을 계산하였다.In addition, 64 - dimensional HOG and HOF features were extracted as feature vectors from each frame image and 64 × 64 symmetric positive covariance matrices were calculated.

검증결과 도 3에 도시한 바와 같이 boxing, hand clapping, hand waving, walking의 경우 각각 100%, 97%, 92%, 97%로 행동 인식 분류율이 매우 높은 것을 알 수 있었으며, jogging과 running의 경우 행동이 거의 유사하여 행동 인식 분류율(Classification Rate)이 상대적으로 낮을 것으로 나타났다.As shown in Fig. 3, the classification rate of behavior recognition is very high in boxing, hand clapping, hand waving, walking, and 100%, 97%, 92%, and 97%, respectively. The behavioral similarity rate (Classification Rate) was found to be relatively low.

그럼에도 불구하고 평균적으로 분류율이 87.91%로 매우 높은 것을 알 수 있어 인간 행동 인식에 매우 적합한 것을 확인하였다.Nevertheless, we found that the average classification rate was very high (87.91%), which proved to be very suitable for human behavior recognition.

이상에서 살펴본 바와 같이 본 발명은 바람직한 실시 예를 들어 도시하고 설명하였으나, 상기한 실시 예에 한정되지 아니하며 본 발명의 정신을 벗어나지 않는 범위 내에서 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변경과 수정이 가능할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, Various changes and modifications will be possible.

Claims

(Hereinafter referred to as a first HOG characteristic vector) and an optical flow histogram characteristic vector (hereinafter, referred to as a first HOF characteristic vector, hereinafter) Quot;) < / RTI >
(Hereinafter referred to as a "first test sample") of the first HOG feature vector and a covariance matrix (hereinafter referred to as a "second test sample") of the first HOF feature vector step;
A first training sample, which is a covariance matrix of HOG feature vectors of training images for different human behaviors different from the first test sample and the second test sample, and a covariance matrix of HOF feature vectors, Transforming the second training sample into a reproducing kernel Hilbert space (RKHS), and calculating a distance between the first test sample and the second test sample, the first training sample and the second training sample step; And
And recognizing the human behavior of the training image having the distance calculated using the K-Nearest Neighbor algorithm as the human behavior of the test image.

The method according to claim 1,
Wherein the first test sample and the second test sample are calculated by the following equation (1).
[Equation 1]

Wherein C _x1 is the first test sample, x1 _i is the first HOG feature vector, and _{x x1} is the mean vector of the first HOG feature vector, C _y1 is the second test sample, y 1 _i is the first test sample, HOF feature vector, and [mu] _y1 is an average vector of the first HOF feature vector.

The method according to claim 1,
Dimensional reconstruction of the reconstructed kernel Hilbert space is performed by calculating a reconstructed kernel function as shown in Equation 2 below, and the reconstructed kernel function is transformed into the log-euclidean Gaussian kernel of the first training sample -Euclidean Gaussian kernel) function value and a Log Euclidean Gaussian kernel function value of the second test sample and the second training sample.
&Quot; (2) "

Where k is the regenerative kernel function, k ₁ and k ₂ are the Log Euclidian Gaussian kernel function, C _x2 is the first training sample, and C _y2 is the second training sample.

The method of claim 3,
Wherein the Log Euclidean Gaussian kernel function is calculated as: < EMI ID = 3.0 >
&Quot; (3) "

Wherein k ₁ is a Log Euclidian Gaussian kernel function for calculating the distance between the first test sample and the first training sample and k ₂ is a log for calculating the distance between the second test sample and the second training sample Euclidean is a Gaussian kernel function, y is a constant, D _x and D _y are log-Euclidean Gaussian distance functions.

The method of claim 3,
Wherein the Log Euclidean Gaussian distance function is calculated according to Equation (4) below.
&Quot; (4) "

Wherein D _x is a Log Euclidean Gaussian distance function for calculating the distance between the first test sample and the first training sample, and D _y is a log-Euclidean distance function for calculating the distance between the second test sample and the second training sample. Clidian Gaussian distance function, _F is the Frobenius norm.

6. The method of claim 5,
The distance between the first test sample and the second test sample, the first training sample and the second training sample on the reproducing kernel Hilbert space (RKHS) is calculated by the following equation (5) A human behavior recognition method.
&Quot; (5) "

Here, d _{Mx x My} denotes a distance in the reproduction kernel Hubert space.

A computer program stored in a recording medium for performing a human behavior recognition method according to any one of claims 1 to 6 by functioning as a computer.