KR20160124948A - Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification - Google Patents

Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification Download PDF

Info

Publication number
KR20160124948A
KR20160124948A KR1020150055044A KR20150055044A KR20160124948A KR 20160124948 A KR20160124948 A KR 20160124948A KR 1020150055044 A KR1020150055044 A KR 1020150055044A KR 20150055044 A KR20150055044 A KR 20150055044A KR 20160124948 A KR20160124948 A KR 20160124948A
Authority
KR
South Korea
Prior art keywords
vector
video
tensor
feature
calculating
Prior art date
Application number
KR1020150055044A
Other languages
Korean (ko)
Other versions
KR101713189B1 (en
Inventor
김진영
뷔넉남
민소희
김정기
Original Assignee
전남대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전남대학교산학협력단 filed Critical 전남대학교산학협력단
Priority to KR1020150055044A priority Critical patent/KR101713189B1/en
Publication of KR20160124948A publication Critical patent/KR20160124948A/en
Application granted granted Critical
Publication of KR101713189B1 publication Critical patent/KR101713189B1/en

Links

Images

Classifications

    • G06K9/00744
    • G06K9/00751
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a method of extracting feature information for classification of object behavior from video, comprising the steps of: calculating a gradient vector and an optical optical flow vector for a selected key point for video frames of a video to be processed; Obtaining a tensor product of the optical flow vector, calculating a tensor divergence with respect to the tensor product so as to lower the dimension of the tensor product, and determining the calculated tensor divergence as a feature vector for motion classification. According to the feature information extraction method and feature extractor for classifying the video object behavior, it is possible to reflect changes in space and time by fusing the features of HOG and HOF into one feature, while not increasing the dimension of the feature vector, It is possible to improve the performance of the behavior classification in video without increasing the computational complexity of the classifier.

Description

TECHNICAL FIELD The present invention relates to a method and an apparatus for extracting feature information based on HOG / HOF for classification of video object behavior,

The present invention relates to HOG / HOF-based feature information extraction method and feature extractor for video object behavior classification, and more particularly, to feature information extraction method and feature extractor for video object behavior classification for improving classification performance of motion in an object .

A variety of feature extraction methods have been proposed for classifying the behavior of an object such as a human action in a video.

In the case of human behavior, there are various behaviors such as walking, running, jogging, and straightening of the arm. Techniques have been developed to recognize these behaviors.

Particularly, excellent features for expressing object behavior have recently been developed. HOG (histogram of the Grandient) based on the gradient of the image, histogram of optical flow (HOF) based on the optical flow of the image in time, Motion boundary histogram (MBH) and scale invariant feature transformation (SIFT), which use optical flow derivatives that are robust to camera motion.

MBH is obtained for the x-axis and the y-axis, respectively, which is called MBHx and MBHy.

A classifier should be used to classify the behavior of objects from extracted features. Classifiers are HMM (hidden markov model), SVM (support vector machine), Fisher vector based GMM (Gaussian mixture model), and histogram feature based BOF (bag of feature).

For methods of extracting object behavior features, the latest article {H. Wang, A. Klitsch, C. Schmid, and C.-L. Liu, " Dense Trajectories and Motion Boundary Descriptors for Action Recognition, "International Journal of Computer Vision, vol. 103, no. 1, pp. 60-79, (May, 2013).

On the other hand, there is a demand for a feature extraction method in video that can improve the classification performance while reducing the amount of computation of the classifier applied for classifying the behavior of the object.

The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a video object behavior classification method capable of reducing calculation amount while combining gradient information and optical flow information of an image as feature information for classifying a behavior of an object in a video The feature information extracting method and the feature extracting device.

According to an aspect of the present invention, there is provided a feature information extracting method for classifying a video object behavior according to the present invention,

According to the feature information extraction method and feature extractor for video object behavior classification according to the present invention, it is possible to reflect changes in space and time by fusing features of HOG and HOF into one feature, , It provides the advantage that the performance of the behavior classification in video can be improved without increasing the amount of calculation of the classifier as the recognizer.

FIG. 1 is a flowchart illustrating a feature information extraction process for classifying a video object behavior according to the present invention.
FIG. 2 is a flow chart showing in detail the feature information extraction process of FIG. 1,
FIG. 3 is a diagram for explaining an example of application to behavior recognition from feature information extracted according to the present invention,
4 and 5 are diagrams illustrating a process of calculating a histogram from feature information extracted according to the present invention,
6 is a block diagram illustrating a feature information extractor for classifying a video object behavior according to the present invention.

Hereinafter, a feature information extracting method and a feature extractor for classifying a video object behavior according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is a flowchart showing a feature information extraction process for classifying a video object behavior according to the present invention.

First, a key point for image frames of an input video to be processed is selected (step 10), and a gradient vector g and an optical optical flow vector f are calculated for the selected key point (step 20).

Next, a tensor product of the calculated gradient vector (g) and the optical optical flow vector (f) is calculated (step 30) and a tensor divergence is calculated for the calculated tensor product to lower the dimension of the tensor product ( Step 40).

Then, the calculated tensor divergence is determined as a feature vector for motion classification, and the determined feature vector data for motion classification is provided to a classifier of BoF or SVM to classify the behavior (step 50).

Hereinafter, this processing will be described in more detail with reference to FIGS. 2 to 5. FIG.

First, when the key points, which are the main pixel points in the video frame of the video, are selected, the feature vector of the key point

Figure pat00001
Let's say. here
Figure pat00002
Is a gradient vector and
Figure pat00003
Is an opttical flow vector.

Figure pat00004

The tensor product of the gradient vector g and the optical flow vector f can be expressed by the following equation (2).

Figure pat00005

here

Figure pat00006
Denotes the tensor product, and the equation (2) is the following equation (3).

Figure pat00007

here

Figure pat00008
The
Figure pat00009
The unit vector of the axis (horizontal axis)
Figure pat00010
The
Figure pat00011
Is the unit vector of the axis (vertical axis). As a result, the tensor product of the two vectors is expressed by a matrix of 2x2.

Therefore, the gradient vector (g) and the optical flow vector (f) are expressed as an extension of a matrix, and each factor in the matrix is expressed as a correlation between two vector elements.

In this patent, tensor divergence is used to convert a feature extended by a tensor product into a low-dimensional vector.

That is,

Figure pat00012
, The tensor divergence is defined by the following equations (4) and (5).

Figure pat00013

Therefore, the gradient-flow tensor divergence (TDGF), which is the feature information generated through step 40, is defined by the following equation (5).

Figure pat00014

≪ RTI ID = 0.0 >

Figure pat00015
Is a two-dimensional vector that lowers the dimension of the feature vector and compressively shows the gradient vector (g) information and the optical flow vector (f) information.

The feature information thus extracted can be used as it is in the next learning and recognition step, and can be applied to the conventional SVM or BOF recognition method by converting it to a feature such as a histogram.

An example of a key point selection process is described below.

Figure pat00016
Let's say. here
Figure pat00017
Is an image frame index,
Figure pat00018
As an image
Figure pat00019
Lt; / RTI >

Various methods known in the art may be used to select the keypoints to be traced for the gradient and optical flow for such image frames.

Hereinafter, a method of using the eigenvalues of the 2 nd gradient matrix will be described as an example of selecting a key point.

First, the key point is obtained by using the equation (6) below the second gradient (2 nd gradient) for each pixel from the input image to be processed video, calculating the eigenvalues (eigen value) of the second gradient, and then, And a pixel having a calculated eigenvalue greater than a set threshold value is determined by a key point.

Figure pat00020

here

Figure pat00021
Wow
Figure pat00022
≪ / RTI >
Figure pat00023
Axis
Figure pat00024
Means a gradient in the axial direction,
Figure pat00025
Wow
Figure pat00026
Is a 2 nd gradient. For real digital images these values
Figure pat00027
About axial direction
Figure pat00028
And
Figure pat00029
About axis
Figure pat00030
It is obtained through filtering with kernels.

Given a matrix for each pixel

Figure pat00031
Eigen-vector decomposition is performed on the two eigenvalues, and two eigenvalues are set to a specific threshold value
Figure pat00032
) Are defined as key points. Here, a specific threshold value (
Figure pat00033
) Is determined empirically. In the above procedure, a gradient vector is obtained, and the gradient vector is obtained for each key point
Figure pat00034
.

Next, time

Figure pat00035
in
Figure pat00036
Second feature point
Figure pat00037
The next frame
Figure pat00038
And is expressed by Equation (7) below.

Figure pat00039

here,

Figure pat00040
Time
Figure pat00041
Key point position in the < RTI ID = 0.0 >
Figure pat00042
Is a median filter kernel, and < RTI ID = 0.0 >
Figure pat00043
Is an optical flow kernel. Then, the optical flow vector is expressed by Equation (8) below.

Figure pat00044

A method of obtaining an optical flow path is variously known and a detailed description thereof will be omitted.

The gradient vector (g) and the optical flow vector (f) are defined in time and space because they are defined on all frames and on all key points.

In other words

Figure pat00045
,
Figure pat00046
Lt; / RTI > here
Figure pat00047
Frame index
Figure pat00048
Means an index of key points.

Next, the tensor product of the gradient vector and the optical flow vector can be rewritten as Equation (9) below, reflecting the frame index and the keypoint index as described above.

Figure pat00049

In addition, the feature vector to which the tensor divergence is applied

Figure pat00050
) Is obtained by the following equation (10).

Figure pat00051

Once the feature vectors for behavior classification are obtained as described above, behavior recognition can be performed according to general learning and recognition procedures.

In the present invention,

Figure pat00052
To support the behavior recognition, a method of using BOF and SVM (support vector machine) will be described with reference to FIG. 3 to FIG. 5. FIG.

Referring first to Figure 3,

Figure pat00053
And a method of action classification using the method.

As shown in FIG. 3,

Figure pat00054
about
Figure pat00055
Calculates a descriptor. This
Figure pat00056
Let's say. To do this,
Figure pat00057
Since the feature is a vector of two-dimensional space, it is quantized. To do this, you can use a vector quantizer,
Figure pat00058
Through the k-means aggregation from TDGF feature vector pools with feature vectors
Figure pat00059
Of representative vectors are obtained. This
Figure pat00060
Let's say. Then,
Figure pat00061
Has the smallest distance
Figure pat00062
Lt; / RTI > here
Figure pat00063
Is the index of the representative vector having the smallest distance.

then

Figure pat00064
The vector coding of
Figure pat00065
Respectively.

For each of the following key points:

Figure pat00066
And
Figure pat00067
Block in the space-time. 4,
Figure pat00068
The subblocks are divided into subblocks. Next, for each pixel in the subblock
Figure pat00069
Are calculated, and these are concatenated. This
Figure pat00070
In addition,
Figure pat00071
The dimension of
Figure pat00072
.

Next, as shown in Figs. 4 and 5,

Figure pat00073
Frame duration
Figure pat00074
Calculate for all my key points and subblocks and use the histogram
Figure pat00075
. First,
Figure pat00076
Representative vector for
Figure pat00077
The feature vector is obtained from the pool by k-means aggregation. However,
Figure pat00078
Let's say. Then entered
Figure pat00079
For all key points and subblocks in a frame
Figure pat00080
, And the information
Figure pat00081
Using one vector < RTI ID = 0.0 >
Figure pat00082
. For this purpose,
Figure pat00083
Representative
Figure pat00084
Th
Figure pat00085
of
Figure pat00086
1 < / RTI > then
Figure pat00087
One for frame
Figure pat00088
Can be obtained.
Figure pat00089
The
Figure pat00090
. Of course, normalization is performed on the thus obtained histogram. Then, for a single action video clip,
Figure pat00091
Can be obtained. This feature vector
Figure pat00092
Let's say. Then the feature vector
Figure pat00093
≪ RTI ID = 0.0 >
Figure pat00094
As a branch of total action
Figure pat00095
ego
Figure pat00096
Is the number of sample video clips for each action
Figure pat00097
. Finally,
Figure pat00098
Is performed using a support vector machine (SVM) employing a non-linear kernel.

The characteristic information extractor of the present invention to which this extraction method is applied is shown in Fig.

6, the feature information extractor 100 includes an input unit 110, a keypoint extractor 120, a feature information extractor 130, and a classifier 140.

The input unit 110 receives the image data to be processed.

The keypoint extraction unit 120 extracts keypoints from the image data input to the input unit 110 in the manner described above.

The feature information extracting unit 130 obtains a tensor product of the gradient vector g and the flow vector f with respect to the keypoint extracted by the keypoint extractor 120 and calculates a tensor product of the tensor product to lower the dimension of the tensor product. Calculates the divergence, and determines the calculated tensor divergence as a feature vector for motion classification, and provides it to the classifier 140.

The classifier 140 classifies the motion of the object from the feature vector information received from the feature information extractor 130.

In order to verify this extraction method, the recognition experiment was performed on the KTH database. As a result, HOG, HOF, MBHx, MBHy,

Figure pat00099
As shown in Table 1 below, the best performance was obtained when the features of the proposed technique were used.

HOG HOF MbhX MbhY

Figure pat00100
KTH 93.98 94.44 95.83 94.44 96.30

For reference, a sample image of the KTH data set used in the recognition experiment is as follows.

Figure pat00101

As described above, according to the feature information extraction method and feature extractor for video object behavior classification according to the present invention, it is possible to reflect changes in space and time by integrating the features of HOG and HOF into one feature, It is possible to improve the performance of the intra-video motion classification without increasing the amount of calculations of the recognizer or classifier.

110: Input unit 120: Keypoint extraction unit
130: Feature information extracting unit 140:

Claims (4)

A method for extracting feature information for object behavior classification from video,
end. Calculating a gradient vector and an optical optical flow vector for a selected key point for video frames of a video to be processed;
I. Obtaining a tensor product of the gradient vector and the optical flow vector;
All. Calculating a tensor divergence with respect to the tensor product to reduce the dimension of the tensor product, and determining the calculated tensor divergence as a feature vector for motion classification. Way.
The method of claim 1, wherein the key point is obtained by obtaining a second gradient for each pixel from an input image of a video to be processed, calculating a unique value for the second gradient, and then calculating a pixel whose calculated eigenvalue is greater than a set threshold value And extracting characteristic information for video object behavior classification. The method of claim 1, wherein the feature vector data for motion classification is provided to a classifier of a BoF or an SVM, and the behavior is classified and processed. An input unit for receiving image data;
A key point extracting unit for extracting a key point from the image data input to the input unit;
Calculating a tensor product of the gradient vector with respect to the key vector extracted by the keypoint extractor, calculating a tensor product of the tensor product to lower the dimension of the product of the tensor product, And a feature information extracting unit for extracting feature information from the feature information extracted by the extracting unit.
KR1020150055044A 2015-04-20 2015-04-20 Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification KR101713189B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150055044A KR101713189B1 (en) 2015-04-20 2015-04-20 Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150055044A KR101713189B1 (en) 2015-04-20 2015-04-20 Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification

Publications (2)

Publication Number Publication Date
KR20160124948A true KR20160124948A (en) 2016-10-31
KR101713189B1 KR101713189B1 (en) 2017-03-08

Family

ID=57445804

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150055044A KR101713189B1 (en) 2015-04-20 2015-04-20 Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification

Country Status (1)

Country Link
KR (1) KR101713189B1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852331A (en) * 2019-10-25 2020-02-28 中电科大数据研究院有限公司 Image description generation method combined with BERT model
CN112101091A (en) * 2020-07-30 2020-12-18 咪咕文化科技有限公司 Video classification method, electronic device and storage medium
WO2022227292A1 (en) * 2021-04-29 2022-11-03 苏州大学 Action recognition method
CN117671357A (en) * 2023-12-01 2024-03-08 广东技术师范大学 Pyramid algorithm-based prostate cancer ultrasonic video classification method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102054211B1 (en) 2017-12-11 2019-12-10 경희대학교 산학협력단 Method and system for video retrieval based on image queries

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008310796A (en) * 2007-06-15 2008-12-25 Mitsubishi Electric Research Laboratories Inc Computer implemented method for constructing classifier from training data detecting moving object in test data using classifier
KR20100077307A (en) * 2008-12-29 2010-07-08 포항공과대학교 산학협력단 Image texture filtering method, storage medium of storing program for executing the same and apparatus performing the same
JP2014072620A (en) * 2012-09-28 2014-04-21 Nikon Corp Image processing program, image processing method, image processing apparatus, and imaging apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008310796A (en) * 2007-06-15 2008-12-25 Mitsubishi Electric Research Laboratories Inc Computer implemented method for constructing classifier from training data detecting moving object in test data using classifier
KR20100077307A (en) * 2008-12-29 2010-07-08 포항공과대학교 산학협력단 Image texture filtering method, storage medium of storing program for executing the same and apparatus performing the same
JP2014072620A (en) * 2012-09-28 2014-04-21 Nikon Corp Image processing program, image processing method, image processing apparatus, and imaging apparatus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852331A (en) * 2019-10-25 2020-02-28 中电科大数据研究院有限公司 Image description generation method combined with BERT model
CN110852331B (en) * 2019-10-25 2023-09-08 中电科大数据研究院有限公司 Image description generation method combined with BERT model
CN112101091A (en) * 2020-07-30 2020-12-18 咪咕文化科技有限公司 Video classification method, electronic device and storage medium
CN112101091B (en) * 2020-07-30 2024-05-07 咪咕文化科技有限公司 Video classification method, electronic device and storage medium
WO2022227292A1 (en) * 2021-04-29 2022-11-03 苏州大学 Action recognition method
CN117671357A (en) * 2023-12-01 2024-03-08 广东技术师范大学 Pyramid algorithm-based prostate cancer ultrasonic video classification method and system

Also Published As

Publication number Publication date
KR101713189B1 (en) 2017-03-08

Similar Documents

Publication Publication Date Title
CN106897675B (en) Face living body detection method combining binocular vision depth characteristic and apparent characteristic
KR101713189B1 (en) Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification
Karaman et al. Fast saliency based pooling of fisher encoded dense trajectories
CN107563345B (en) Human body behavior analysis method based on space-time significance region detection
Sun et al. Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild
CN108062543A (en) A kind of face recognition method and device
CN106980825B (en) Human face posture classification method based on normalized pixel difference features
CN104036296B (en) A kind of expression of image and processing method and processing device
CN111178195A (en) Facial expression recognition method and device and computer readable storage medium
JP2010108494A (en) Method and system for determining characteristic of face within image
CN104794446B (en) Human motion recognition method and system based on synthesis description
Xue et al. Automatic 4D facial expression recognition using DCT features
HN et al. Human Facial Expression Recognition from static images using shape and appearance feature
De Souza et al. Detection of violent events in video sequences based on census transform histogram
Sarhan et al. HLR-net: a hybrid lip-reading model based on deep convolutional neural networks
CN108062559A (en) A kind of image classification method based on multiple receptive field, system and device
CN111428590A (en) Video clustering segmentation method and system
CN104504161A (en) Image retrieval method based on robot vision platform
Jiang et al. An isolated sign language recognition system using RGB-D sensor with sparse coding
KR20210011707A (en) A CNN-based Scene classifier with attention model for scene recognition in video
Lan et al. The best of both worlds: Combining data-independent and data-driven approaches for action recognition
Zhang et al. Recognizing human action and identity based on affine-SIFT
Li et al. Facial expression recognition using facial-component-based bag of words and PHOG descriptors
Shukla et al. Deep Learning Model to Identify Hide Images using CNN Algorithm
Al-agha et al. Geometric-based feature extraction and classification for emotion expressions of 3D video film

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right