KR20160124948A - Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification - Google Patents
Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification Download PDFInfo
- Publication number
- KR20160124948A KR20160124948A KR1020150055044A KR20150055044A KR20160124948A KR 20160124948 A KR20160124948 A KR 20160124948A KR 1020150055044 A KR1020150055044 A KR 1020150055044A KR 20150055044 A KR20150055044 A KR 20150055044A KR 20160124948 A KR20160124948 A KR 20160124948A
- Authority
- KR
- South Korea
- Prior art keywords
- vector
- video
- tensor
- feature
- calculating
- Prior art date
Links
- 238000000605 extraction Methods 0.000 title abstract description 15
- 230000009471 action Effects 0.000 title description 7
- 239000013598 vector Substances 0.000 claims abstract description 62
- 238000000034 method Methods 0.000 claims abstract description 23
- 230000003287 optical effect Effects 0.000 claims abstract description 23
- 230000006399 behavior Effects 0.000 description 26
- 238000012706 support-vector machine Methods 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G06K9/00744—
-
- G06K9/00751—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a method of extracting feature information for classification of object behavior from video, comprising the steps of: calculating a gradient vector and an optical optical flow vector for a selected key point for video frames of a video to be processed; Obtaining a tensor product of the optical flow vector, calculating a tensor divergence with respect to the tensor product so as to lower the dimension of the tensor product, and determining the calculated tensor divergence as a feature vector for motion classification. According to the feature information extraction method and feature extractor for classifying the video object behavior, it is possible to reflect changes in space and time by fusing the features of HOG and HOF into one feature, while not increasing the dimension of the feature vector, It is possible to improve the performance of the behavior classification in video without increasing the computational complexity of the classifier.
Description
The present invention relates to HOG / HOF-based feature information extraction method and feature extractor for video object behavior classification, and more particularly, to feature information extraction method and feature extractor for video object behavior classification for improving classification performance of motion in an object .
A variety of feature extraction methods have been proposed for classifying the behavior of an object such as a human action in a video.
In the case of human behavior, there are various behaviors such as walking, running, jogging, and straightening of the arm. Techniques have been developed to recognize these behaviors.
Particularly, excellent features for expressing object behavior have recently been developed. HOG (histogram of the Grandient) based on the gradient of the image, histogram of optical flow (HOF) based on the optical flow of the image in time, Motion boundary histogram (MBH) and scale invariant feature transformation (SIFT), which use optical flow derivatives that are robust to camera motion.
MBH is obtained for the x-axis and the y-axis, respectively, which is called MBHx and MBHy.
A classifier should be used to classify the behavior of objects from extracted features. Classifiers are HMM (hidden markov model), SVM (support vector machine), Fisher vector based GMM (Gaussian mixture model), and histogram feature based BOF (bag of feature).
For methods of extracting object behavior features, the latest article {H. Wang, A. Klitsch, C. Schmid, and C.-L. Liu, " Dense Trajectories and Motion Boundary Descriptors for Action Recognition, "International Journal of Computer Vision, vol. 103, no. 1, pp. 60-79, (May, 2013).
On the other hand, there is a demand for a feature extraction method in video that can improve the classification performance while reducing the amount of computation of the classifier applied for classifying the behavior of the object.
The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a video object behavior classification method capable of reducing calculation amount while combining gradient information and optical flow information of an image as feature information for classifying a behavior of an object in a video The feature information extracting method and the feature extracting device.
According to an aspect of the present invention, there is provided a feature information extracting method for classifying a video object behavior according to the present invention,
According to the feature information extraction method and feature extractor for video object behavior classification according to the present invention, it is possible to reflect changes in space and time by fusing features of HOG and HOF into one feature, , It provides the advantage that the performance of the behavior classification in video can be improved without increasing the amount of calculation of the classifier as the recognizer.
FIG. 1 is a flowchart illustrating a feature information extraction process for classifying a video object behavior according to the present invention.
FIG. 2 is a flow chart showing in detail the feature information extraction process of FIG. 1,
FIG. 3 is a diagram for explaining an example of application to behavior recognition from feature information extracted according to the present invention,
4 and 5 are diagrams illustrating a process of calculating a histogram from feature information extracted according to the present invention,
6 is a block diagram illustrating a feature information extractor for classifying a video object behavior according to the present invention.
Hereinafter, a feature information extracting method and a feature extractor for classifying a video object behavior according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
FIG. 1 is a flowchart showing a feature information extraction process for classifying a video object behavior according to the present invention.
First, a key point for image frames of an input video to be processed is selected (step 10), and a gradient vector g and an optical optical flow vector f are calculated for the selected key point (step 20).
Next, a tensor product of the calculated gradient vector (g) and the optical optical flow vector (f) is calculated (step 30) and a tensor divergence is calculated for the calculated tensor product to lower the dimension of the tensor product ( Step 40).
Then, the calculated tensor divergence is determined as a feature vector for motion classification, and the determined feature vector data for motion classification is provided to a classifier of BoF or SVM to classify the behavior (step 50).
Hereinafter, this processing will be described in more detail with reference to FIGS. 2 to 5. FIG.
First, when the key points, which are the main pixel points in the video frame of the video, are selected, the feature vector of the key point
Let's say. here Is a gradient vector and Is an opttical flow vector.
The tensor product of the gradient vector g and the optical flow vector f can be expressed by the following equation (2).
here
Denotes the tensor product, and the equation (2) is the following equation (3).
here
The The unit vector of the axis (horizontal axis) The Is the unit vector of the axis (vertical axis). As a result, the tensor product of the two vectors is expressed by a matrix of 2x2.Therefore, the gradient vector (g) and the optical flow vector (f) are expressed as an extension of a matrix, and each factor in the matrix is expressed as a correlation between two vector elements.
In this patent, tensor divergence is used to convert a feature extended by a tensor product into a low-dimensional vector.
That is,
, The tensor divergence is defined by the following equations (4) and (5).
Therefore, the gradient-flow tensor divergence (TDGF), which is the feature information generated through
≪ RTI ID = 0.0 >
Is a two-dimensional vector that lowers the dimension of the feature vector and compressively shows the gradient vector (g) information and the optical flow vector (f) information.The feature information thus extracted can be used as it is in the next learning and recognition step, and can be applied to the conventional SVM or BOF recognition method by converting it to a feature such as a histogram.
An example of a key point selection process is described below.
Let's say. here Is an image frame index, As an image Lt; / RTI >Various methods known in the art may be used to select the keypoints to be traced for the gradient and optical flow for such image frames.
Hereinafter, a method of using the eigenvalues of the 2 nd gradient matrix will be described as an example of selecting a key point.
First, the key point is obtained by using the equation (6) below the second gradient (2 nd gradient) for each pixel from the input image to be processed video, calculating the eigenvalues (eigen value) of the second gradient, and then, And a pixel having a calculated eigenvalue greater than a set threshold value is determined by a key point.
here
Wow ≪ / RTI > Axis Means a gradient in the axial direction, Wow Is a 2 nd gradient. For real digital images these values About axial direction And About axis It is obtained through filtering with kernels.Given a matrix for each pixel
Eigen-vector decomposition is performed on the two eigenvalues, and two eigenvalues are set to a specific threshold value ) Are defined as key points. Here, a specific threshold value ( ) Is determined empirically. In the above procedure, a gradient vector is obtained, and the gradient vector is obtained for each key point .Next, time
in Second feature point The next frame And is expressed by Equation (7) below.
here,
Time Key point position in the < RTI ID = 0.0 > Is a median filter kernel, and < RTI ID = 0.0 > Is an optical flow kernel. Then, the optical flow vector is expressed by Equation (8) below.
A method of obtaining an optical flow path is variously known and a detailed description thereof will be omitted.
The gradient vector (g) and the optical flow vector (f) are defined in time and space because they are defined on all frames and on all key points.
In other words
, Lt; / RTI > here Frame index Means an index of key points.Next, the tensor product of the gradient vector and the optical flow vector can be rewritten as Equation (9) below, reflecting the frame index and the keypoint index as described above.
In addition, the feature vector to which the tensor divergence is applied
) Is obtained by the following equation (10).
Once the feature vectors for behavior classification are obtained as described above, behavior recognition can be performed according to general learning and recognition procedures.
In the present invention,
To support the behavior recognition, a method of using BOF and SVM (support vector machine) will be described with reference to FIG. 3 to FIG. 5. FIG.Referring first to Figure 3,
And a method of action classification using the method.As shown in FIG. 3,
about Calculates a descriptor. This Let's say. To do this, Since the feature is a vector of two-dimensional space, it is quantized. To do this, you can use a vector quantizer, Through the k-means aggregation from TDGF feature vector pools with feature vectors Of representative vectors are obtained. This Let's say. Then, Has the smallest distance Lt; / RTI > here Is the index of the representative vector having the smallest distance.then
The vector coding of Respectively.For each of the following key points:
And Block in the space-time. 4, The subblocks are divided into subblocks. Next, for each pixel in the subblock Are calculated, and these are concatenated. This In addition, The dimension of .Next, as shown in Figs. 4 and 5,
Frame duration Calculate for all my key points and subblocks and use the histogram . First, Representative vector for The feature vector is obtained from the pool by k-means aggregation. However, Let's say. Then entered For all key points and subblocks in a frame , And the information Using one vector < RTI ID = 0.0 > . For this purpose, Representative Th of 1 < / RTI > then One for frame Can be obtained. The . Of course, normalization is performed on the thus obtained histogram. Then, for a single action video clip, Can be obtained. This feature vector Let's say. Then the feature vector ≪ RTI ID = 0.0 > As a branch of total action ego Is the number of sample video clips for each action . Finally, Is performed using a support vector machine (SVM) employing a non-linear kernel.The characteristic information extractor of the present invention to which this extraction method is applied is shown in Fig.
6, the
The
The
The feature
The
In order to verify this extraction method, the recognition experiment was performed on the KTH database. As a result, HOG, HOF, MBHx, MBHy,
As shown in Table 1 below, the best performance was obtained when the features of the proposed technique were used.
For reference, a sample image of the KTH data set used in the recognition experiment is as follows.
As described above, according to the feature information extraction method and feature extractor for video object behavior classification according to the present invention, it is possible to reflect changes in space and time by integrating the features of HOG and HOF into one feature, It is possible to improve the performance of the intra-video motion classification without increasing the amount of calculations of the recognizer or classifier.
110: Input unit 120: Keypoint extraction unit
130: Feature information extracting unit 140:
Claims (4)
end. Calculating a gradient vector and an optical optical flow vector for a selected key point for video frames of a video to be processed;
I. Obtaining a tensor product of the gradient vector and the optical flow vector;
All. Calculating a tensor divergence with respect to the tensor product to reduce the dimension of the tensor product, and determining the calculated tensor divergence as a feature vector for motion classification. Way.
A key point extracting unit for extracting a key point from the image data input to the input unit;
Calculating a tensor product of the gradient vector with respect to the key vector extracted by the keypoint extractor, calculating a tensor product of the tensor product to lower the dimension of the product of the tensor product, And a feature information extracting unit for extracting feature information from the feature information extracted by the extracting unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150055044A KR101713189B1 (en) | 2015-04-20 | 2015-04-20 | Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150055044A KR101713189B1 (en) | 2015-04-20 | 2015-04-20 | Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20160124948A true KR20160124948A (en) | 2016-10-31 |
KR101713189B1 KR101713189B1 (en) | 2017-03-08 |
Family
ID=57445804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150055044A KR101713189B1 (en) | 2015-04-20 | 2015-04-20 | Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101713189B1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110852331A (en) * | 2019-10-25 | 2020-02-28 | 中电科大数据研究院有限公司 | Image description generation method combined with BERT model |
CN112101091A (en) * | 2020-07-30 | 2020-12-18 | 咪咕文化科技有限公司 | Video classification method, electronic device and storage medium |
WO2022227292A1 (en) * | 2021-04-29 | 2022-11-03 | 苏州大学 | Action recognition method |
CN117671357A (en) * | 2023-12-01 | 2024-03-08 | 广东技术师范大学 | Pyramid algorithm-based prostate cancer ultrasonic video classification method and system |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102054211B1 (en) | 2017-12-11 | 2019-12-10 | 경희대학교 산학협력단 | Method and system for video retrieval based on image queries |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008310796A (en) * | 2007-06-15 | 2008-12-25 | Mitsubishi Electric Research Laboratories Inc | Computer implemented method for constructing classifier from training data detecting moving object in test data using classifier |
KR20100077307A (en) * | 2008-12-29 | 2010-07-08 | 포항공과대학교 산학협력단 | Image texture filtering method, storage medium of storing program for executing the same and apparatus performing the same |
JP2014072620A (en) * | 2012-09-28 | 2014-04-21 | Nikon Corp | Image processing program, image processing method, image processing apparatus, and imaging apparatus |
-
2015
- 2015-04-20 KR KR1020150055044A patent/KR101713189B1/en active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008310796A (en) * | 2007-06-15 | 2008-12-25 | Mitsubishi Electric Research Laboratories Inc | Computer implemented method for constructing classifier from training data detecting moving object in test data using classifier |
KR20100077307A (en) * | 2008-12-29 | 2010-07-08 | 포항공과대학교 산학협력단 | Image texture filtering method, storage medium of storing program for executing the same and apparatus performing the same |
JP2014072620A (en) * | 2012-09-28 | 2014-04-21 | Nikon Corp | Image processing program, image processing method, image processing apparatus, and imaging apparatus |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110852331A (en) * | 2019-10-25 | 2020-02-28 | 中电科大数据研究院有限公司 | Image description generation method combined with BERT model |
CN110852331B (en) * | 2019-10-25 | 2023-09-08 | 中电科大数据研究院有限公司 | Image description generation method combined with BERT model |
CN112101091A (en) * | 2020-07-30 | 2020-12-18 | 咪咕文化科技有限公司 | Video classification method, electronic device and storage medium |
CN112101091B (en) * | 2020-07-30 | 2024-05-07 | 咪咕文化科技有限公司 | Video classification method, electronic device and storage medium |
WO2022227292A1 (en) * | 2021-04-29 | 2022-11-03 | 苏州大学 | Action recognition method |
CN117671357A (en) * | 2023-12-01 | 2024-03-08 | 广东技术师范大学 | Pyramid algorithm-based prostate cancer ultrasonic video classification method and system |
Also Published As
Publication number | Publication date |
---|---|
KR101713189B1 (en) | 2017-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106897675B (en) | Face living body detection method combining binocular vision depth characteristic and apparent characteristic | |
KR101713189B1 (en) | Tensor Divergence Feature Extraction System based on HoG and HOF for video obejct action classification | |
Karaman et al. | Fast saliency based pooling of fisher encoded dense trajectories | |
CN107563345B (en) | Human body behavior analysis method based on space-time significance region detection | |
Sun et al. | Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild | |
CN108062543A (en) | A kind of face recognition method and device | |
CN106980825B (en) | Human face posture classification method based on normalized pixel difference features | |
CN104036296B (en) | A kind of expression of image and processing method and processing device | |
CN111178195A (en) | Facial expression recognition method and device and computer readable storage medium | |
JP2010108494A (en) | Method and system for determining characteristic of face within image | |
CN104794446B (en) | Human motion recognition method and system based on synthesis description | |
Xue et al. | Automatic 4D facial expression recognition using DCT features | |
HN et al. | Human Facial Expression Recognition from static images using shape and appearance feature | |
De Souza et al. | Detection of violent events in video sequences based on census transform histogram | |
Sarhan et al. | HLR-net: a hybrid lip-reading model based on deep convolutional neural networks | |
CN108062559A (en) | A kind of image classification method based on multiple receptive field, system and device | |
CN111428590A (en) | Video clustering segmentation method and system | |
CN104504161A (en) | Image retrieval method based on robot vision platform | |
Jiang et al. | An isolated sign language recognition system using RGB-D sensor with sparse coding | |
KR20210011707A (en) | A CNN-based Scene classifier with attention model for scene recognition in video | |
Lan et al. | The best of both worlds: Combining data-independent and data-driven approaches for action recognition | |
Zhang et al. | Recognizing human action and identity based on affine-SIFT | |
Li et al. | Facial expression recognition using facial-component-based bag of words and PHOG descriptors | |
Shukla et al. | Deep Learning Model to Identify Hide Images using CNN Algorithm | |
Al-agha et al. | Geometric-based feature extraction and classification for emotion expressions of 3D video film |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right |