CN107194365B - Behavior identification method and system based on middle layer characteristics - Google Patents

Behavior identification method and system based on middle layer characteristics Download PDF

Info

Publication number
CN107194365B
CN107194365B CN201710416188.1A CN201710416188A CN107194365B CN 107194365 B CN107194365 B CN 107194365B CN 201710416188 A CN201710416188 A CN 201710416188A CN 107194365 B CN107194365 B CN 107194365B
Authority
CN
China
Prior art keywords
component
candidate
component detector
detector
image sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710416188.1A
Other languages
Chinese (zh)
Other versions
CN107194365A (en
Inventor
桑农
张士伟
高常鑫
李乐仁瀚
邵远杰
王金
况小琴
何翼
皮智雄
宾言锐
都文鹏
舒娟
吴建雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201710416188.1A priority Critical patent/CN107194365B/en
Publication of CN107194365A publication Critical patent/CN107194365A/en
Application granted granted Critical
Publication of CN107194365B publication Critical patent/CN107194365B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a behavior recognition method and a behavior recognition system based on middle-layer characteristics, wherein the method is realized by the following steps: obtaining a set of candidate component detectors from the sequence of sample images; removing B% of the component detectors with weak discrimination ability in the candidate component detector set to obtain a new candidate component detector set; sorting the component detectors from big to small according to the weight of each component detector in the new candidate component detector set, and selecting P component detectors which are sorted in the front as a middle-layer feature extractor of the A-type behavior category; acquiring a middle-layer feature extractor of each behavior category in the behavior categories, combining the behavior categories into word bags, extracting sample middle-layer features of a sample image sequence by using the word bags, and training a classifier by using the sample middle-layer features to obtain a behavior recognition classifier; and inputting the test image sequence into a behavior recognition classifier to obtain the behavior category of the test image sequence. The method has the advantages of strong identification capability, high identification accuracy and strong practicability, and maintains the relevance among the components.

Description

Behavior identification method and system based on middle layer characteristics
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a behavior identification method and system based on middle-layer features.
Background
Behavior recognition technology is a core technology in the application fields of video security monitoring, man-machine interaction, video retrieval and analysis and the like, and is increasingly paid more attention by the industrial and academic circles. But because there are significant disturbances in the behavior in the video, such as motion blur, scale variation, low resolution, background noise, camera motion, and perspective variation, the behavior analysis is very challenging.
The existing method mainly comprises the following two main lines: the first is low-level spatio-temporal local features such as: spatio-temporal interest points, gradient-based features, and trajectory features. Typically, a large number of local descriptors are extracted from a video training set, then a "bag of words" is constructed, and finally a global expression of a behavior is constructed by using a coding technology such as BOW or FV. The second main line is a high-level template-based feature, a plurality of posture or view-angle behavior modes of behaviors of a specific category are selected by a manual mode or a weak supervision method and the like and combined into a behavior extractor, and finally high-level expression of the behaviors is extracted. However, both of these methods have some disadvantages. The first method, although being robust enough to vary within a class, expresses too low a level to express discriminability of motion patterns at higher levels. In contrast, the second class of methods is very good at extracting high level expression, but it has a greater sensitivity to intra-class variation. To balance between the two types of approaches, many scholars propose to express based on discriminant component behavior. Typically, the part detector is trained using parts, and the detector is used to extract mid-level features from the video.
The existing judgment component mining technology needs manual intervention, but a large amount of manpower and material resources are needed when a large amount of video samples are processed, so that the practical level is difficult to achieve; or heuristic rules defined in advance, but this approach loses the correlation between components, so that the minimum component set cannot have the maximum discrimination capability.
Therefore, the existing behavior recognition has the technical problems of weak recognition capability, large manpower and material resources, impracticality and loss of the relevance between the parts.
Disclosure of Invention
In view of the above defects or improvement needs of the prior art, the present invention provides a behavior recognition method and system based on middle-layer features, thereby solving the technical problems of weak recognition capability, large amount of manpower and material resources, impracticality, and loss of correlation between components in the existing behavior recognition.
To achieve the above object, according to an aspect of the present invention, there is provided a behavior recognition method based on a middle layer feature, including:
(1) extracting a spatiotemporal component set D of the class A behavior category and a spatiotemporal component set N of other behavior categories except the class A from the sample image sequence, and training a component detector by using the spatiotemporal component set D and the spatiotemporal component set N to obtain a candidate component detector set;
(2) combining the middle-layer features of the sample image sequence selected by each component detector in the candidate component detector set to obtain candidate feature vectors, and training a selector by using the candidate feature vectors to obtain weight vectors of the selector;
(3) measuring the discrimination capability of each part detector in the candidate part detector set by using the weight vector of the selector, and removing B% of part detectors with weak discrimination capability in the candidate part detector set to obtain a new candidate part detector set;
(4) sorting the component detectors from big to small according to the weight of each component detector in the new candidate component detector set, and selecting P component detectors which are sorted in the front as a middle-layer feature extractor of the A-type behavior category;
(5) acquiring a middle-layer feature extractor of each behavior category in the behavior categories, combining the behavior categories into word bags, extracting sample middle-layer features of a sample image sequence by using the word bags, and training a classifier by using the sample middle-layer features to obtain a behavior recognition classifier;
(6) and inputting the test image sequence into a behavior recognition classifier to obtain the behavior category of the test image sequence.
Further, the specific implementation of the training component detector is as follows: the positive samples are spatiotemporal components in a set D of spatiotemporal components, the negative samples are spatiotemporal components in a set N of spatiotemporal components, and the component detector is trained with one positive sample and a plurality of negative samples for each spatiotemporal component in the set D of spatiotemporal components.
Further, the candidate feature vector fcComprises the following steps:
Figure BDA0001313684290000031
wherein d isiIn order to be the i-th component detector,
Figure BDA0001313684290000032
for the 1 st component detector d1The middle layer features extracted by using the maximum pooling quantization function in the images upsilon in the sample image sequence,
Figure BDA0001313684290000033
for the 2 nd component detector d2The middle layer features extracted by using the maximum pooling quantization function in the images upsilon in the sample image sequence,
Figure BDA0001313684290000034
for the m-th component detector dmThe middle-layer features are extracted by utilizing a maximum pooling quantization function in an image upsilon in a sample image sequence, i is more than or equal to 1 and less than or equal to m, and m represents a candidate component detector set DcThe number of middle component detectors.
Further, the weight vector of the selector is:
Figure BDA0001313684290000035
wherein phic(fc) Representing the selector, w being the weight vector of the selector, b being the bias of the selector, a loss functionC is a penalty factor, ynClass label, x, representing the nth image in the sequence of sample imagesnRepresents the middle level features of the nth image in the sample image sequence, and N represents the total number of images in the sample image sequence.
Further, the specific implementation manner of step (3) is as follows:
using the weight vector of the selector to measure the discriminative power of each component detector in the candidate set of component detectors, and using recursive removal, candidate mid-level feature matrices
Figure BDA0001313684290000037
When k is 1, F is initialized0=FcWhen k > 1, the k-th recursion can be represented as follows:
wherein S isk=[s1,s2…sm],siE {0,1}, indicates the component select flag bit, if si1, the ith component detector is selected, if si0, then the ith component detector is not selected, y represents the class label vector of the sample image sequence, wkRepresents the weight vector after the k-th recursion, Fk-1Represents the candidate mid-level feature matrix after the k-1 recursion, FkRepresenting the candidate mid-level feature matrix after the k-th recursion,
Figure BDA0001313684290000042
representing the vector w according to the weights after the k-th recursionkAnd removing at a rate of τ ═ B%, for a total of H recursions, to obtain a new candidate part detector set.
According to another aspect of the present invention, there is provided a behavior recognition system based on a middle layer feature, including:
the acquisition candidate component detector set module is used for extracting a space-time component set D of the class A behavior category and a space-time component set N of other behavior categories except the class A from the sample image sequence, and training a component detector by using the space-time component set D and the space-time component set N to obtain a candidate component detector set;
the training selector module is used for combining the middle-layer features of the sample image sequence selected by each component detector in the candidate component detector set to obtain candidate feature vectors, and training the selector by using the candidate feature vectors to obtain the weight vectors of the selector;
a removed component detector module, configured to measure the discriminative power of each component detector in the candidate component detector set using the weight vector of the selector, and remove B% of the component detectors in the candidate component detector set that have weak discriminative power to obtain a new candidate component detector set;
the middle-layer feature extractor module is used for sorting the component detectors from big to small according to the weight of each component detector in the new candidate component detector set, and selecting P component detectors which are sorted in the front as the middle-layer feature extractor of the A-type behavior category;
the training classifier module is used for acquiring the middle-layer feature extractors of each behavior category in the behavior categories, combining the middle-layer feature extractors into word bags, extracting the middle-layer features of the samples of the sample image sequence by using the word bags, and training the classifier by using the middle-layer features of the samples to obtain a behavior recognition classifier;
and the behavior recognition module is used for inputting the test image sequence into the behavior recognition classifier to obtain the behavior category of the test image sequence.
Further, the specific implementation of the training component detector is as follows: the positive samples are spatiotemporal components in a set D of spatiotemporal components, the negative samples are spatiotemporal components in a set N of spatiotemporal components, and the component detector is trained with one positive sample and a plurality of negative samples for each spatiotemporal component in the set D of spatiotemporal components.
Further, the candidate feature vector fcComprises the following steps:
Figure BDA0001313684290000051
wherein d isiIn order to be the i-th component detector,
Figure BDA0001313684290000052
for the 1 st component detector d1The middle layer features extracted by using the maximum pooling quantization function in the images upsilon in the sample image sequence,
Figure BDA0001313684290000053
for the 2 nd component detector d2The middle layer features extracted by using the maximum pooling quantization function in the images upsilon in the sample image sequence,
Figure BDA0001313684290000054
for the m-th component detector dmThe middle-layer features are extracted by utilizing a maximum pooling quantization function in an image upsilon in a sample image sequence, i is more than or equal to 1 and less than or equal to m, and m represents a candidate component detector set DcThe number of middle component detectors.
Further, the weight vector of the selector is:
Figure BDA0001313684290000055
wherein phic(fc) Representing the selector, w being the weight vector of the selector, b being the bias of the selector, a loss functionC is a penalty factor, ynClass label, x, representing the nth image in the sequence of sample imagesnRepresents the middle level features of the nth image in the sample image sequence, and N represents the total number of images in the sample image sequence.
Further, the specific implementation manner of the removed component detector module is as follows:
using the weight vector of the selector to measure the discriminative power of each component detector in the candidate set of component detectors, and using recursive removal, candidate mid-level feature matrices
Figure BDA0001313684290000057
When k is 1, F is initialized0=FcWhen k > 1, the k-th recursion can be represented as follows:
Figure BDA0001313684290000061
wherein S isk=[s1,s2…sm],siE {0,1}, indicates the component select flag bit, if si1, the ith component detector is selected, if si0, then the ith component detector is not selected, y represents the class label vector of the sample image sequence, wkRepresents the weight vector after the k-th recursion, Fk-1Represents the candidate mid-level feature matrix after the k-1 recursion, FkRepresenting the candidate mid-level feature matrix after the k-th recursion,representing the vector w according to the weights after the k-th recursionkAnd removing with a removal rate of tau-B%, and performing H recursions to obtain a new candidate part detector set。
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) the invention combines the middle-layer characteristics of the sample image sequence selected by each component detector in the candidate component detector set to obtain the candidate characteristic vector, trains the selector by using the candidate characteristic vector to obtain the weight vector of the selector, comprehensively considers the correlation among the component detectors, and can ensure that the candidate component detector set has stronger discrimination capability on the whole.
(2) The method measures the discrimination capability of each part detector in the candidate part detector set by using the weight vector of the selector, removes B% part detectors with weak discrimination capability in the candidate part detector set to obtain a new candidate part detector set, removes behavior parts without discrimination capability obviously, and selects parts in the new candidate part detector set, so that the method has stronger generalization capability.
(3) The invention obtains a candidate component detector set from a sample image sequence; removing B% of the component detectors with weak discrimination ability in the candidate component detector set to obtain a new candidate component detector set; acquiring a middle layer feature extractor in the new candidate part detector set; obtaining a word bag, extracting sample middle-layer characteristics of the sample image sequence by using the word bag, and training a classifier by using the sample middle-layer characteristics to obtain a behavior recognition classifier; can be applied to the behavior category. The method has the advantages of strong identification capability, high identification accuracy and strong practicability, and maintains the relevance among the components. The invention can excavate the minimum behavior component detector in a weak supervision mode, can well process illumination change, motion blur, camera motion and view angle change, and can more easily meet the requirements of practical application.
(4) Preferably, the complexity of the candidate component detector set is considered, the component detectors which are obviously not discriminable can be removed iteratively by adopting recursive removal, a new candidate component detector set is obtained, and component detector selection in the new candidate component detector set can enable the invention to have stronger generalization capability.
Drawings
Fig. 1 is a flowchart of a behavior recognition method based on middle-layer features according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, a behavior recognition method based on middle layer features includes:
(1) extracting a spatiotemporal component set D of the class A behavior category and a spatiotemporal component set N of other behavior categories except the class A from the sample image sequence, and training a component detector by using the spatiotemporal component set D and the spatiotemporal component set N to obtain a candidate component detector set;
(2) combining the middle-layer features of the sample image sequence selected by each component detector in the candidate component detector set to obtain candidate feature vectors, and training a selector by using the candidate feature vectors to obtain weight vectors of the selector;
(3) measuring the discrimination capability of each part detector in the candidate part detector set by using the weight vector of the selector, and removing B% of part detectors with weak discrimination capability in the candidate part detector set to obtain a new candidate part detector set;
(4) sorting the component detectors from big to small according to the weight of each component detector in the new candidate component detector set, and selecting P component detectors which are sorted in the front as a middle-layer feature extractor of the A-type behavior category;
(5) acquiring a middle-layer feature extractor of each behavior category in the behavior categories, combining the behavior categories into word bags, extracting sample middle-layer features of a sample image sequence by using the word bags, and training a classifier by using the sample middle-layer features to obtain a behavior recognition classifier;
(6) and inputting the test image sequence into a behavior recognition classifier to obtain the behavior category of the test image sequence.
Further, the step (1) further comprises:
(1-1) extracting a spatio-temporal component set from a sample image sequence using dense sampling and multi-scale sampling, first discarding smooth and static spatio-temporal components, and for each spatio-temporal component, extracting an optical flow Histogram (HOF) descriptor, a whitened gradient Histogram (HOG) descriptor, and a motion frame histogram (MBH) descriptor;
(1-2) extracting a spatiotemporal component set D from class A given by the behavior class, extracting a spatiotemporal component set N from other classes except class A of the behavior class, and constructing a candidate component detector set D with expression capability by using a cross-validation clustering strategy in the spatiotemporal component set D and the spatiotemporal component set Nc
(1-3) the positive samples are spatio-temporal components in the spatio-temporal component set D, the negative samples are spatio-temporal components in the spatio-temporal component set N, for each spatio-temporal component in the spatio-temporal component set D, training the component detectors using one positive sample and a plurality of negative samples to obtain trained component detectors, all trained component detectors constituting the candidate component detector set Dc
Preferably, the component detector is a linear discriminant analysis, LDA, or a support vector machine, SVM.
Preferably, dense sampling and multiscale sampling extract a set of spatio-temporal components from a sequence of sample images, 2i×2jAnd i, j is {0,1, 2}, the size of the spatio-temporal component used is 80 × 80 × 20 at each scale, and the corresponding sampling interval is set to 20 × 20 × 10, so that a large number of spatio-temporal components without information are generated, and efficiency is reduced, and therefore some spatio-temporal blocks without motion information need to be removed in advance. For each spatiotemporal component p, its average optical flow strength f is first calculatedpAnd gradient size gpThen for all fp>tfAnd gp>tpSpace-time part of (2), wherein tf=0.6×fmax,tp=0.7×gmax,fmaxAnd gmaxAnd for the maximum optical flow value and the maximum gradient value in the space-time component, the number of clustering centers during cross validation clustering is set to be K which is S/10, wherein S is the total number of component detectors participating in clustering each time, the component detectors with the number of centers larger than 3 are reserved each time, the cross process is detected at the same time, and the first 5 component detectors are detected to form a new center.
Further, the step (2) further comprises:
(2-1) combining the middle-layer features of the sample image sequence selected by each part detector in the candidate part detector set to obtain a candidate feature vector f of an image upsilon in the sample image sequencecComprises the following steps:
Figure BDA0001313684290000091
wherein the content of the first and second substances,
Figure BDA0001313684290000092
for the i-th component detector diIn the image upsilon, the middle-layer features are extracted by utilizing a maximum pooling quantization function, i is more than or equal to 7 and less than or equal to m, and m represents a candidate component detector set DcThe number of middle component detectors;
(2-2) training selector Φ Using candidate feature vectorsc(fc) And obtaining the weight vector of the selector,
Figure BDA0001313684290000093
wherein w is the weight vector of the selector, b is the bias of the selector, the loss function
Figure BDA0001313684290000094
C is a penalty factor, ynClass label, x, representing the nth image in the sequence of sample imagesnRepresenting the middle layer characteristics of the nth image in the sample image sequence, wherein N represents the total number of the images in the sample image sequence;
component detector diWeight w ofiComprises the following steps:
Figure BDA0001313684290000095
Figure BDA0001313684290000096
denotes the i-th component detector diThe weight vector of the selector is composed of the weights of all component detectors in the candidate set of component detectors.
Preferably, C is 1.
Further, the specific implementation manner of step (3) is as follows:
measuring the discrimination capability of each part detector in the candidate part detector set by using the weight vector of the selector, and removing B% of part detectors with weak discrimination capability in the candidate part detector set, wherein B is 3; removing B% of the weak discriminative component detectors in the set of candidate component detectors by recursive removal
Figure BDA0001313684290000101
When k is 1, F is initialized0=FcWhen k > 1, the k-th recursion can be represented as follows:
Figure BDA0001313684290000102
wherein S isk=[s1,s2…sm],siE {0,1}, indicates the component select flag bit, if si1, the ith component detector is selected, if si0, then the ith component detector is not selected, y represents the class label vector for all sample image sequences, wkRepresents the weight vector after the k-th recursion, Fk-1Represents the candidate mid-level feature matrix after the k-1 recursion, FkRepresenting the candidate mid-layer feature matrix after the k-th recursion, setting the component detector removal rate tau to 0.03, namely discarding the 3% component detector with the lowest discrimination in the candidate component detector set each time,
Figure BDA0001313684290000103
representing the vector w according to the weights after the k-th recursionkAnd a new candidate part detector set D obtained by removing at a removal rate of tau and performing H recursions in totalc[SH]。
Preferably, H ═ 3.
Preferably, P is 300.
Preferably, the classifier is an SVM classifier.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (4)

1. A behavior recognition method based on middle layer features is characterized by comprising the following steps:
(1) extracting a spatiotemporal component set D of the class A behavior category and a spatiotemporal component set N of other behavior categories except the class A from the sample image sequence, and training a component detector by using the spatiotemporal component set D and the spatiotemporal component set N to obtain a candidate component detector set;
(2) combining the middle-layer features of the sample image sequence selected by each component detector in the candidate component detector set to obtain candidate feature vectors, and training a selector by using the candidate feature vectors to obtain weight vectors of the selector;
(3) measuring the discrimination capability of each part detector in the candidate part detector set by using the weight vector of the selector, and removing B% of part detectors with weak discrimination capability in the candidate part detector set to obtain a new candidate part detector set;
(4) sorting the component detectors from big to small according to the weight of each component detector in the new candidate component detector set, and selecting P component detectors which are sorted in the front as a middle-layer feature extractor of the A-type behavior category;
(5) acquiring a middle-layer feature extractor of each behavior category in the behavior categories, combining the behavior categories into word bags, extracting sample middle-layer features of a sample image sequence by using the word bags, and training a classifier by using the sample middle-layer features to obtain a behavior recognition classifier;
(6) inputting the test image sequence into a behavior recognition classifier to obtain the behavior category of the test image sequence;
the candidate feature vector fcComprises the following steps:
Figure FDA0002195611230000011
wherein d isiIn order to be the i-th component detector,
Figure FDA0002195611230000012
for the 1 st component detector d1The middle layer features extracted by using the maximum pooling quantization function in the images upsilon in the sample image sequence,
Figure FDA0002195611230000013
for the 2 nd component detector d2The middle layer features extracted by using the maximum pooling quantization function in the images upsilon in the sample image sequence,
Figure FDA0002195611230000021
for the m-th component detector dmThe middle-layer features are extracted by utilizing a maximum pooling quantization function in an image upsilon in a sample image sequence, i is more than or equal to 1 and less than or equal to m, and m represents a candidate component detector set DcThe number of middle component detectors;
the weight vector of the selector is:
Φc(fc)=wfc+b,
wherein phic(fc) Representing the selector, w being the weight vector of the selector, b being the bias of the selector, a loss function
Figure FDA0002195611230000023
C is a penalty factor, ynClass label, x, representing the nth image in the sequence of sample imagesnRepresenting the middle layer characteristics of the nth image in the sample image sequence, wherein N represents the total number of the images in the sample image sequence;
the specific implementation manner of the step (3) is as follows:
using the weight vector of the selector to measure the discriminative power of each component detector in the candidate set of component detectors, and using recursive removal, candidate mid-level feature matrices
Figure FDA0002195611230000024
When k is 1, F is initialized0=FcWhen k > 1, the k-th recursion can be represented as follows:
wk=SVM(Fk-1,y),
Figure FDA0002195611230000025
Fk=Fk-1[Sk]
wherein S isk=[s1,s2…sm],siE {0,1}, indicates the component select flag bit, if si1, the ith component detector is selected, if si0, then the ith component detector is not selected, y represents the class label vector of the sample image sequence, wkRepresents the weight vector after the k-th recursion, Fk-1Represents the candidate mid-level feature matrix after the k-1 recursion, FkRepresenting the candidate mid-level feature matrix after the k-th recursion,
Figure FDA0002195611230000026
representing the vector w according to the weights after the k-th recursionkAnd removing at a rate of τ ═ B%, for a total of H recursions, to obtain a new candidate part detector set.
2. The behavior recognition method based on the middle-layer features as claimed in claim 1, wherein the training component detector is implemented as follows: the positive samples are spatiotemporal components in a set D of spatiotemporal components, the negative samples are spatiotemporal components in a set N of spatiotemporal components, and the component detector is trained with one positive sample and a plurality of negative samples for each spatiotemporal component in the set D of spatiotemporal components.
3. A behavior recognition system based on mid-tier features, comprising:
the acquisition candidate component detector set module is used for extracting a space-time component set D of the class A behavior category and a space-time component set N of other behavior categories except the class A from the sample image sequence, and training a component detector by using the space-time component set D and the space-time component set N to obtain a candidate component detector set;
the training selector module is used for combining the middle-layer features of the sample image sequence selected by each component detector in the candidate component detector set to obtain candidate feature vectors, and training the selector by using the candidate feature vectors to obtain the weight vectors of the selector;
a removed component detector module, configured to measure the discriminative power of each component detector in the candidate component detector set using the weight vector of the selector, and remove B% of the component detectors in the candidate component detector set that have weak discriminative power to obtain a new candidate component detector set;
the middle-layer feature extractor module is used for sorting the component detectors from big to small according to the weight of each component detector in the new candidate component detector set, and selecting P component detectors which are sorted in the front as the middle-layer feature extractor of the A-type behavior category;
the training classifier module is used for acquiring the middle-layer feature extractors of each behavior category in the behavior categories, combining the middle-layer feature extractors into word bags, extracting the middle-layer features of the samples of the sample image sequence by using the word bags, and training the classifier by using the middle-layer features of the samples to obtain a behavior recognition classifier;
the behavior recognition module is used for inputting the test image sequence into the behavior recognition classifier to obtain the behavior category of the test image sequence;
the candidate feature vector fcComprises the following steps:
Figure FDA0002195611230000031
wherein d isiIn order to be the i-th component detector,
Figure FDA0002195611230000032
for the 1 st component detector d1The middle layer features extracted by using the maximum pooling quantization function in the images upsilon in the sample image sequence,for the 2 nd component detector d2The middle layer features extracted by using the maximum pooling quantization function in the images upsilon in the sample image sequence,
Figure FDA0002195611230000041
for the m-th component detector dmThe middle-layer features are extracted by utilizing a maximum pooling quantization function in an image upsilon in a sample image sequence, i is more than or equal to 1 and less than or equal to m, and m represents a candidate component detector set DcThe number of middle component detectors;
the weight vector of the selector is:
Φc(fc)=wfc+b,
wherein phic(fc) Representing the selector, w being the weight vector of the selector, b being the bias of the selector, a loss function
Figure FDA0002195611230000043
C is a penalty factor, ynClass label, x, representing the nth image in the sequence of sample imagesnRepresenting the middle layer characteristics of the nth image in the sample image sequence, wherein N represents the total number of the images in the sample image sequence;
the specific implementation manner of the removed part detector module is as follows:
using the weight vector of the selector to measure the discriminative power of each component detector in the candidate set of component detectors, and using recursive removal, candidate mid-level feature matricesWhen k is 1, F is initialized0=FcWhen k > 1, the k-th recursion can be represented as follows:
wk=SVM(Fk-1,y),
Figure FDA0002195611230000045
Fk=Fk-1[Sk]
wherein S isk=[s1,s2…sm],siE {0,1}, indicates the component select flag bit, if si1, the ith component detector is selected, if si0, then the ith component detector is not selected, y represents the class label vector of the sample image sequence, wkRepresents the weight vector after the k-th recursion, Fk-1Represents the candidate mid-level feature matrix after the k-1 recursion, FkRepresenting the candidate mid-level feature matrix after the k-th recursion,
Figure FDA0002195611230000046
representing the vector w according to the weights after the k-th recursionkAnd removing at a rate of τ ═ B%, for a total of H recursions, to obtain a new candidate part detector set.
4. A mid-level feature-based behavior recognition system as defined in claim 3, wherein the training component detector is implemented as: the positive samples are spatiotemporal components in a set D of spatiotemporal components, the negative samples are spatiotemporal components in a set N of spatiotemporal components, and the component detector is trained with one positive sample and a plurality of negative samples for each spatiotemporal component in the set D of spatiotemporal components.
CN201710416188.1A 2017-06-06 2017-06-06 Behavior identification method and system based on middle layer characteristics Expired - Fee Related CN107194365B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710416188.1A CN107194365B (en) 2017-06-06 2017-06-06 Behavior identification method and system based on middle layer characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710416188.1A CN107194365B (en) 2017-06-06 2017-06-06 Behavior identification method and system based on middle layer characteristics

Publications (2)

Publication Number Publication Date
CN107194365A CN107194365A (en) 2017-09-22
CN107194365B true CN107194365B (en) 2020-01-03

Family

ID=59877941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710416188.1A Expired - Fee Related CN107194365B (en) 2017-06-06 2017-06-06 Behavior identification method and system based on middle layer characteristics

Country Status (1)

Country Link
CN (1) CN107194365B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268834B (en) * 2017-12-25 2021-09-28 西安电子科技大学 Behavior identification method based on behavior component space-time relation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103500340A (en) * 2013-09-13 2014-01-08 南京邮电大学 Human body behavior identification method based on thematic knowledge transfer
CN106022300A (en) * 2016-06-02 2016-10-12 中国科学院信息工程研究所 Traffic sign identifying method and traffic sign identifying system based on cascading deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103500340A (en) * 2013-09-13 2014-01-08 南京邮电大学 Human body behavior identification method based on thematic knowledge transfer
CN106022300A (en) * 2016-06-02 2016-10-12 中国科学院信息工程研究所 Traffic sign identifying method and traffic sign identifying system based on cascading deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Action Recognition by Learning Mid-level Motion Features;Alireza Fathi等;《2008 IEEE Conference on Computer Vision and Pattern Recognition》;20080805;全文 *
Boosted Exemplar Learning for Action;Tianzhu Zhang等;《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》;20110731;第21卷(第7期);第853-866页 *
Learning Mid-Level Features For Recognition;Y-Lan Boureau等;《2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition》;20100805;第2559-2566页 *
基于特征表示的行为识别方法研究;陈飞飞;《中国博士学位论文全文数据库 信息科技辑》;20160715(第7期);第40-50页 *

Also Published As

Publication number Publication date
CN107194365A (en) 2017-09-22

Similar Documents

Publication Publication Date Title
Mao et al. What can help pedestrian detection?
CN106127197B (en) Image saliency target detection method and device based on saliency label sorting
CN110738160A (en) human face quality evaluation method combining with human face detection
CN107622280B (en) Modularized processing mode image saliency detection method based on scene classification
Jiang et al. Social behavioral phenotyping of Drosophila with a 2D–3D hybrid CNN framework
CN110599463A (en) Tongue image detection and positioning algorithm based on lightweight cascade neural network
CN106203374A (en) A kind of characteristic recognition method based on compressed sensing and system thereof
CN111814690A (en) Target re-identification method and device and computer readable storage medium
Sun et al. Brushstroke based sparse hybrid convolutional neural networks for author classification of Chinese ink-wash paintings
Awang et al. Vehicle counting system based on vehicle type classification using deep learning method
Chen et al. Multi-modality gesture detection and recognition with un-supervision, randomization and discrimination
Naseer et al. Pixels to precision: features fusion and random forests over labelled-based segmentation
Rodrigues et al. Evaluation of Transfer Learning Scenarios in Plankton Image Classification.
CN108596244A (en) A kind of high spectrum image label noise detecting method based on spectrum angle density peaks
CN107194365B (en) Behavior identification method and system based on middle layer characteristics
Moller et al. Active learning for the classification of species in underwater images from a fixed observatory
CN109492702A (en) Pedestrian based on sorting measure function recognition methods, system, device again
CN106934339B (en) Target tracking and tracking target identification feature extraction method and device
CN110689066B (en) Training method combining face recognition data equalization and enhancement
Dong et al. Scene-oriented hierarchical classification of blurry and noisy images
Boudhane et al. Optical fish classification using statistics of parts
Abdulmunem et al. 3D GLOH features for human action recognition
CN113903004A (en) Scene recognition method based on middle-layer convolutional neural network multi-dimensional features
CN114463574A (en) Scene classification method and device for remote sensing image
CN110555342B (en) Image identification method and device and image equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200103

Termination date: 20210606