CN105095863A - Similarity-weight-semi-supervised-dictionary-learning-based human behavior identification method - Google Patents

Similarity-weight-semi-supervised-dictionary-learning-based human behavior identification method Download PDF

Info

Publication number
CN105095863A
CN105095863A CN201510414039.2A CN201510414039A CN105095863A CN 105095863 A CN105095863 A CN 105095863A CN 201510414039 A CN201510414039 A CN 201510414039A CN 105095863 A CN105095863 A CN 105095863A
Authority
CN
China
Prior art keywords
dictionary
sample
video
represent
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510414039.2A
Other languages
Chinese (zh)
Other versions
CN105095863B (en
Inventor
张向荣
焦李成
孙志豪
马文萍
侯彪
白静
马晶晶
冯婕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201510414039.2A priority Critical patent/CN105095863B/en
Publication of CN105095863A publication Critical patent/CN105095863A/en
Application granted granted Critical
Publication of CN105095863B publication Critical patent/CN105095863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a similarity-weight-semi-supervised-dictionary-learning-based human behavior identification method. With the method, a problem of low human behavior identification rate of the existing supervision method in the prior art can be solved. The identification method comprises: (1), an inputted data set is divided into test samples and training samples; (2), local feature detection is carried out on all samples and local features with the labeled samples are selected randomly to obtain an initialized dictionary; (3), according to the initialized dictionary, dictionary learning is carried out by using a semi-supervised method; (4), group sparse coding is carried out on all samples by using the learned dictionary to obtain a coding matrix of each sample; (5), vectorization is carried out on the coding matrix of each sample to obtain a final expression; and (6), testing sample classification is carried out by using the final expression of each sample and a sparse representation classification method to complete human behavior identification in the testing samples. Therefore, discrimination of dictionary learning is enhanced; the human behavior identification rate is improved; and the method can be used for target detection in a video.

Description

Based on the Human bodys' response method of the semi-supervised dictionary learning of similarity weights
Technical field
The invention belongs to mode identification technology, the recognition methods of particularly target person behavior in video, can be used for target detection in video.
Background technology
Human bodys' response refers to the behavioural information identifying target in video sequence, for work for the treatment of is afterwards prepared, it comprises detect relevant target visual information from video sequence, and express by a kind of suitable mode, finally explain that these information are to realize learning and identifying the behavior of people.
In recent years, without supervising and having supervision dictionary learning to be successfully applied to Images Classification and Activity recognition field.In Human bodys' response field, whether their difference uses the video sequence of label, does not wherein use the label information of video without supervision dictionary learning, and has supervision dictionary learning just contrary.The follow-up works such as identification are carried out eventually through the dictionary learnt.Following step is divided into for there being supervision dictionary learning:
The first step, obtains local feature: utilize local feature detecting device, and as Harris3D detects son, Hessian detects son, and Cuboid detects son etc., automatically detects interested region in video, and is described with corresponding descriptor;
Second step, obtain initialization dictionary: utilize K average that all video local feature descriptions symbol is carried out cluster, thus obtain several cluster centres, and these cluster centres are exactly so-called vision keyword, the number of cluster centre or be called word bag large I in advance by artificially setting.
3rd step, acquisition dictionary: solve objective function, generally comprise two steps repeated, namely solves code coefficient and dictionary learning hockets, until end condition reaches.
Can find out and have the dictionary learning of supervision to use the label information of video sequence relative to without supervision dictionary learning, and the different supervision dictionary learning methods that has just is how to use label information.But because the label of target obtains and needs the manpower and materials of at substantial in real life, the video in real life is often without label.And have supervision dictionary learning method also not consider without exemplar information.
2014, the people such as Y.Sun are organizing on sparse basis, introduce the group sparse constraint item of a weighting, the object of this bound term is the coding making of a sort dictionary atom participate in video as much as possible, thus propose a kind of have more identification have supervision dictionary learning method, the method takes full advantage of the information of exemplar, but do not use the information without label, specifically see SunY, LiuQ, TangJ, etal.Learningdiscriminativedictionaryforgroupsparserepre sentation. [J] .IEEETransactionsonImageProcessing, 2014, 23 (9): 3816-3828.
Although said method can obtain the dictionary having more identification, improve accuracy of identification, the deficiency of the method is also obvious: its consideration has marker samples, does not consider the information of unmarked sample, does not utilize the information of sample fully; And the difficulty in fact often having exemplar to obtain is very large, the sample without label but easily can obtain and exist in a large number, how fully to extract and to utilize in a large number without the information of exemplar, becoming the key point in this field.
Summary of the invention
The object of the invention is to a kind of Human bodys' response method proposing semi-supervised dictionary learning based on similarity weights, with the information by extracting without label video, improving Human bodys' response precision.
Technical thought of the present invention is: introduce without label video, and obtain the dictionary that has more identification thus obtain the coding of each video, apply it in Human bodys' response, implementation step comprises as follows:
(1) input comprises the sets of video data of c class behavior, and comprising training dataset and test data set, training dataset is by n lthe individual video data with class label and n uindividual without label video data composition, test data set is by n tindividual tape test video data composition, each video only contains a kind of behavior as a sample;
(2) local feature of each video data is extracted: utilize the Harris angular-point detection method of Space-time domain to carry out local characteristic region detection to each video, the histogram of gradients characteristic sum light stream histogram feature of video is extracted at the local characteristic region extracted, and these the two kinds of features obtained are spliced, obtain the local feature of behavior in each video;
(3) concentrating from training sample, obtaining initialization dictionary D by carrying out stochastic sampling to the local feature of each class video sample (0)∈ R d × m, wherein: d represents the dimension of sample local feature, m represents the number of dictionary atom;
3a) suppose that the local feature of training sample i-th class video sample is wherein: n irepresent that the i-th class training sample has the number of exemplar, i=1,2 ..., c, c represent the classification number of video sample;
3b) to the local feature X of the i-th class video sample of training sample icarry out the initialization classification dictionary that stochastic sampling obtains the i-th class the all initialization classification dictionaries obtained are carried out splicing and obtains initialization dictionary wherein: d represents the dimension of local feature, b represents the atom number of every class initialization classification dictionary, and m is the atom number of initialization dictionary, i.e. m=c*b.
(4) the weight matrix A encoded is configured to (t)∈ R m × n, wherein: n represents number and the n=n of all training samples l+ n u, t=0,1 ..., Τ max, Τ max represents maximum iteration time, and the weight vectors of corresponding sample is shown in each list of weight matrix;
(5) the dictionary D of the t time iteration acquisition is used (t), encoded by the local feature of objective function to l video sample optimized below, obtain the encoder matrix of the t time iteration of l video sample
min B l ( t ) 1 2 | | Y l - D ( t ) B l ( t ) | | F 2 + λ 1 | | B l ( t ) | | 1 , 1 + λ 2 | | d i a g ( A · l ( t ) ) B l ( t ) | | 2 , 1
Wherein, Y lrepresent the local feature of l video sample, l=1,2 ...., n, weight matrix A (t)l row, || || frepresent F norm, || || 1,11,1 norm of representing matrix namely presentation code matrix p capable, || || 1represent 1 norm of vector, || || 2,12,1 norm of representing matrix, above formula Section 1 represents the reconstructed error item that video sample is encoded, to encoder matrix sparsity constraints item, be group sparse constraint item, the dictionary atom that this group sparse constraint item participates in coding in order to constraint comes from of a sort classification dictionary, λ 1sparse constraint item parameter, λ 2it is group sparse constraint item parameter;
(6) upgrade by the objective function optimized below the dictionary D that dictionary obtains the t+1 time iteration (t+1):
min D ( t + 1 ) &Sigma; l = 1 n 1 2 | | Y l - D ( t + 1 ) B l ( t ) | | F 2 + &lambda; 3 &Sigma; i < j &Sigma; j = 1 c | | ( D i ( t + 1 ) ) T D j ( t + 1 ) | | F 2
Wherein, the similarity constraint item to classification dictionary, in order to increase the identification between classification dictionary, () trepresent transpose operation, λ 3it is the parameter of similarity constraint item;
(7) repeat step (4)-(6), until objective function converges or reach maximum iteration time, obtain final dictionary D;
(8) use final dictionary D, obtained the encoder matrix B of each video sample by the objective function optimizing following formula g:
min B g 1 2 | | Y g - DB g | | F 2 + &gamma; | | B g | | 2 , 1 , g = 1 , 2 , ... , h ,
Wherein, || || frepresent F norm, || || 2,1represent 2,1 norm, above formula Section 1 is the reconstructed error item of video sample coding, || B g|| 2,1to encoder matrix B ggroup sparse constraint item, h represents number and the h=n of all video samples l+ n u+ n t, γ is the parameter of group sparse constraint item;
(9) to the local feature of all video samples, according to the encoder matrix B obtained in step (7) g, apply maximum pond algorithm, each video sample be expressed as the coding vector z of a m dimension g:
z g = &lsqb; z ^ 1 , z ^ 2 , .. z ^ k . , z ^ m &rsqb; T , k = 1 , 2 , .. , m
Wherein, z ^ k = m a x ( | B g | k 1 | , | B g | k 2 | , .. , | B g | k q | , ... , | B g | k K | ) , G=1,2 ..., h, q=1,2 ..., K, B g|kqrepresent g video sample encoder matrix B grow k q row, K represents the local feature number of this video;
(10) the coding vector composition rarefaction representation classifying dictionary of all training samples is utilized be that the coding vector of all training samples of i forms by class label, i is the class label i=1 of dictionary, 2 ..., c, c are classification sum, n lthe sum having exemplar in training sample, namely represent that the i-th class has the number of exemplar;
(11) according to classifying dictionary to the coding vector of each test sample book that step (9) obtains carry out sparse coding, obtain the code coefficient β of test sample book on classifying dictionary by following formula:
min &beta; { | | y ^ - D ^ &beta; | | 2 2 + &eta; | | &beta; | | 1 } ,
Wherein, || || 2represent 2 norms of vector, || || 1represent 1 norm of vector, η is that η span is 0 ~ 1 for Equilibrium fitting error and openness parameter of encoding;
(12) code coefficient β is utilized to calculate each test sample book successively at every class classifying dictionary on residual error
r i ( y ^ ) = | | y ^ - D ^ i &beta; i | | 2 2 / | | &beta; i | | 2 , i = 1 , ... , c
Wherein, β ithat current test sample book is at the i-th category dictionary on code coefficient;
(13) according to residual error size to test sample book classify, find the class categories dictionary producing least residual by this dictionary label i as the label of current test sample book, complete the classification to all test sample books successively.
The present invention compared with prior art, has the following advantages:
1, the semi-supervised dictionary learning method of the present invention's use, relative to having supervision dictionary learning method and without supervision dictionary learning method, take into full account the information without exemplar of a large amount of existence, when there being exemplar little, more can embody it relative to having supervision dictionary learning method and the advantage without supervision dictionary learning method, the situation of more realistic application.
2, the present invention's weight vector of using k near neighbor method to obtain without exemplar, is introduced the local spatial information of feature, enhances the identification that final dictionary represents video sample by weight vectors.
Accompanying drawing explanation
Fig. 1 of the present inventionly realizes schematic diagram;
Fig. 2 is the sample frame image that Weizmann data centralization used during the present invention tests intercepts;
Fig. 3 is the sample frame image that KTH data centralization used during the present invention tests intercepts;
Fig. 4 is the classification confusion matrix figure of the present invention on Weizmann data set;
Fig. 5 is the classification confusion matrix figure of the present invention on KTH data set.
Embodiment
With reference to Fig. 1, the present invention mainly comprises three parts: dictionary learning, representation of video shot, visual classification.Introduce the implementation step of this three part below respectively:
One, dictionary learning
Step 1: the division all video samples being carried out to training sample and test sample book.
1a) input all video samples and their the true tag i of Human bodys' response data set, according to data set author suggestion method choose wherein n video sample as training sample, the remaining h-n of a data centralization video sample as test sample book, wherein, i ∈ { 1,2, ..., c}, i represent the class label of this video sample, c represents the class label sum of video sample, and h represents the number of all video samples;
1b) according to the true tag i of data centralization training sample, be choose w video sample the video sample of i as the known sample of true tag from true tag, namely have exemplar; Using the sample of video sample remaining in training sample as true tag the unknown, namely without exemplar; The number obtaining exemplar is w*c, and the number without exemplar is n-w*c.
Step 2: input the true tag i having exemplar in all training samples, test sample book and training sample, obtain the local feature of each video sample.
Only containing a kind of human body behavior in each video sample, the Harris angular-point detection method of Space-time domain is utilized to carry out local characteristic region detection to the behavior in video, the histogram of gradients characteristic sum light stream histogram feature of behavior in video is extracted at the local characteristic region extracted, and these the two kinds of features obtained are spliced, obtain the local feature set of a video sample:
X a i = &lsqb; x 1 , x 2 , ... , x q , ... , x b a i &rsqb; &Element; R d &times; b a i ,
Wherein, represent the local feature set having the i-th class a video sample of exemplar in training sample, a=1,2 ..., n i, n irepresent that the i-th class training sample has the number of exemplar, x qrepresent q local feature of this video sample, represent the local feature number having the i-th class a video sample of exemplar in training sample, d represents the dimension of local feature.
Step 3: utilizing allly in training sample has the local feature of label video sample to set up initialization dictionary D (0).
3a) set the local feature set of training sample i-th class video sample as
3b) to the local feature set X of the i-th class video sample of training sample icarry out stochastic sampling, obtain the initialization classification dictionary of the i-th class the all initialization classification dictionaries obtained are carried out splicing and obtains initialization dictionary wherein: i=1,2 ..., c, d represent the dimension of local feature, and b represents the atom number of every class initialization classification dictionary, and m is the atom number of initialization dictionary, i.e. m=c*b.
Step 4: the weight matrix A constructing the t time iteration (t).
4a) there is exemplar for each in training sample, obtain its weight vectors in accordance with the following steps:
Following formula 4a1) is utilized to obtain this video sample weight vectors p element
Wherein, p=1,2 ..., m, l=1,2 ..., n;
4a2) calculate weight vectors each element value obtain the weight vectors of this video sample
Each weight vectors without label video sample 4b) in calculation training sample:
4b1) obtain each local feature of this video sample at the t time iteration dictionary D with k near neighbor method (t)in k neighbour's dictionary atom, and obtain this video sample neighbour matrix L ∈ R m × Kp capable s row L ps:
Wherein, p=1,2 ..., m, s=1,2 ..., K, K represent the local feature number in this video;
4b2) calculate each element value L of neighbour's matrix L ps, obtain neighbour's matrix L of this video sample;
4b3) matrix L is sued for peace by row, obtain a column vector and be set to LL;
4b4) according to the column vector LL obtained, following formula is utilized to obtain this video sample weight vectors p element
Wherein p=1,2 ..., m, δ are scale parameter, LL prepresent p element of column vector, max (LL) represents the greatest member value asking column vector LL;
4b5) calculate weight vectors each element value obtain the weight vectors of this video sample
4c) calculate weight matrix A (t)∈ R m × nin the weight vectors of training sample corresponding to each row obtain the weight matrix A of all training samples (t), wherein, n represents that all training samples have the number of exemplar, and has t=0,1 ..., max, max represent maximum iteration time, the weight vectors of the corresponding training sample of each row of weight matrix.
Step 5: use the dictionary D that the t time iteration obtains (t), each training sample is encoded.
5a) for l video sample Y in training sample l, obtain solving this video sample the t time iteration encoder matrix objective function, as shown in formula <1>:
min B l ( t ) 1 2 | | Y l - D ( t ) B l ( t ) | | F 2 + L | | B l ( t ) | | 1 , 1 + &lambda; 2 | | d i a g ( A &CenterDot; l ( t ) ) B l ( t ) | | 2 , 1 , - - - < 1 >
Wherein l=1,2 ...., n, weight matrix A (t)l row, || || frepresent F norm, || || 1,1represent 1,1 norm, || || 2,1represent 2,1 norm, in formula, Section 1 represents the reconstructed error item that video sample is encoded, to encoder matrix sparsity constraints item, be group sparse constraint item, the dictionary atom that this group sparse constraint item participates in coding in order to constraint comes from of a sort classification dictionary, λ 1sparse constraint item parameter, λ 2it is group sparse constraint item parameter;
5b) optimize formula <1>, obtain the encoder matrix of this video sample the t time iteration
5b1) by the capable q row of the r of formula <1> to l video sample encoder matrix in encoder matrix carry out differentiate, obtain following formula:
&part; f &part; B l | r q ( t ) = &Sigma; j &NotEqual; r B l | r q ( t - 1 ) ( d j ( t ) &CenterDot; d r ( t ) ) - Y &CenterDot; r l &CenterDot; d r ( t ) + | | d r ( t ) | | 2 2 B l | r q ( t ) + &lambda; 1 &part; &part; B l | r q ( t ) | | B l | r q ( t ) | | 1 + &lambda; 2 A r l ( t ) B l | r q ( t ) | | B l | r &CenterDot; ( t ) | | 2 - - - < 2 >
Wherein, f = 1 2 | | Y l - D ( t ) B l ( t ) | | F 2 + L | | B l ( t ) | | 1 , 1 + &lambda; 2 | | d i a g ( A &CenterDot; l ( t ) ) B l ( t ) | | 2 , 1 , || || 2represent 2 norms of vector, represent vectorial 2 norms square, represent two vectorial inner product operations, represent the capable q row of the r of the t time iteration l video sample encoder matrix, represent that the r of the t time iteration l video sample encoder matrix is capable, q represents q local feature of video sample, represent dictionary D (t)r row, r=1,2 ..., m;
5b2) make formula <2> equal zero, obtain following formula:
B l | r q ( t ) = { ( 1 - &lambda; 2 A r l ( t ) | | v q &prime; &lambda; 1 | | 2 ) v q &prime; - &lambda; 1 | | d r ( t ) | | 2 2 v q &prime; > &lambda; 1 0 v q &prime; < &lambda; 1 , - - - < 3 >
Wherein v' q=max (v q, 0), &nu; q = Y &CenterDot; r l &CenterDot; d r ( t ) - &Sigma; j &NotEqual; r B l | r q ( t - 1 ) ( d j ( t ) &CenterDot; d r ( t ) ) ;
5b3) calculate the t time iteration encoder matrix in each element value obtain the encoder matrix of this video sample
Step 6: upgrade dictionary, obtain iteration dictionary each time.
6a) obtain the iteration dictionary D solving the t+1 time (t+1)objective function, as shown in formula <4>:
min D ( t + 1 ) &Sigma; l = 1 n 1 2 | | Y l - D ( t + 1 ) B l ( t ) | | F 2 + &lambda; 3 &Sigma; i < j &Sigma; j = 1 c | | ( D i ( t + 1 ) ) T D j ( t + 1 ) | | F 2 - - - < 4 >
Wherein, the similarity constraint item to classification dictionary, in order to increase the identification between classification dictionary, () trepresent transpose operation, represent the classification dictionary of the t+1 time iteration i-th class, λ 3it is the parameter of similarity constraint item;
6b) by formula <4> to the t+1 time iteration dictionary D (t+1)in r dictionary atom carry out differentiate and make its result equal zero, obtaining following formula:
d r ( t + 1 ) = ( &nu; ( r , r ) + &lambda; 3 M &CenterDot; M T ) - 1 u ( : , r ) - - - < 5 >
Wherein, r ∈ 1,2 ..., m}, i ∈ 1,2 ..., c}, local dictionary M is dictionary D (t)reject i-th class classification dictionary rear formed local dictionary, namely () trepresent transpose operation, () -1the inversion operation of representing matrix, u ( : , r ) = v v ( : , r ) - D ( t ) &CenterDot; v ( : , r ) + v ( r , r ) &CenterDot; d r ( t ) , v = &Sigma; l B l ( t ) &CenterDot; ( B l ( t ) ) T , &nu; &nu; = &Sigma; l Y l &CenterDot; ( B l ( t ) ) T ;
6c) by calculating the t+1 time iteration dictionary D (t+1)in each dictionary atom obtain the iteration dictionary D of the t+1 time (t+1).
Step 7: repeat step (4)-(6), until objective function converges or reach maximum iteration time, obtains final dictionary D.
Two, Video coding
Step 8: use final dictionary D, obtains the encoder matrix B of each video sample by the objective function optimizing following formula g:
min B g 1 2 | | Y g - DB g | | F 2 + &gamma; | | B g | | 2 , 1 g = 1 , 2 , ... , h
Wherein, || || frepresent F norm, || || 2,1represent 2,1 norm, above formula Section 1 is the reconstructed error item of video sample coding, || B g|| 2,1to encoder matrix B ggroup sparse constraint item, γ is the parameter of group sparse constraint item.
Step 9: by each encoder matrix vectorization, obtains the final presentation code vector of each sample.
9a) utilize maxpooling algorithm to the encoder matrix B of each video sample will obtained in step 7 gevery a line get maximal value:
z ^ k = m a x ( | B g | k 1 | , | B g | k 2 | , .. , | B g | k i | , ... , | B g | k K | ) ,
Wherein, g=1,2 ..., h, k=1,2 ..., m, B g|kirepresent g video sample encoder matrix B grow k i-th arrange, K represents the local feature number of this video;
9b) by the maximal value of the every a line of encoder matrix form a column vector: k=1,2 ..., m, each like this video sample is just expressed as the coding vector z of a m dimension *.
Three, visual classification
Step 10: utilize training sample to set up classifying dictionary
If there is the number of exemplar to be N in training sample l=w*c, utilizes all coding vectors by exemplar in training sample to form classifying dictionary represent the i-th class class categories dictionary, i=1,2 ..., c, m are dictionary atom numbers, and c is dictionary classification sum.
Step 11: utilize classifying dictionary successively to each test sample book coding vector that step (10) obtains carry out sparse coding, obtain the code coefficient β of test sample book on classifying dictionary:
min &beta; { | | y ^ - D ^ &beta; | | 2 2 + &eta; | | &beta; | | 1 } ,
Wherein, || || 2represent 2 norms of vector, || || 1represent 1 norm of vector, η is that η span is 0 ~ 1 for Equilibrium fitting error and openness parameter of encoding.
Step 12: utilize code coefficient to calculate the residual error of each test sample book on each class categories dictionary successively
r i ( y ^ ) = | | y ^ - D ^ i &beta; i | | 2 2 / | | &beta; i | | 2 , i = 1 , ... , c ,
Wherein, β ithat current test sample book is at the i-th class class categories dictionary on code coefficient.
Step 13: according to the residual error of test sample book on each class categories dictionary, test sample book is classified.
According to the residual error of test sample book on each class categories dictionary find the class categories dictionary producing least residual by this class categories dictionary class mark i as the class mark of test sample book, i ∈ 1,2 ..., c}.
Effect of the present invention can be further illustrated by following emulation experiment:
1. simulated conditions
Emulation experiment, at AMDA6-6310CPU, dominant frequency 1.80GHz, the MATLAB7.14 on internal memory 4G, Windows7 platform is carried out.This experiment utilizes the inventive method to test respectively on Weizmann data set and KTH data set, and and Y.Sun, Q.Liu, J.Tang, D.Tao, the supervision dictionary learning method that has in LearningDiscriminationDictionaryforGroupSparseRepresenta tion, ImageProcessing. literary composition contrasts.The data set that experiment uses is Weizmann data set and KTH data set.Wherein:
Weizmann data set comprises 93 videos, and all videos come from 9 different human actions, and everyone demonstrates different 10 behavior acts, i.e. c=10, and the part sample frame sectional drawing of this data set video as shown in Figure 2.These actions comprise: walk, run, jump, side, bend, waveone, wavetwo, pjump, jac, and skip, owing to there being a people to demonstrate twice walk, this three behaviors of runandskip, from the walk of this people, remove a video sample in runandskip three behaviors respectively, use remaining 90 video samples to carry out emulation experiment.Select the behavior act of wherein 5 people as training sample in emulation experiment, n=50, remaining video sample is as test sample book h-n=40;
KTH data set comprises 600 videos, and the part sample frame sectional drawing of this data set video as shown in Figure 3.This data set is completed under 4 different scenes by 25 people, comprises 6 behavior acts, i.e. c=6, respectively: walk, jog, run, box, hwavandhclap, the background of video is fixing, only has the change that in sub-fraction video, visual angle has some slight.According to the suggestion of author in emulation experiment, choose the behavior act of wherein 8 people as training sample, i.e. the behavior act of 11-18 people, n=192; Choose the behavior act of wherein 10 people as test sample book, namely the 2nd, 3,5-10, the behavior act of 22 people, h-n=216.
2. emulate content and result
Emulation 1, Weizmann data set uses the inventive method carry out identifying the emulation experiment of test.
Along with the every class in training sample has the change of exemplar number w, by the inventive method with existingly have measure of supervision to identify Weizmann data set, its result is as table 1.
Table 1. the present invention and existingly have the classification results of measure of supervision on Weizmann data set to contrast
As can be seen from Table 1, recognition effect of the present invention is better than existingly having measure of supervision on the whole.Existing have measure of supervision only to introduce having the reconstructed error of exemplar and having the information of exemplar when dictionary learning, and the inventive method not only introduces the reconstructed error to there being exemplar, also add sparsity constraints and classification dictionary similarity constraint, and the information simultaneously introduced without exemplar, thus the recognition correct rate that can promote test sample book.The results show, the inventive method can obtain the dictionary having more identification, thus effectively can represent human body behavior act, and reaches good Human bodys' response effect on the basis effectively represented.
As w=4, use the inventive method to the confusion matrix figure of Weizmann data set classification results, as shown in Figure 4.As can be seen from Figure 4, all human body behavior acts of the inventive method to Weizmann data centralization all achieve good discrimination.
Emulation 2, along with the every class in training sample has the change of exemplar number w, use the inventive method and existingly have measure of supervision to identify KTH data set, its result is as table 2.
Table 2. the present invention and existingly have the classification results of measure of supervision on KTH data set to contrast
Can to find out from table 2, the recognition correct rate of the present invention on KTH data set is better than existing measure of supervision, improve the accuracy of nearly 1% than the method for existing supervision, this proves further, and the dictionary learning method used in the present invention effectively can ensure the correct identification for test sample book.
As w=8, use the inventive method to the confusion matrix figure of KTH data set classification results, as shown in Figure 5.As can be seen from Figure 5, the present invention all has good discrimination for KTH data centralization major part human body behavior act, and is not very high for the discrimination of this behavior act of run, and this is due to the more similar reason of run with jog two kinds of behavior acts.The study of dictionary is carried out owing to present invention employs semi-supervised dictionary learning method, introduce more sample discriminant information, and coded representation is carried out to the local feature in video, make final representation of video shot have more identification, thus ensure that the higher recognition capability to human body behavior.

Claims (2)

1., based on the Human bodys' response method of the semi-supervised dictionary learning of similarity weight, comprise the steps:
(1) input comprises the sets of video data of c class behavior, and comprising training dataset and test data set, training dataset is by n lthe individual video data with class label and n uindividual without label video data composition, test data set is by n tindividual test video data composition, each video only contains a kind of behavior as a sample;
(2) local feature of each video data is extracted: utilize the Harris angular-point detection method of Space-time domain to carry out local characteristic region detection to each video, the histogram of gradients characteristic sum light stream histogram feature of video is extracted at the local characteristic region extracted, and these the two kinds of features obtained are spliced, obtain the local feature of behavior in each video;
(3) concentrating from training sample, obtaining initialization dictionary D by carrying out stochastic sampling to the local feature of each class video sample (0)∈ R d × m, wherein: d represents the dimension of sample local feature, m represents the number of dictionary atom;
3a) suppose that the local feature of training sample i-th class video sample is wherein: n irepresent that the i-th class training sample has the number of exemplar, i=1,2 ..., c, c represent the classification number of video sample;
3b) to the local feature X of the i-th class video sample of training sample icarry out the initialization classification dictionary that stochastic sampling obtains the i-th class the all initialization classification dictionaries obtained are carried out splicing and obtains initialization dictionary wherein: d represents the dimension of local feature, b represents the atom number of every class initialization classification dictionary, and m is the atom number of initialization dictionary, i.e. m=c*b.
(4) the weight matrix A encoded is configured to (t)∈ R m × n, wherein: n represents number and the n=n of all training samples l+ n u, t=0,1 ..., Τ max, Τ max represents maximum iteration time, and the weight vectors of corresponding sample is shown in each list of weight matrix;
(5) the dictionary D of the t time iteration acquisition is used (t), encoded by the local feature of objective function to l video sample optimized below, obtain the encoder matrix of the t time iteration of l video sample
min B l ( t ) 1 2 | | Y l - D ( t ) B l ( t ) | | F 2 + &lambda; 1 | | B l ( t ) | | 1 , 1 + &lambda; 2 | | d i a g ( A . l ( t ) ) B l ( t ) | | 2 , 1
Wherein, Y lrepresent the local feature of l video sample, l=1,2 ...., n, weight matrix A (t)l row, || || frepresent F norm, || || 1,11,1 norm of representing matrix namely presentation code matrix p capable, || || 1represent 1 norm of vector, || || 2,12,1 norm of representing matrix, above formula Section 1 represents the reconstructed error item that video sample is encoded, to encoder matrix sparsity constraints item, be group sparse constraint item, the dictionary atom that this group sparse constraint item participates in coding in order to constraint comes from of a sort classification dictionary, λ 1sparse constraint item parameter, λ 2it is group sparse constraint item parameter;
(6) upgrade by the objective function optimized below the dictionary D that dictionary obtains the t+1 time iteration (t+1):
min D ( t + 1 ) &Sigma; l = 1 n 1 2 | | Y l - D ( t + 1 ) B l ( t ) | | F 2 + &lambda; 3 &Sigma; i < j &Sigma; j = 1 c | | ( D i ( t + 1 ) ) T D j ( t + 1 ) | | F 2
Wherein, the similarity constraint item to classification dictionary, in order to increase the identification between classification dictionary, () trepresent transpose operation, λ 3it is the parameter of similarity constraint item;
(7) repeat step (4)-(6), until objective function converges or reach maximum iteration time, obtain final dictionary D;
(8) use final dictionary D, obtained the encoder matrix B of each video sample by the objective function optimizing following formula g:
min B g 1 2 | | Y g - DB g | | F 2 + &gamma; | | B g | | 2 , 1 , g = 1 , 2 , ... , h ,
Wherein, || || frepresent F norm, || || 2,1represent 2,1 norm, above formula Section 1 is the reconstructed error item of video sample coding, || B g|| 2,1to encoder matrix B ggroup sparse constraint item, h represents number and the h=n of all video samples l+ n u+ n t, γ is the parameter of group sparse constraint item;
(9) to the local feature of all video samples, according to the encoder matrix B obtained in step (7) g, apply maximum pond algorithm, each video sample be expressed as the coding vector z of a m dimension g:
z g = &lsqb; z ^ 1 , z ^ 2 , .. z ^ k . , z ^ m &rsqb; T , k = 1 , 2 , .. , m
Wherein, g=1,2 ..., h, q=1,2 ..., K, B g|kqrepresent g video sample encoder matrix B grow k q row, K represents the local feature number of this video;
(10) the coding vector composition rarefaction representation classifying dictionary of all training samples is utilized be that the coding vector of all training samples of i forms by class label, i is the class label i=1 of dictionary, 2 ..., c, c are classification sum, n lthe sum having exemplar in training sample, namely represent that the i-th class has the number of exemplar;
(11) according to classifying dictionary to the coding vector of each test sample book that step (9) obtains carry out sparse coding, obtain the code coefficient β of test sample book on classifying dictionary by following formula:
min &beta; { | | y ^ - D ^ &beta; | | 2 2 + &eta; | | &beta; | | 1 } ,
Wherein, || || 2represent 2 norms of vector, || || 1represent 1 norm of vector, η is that η span is 0 ~ 1 for Equilibrium fitting error and openness parameter of encoding;
(12) code coefficient β is utilized to calculate each test sample book successively at every class classifying dictionary on residual error
r i ( y ^ ) = | | y ^ - D ^ i &beta; i | | 2 2 / | | &beta; i | | 2 , i = 1 , ... , c
Wherein, β ithat current test sample book is at the i-th category dictionary on code coefficient;
(13) according to residual error i=1 ..., the size of c is to test sample book classify, find the class categories dictionary producing least residual by this dictionary label i as the label of current test sample book, complete the classification to all test sample books successively.
2. the Human bodys' response method of the semi-supervised dictionary learning based on similarity weight according to claim 1, the structure weight matrix A wherein described in step (4) (t)∈ R m × n, carry out as follows:
Each weight vectors having label video sample 4a) in calculation training sample set:
Wherein, for l video sample and this video sample is without exemplar, represent the coding vector of this sample p element, p=1,2 ..., the atom number of m, b representation class malapropism allusion quotation, l=1,2 ..., n, i ∈ 1,2 ..., c};
Each weight vectors without label video sample 4b) in calculation training sample set:
4b1) obtain each local feature of this video sample at the t time iteration dictionary D with k near neighbor method (t)in k neighbour's dictionary atom, and obtain this video sample neighbour matrix L ∈ R m × K, its p capable s column element L psfor:
Wherein, p=1,2 ..., m, s=1,2 ..., K, K represent the local feature number in this video;
4b2) each row of neighbour's matrix L is sued for peace, obtain a column vector LL ∈ R m;
4b3) according to the column vector LL that obtains, following formula is utilized to obtain l video sample and this sample is weight vectors without exemplar p element
Wherein p=1,2 ..., m, δ are scale parameter, LL prepresent p element of column vector, max (LL) represents the greatest member value asking column vector LL;
4c) calculate weight matrix A (t)in the weight vectors of training sample corresponding to each row obtain weight matrix A (t).
CN201510414039.2A 2015-07-14 2015-07-14 The Human bodys' response method of semi-supervised dictionary learning based on similitude weights Active CN105095863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510414039.2A CN105095863B (en) 2015-07-14 2015-07-14 The Human bodys' response method of semi-supervised dictionary learning based on similitude weights

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510414039.2A CN105095863B (en) 2015-07-14 2015-07-14 The Human bodys' response method of semi-supervised dictionary learning based on similitude weights

Publications (2)

Publication Number Publication Date
CN105095863A true CN105095863A (en) 2015-11-25
CN105095863B CN105095863B (en) 2018-05-25

Family

ID=54576252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510414039.2A Active CN105095863B (en) 2015-07-14 2015-07-14 The Human bodys' response method of semi-supervised dictionary learning based on similitude weights

Country Status (1)

Country Link
CN (1) CN105095863B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827250A (en) * 2016-03-16 2016-08-03 江苏大学 Electric-energy quality data compression and reconstruction method based on self-adaptive dictionary learning
CN105938544A (en) * 2016-04-05 2016-09-14 大连理工大学 Behavior identification method based on integrated linear classifier and analytic dictionary
CN106056135A (en) * 2016-05-20 2016-10-26 北京九艺同兴科技有限公司 Human body motion classification method based on compression perception
CN106960225A (en) * 2017-03-31 2017-07-18 哈尔滨理工大学 A kind of sparse image classification method supervised based on low-rank
CN107229944A (en) * 2017-05-04 2017-10-03 青岛科技大学 Semi-supervised active identification method based on cognitive information particle
CN107832772A (en) * 2017-09-20 2018-03-23 深圳大学 A kind of image-recognizing method and device based on semi-supervised dictionary learning
CN107862302A (en) * 2017-11-29 2018-03-30 合肥赑歌数据科技有限公司 A kind of human motion detecting system and method based on semi-supervised learning
CN108133232A (en) * 2017-12-15 2018-06-08 南京航空航天大学 A kind of Radar High Range Resolution target identification method based on statistics dictionary learning
CN109034200A (en) * 2018-06-22 2018-12-18 广东工业大学 A kind of learning method indicated based on joint sparse with multiple view dictionary learning
CN109376802A (en) * 2018-12-12 2019-02-22 浙江工业大学 A kind of gastroscope organ classes method dictionary-based learning
CN110472576A (en) * 2019-08-15 2019-11-19 西安邮电大学 A kind of method and device for realizing mobile human body Activity recognition
CN110580488A (en) * 2018-06-08 2019-12-17 中南大学 Multi-working-condition industrial monitoring method, device, equipment and medium based on dictionary learning
CN111414827A (en) * 2020-03-13 2020-07-14 四川长虹电器股份有限公司 Depth image human body detection method and system based on sparse coding features

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605952A (en) * 2013-10-27 2014-02-26 西安电子科技大学 Human-behavior identification method based on Laplacian-regularization group sparse
WO2014056819A1 (en) * 2012-10-12 2014-04-17 Commissariat A L'energie Atomique Et Aux Energies Alternatives Method of classifying a multimodal object
CN104392251A (en) * 2014-11-28 2015-03-04 西安电子科技大学 Hyperspectral image classification method based on semi-supervised dictionary learning
US9292797B2 (en) * 2012-12-14 2016-03-22 International Business Machines Corporation Semi-supervised data integration model for named entity classification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014056819A1 (en) * 2012-10-12 2014-04-17 Commissariat A L'energie Atomique Et Aux Energies Alternatives Method of classifying a multimodal object
US9292797B2 (en) * 2012-12-14 2016-03-22 International Business Machines Corporation Semi-supervised data integration model for named entity classification
CN103605952A (en) * 2013-10-27 2014-02-26 西安电子科技大学 Human-behavior identification method based on Laplacian-regularization group sparse
CN104392251A (en) * 2014-11-28 2015-03-04 西安电子科技大学 Hyperspectral image classification method based on semi-supervised dictionary learning

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105827250A (en) * 2016-03-16 2016-08-03 江苏大学 Electric-energy quality data compression and reconstruction method based on self-adaptive dictionary learning
CN105938544A (en) * 2016-04-05 2016-09-14 大连理工大学 Behavior identification method based on integrated linear classifier and analytic dictionary
CN105938544B (en) * 2016-04-05 2020-05-19 大连理工大学 Behavior recognition method based on comprehensive linear classifier and analytic dictionary
CN106056135A (en) * 2016-05-20 2016-10-26 北京九艺同兴科技有限公司 Human body motion classification method based on compression perception
CN106960225B (en) * 2017-03-31 2020-01-31 哈尔滨理工大学 sparse image classification method based on low-rank supervision
CN106960225A (en) * 2017-03-31 2017-07-18 哈尔滨理工大学 A kind of sparse image classification method supervised based on low-rank
CN107229944A (en) * 2017-05-04 2017-10-03 青岛科技大学 Semi-supervised active identification method based on cognitive information particle
CN107229944B (en) * 2017-05-04 2021-05-07 青岛科技大学 Semi-supervised active identification method based on cognitive information particles
CN107832772A (en) * 2017-09-20 2018-03-23 深圳大学 A kind of image-recognizing method and device based on semi-supervised dictionary learning
CN107862302A (en) * 2017-11-29 2018-03-30 合肥赑歌数据科技有限公司 A kind of human motion detecting system and method based on semi-supervised learning
CN108133232A (en) * 2017-12-15 2018-06-08 南京航空航天大学 A kind of Radar High Range Resolution target identification method based on statistics dictionary learning
CN110580488A (en) * 2018-06-08 2019-12-17 中南大学 Multi-working-condition industrial monitoring method, device, equipment and medium based on dictionary learning
CN110580488B (en) * 2018-06-08 2022-04-01 中南大学 Multi-working-condition industrial monitoring method, device, equipment and medium based on dictionary learning
CN109034200A (en) * 2018-06-22 2018-12-18 广东工业大学 A kind of learning method indicated based on joint sparse with multiple view dictionary learning
CN109376802A (en) * 2018-12-12 2019-02-22 浙江工业大学 A kind of gastroscope organ classes method dictionary-based learning
CN109376802B (en) * 2018-12-12 2021-08-03 浙江工业大学 Gastroscope organ classification method based on dictionary learning
CN110472576A (en) * 2019-08-15 2019-11-19 西安邮电大学 A kind of method and device for realizing mobile human body Activity recognition
CN111414827A (en) * 2020-03-13 2020-07-14 四川长虹电器股份有限公司 Depth image human body detection method and system based on sparse coding features
CN111414827B (en) * 2020-03-13 2022-02-08 四川长虹电器股份有限公司 Depth image human body detection method and system based on sparse coding features

Also Published As

Publication number Publication date
CN105095863B (en) 2018-05-25

Similar Documents

Publication Publication Date Title
CN105095863A (en) Similarity-weight-semi-supervised-dictionary-learning-based human behavior identification method
CN109492099B (en) Cross-domain text emotion classification method based on domain impedance self-adaption
CN110135459B (en) Zero sample classification method based on double-triple depth measurement learning network
CN104966105A (en) Robust machine error retrieving method and system
CN110717431A (en) Fine-grained visual question and answer method combined with multi-view attention mechanism
CN110866542B (en) Depth representation learning method based on feature controllable fusion
CN105913025A (en) Deep learning face identification method based on multiple-characteristic fusion
CN112149421A (en) Software programming field entity identification method based on BERT embedding
CN102422324B (en) Age estimation device and method
CN109190472B (en) Pedestrian attribute identification method based on image and attribute combined guidance
CN111931061B (en) Label mapping method and device, computer equipment and storage medium
CN105205501A (en) Multi-classifier combined weak annotation image object detection method
CN112732921B (en) False user comment detection method and system
CN105334504A (en) Radar target identification method based on large-boundary nonlinear discrimination projection model
CN104298977A (en) Low-order representing human body behavior identification method based on irrelevance constraint
CN106203483A (en) A kind of zero sample image sorting technique of multi-modal mapping method of being correlated with based on semanteme
CN106934055B (en) Semi-supervised webpage automatic classification method based on insufficient modal information
CN111369535B (en) Cell detection method
CN103745233B (en) The hyperspectral image classification method migrated based on spatial information
CN104750875A (en) Machine error data classification method and system
CN109492230A (en) A method of insurance contract key message is extracted based on textview field convolutional neural networks interested
CN104778482A (en) Hyperspectral image classifying method based on tensor semi-supervised scale cutting dimension reduction
CN116415581A (en) Teaching data analysis system based on intelligent education
CN104616005A (en) Domain-self-adaptive facial expression analysis method
CN103942214B (en) Natural image classification method and device on basis of multi-modal matrix filling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant