Based on the infrared behavior recognition methods that engineer's feature and degree of deep learning characteristic self adaptation merge
Technical field
The invention belongs to image procossing and technical field of computer vision, relate to a kind of infrared behavior recognition methods merged based on engineer's feature and degree of deep learning characteristic self adaptation.
Background technology
In recent years, image and the Activity recognition in video become a vital task of computer vision field, and the work such as video monitoring, Video Information Retrieval Techniques, man-machine interaction are all of great significance by the Activity recognition in video.Along with various actions recognizer constantly refreshes the recognition accuracy of each public data collection, the Activity recognition task in video has made great progress.But current most data set is all based on visible light video, the Activity recognition based on infrared video works relatively fewer.
The Activity recognition algorithm of current main flow relates generally to two kinds of description: engineer's Feature Descriptor and description obtained by degree of depth study.
Engineer's feature and typical local description, such as space-time interest points feature (Spatial-TemporalInterestPoint, STIP), histograms of oriented gradients (HistogramofOrientedGradient, HOG), light stream direction histogram (HistogramofOpticalFlow, HOF), intensive track characteristic (DenseTrajectory, DT) etc., it is based on movable information between the texture information of image, vision shape and different frame etc. different behaviors is classified and identifies;Owing to intensive track characteristic comprises the abundant sub-information of description such as HOG, HOF, MBH (MotionBoundaryHistory), become engineer's feature that current recognition accuracy is higher;Lifting along with computing power, the CNN feature extracted by convolutional neural networks is used as image, Activity recognition in video becomes popular research direction in recent years, convolutional neural networks achieves the profound excavation to pictorial information, effective extraction has the information of identification, the 3D convolutional neural networks having been proposed that at present, degree of depth convolutional neural networks, in the models such as dual pathways convolutional neural networks, effect is a dual pathways convolutional neural networks being made up of temporal information passage and spatial information passage most preferably, each more challenging data set all achieves good experiment effect.
But the Activity recognition research work being currently based on infrared video is relatively fewer, in video monitoring works, if running into the relatively low vile weather of the visibility such as rain, mist or nighttime conditions descends, it is seen that light video monitoring just loses meaning.Therefore, infrared video Activity recognition has critically important practical value, is badly in need of a kind of effective infrared video Activity recognition algorithm at present.
Summary of the invention
In view of this, it is an object of the invention to provide a kind of infrared behavior recognition methods merged based on engineer's feature and degree of deep learning characteristic self adaptation, the method takes full advantage of feature and the advantage of infrared image, engineer's feature is improved, and by two sorter networks and a weight learning function, it is weighted the probability output of engineer's tagsort network and degree of deep learning characteristic sorter network merging, the accuracy rate being effectively improved in infrared video Activity recognition.
For reaching above-mentioned purpose, the present invention provides following technical scheme:
A kind of infrared behavior recognition methods merged based on engineer's feature and degree of deep learning characteristic self adaptation, the method comprises the following steps:
S1: by engineer's characteristic module, the intensive track characteristic that original video is improved extracts;
S2: the engineer's feature extracted in step S1 is carried out feature coding;
S3: by CNN characteristic module, to original video image sequence, utilizes a kind of variation optical flow algorithm to extract Optic flow information, obtains corresponding light flow image sequences, as the input of convolutional neural networks;
S4: utilize convolutional neural networks, extracts CNN feature to the light stream graphic sequence obtained in step S3;
S5: data set is divided into training set and test set;To training set data, by weight optimization e-learning weight, utilize study to weight be weighted the probability output of CNN tagsort network and engineer's tagsort network merging, draw optimal weights by contrasting recognition result, and be applied to the classification of test set data.
Further, in step sl, described to the engineer's feature after original video extraction improvement, specifically include: first each frame infrared image is carried out point of interest intensive sampling, then between consecutive image sequence, follow the trail of point of interest, describe addition track correspondence Pictures location gray value weight in son at the track obtained, the track after weighting is described the son intensive track characteristic as improvement.
Further, in step s 2, described is encoded intensive track characteristic after the improvement extracted, and coding carries out in the following ways:
1): Bag-of-Words (BOW)
It is step by step rapid that this feature coding method includes two:
11) visual dictionary is generated: in training set data, extract feature, by clustering algorithm, the feature extracted is clustered;Cluster centre can be regarded as the visual vocabulary in dictionary, and all visual vocabularies form a visual dictionary;
12) with the lexical representation image in dictionary: each feature in image is mapped on certain word of visual dictionary, then add up each visual word occurrence number on an image, image table can be shown as the histogram vectors that a dimension is fixing;
2): FisherVector (FV)
FisherVector coding is a kind of coded system based on Fisher core principle, gauss hybrid models (GMM) is trained by the method for maximal possibility estimation first with training sample, then utilizing gauss hybrid models that the primitive character (such as Dense-Traj) extracted from sample is modeled, the primitive character of sample is encoded into the Fisher vector being easy to study and tolerance by the model parameter that recycling generates;
3): VectorofLocallyAggregatedDescriptors (VLAD)
First the feature extracted in training set is clustered by VLAD coding, obtain code book, then calculate the gap of each word in primitive character and code book, and these differences are added up, finally the accumulative difference of all words is carried out being spliced to form a new vector to represent image.
Further, in step s3, described utilization one variation optical flow algorithm extracts Optic flow information, obtains corresponding light flow image sequences, specifically includes:
S31: when the constant hypothesis of brightness, constant gradient hypothesis and space-time smoothness constraint continuously, it is proposed to the energy functional of improvement;
S32: derive the Euler-Lagrange equation of its correspondence according to energy function, then utilizes Gauss-Seidel or SOR method to obtain light stream vector, obtains corresponding light flow graph.
Further, in step s 4, to the light flow image sequences obtained, utilize convolutional neural networks, extract the output of full articulamentum as CNN feature;CNN adopts multitiered network structure, convolutional layer and sampling layer to be alternately present, and network output layer is the full connected mode of feedforward network.
Further, in step s 5, described data set is divided into training set and test set;Utilize training set data to pass through the study of self adaptation Fusion Module and obtain optimal weights, and optimal weights is applied to test set data classifies, specifically include:
S51: utilize training set data to learn optimal weights:
Self adaptation Fusion Module includes two sorter networks comprising full articulamentum and softmax grader layer and a single node logical function;The input of two sorter networks is engineer's feature and the CNN feature of training set data respectively, and two kinds of features obtain respective probability output P1 and P2 after being input to map network, utilizes single node logical function to calculate the weight Q of correspondence simultaneously;The final probability output of two sorter networks being weighted according to specified weight is merged, and utilizes error backpropagation algorithm to judge identification error, update gradient, it is judged that and export optimal weights;
S52: optimal weights is applied to test set data and classifies:
Engineer's feature test set data extracted and CNN feature are input to corresponding sorter network, obtain corresponding probability output, utilize the probability output of two sorter networks of optimal weights Weighted Fusion obtained in S51, obtain the recognition result of test set data.
The beneficial effects of the present invention is: the method for the invention highlights the infrared video advantage compared with visible light video, Traditional Man design feature and the degree of deep learning characteristic recently quickly grown are conjointly employed in Activity recognition task, innovate Feature Fusion mode, improve the reliability of Activity recognition in infrared video, subsequent video analysis work is of great importance.
Accompanying drawing explanation
In order to make the purpose of the present invention, technical scheme and beneficial effect clearly, the present invention provides drawings described below to illustrate:
Fig. 1 is the dual pathways infrared behavior recognizer schematic diagram of the present invention;
Fig. 2 is that the intensive track characteristic improved extracts schematic diagram;
Fig. 3 is light stream convolutional neural networks feature extraction schematic diagram;
Fig. 4 is self-adaptive features Fusion Model schematic diagram.
Detailed description of the invention
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described in detail.
In the present invention, engineer's characteristic module, the intensive track characteristic after original video is improved extracts, and the feature extracted is encoded;The intensive track characteristic improved is add gray value weight information in description of original intensive track, the space time information of sequence of video images of the intensive track characteristic major embodiment after improvement, highlights the foreground moving information of sequence of pictures;CNN characteristic module, original Infrared video image sequence utilize a kind of variation optical flow algorithm extract Optic flow information, form light flow image sequences, using the light flow graph that extracts as the input of convolutional neural networks, take the feature of the full articulamentum of convolutional neural networks as video CNN Feature Descriptor;First data set is divided into training set and test set by self adaptation Fusion Module;To training set data, weight is learnt by weight learning network, utilize study to weight be weighted the probability output of CNN tagsort network and engineer's tagsort network merging, draw optimal weights by contrasting recognition result, and optimal weights be applied to the classification of test set data.
Fig. 1 is the dual pathways infrared behavior recognizer schematic diagram of the present invention, as it can be seen, the method for the invention specifically includes following steps:
Step S1: to original video extract improve after intensive track characteristic, its particular flow sheet as shown in Figure 2:
S101: each frame infrared image of original video is carried out point of interest intensive sampling;
S102: following the trail of point of interest between consecutive image sequence, the pixel that between two continuous frames, displacement is only small and excessive is all rejected, and finally gives the point of interest track that can characterize action;
S103: describe addition track correspondence Pictures location gray value weight in son at the track obtained, describes the son intensive track characteristic as improvement using the track after weighting.
Original video is resolution be 293 × 256 infrared data collection, totally 12 type of action, each type of action has 50 video samples.Intensive track characteristic is engineer's feature that in current image Classification and Identification task, accuracy rate is higher, but the data set being mostly used in test is all visible data collection.Owing to infrared image can reflect the infrared emanation of each target in image, heat radiation is more strong, gray value is more big, making there is obvious contrast between target and background, based on this feature of infrared image, we add gray value weight on the basis of original intensive track characteristic, the track making gray value high has bigger weight, the track that gray value is less distributes less weight, improves original intensive track characteristic, highlights the advantage of infrared image Activity recognition.
Step S2: the engineer's feature extracted is carried out FisherVector coding.
Step S3: utilize a kind of variation optical flow algorithm that original video image sequence extracts Optic flow information, form light flow image sequences.The data item of this algorithm energy functional is made up of the constant hypothesis of brightness and the constant hypothesis of gradient, adds discontinuous space-time smoothness constraint, has continuous preferably and rotational invariance, calculates speed fast, and precision is high.
Step S4: the picture of the light flow image sequences that step S3 is obtained carries out pretreatment, it is 227 × 227 that dimension of picture adjusts resolution, and it can be used as the input of convolutional neural networks, this convolutional neural networks is made up of five convolutional layers and three full articulamentums, finally taking the feature of second convolutional layer as feature representation, flow chart is as shown in Figure 3.
Convolutional neural networks (ConvolutionalNeuralNet, CNN) it is a kind of Multi-layered Feedforward Networks, convolutional layer and sampling layer as network intermediate layer are alternately present, and network is output as the full connected mode of feedforward network, and the dimension of output layer is the classification number in classification task.Convolutional neural networks directly can automatically learn individual features from original input data, saves the characteristic Design process that general algorithm needs, it is possible to learn and find more effective feature.
Step S5: training set data utilizes self adaptation Fusion Module to learn optimal weights, and is applied to the classification of test set data.The schematic diagram of self adaptation Fusion Model is as shown in Figure 4:
S501: data set is divided into training set and test set;
S502: utilize training set sample data to learn optimal weights: self adaptation Fusion Model includes two sorter networks comprising full articulamentum and softmax grader layer and a single node logical function;The input of two sorter networks is engineer's feature and CNN feature that training set data is extracted respectively, and two kinds of features obtain respective probability output P1 and P2 after being input to map network, single node logical function calculates corresponding weight Q simultaneously;The final probability output of two sorter networks being weighted according to specified weight is merged, and utilizes error backpropagation algorithm to judge identification error, update gradient, it is judged that and export optimal weights;
S503: optimal weights is applied to test set data and classifies: engineer's feature test set sample data extracted and CNN feature are input to corresponding sorter network, obtain corresponding probability output, utilize optimal weights proportion weighted to merge the probability output of two sorter networks, obtain test set data recognition result.
The main purpose of error back propagation (ErrorBackpropagation, BP) algorithm is by by output error anti-pass, giving all unit of each layer by error distribution, thus obtaining the error signal of each layer unit, and then revises the weights of each unit.The learning process of BP algorithm is made up of two processes of back propagation of the forward-propagating of signal Yu error.During forward-propagating, input sample is incoming from input layer, after each hidden layer successively processes, is transmitted to output layer.If the actual output of output layer and desired output are not inconsistent, then proceed to the back-propagation phase of error.Error-duration model be by output error with some form by each hidden layer to input layer successively anti-pass, and error distribution is given all unit of each layer, thus obtaining the error signal of each layer unit, this error signal is namely as the foundation of correction unit weights.In this module, the process that this weights gone round and begun again constantly adjust just makes to finally give optimal weights.
What finally illustrate is, preferred embodiment above is only in order to illustrate technical scheme and unrestricted, although the present invention being described in detail by above preferred embodiment, but skilled artisan would appreciate that, in the form and details it can be made various change, without departing from claims of the present invention limited range.