Pedestrian based on the adaptive sub-space learning algorithm in visual angle recognition methods and system again
Technical field
The present invention relates to a kind of methods of computer vision field, and in particular, to one kind is empty based on the adaptive son in visual angle
Between learning algorithm pedestrian recognition methods and system again.
Background technique
With the continuous development of information technology, Intelligent treatment terminal has become to popularize very much, the acquisition of multi-medium data
Also become increasingly to facilitate.In face of the multi-medium data of magnanimity, how intellectual analysis is carried out to them, accomplish to use by oneself, be
Community service has become the important subject of computer vision field.Target detection technique, target following technology and mesh
Mark identification technology etc. all obtains huge development, and for detection, tracking and the identification technology of pedestrian due to its important reality
The concern of many researchers has been obtained with value.In the fields such as security protection and family endowment, we are often paid close attention to for specific pedestrian
Long-term locking tracking problem, this is related to multiple technologies such as pedestrian detection, pedestrian tracking.And when pedestrian is under a camera
It disappears, when he again appears under another camera, it is therefore desirable to be able to it identifies the pedestrian and continues to track, this
It is related to pedestrian's weight identification technology.Pedestrian identifies that the target to be realized is will to detect in two non-overlapping cameras again
Target connects, to realize the relay tracking across camera.But due to different camera configurations, the position placed, field
Scape is different, and there are different degrees of color change and Geometrical changes for the pedestrian image for causing under different cameras, along with complexity
Monitoring scene under, exist between pedestrian it is different degrees of block so that the pedestrian under different cameras identifies that problem becomes again
It is more intractable.Current pedestrian identifies that, mainly for the matching between picture, there are no utilize video information, the algorithm of mainstream again
Two major classes can be divided into: the pedestrian's macroscopic features matching algorithm extracted based on low-level image feature, and the feature based on metric learning
Matching algorithm.First kind algorithm is dedicated to extracting pedestrian's feature that is more robust, having discrimination, to improve of pedestrian's appearance
With accuracy rate.And the second class algorithm is dedicated to study to more reasonable feature space, with reduce with a group traveling together due to posture,
Feature difference caused by visual angle etc. changes.First kind method does not need training sample, therefore convenient for promoting the use of, it require that
Many and diverse characteristic Design, and cause the changed factor of pedestrian's macroscopic features excessively complicated under reality, it is difficult to find
The pervasive feature for having discrimination.The content that the invention patent is studied belongs to the second class algorithm, and target is to utilize training data
More preferably proper subspace is obtained, so that with a group traveling together's feature closer to special without same pedestrian in new proper subspace
Sign farther away from.
Passing through a large amount of literature search, it has been found that existing metric learning algorithm is mainly the transformation to mahalanobis distance,
Target is to learn a kind of eigentransformation matrix, so that transformed feature is more in line with ideal feature distribution (i.e. same a group traveling together
Feature distribution closer to, different pedestrians feature farther away from).Alexis Mignon et al. was in 2012
In Internaltional Conference on Computer Vision and Pattern Recogintion
“PCCA:A New Approach for Distance Learning from Sparse Pairwise Constraints”
In one text, proposition learns to obtain a lower-dimensional subspace using training data, and the training sample that acceptance of the bid is set in this space is to full
The ideal feature distribution of foot (adjust the distance less than one threshold value, without the feature samples of same pedestrian by the feature samples of the same pedestrian
To greater than the threshold value).This method is suitable for high-dimensional feature space, and can also obtain not in the case where training sample is less
Wrong effect.Internaltional Conference on Computer of the Wei-Shi Zheng et al. in 2011
" Person Re-identification by Probabilistic in Vision and Pattern Recogintion
In a Relative Distance Comparison " text, a kind of metric learning algorithm based on triple input, target are proposed
Be so that belong to the feature samples of same a group traveling together to the distance between be less than distance between the feature samples pair for belonging to different pedestrians
Maximization.But this method has more limitation (triple) to input data, and under high dimensional feature input condition
Processing speed is slower.
Chinese patent literature CN103500345A, open (bulletin) day 2014.01.08 disclose a kind of based on measurement
Pedestrian's weight recognizer of study, the invention carry out pedestrian's re-examination by using newly-designed Smooth Regularization distance metric model
Card, has fully considered covariance matrix offset issue in model.Have the advantages that not needing complicated Optimized Iterative process.
But this method does not account for the variation that the pedestrian image under different camera visual angles corresponds to different illumination, visual angle etc., because
This obtained measurement is also not optimal.
Summary of the invention
For the defects in the prior art, the object of the present invention is to provide one kind is calculated based on the adaptive sub-space learning in visual angle
The pedestrian of method recognition methods and system again, can sufficiently excavate influence of the different cameras to pedestrian's feature, and be pointedly
Each camera learns corresponding transformation relation and is expert at so that influence of the camera to pedestrian's macroscopic features is minimized
The characteristic matching stage that people identifies again can only focus on the difference of pedestrian's macroscopic features, to greatly improve what pedestrian identified again
Accuracy rate.
According to an aspect of the present invention, a kind of pedestrian based on the adaptive sub-space learning algorithm in visual angle is provided to identify again
Method, the method cut out only to include the rectangular image of single pedestrian or from raw video image mesh by tracking result
Mark rectangle frame and be used as input picture, extract feature vector over an input image, and by data set be divided into training dataset with
Test data set learns to obtain transformation matrix, test on training dataset according to the adaptive sub-space learning algorithm in visual angle
The transformation matrix obtained on data set using study carries out distance calculating and pedestrian identifies again.
Described method includes following steps:
Step 1): carrying out feature extraction to input picture using feature extraction algorithm, obtain feature vector set, feature to
Duration set is further divided into training dataset and test data set again;
Step 2) learns each camera to obtain the adaptive sub-space transform matrix in visual angle on training dataset,
The process for wherein learning mapping matrix is realized by optimization loss function;
The eigenmatrix of all test images is mapped to corresponding subspace first in test data set by step 3),
Feature vector after being mapped, and carry out pedestrian on this basis and identify again.
Further, in step 2), shown in the loss function such as formula (1):
Wherein: LA,LBIt is the mapping matrix for needing to learn, LAFor compensation camera A to pedestrian's macroscopic features under the camera lens to
Measure bring variation, LBChange for compensation camera B to pedestrian's appearance features vector bring under the camera lens, all training samples
This is all to occur in pairs, the feature vector under camera A are as follows: { xi, i=1,2 ..., Ntrain, the feature vector under camera B
Are as follows: { yi, i=1,2 ..., Ntrain, the feature of corresponding position corresponds to same a line under different cameras in two characteristic sets
People, that is, xiWith yiCorresponding to same a group traveling together;| S |, | D | respectively indicate positive sample to i.e. with a group traveling together's feature to and negative sample pair
Number;λ, μA,μBFor the parameter of each significance level in regulation loss function;||·||FThe Frobenius model of homography
Number;
Loss function in formula (1) can in the illumination of camera, in the case that Jiao Alto variation is not especially complex
Good effect is obtained, but linear transformation operation can only be carried out to the feature vector under each camera, in order to preferably suitable
Complicated actual scene is answered, nonlinear transformation is introduced by kernel function, to bring more flexibilities, Neng Gougeng to model
The macroscopic features of pedestrian itself is restored well;The method of the introducing kernel function is as follows:
Feature vector is calculated in the distance of nuclear space by following formula:
Wherein: φ (xi),φ(yj) be nuclear space feature vector,It is the mapping matrix of corresponding nuclear space;
On the basis of formula (2), the loss function in formula (1) is generalized to nuclear space, because of the dimension of nuclear space
It is very high, it cannot be directly rightLearnt, therefore introduces transformation matrix QA,QBTo indicatePhysical relationship is as follows:
Wherein:For nuclear space
The matrix of feature vector composition;
Thus, it is as follows in the loss function of nuclear space:
Wherein:Be nuclear space withFor the loss function of parameter, KA=φ (A)Tφ(A),KB=
φ(B)Tφ (B), the mark operation of tr () representing matrix, T are matrix transposition symbols, and X indicates that in addition to diagonal entry be zero,
Remaining element is all one square matrix;It can prove that formula (4) is about QA,QBConvex function, therefore use simple gradient descent method
Converge to optimal solution;The method that the gradient descent method optimizes formula (4) is as follows:
First respectively to QA,QBDerivation is carried out, following result is obtained:
Wherein: l is loss function, KA,KB,QA,QB, X with it is corresponding in formula (4);
On this basis to QA,QBIt is iterated update, updates rule are as follows:
Wherein: l is loss function, ηA,ηBFor the step-length that iteration updates, obtained by cross validation;T is the number of iterations.
Further, in step 3), the pedestrian identify again refer to will test data concentrate camera A under it is any
The corresponding feature vector of one image set of eigenvectors corresponding with all images under camera B is carried out apart from calculating, and
It is ranked up from small to large according to distance, the image for coming foremost is considered as matched same a line under different cameras
People;
Specifically, the pedestrian identifies again, include the following steps:
3.1) it is concentrated in test data, by all rows under the characteristics of image of the first man under camera A and camera B
The feature of people carries out obtaining the first row data M of distance matrix M apart from calculating1;
3.2) repeat step 3.1), until all pedestrians under camera A all carried out with pedestrian under camera B feature away from
From comparing, and obtain distance matrix M2,M3,...,Mi,j, wherein Mi,jIndicate i-th of pedestrian in A with j-th of pedestrian's in B
Characteristic distance;
Every a line of M is sorted from small to large, comes the image in the corresponding B of distance of i-th bit, i.e., with the row institute in A
The matched image of correspondence image i-th, wherein come first row is most matched image.
It is highly preferred that shown in the distance calculating method such as formula (7):
Wherein:φ(Atest),φ(Atrain) respectively correspond
The set that test set and training set are formed in the feature vector of nuclear space in camera A;Correspondingly, φ (Btest),φ
(Btrain) correspond respectively to the set that test set and training set are formed in the feature vector of nuclear space in camera B;QA,QBIt is step
The mapping matrix that rapid 2) middle school acquistion is arrived;ei,ejRespectively indicate i-th, j element be one, remaining be all zero column vector.
According to another aspect of the present invention, a kind of pedestrian based on the adaptive sub-space learning algorithm in visual angle is provided to know again
Other system the system comprises sequentially connected characteristic extracting module, subspace mapping matrix study module and again identifies mould
Block;Wherein:
The system comprises: the adaptive sub-space learning module in characteristic extracting module, visual angle and pedestrian identify mould again
Block, in which:
The characteristic extracting module, input are original pedestrian images, and the pedestrian which inputs each schemes
As carrying out feature extraction, d dimensional feature vector is obtained;In all pedestrians, a certain number of pedestrian images are randomly selected as instruction
Practice data acquisition system, and using their corresponding features as the input of subspace mapping matrix study module;Remaining pedestrian image is made
For in test data set;
The subspace mapping matrix study module, input are the training data set of characteristic extracting module output,
For being adaptively that each camera learns to obtain optimal mapping matrix, so that transformed feature vector meets as far as possible
Desired characteristics distribution, it may be assumed that for the feature vector with a group traveling together apart from small, different pedestrian's feature vectors distances are big;The module exports
The transformation matrix Q arrivedA,QB;
The heavy identification module, the module are handled in test data set, are learnt using subspace mapping matrix
The transformation matrix Q that module obtainsA,QBTest data set image is carried out apart from calculating, and will be with certain a group traveling together under A camera most
Pedestrian under similar B camera is as pedestrian's weight recognition result output.
Compared with prior art, the present invention have it is following the utility model has the advantages that
Traditional metric learning often makees identical transformation to the picture feature under different cameras, so that transformed spy
Sign space meets ideal feature distribution as far as possible, and (ideal distribution refers to that the characteristic distance for belonging to same a group traveling together is closer, and belongs to
The characteristic distance of different pedestrians is farther out).It is contemplated that the pedestrian image under different camera visual angles correspond to different illumination,
Visual angle can not excavate the different respective characteristics of camera using identical transformation matrix to different cameras, therefore learn
Obtained transformation space is also not optimal.Based on this, the present invention is proposed using the adaptive sub-space learning algorithm in visual angle
Recognition methods, it further considers that different cameras has different characteristics on the basis of traditional measure learning algorithm, and
The conversion characteristics of different cameras is made up using different transformation (linear or non-linear).Pass through this thought, the present invention
It can learn to obtain the optimal mapping relationship of each pair of camera more flexiblely, so that the feature after transformation under different cameras is more
The closely ideal feature distribution of adjunction.Experimental result in pedestrian's weight identification mission confirms the effective of method proposed by the present invention
Property.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention,
Objects and advantages will become more apparent upon:
Fig. 1 is the flow chart of one embodiment of the invention;
Fig. 2 is the adaptive sub-space learning algorithm flow chart in one embodiment of the invention visual angle;
Fig. 3 is that one embodiment of the invention illustrates the adaptive sub-space learning algorithm in visual angle better than traditional metric learning calculation
The schematic diagram of method;
Fig. 4 is that one embodiment of the invention personage identifies several groups of rows to be matched randomly selected in common data set again
People's image;
Fig. 5 is the visualization recognition effect figure of one embodiment of the invention method, and first is classified as image to be matched, other column
For the feature extracted using the present invention, after carrying out characteristic matching, ten matching image before the ranking obtained, second is classified as according to this
The most matching image that the method for invention obtains;
Fig. 6 be sub-space learning algorithm proposed by the invention, when being identified applied to personage again and other methods it is accurate
Rate compares figure.
Fig. 7 is system structure diagram in one embodiment of the invention.
Specific embodiment
The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field
Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field
For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention
Protection scope.
As shown in Figure 1, a kind of pedestrian's recognition methods again based on the adaptive sub-space learning algorithm in visual angle, the method with
Only comprising single pedestrian rectangular image or target rectangle frame is cut out from raw video image by tracking result as defeated
Enter image, extracts feature vector over an input image, and data set is divided into training dataset and test data set, instructing
Practice and learn to obtain transformation matrix according to the adaptive sub-space learning algorithm in visual angle on data set, study is utilized in test data set
Obtained transformation matrix carries out distance calculating and pedestrian identifies again.Specifically, including the following steps:
Step 1): carrying out feature extraction to every image that data are concentrated, obtain d dimensional feature vector, all features to
Amount is further randomly selected out a part and is used as training dataset, remaining to be used as test data set;
The step can be realized using the method that the prior art is recorded, for example utilize document " Large Scale
Metric Learning from Equivalence Constraints (carrying out extensive metric learning from equity constraint) "
(Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M. , &Bischof, H. < Computer Vision
And Pattern Recognition >, 2012) in method carry out feature extraction;
In the present embodiment, above method specific implementation is done as described below:
An input picture is given, is first 128 × 48 by size adjusting, is then slided on the image with 16 × 8 window
Dynamic, the size of stepping is 8 × 8, and the zonule that 128 × 48 complete image can be divided into 90 16 × 8 in this way is (interregional
There is overlapping);
Extract Lab, HSV histogram and LBP textural characteristics respectively in each zonule, wherein Lab, HSV is to each
The histogram of 24 dimensions is all extracted in channel, and LBP is the uniform LBP histogram of 59 dimensions.Available one 203 dimension of fritter each in this way
Feature vector;
The feature vector that each fritter extracts once is stitched together in order, obtains the complete feature vector of image,
The dimension of final feature vector is 18270.
In the present embodiment, in order to reduce the redundancy of information, while arithmetic speed is improved, further uses PCA algorithm will
Feature vector carries out dimensionality reduction, and the dimension after dimensionality reduction is 34.
Step 2): on training dataset, study obtains the adaptive sub-space transform matrix in visual angle;
The visual angle, which adaptively refers to, learns a specific transformation matrix to each camera, so that transformed figure
As feature between different cameras being consistent property, thus improve more flexiblely weight recognition effect.Due to feelings in actual scene
The complexity of condition introduces kernel function and carrys out mould to preferably overcome influence of the transformation of camera introducing to pedestrian's macroscopic features
Quasi- nonlinear transformation, illustrates as shown in Figure 3 to different cameras using different transformation matrixs, and introduces kernel function and come
Simulate the advantage of nonlinear transformation.
As shown in Fig. 2, specifically learning for the flow chart of the sub-space learning algorithm proposed in one embodiment of the invention
Journey is following (parameter being related to below is not particularly illustrated, and please refers to summary of the invention):
It 2.1), will for some data set (such as: VIPER illustrates the data set part samples pictures as shown in Figure 4)
Data are divided into two groups, and every group of picture comprising all pedestrians, VIPER shares 612 couples of pedestrians, so first group includes 612
To the wherein piece image of pedestrian, and second group includes another image, and the same pedestrian putting in order in two groups is identical;
A part of pedestrian's data will be selected in the data set being divided into group as training dataset (such as: selecting 316 pairs of rows in VIPER at random
All pictures of people are as training sample), it is remaining as test data set (merely with the feature of training sample in step 2);
It 2.2), can not be directly right since the dimension of nuclear space may be infinityIt optimizes, therefore, Wo MenlingTo be converted to QA,QBIt optimizes.Initialize QA,QBFor unit matrix, setting is received
Hold back loss threshold epsilon=10 of judgement-5;
2.3) loss function is calculated according to formula (4);
2.4) Q is calculated according to formula (5)A,QBGradient;
2.5) Q is updated according to formula (6)A,QB;
2.6) updated Q is utilizedA,QB, loss function l is calculated according to formula (4), if Δ l > ε, goes to step
2.4, otherwise it is judged to restraining, exports corresponding QA,QB。
Step 3): learn obtained Q in step 2)A,QBOn the basis of, in test data set, enterprising every trade people identifies again;Tool
Body implementation method is as follows:
3.1) it is concentrated in test data, by all rows under the characteristics of image of the first man under camera A and camera B
The feature of people carries out obtaining the first row data M of distance matrix M apart from calculating according to formula (7)1.It is with VIPER data set
Example, since test set has 316 pedestrians, so M1Include 316 range data.
3.2) repeat step 3.1) until all pedestrians under camera A all carried out with pedestrian under camera B feature away from
From comparing, and obtain distance matrix M2,M3,...,M316, the matrix of 316 × 316 sizes is finally obtained, wherein Mi,jIt indicates in A
The characteristic distance of i-th of pedestrian and j-th of pedestrian in B;
Every a line of M is sorted from small to large, comes the image in the corresponding B of distance of i-th bit, is exactly that this method provides
With the matched image of row corresponding image i-th in A, wherein come first row is most matched image.
As shown in fig. 7, the present invention also provides a kind of, and the sub-space learning adaptive based on visual angle is calculated based on above-mentioned method
Pedestrian's weight identifying system of method is known the system comprises: characteristic extracting module, adaptive subspace mapping matrix module and again
Other module, in which:
The system comprises: the adaptive sub-space learning module in characteristic extracting module, visual angle and pedestrian identify mould again
Block, in which:
The characteristic extracting module, input are original pedestrian images, and the pedestrian which inputs each schemes
As carrying out feature extraction, d dimensional feature vector is obtained;In all pedestrians, a certain number of pedestrian images are randomly selected as instruction
Practice data acquisition system, and using their corresponding features as the input of subspace mapping matrix study module;Remaining pedestrian image is made
For in test data set;
The subspace mapping matrix study module, input are the training data set of characteristic extracting module output,
The module is adaptively that each camera learns to obtain optimal mapping matrix, and with obtained QA,QBTo training dataset
In feature carry out eigentransformation so that transformed feature vector meet as far as possible desired characteristics distribution (with the feature of a group traveling together
Vector distance is smaller, and different pedestrian's feature vectors are apart from larger);
The pedestrian image that each inputs is expressed as a d dimensional feature vector by the characteristic extracting module;
The heavy identification module, the module are handled in test data set, the transformation matrix Q obtained using studyA,
QBFeature Mapping is carried out to test data set, the feature after mapping is carried out apart from calculating according to formula (7), and will be with camera
Pedestrian under A under the most like camera B of certain a group traveling together is as pedestrian's weight recognition result output.
In the present embodiment, to some pedestrian in camera A, according to the sequence of distance from small to large to camera B
In pedestrian be ranked up, come in the B of foremost pedestrian as the matching result with the pedestrian in camera A, output identification
As a result.
The technology that above-mentioned modules specifically use is corresponding with each section of the above method, repeats no more again.
As shown in figure 5, being before the ranking that an embodiment obtains ten matching image, first is classified as image to be matched, behind
What each column were followed successively by that the present embodiment provides ranks the first to ten matched matching images, and it is actual that wherein dotted line frame, which outlines,
With image, it can be seen that the method that the present embodiment is proposed can be good at carrying out the identification and matching of same a group traveling together.
As shown in fig. 6, being embodiment figure (ILIDS compared with the heavy recognition accuracy of non-adaptive sub-space learning
Data set), in which: SDALF is the extraction that color, Texture eigenvalue are carried out based on symmetry, and all kinds of Fusion Features are carried out
Personage knows method for distinguishing again;Metric learning is then compared threshold value at a distance from local auto-adaptive and combined by SVMML, is overcome single
The disadvantage that threshold value causes discrimination lower;KISSME proposes a kind of quick metric learning method from the angle of statistical inference, no
Need iteration optimization;KLFDA is then the improvement point that the principle based on covariance between minimizing covariance in class, maximizing class proposes
The method of class result;PCCA proposition learns to obtain a lower-dimensional subspace using training data, the instruction that acceptance of the bid is set in this space
Practice sample to meeting ideal feature distribution;PRDC is then to learn more preferably to measure, so that belonging to the feature samples of same a group traveling together
To the distance between be less than and belong to the maximization of distance between the feature samples pair of different pedestrians.Our Linear
Kernel and Our RBF Kernel is that the present embodiment accuracy rate result (while testing the effect of linear and nonlinear RBF core
Fruit).It can be seen that the present embodiment is similar to other methods on recognition accuracy, and the accuracy rate of the method for the present invention converges to 1
Speed faster.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned
Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow
Ring substantive content of the invention.