CN108564116A

CN108564116A - A kind of ingredient intelligent analysis method of camera scene image

Info

Publication number: CN108564116A
Application number: CN201810281581.9A
Authority: CN
Inventors: 闫潇宁
Original assignee: Shenzhen City Soft Wisdom Technology Co Ltd
Current assignee: Shenzhen City Soft Wisdom Technology Co Ltd
Priority date: 2018-04-02
Filing date: 2018-04-02
Publication date: 2018-09-21

Abstract

The present invention discloses a kind of ingredient intelligent analysis method of camera scene image, including：Process 1 dissociates image data input AlexNet deep neural network models progress feature to obtain default dimensional characteristics；The default dimensional characteristics that process 2, receive process 1 export, carry out opposite feature elimination；The output data of process 3, receive process 2, to it into the high correlation filtering of row matrix；Process 4 receives previous step data, using accidental projection forest, all data are divided, each search for is reduced to an acceptable range with the number of the point calculated, then multiple accidental projection trees are established and constitute accidental projection forest, using the synthesis result of forest as final result.The present invention through the above scheme, in the case of guarantee final data scale same level basic with existing method, takes into account the accurate equivalence that the later stage is compared, searched for, this, which is PCA principal component analysis or other methods, to take into account.

Description

A kind of ingredient intelligent analysis method of camera scene image

Technical field

The present invention relates to image processing field, the intelligent analysis method of specifically a kind of iconic element.

Background technology

The ingredient of image generally refers to interest ingredient, is that total size is substantially reduced than original image but in content search, similar Property than in peer application can equivalencing original image a kind of data combination.

In view of more and more industries need to the picture library of hundred million grades of scales carry out a plurality of types of big data analysis (such as Content similarities comparison and interest content search etc.), traditional lookup method is no matter on precision, speed or storage overhead It all increasingly cannot be satisfied demand.In addition, in the research and application in many fields, multiple changes to reflecting things are generally required Amount is largely observed, and is collected mass data and is found rule to carry out analysis.Multivariable large sample undoubtedly can be research and Using providing abundant information, but the workload of data acquisition is also increased to a certain extent, it is often more important that in majority In the case of, there may be correlations between many variables, to increase the complexity of case study, while being brought not to analysis Just.If analyzed respectively each index, what analysis was often isolated, rather than comprehensive.Blindly reducing index can damage Many information are lost, the conclusion of mistake is easy tod produce.Therefore it needs to find a rational method, is reducing the index for needing to analyze Meanwhile reducing the loss that former index includes information to the greatest extent, to achieve the purpose that analyze collected data comprehensively.Due to each There are certain correlativity between variable, it is therefore possible to less overall target respectively it is comprehensive be present in it is each in each variable Category information.PCA (Principal Component Analysis, PCA) principal component analysis just belongs to the method for this kind of dimensionality reduction.

PCA principal component analysis is a kind of statistical method.By orthogonal transformation by one group there may be the variable of correlation turn It is changed to one group of linearly incoherent variable, transformed this group of variable is principal component.Principal Component Analysis is a kind of system of dimensionality reduction Meter method, it by means of an orthogonal transformation, by the relevant former random vector of its component be converted to its component it is incoherent newly with Machine vector, this shows as the covariance matrix of former random vector being transformed into diagonal form battle array on algebraically, geometrically show as by The orthogonal coordinate system of former coordinate system transformation Cheng Xin is allowed to be directed toward the p orthogonal direction that sample point distribution is most opened, then become to multidimensional Amount system carries out dimension-reduction treatment, makes it to be converted into low-dimensional variable system with a higher precision, then appropriate by constructing Low-dimensional system is further converted to unidimensional system by cost function.But existing PCA principal component analysis wind hair has following lack Point：1, process takes extremely long, and 20 minutes need to be taken by carrying out PCA processes to 50000 10000 dimension groups；2, data main component Loss and the balance that has been unable to reach of memory space, compression degree is higher, loss it is more.

Invention content

Therefore, for above-mentioned problem, the present invention proposes a kind of ingredient intelligent analysis method of camera scene image, energy So that raw data set size is substantially reduced, while can keep the key property of former data that new data is made to be searched to greatest extent Rope, than peering when can keep constant with former data, and analysis time is greatly reduced, while reducing memory space and occupying sum number According to loss.

For this purpose, the technical solution adopted in the present invention is, and a kind of ingredient intelligent analysis method of camera scene image, packet It includes：

Process 1 dissociates image data input AlexNet deep neural network models progress feature to obtain default dimension spy Sign；Specifically, AlexNet deep neural network models include five convolutional layers set gradually and three full articulamentums, respectively It is denoted as the first convolutional layer, the second convolutional layer, third convolutional layer, Volume Four lamination, the 5th convolutional layer, the first full articulamentum, second Full articulamentum and the full articulamentum of third, default dimensional characteristics are in the first full articulamentum (AlexNet deep neural network models Layer 6) obtained default dimensional characteristics, wherein default dimensional characteristics preferably select 4096 dimensional features, relative to other dimensions, 4096 dimensions are average up to 99.2% in the precision that opposite feature is eliminated, and are and compare operation in extraction accuracy and follow-up big data One of the reason of balance is best and the present invention uses AlexNet.Second full articulamentum, 256 dimensional features of preferred output.Assuming that One shared N pictures, then process 1 generate 4096*N matrix namely most original high dimension vector matrix；

Image of the application also particular for camera scene, is improved AlexNet deep neural network models, makes It is different from existing model structure.Specifically, in above-mentioned AlexNet deep neural network models, the first convolutional layer, volume Two Lamination, third convolutional layer, Volume Four lamination and the 5th convolutional layer convolution kernel (filter) be adjusted downward to 3x3 from the 5x5 of master.

Preferably, in the first convolutional layer and the 5th convolutional layer, a 1x1 convolution is added respectively before 3x3 convolution kernels Core.In second convolutional layer, third convolutional layer and Volume Four lamination, a 1x1 convolution is all respectively added before and after 3x3 convolution kernels Core.By taking the characteristic pattern (feature map) of an input is 28 × 28 × 192 as an example, it is N2 to enable 3x3 convolutional channels, and 5x5 convolution is logical Road is N3, if calculated so that the second convolutional layer, third convolutional layer and Volume Four lamination are 5x5 convolution kernels, convolution kernel Parameter is 5 × 5 × 192 × N3+5 × 5 × 192 × N3+5 × 5 × 192 × N3, with the second convolutional layer, third convolutional layer and the 4th Convolutional layer is 3x3 convolution kernels to calculate, then convolution nuclear parameter be 3 × 3 × 192 × N2+3 × 3 × 192 × N2+3 × 3 × 192 × N2, and the 1x1 convolution kernels (1x1 forward and backward by 3 × 3 convolution kernels are added in all is added before and after 3 × 3 convolution kernel of the application couple Convolution kernel is denoted as the first 1x1 convolution kernels and the 2nd 1x1 convolution kernels respectively), the port number of the first 1x1 convolution kernels is denoted as N4, and second The port number of 1x1 convolution kernels is denoted as N5, then convolution nuclear parameter reformed into (1 × 1 × 192 × N4+3 × 3 × N4 × N2+1 × 1×N2×N5)+(1×1×192×N4+3×3×N4×N2+1×1×N2×N5)+(1×1×192×N4+3×3×N4 × N2+1 × 1 × N2 × N5), in actual design, the port number of above-mentioned convolution kernel is set by being actually needed, for example, setting 3x3 Convolutional channel N2=128,5x5 convolutional channel N3=32, the first 1x1 convolution kernels channel N4=96, the 2nd 1x1 convolution kernels channel N5 =16, then, it is known that, by said program so that the port number of convolutional layer output and input all reduces, and convolution nuclear parameter is also big It reduces greatly.

The application has not only carried out dimensionality reduction by the above-mentioned different designs scheme to different convolutional layers, has also carried out rising dimension, So that the port number of convolutional layer output and input all reduces, number of parameters is further reduced.It is smaller for common interest target Video image can more capture target signature details while reduce computing overhead (for the 36% of former convolution kernel), while second connects entirely Connect layer and be down to 256 dimensions from the outputs of the 4096 of the first full articulamentum dimensions, make feature extraction speed subsequent step arithmetic speed and Memory space utilization rate promotes 16 times and mean accuracy only declines 5.7%, thus greatly reduces analysis time and storage is empty Between occupy and data damaed cordition generation.

The default dimensional characteristics that process 2, receive process 1 export, carry out opposite feature elimination；

In the method, all sorting algorithms are first trained with n feature.Each dimensionality reduction operation, using n-1 feature To classifier training n times, n new grader is obtained.By the n-1 used in the wrong grader for dividing rate variation minimum in new grader Dimensional feature is as the feature set after dimensionality reduction.Constantly the process is iterated, you can obtain the result after dimensionality reduction.Kth time is repeatedly What is obtained during generation is n-k dimensional feature graders.By selecting maximum fault tolerant rate, (preferably fault tolerant rate takes 10%) it, can obtain reaching the how many a features of specified classification performance minimum needs on selection sort device.

The output data of process 3, receive process 2, to it into the high correlation filtering of row matrix；

High correlation filtering thinks that the information that they include is also shown when two column data variation tendencies are similar.In this way, using A row in similar row can meet machine learning model.For numerical value arrange between similitude by calculate related coefficient come It indicates, the related coefficient of name part of speech row can be indicated by calculating Pearson came chi-square value.Related coefficient is more than some threshold Two row of value only retain a row.

The scheme of the application is especially suitable for scene under camera, because its homogeneity with elevation angle and picture compared with By force, from the angle of classification：Inter- object distance is small, between class distance is big, it is possible to is adjusted to step 1 and be unlikely to leakage and grab Or it accidentally grabs.And in 2,3 steps due between larger class gap can guarantee target signature loss in reduced levels.

Wherein, process 2 plays the part of vital role with process 3, both rejects the repetition in image with priceless value information To save space and retain main contents ingredient to greatest extent, the functional equivalence relative to original image is kept.

Wherein, the difference of process 2 and process 3 and PCA dimensionality reductions essentially consists in：The image vector that PCA is suitable for full type drops Dimension, the vectorial dimensionality reduction that even can be used for other types data.And just for road monitoring camera application scenarios, institute in this method Declined to a great extent with diversity, impurity vector be easier to be rejected in batches by process 2 and 3 (can not there was only 2 or only 3, it is no it will cause Excessively reject (only 2) or redundancy increase (only 3)), while being answered herein with traditional correlation filtering method different step 3 With in scene, can guarantee when lamda values are set as 2.5 the ingredient after process 2 is excessively rejected part in filtering oneself Dynamic progress most reasonable compensation.

Process 4, reception previous step data divide all data using accidental projection forest, will search for every time It is reduced to an acceptable range with the number of the point of calculating, it is gloomy then to establish multiple accidental projection trees composition accidental projections Woods, using the synthesis result of forest as final result.4 receive process 2 of process and process 3 treated data carry out random It divides, is the feasible final step vector reduction process under camera application scenarios that carries out.

Ultra-large matrix operation is designed in view of process 1,4, so process 1 and 4 is placed in the big rule with multiple GPU On mould machine learning server, the task of process 2 and 3 does parallel distribution on each (upper limit 200) edge calculations point.

The present invention is through the above scheme, simultaneous in the case of guarantee final data scale same level basic with existing method The accurate equivalence that the later stage is compared, searched for is cared for, this, which is PCA principal component analysis or other methods, to take into account, such as PCA Though method can be greatly reduced ultimate constituent data scale however compared with initial data, the final data of PCA scans for or compares Right, precision is substantially reduced.However compared with PCA methods, this method is without this disadvantage.Precision equivalence is being kept with initial data While, the advantages of this method is conducive to high-speed search as PCA final datas.

Description of the drawings

Fig. 1 is the schematic diagram for carrying out feature dissociation in the present invention to image data；

Fig. 2 is the schematic diagram that opposite feature is eliminated in the present invention；

Fig. 3 is the schematic diagram of high correlation filtering in the present invention；

Fig. 4 is the intermediate state figure that random vector divides in random forest in the present invention；

Fig. 5 be random forest of the present invention in random vector divide successively after result schematic diagram.

Specific implementation mode

The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the present invention, and is not considered as limiting the invention.

Specifically, a kind of ingredient intelligent analysis method of camera scene image of the present invention, including：

Process 1, as shown in Figure 1, by image data input AlexNet deep neural networks carry out feature dissociation, connecting entirely A layer layer 6 is connect to obtain then generating the matrix of 4096*N (assuming that a shared N pictures) after 4096 dimensional features.In Fig. 1, Max Pooling is maximum pond, and dense is compression, and Stride is step-length.

Wherein, image of the application also particular for camera scene changes AlexNet deep neural network models Into keeping it different from existing model structure.Specifically, in above-mentioned AlexNet deep neural network models, the first convolutional layer, Second convolutional layer, third convolutional layer, Volume Four lamination and the 5th convolutional layer convolution kernel (filter) lowered from the 5x5 of master To 3x3.

The application has not only carried out dimensionality reduction by the above-mentioned different designs scheme to different convolutional layers, has also carried out rising dimension, So that the port number of convolutional layer output and input all reduces, number of parameters is further reduced.It is smaller for common interest target Video image can more capture target signature details while reduce computing overhead (for the 36% of former convolution kernel), while second connects entirely The output that layer is connect from 4096 is down to 256, and feature extraction speed subsequent step arithmetic speed and memory space utilization rate is made to be promoted 16 times and mean accuracy only declines 5.7%；

Process 2, as shown in Fig. 2, receive previous step data carry out opposite feature elimination；In Fig. 1, Backward Feature Elimination Start(1:1) it is beginning 1:1 backward feature is eliminated, and Patitioning is subregion, Raplace this Node by the appropriate predictor are that the Node that available suitable grader substitutes herein (is noted：It is used herein as Decision Tree Predictor, that is, decision tree), Missing Value are discarding value, Backward Feature Elimination End terminate after being to feature elimination；

In the method, all sorting algorithms are first trained with n feature.Each dimensionality reduction operation, using n-1 feature To classifier training n times, n new grader is obtained.By the n-1 used in the wrong grader for dividing rate variation minimum in new grader Dimensional feature is as the feature set after dimensionality reduction.Constantly the process is iterated, you can obtain the result after dimensionality reduction.Kth time is repeatedly What is obtained during generation is n-k dimensional feature graders.By selecting maximum fault tolerant rate, can obtain in selection sort device On reach specified classification performance minimum and need how many a features.

Specific algorithm is described below：

Input：Training sample set { (xi, vi) } Ni=1, vi ∈ 1,2 ..., and l }, l is classification number

Output：Feature ordering collection R

1) initialization primitive character set S={ 1,2 ..., D }, feature ordering collection R=[]

2) l (l-1) 2 training samples are generated:

Different classes of combination of two is found out in training sample { (xi, vi) } Ni=1 obtains training sample to the end：

Xj=

{ (xi, yi) } N1+Nj+1i=1, j=1,2 ..., l；As vi=1, yi=1 works as vi=j+1, yi=-1

{ (xi, yi) } N2+Nj-l+3i=1, j=l ..., 2l-3；As vi=2, yi=1 works as vi=j-l+3, yi =-1

…………

{ (xi, yi) } Nl-1+Nli=1, j=l (l-1) 2-1 ..., l (l-1) 2；As vi=l-1, yi=1 works as vi =l, yi=-1

3) process is recycled until S=[]:

It obtains with l trained subsample Xj (j=1,2 ..., l (l-1)/2)；

Train SVM with Xj respectively, respectively obtain ω j (j=1,2 ..., l)；

Calculating ranking criteria score ck=Σ j ω 2jk (k=1,2 ..., | S |)；

Find out the feature p=argminkck of ranking criteria score minimum；

Update feature set R=[p, R]；

This feature S=S/p is removed in S；

High correlation filtering thinks that the information that they include is also shown when two column data variation tendencies are similar.In this way, using A row in similar row can meet machine learning model.For numerical value arrange between similitude by calculate related coefficient come It indicates, the related coefficient of name part of speech row can be indicated by calculating Pearson came chi-square value.Related coefficient is more than some threshold Two row of value only retain a row.Algorithm schematic diagram is as shown in Figure 3.

Specific algorithm is described below：

Remember that the correlation (correlation) of two signals f and g are：

Wherein f^*Indicate the complex conjugate of f,It is Kronecker product, the visual interpretation of correlation is exactly to weigh two Function is in sometime similarity degree；

And correlation filter are applied and are exactly with the simplest way of tracking：Two signals are more similar, phase Pass value is higher.It is tracking, is exactly finding and the tracking maximum item of target response.Therefore, filter herein is called Minimum Output Sum of Squared Error filter (MOSSE) (error least square and filter).According to the think of of front Road will then find a filter, keep its response in target maximum, then following formula：

Wherein g indicates that response output, f indicate that input picture, h indicate Filtering Template；

Compare acquisition response output if obtained, only need to determine filter template h；The calculating of above formula is to be rolled up Product calculate, this in a computer calculating consumption when it is prodigious, therefore to above formula carry out Fast Fourier Transform (FFT) (FFT), in this way Convolution operation has reformed into dot product operation after FFT, greatly reduces calculation amount.Above formula becomes following form：

Above formula is written as form：

G=FH^*；

Then, H^*=G/F；

In view of the influence of the factors such as the variable cosmetic of target during actual tracking, so need while considering mesh As reference, to improve the robustness of filter template, formula is as follows for m image of target：

Because the operation of above formula is all that Element-Level is other, therefore to find, as long as making each element therein, (w and v are The index of each element in H) MOSSE it is minimum.Therefore above formula can be exchanged into following form：

Local derviation is asked to above formula, and local derviation is made to can be obtained minimum for 0I.e.：

Different when complex field derivation and real number field derivation, derivation process is as follows：

Be obtained above be each element in H value, finally obtaining H is：

The model formation of above formula, that is, filter；

Filter H, Fi and Gi in tracking are obtained using following process：Tracking box (groundtruth) is carried out Random affine transformation, obtains a series of training sample fi, and gi be then generated by Gaussian function, and its peak position be The center of fi.After obtaining a series of training sample and result, so that it may to calculate the value of filter h.Here f, The size sizes of g, h are all identical.

Process 2 plays the part of vital role with process 3, both rejects the repetition in image to save with priceless value information Space and main contents ingredient is retained to greatest extent, keeps the functional equivalence relative to original image.

Process 4 receives previous step data, and linear search finds the time overhead of KNN (K closest target base) too Greatly, it and needs to read all data in memory, it is impractical.Therefore, using accidental projection forest, to all Data are divided, and will be searched for every time and are reduced to an acceptable range with the number of point calculated, then establish it is multiple with Machine Tree-projection constitutes accidental projection forest, using the synthesis result of forest as final result.Fig. 4 is that random vector divides successively Result afterwards.

4 receive process 2 of process and process 3 treated data carry out random division, are to carry out in camera applied field Feasible final step vector reduces process under scape.

It is as follows：A vector from origin is randomly selected, with this vertical straight line of vector by plane Interior point has been divided into two parts, will belong to this two-part point and be respectively divided to left subtree and right subtree.In mathematical computations, It is by calculating the step for dot product of each point and vertical vector is completed, point of the dot product more than zero is divided into left subtree, point The minus point of product is divided into right subtree.Paying attention to a bit, straight line not with the arrow is the foundation for dividing left and right subtree in figure, Vector with the arrow is for calculating dot product.In this way, original point just divides for left and right two parts, such as Fig. 4.

But the number of the point in a division result is still relatively more at this time, therefore continue to divide.It randomly selects again One vector, the straight line vertical with the vector are divided all the points, such as Fig. 5.

Pay attention to a bit, division at this time carries out on the basis of upper primary division.That is the point in present figure Four parts are had been divided into, it is 2 to correspond to a depth, and there are four the trees of leaf node.And so on continue divide down, Until the number at each leaf node midpoint reaches a sufficiently small number.Notice that this tree is not complete tree.

When carrying out arest neighbors calculating to new point using this tree, first by calculating the point and dividing vector used every time Dot product then utilize these points in this leaf node to carry out the meter of nearest neighbor algorithms to find the leaf node belonging to it It calculates.This process is the calculating process of an accidental projection tree, using same method, establish multiple accidental projection trees constitute with Machine forest, using the summation result of forest as final result.

Ultra-large matrix operation is designed in view of process 1,4, so process 1 and 4 is placed in the big rule with multiple GPU On mould machine learning server, the task of process 2 and 3 does parallel distribution on each (upper limit 200) edge calculations point.Its In, the relevant device in calculating process can be used existing video acquisition system and realize, the existing calculating that can utmostly utilize Resource.

In the case where ensureing the basic same level of final data scale and existing method, take into account the later stage is compared, search Accurate equivalence, this, which is PCA principal component analysis or other methods, to take into account, though such as PCA methods can be greatly reduced most Whole compositional data scale however compared with initial data, the final data of PCA is scanned for or is compared, and precision is substantially reduced. However compared with PCA methods, this method is without this disadvantage.While keeping precision equivalence with initial data, this method and PCA Final data is equally conducive to the advantages of high-speed search.

Although specifically showing and describing the present invention in conjunction with preferred embodiment, those skilled in the art should be bright In vain, it is not departing from the spirit and scope of the present invention defined by the appended claims, it in the form and details can be right The present invention makes a variety of changes, and is protection scope of the present invention.

Claims

1. a kind of ingredient intelligent analysis method of camera scene image, including：

Process 1 dissociates image data input AlexNet deep neural network models progress feature to obtain default dimensional characteristics； Specifically, AlexNet deep neural network models include five convolutional layers set gradually and three full articulamentums, it is denoted as respectively First convolutional layer, the second convolutional layer, third convolutional layer, Volume Four lamination, the 5th convolutional layer, the first full articulamentum, second connect entirely Layer and the full articulamentum of third are connect, default dimensional characteristics are the default dimensional characteristics obtained in the first full articulamentum；

Process 4, reception previous step data divide all data using accidental projection forest, will search for and count every time The number of the point of calculation is reduced to an acceptable range, then establishes multiple accidental projection trees and constitutes accidental projection forest, will The synthesis result of forest is as final result.

2. the ingredient intelligent analysis method of camera scene image according to claim 1, it is characterised in that：It is described In AlexNet deep neural network models, the first convolutional layer, the second convolutional layer, third convolutional layer, Volume Four lamination and volume five Lamination is 3x3 convolution kernels.

3. the ingredient intelligent analysis method of camera scene image according to claim 2, it is characterised in that：First convolution In layer and the 5th convolutional layer, a 1x1 convolution kernel is added respectively before 3x3 convolution kernels.

4. the ingredient intelligent analysis method of camera scene image according to claim 3, it is characterised in that：Second convolution In layer, third convolutional layer and Volume Four lamination, a 1x1 convolution kernel is all respectively added before and after 3x3 convolution kernels.

5. according to the ingredient intelligent analysis method of any camera scene images of claim 1-4, it is characterised in that：In advance If dimensional characteristics are 4096 dimensional features obtained in full articulamentum layer 6.

6. the ingredient intelligent analysis method of camera scene image according to claim 6, it is characterised in that：Second connects entirely It connects layer and exports 256 dimensional features.

7. according to the ingredient intelligent analysis method of any camera scene images of claim 1-4, it is characterised in that：It crosses During the opposite feature of journey 2 is eliminated, all sorting algorithms are first trained with n feature；Each dimensionality reduction operation, using n-1 spy Sign obtains n new grader to classifier training n times；Used in the wrong grader for dividing rate variation minimum in new grader N-1 dimensional features are as the feature set after dimensionality reduction；Constantly the process is iterated, you can obtain the result after dimensionality reduction；Kth time What is obtained in iterative process is n-k dimensional feature graders；By selecting maximum fault tolerant rate to get in selection sort device On reach specified classification performance minimum and need how many a features；Specifically comprise the following steps：

Input：Training sample set { (xi, vi) } Ni=1, vi ∈ 1,2 ..., and l }, l is classification number；

Output：Feature ordering collection R；

1) initialization primitive character set S={ 1,2 ..., D }, feature ordering collection R=[]；

2) l (l-1) 2 training samples are generated：

Xj={ (xi, yi) } N1+Nj+1i=1, j=1,2 ..., l；As vi=1, yi=1 works as vi=j+1, yi=-1；

{ (xi, yi) } N2+Nj-l+3i=1, j=l ..., 2l-3；As vi=2, yi=1 works as vi=j-l+3, yi=-1；

…………

{ (xi, yi) } Nl-1+Nli=1, j=l (l-1) 2-1 ..., l (l-1) 2；As vi=l-1, yi=1 works as vi=l, Yi=-1；

3) process is recycled until S=[]:

It obtains with l trained subsample Xj (j=1,2 ..., l (l-1)/2)；

Train SVM with Xj respectively, respectively obtain ω j (j=1,2 ..., l)；

Calculating ranking criteria score ck=Σ j ω 2jk (k=1,2 ..., | S |)；

Find out the feature p=argminkck of ranking criteria score minimum；

Update feature set R=[p, R]；

This feature S=S/p is removed in S.

8. according to the ingredient intelligent analysis method of any camera scene images of claim 1-4, it is characterised in that：It crosses Journey 3 specifically comprises the following steps：

Remember that the correlation of two signals f and g are：

Wherein f^*Indicate the complex conjugate of f,It is Kronecker product, the visual interpretation of correlation is exactly to weigh two functions In sometime similarity degree；

A filter is found, keeps its response in target maximum, then following formula：

Above formula is subjected to Fast Fourier Transform (FFT), such convolution operation has reformed into dot product operation after FFT, has been significantly reduced Calculation amount, above formula become following form：

Above formula is written as form：

G=F ˙ H^*

Then, H^*=G/F；

In view of the influence of the factors such as the variable cosmetic of target during actual tracking, so need while considering target As reference, to improve the robustness of filter template, formula is as follows for m image：

Above formula be can be exchanged into following form：

It obtains：Finally obtaining H is：

The model formation of above formula, that is, filter；

Random affine transformation is carried out to tracking box, obtains a series of training sample fi, and gi be then generated by Gaussian function, and And its peak position is in the center of fi；

After obtaining a series of training sample and result, you can calculate the value for obtaining filter h.