CN107491761A

CN107491761A - A kind of method for tracking target learnt based on deep learning feature and point to aggregate distance measurement

Info

Publication number: CN107491761A
Application number: CN201710730930.6A
Authority: CN
Inventors: 张盛平; 刘鑫丽; 齐元凯; 张维刚
Original assignee: Harbin Institute of Technology Weihai
Current assignee: Harbin Institute of Technology Weihai
Priority date: 2017-08-23
Filing date: 2017-08-23
Publication date: 2017-12-19
Anticipated expiration: 2037-08-23
Also published as: CN107491761B

Abstract

The invention discloses a kind of method for tracking target learnt based on deep learning feature and point to aggregate distance measurement, comprise the following steps：Some target samples and background sample are randomly selected in the start frame of tracking；Target sample feature extraction is carried out to target sample, background sample feature extraction is carried out to background sample；By the target sample feature clustering of extraction into several To Template set, by the background sample feature clustering of extraction into several background template set；By reducing distance between generic sample and increasing the distance between different samples to learn projection matrix；Target candidate collection is carried out to subsequent frame according to Gaussian Profile；The feature of target candidate is extracted, and To Template set, background template set and target candidate are projected into common subspace using projection matrix；Each target candidate is calculated to the distance of all To Template set, apart from score of the sum as each target candidate, final tracking result is the average value of several minimum preceding target candidates of distance.

Description

A kind of target following learnt based on deep learning feature and point to aggregate distance measurement Method

Technical field

It is especially a kind of to be arrived based on deep learning feature and point the present invention relates to image procossing and mode identification technology The method for tracking target of aggregate distance measurement study.

Background technology

Target following is an important research direction of computer vision field, and it is in video monitoring, virtual reality, people The fields such as machine interaction, automatic Pilot have extensive use.At present, discriminate tracking achieves preferable tracking result.Greatly Part discriminate tracking regards target following as a classification task, and target sample and background sample instruction are chosen in the first frame Practice a SVM classifier；For subsequent frame, some target candidates are gathered in each frame, and each target candidate is classified device It is divided into target or background；Candidate with maximum target confidence level is designated as tracking result.When being classified due to SVM classifier only According to a small amount of supporting vector (a small amount of sample as classification boundaries selected from training sample) and in most cases Sample linearly inseparable, this have ignored effect of the remaining sample in assorting process.

The content of the invention

It is an object of the invention to provide it is a kind of based on deep learning feature and point to aggregate distance measurement study target with Track method, extract feature by depth convolutional neural networks and improve the separating capacity of expression aspect, by put to set away from Effect of all training samples in assorting process is given full play to from metric learning.

To achieve the above object, the present invention uses following technical proposals：

A kind of method for tracking target learnt based on deep learning feature and point to aggregate distance measurement, including following step Suddenly：

Some target samples and background sample are randomly selected in the start frame of tracking；

Target sample feature extraction is carried out to target sample, background sample feature extraction is carried out to background sample；

By the target sample feature clustering of extraction into several To Template set, by the background sample feature clustering of extraction Into several background template set；

By reducing distance between generic sample and increasing the distance between different samples to learn projection matrix；

Target candidate collection is carried out to subsequent frame according to Gaussian Profile；

The feature of target candidate is extracted, and is waited To Template set, background template set and target using projection matrix Choosing projects to common subspace；

Each target candidate is calculated to the distance of all To Template set, apart from sum obtaining as each target candidate Point, final tracking result is the average value of several minimum preceding target candidates of distance.

Further, the start frame in tracking randomly selects some target samples and background sample, including：

In start frame according to target sample and the quantity ratio 1 of background sample：10 are sampled, and the target sample is with referring to Determine tracing area to hand over and compare more than 0.7, the background sample is handed over specified tracing area and compared less than 0.5.

Further, it is described that target sample feature extraction is carried out to target sample, it is special that background sample is carried out to background sample Sign extraction, including：

Target sample feature extraction is carried out to target sample using depth convolutional neural networks MDNet and background sample is entered Row background sample feature extraction.

Further, the target sample feature clustering by extraction is into several To Template set, by the back of the body of extraction Scape sample characteristics are clustered into several background template set, including：

K-means clusterings are used for several target sample set to the target sample feature of extraction, to each mesh Mark sample set and distribute a class label；K-means clusterings are used to be carried on the back for several background sample feature of extraction Scape sample set, a class label is distributed to each background sample set.

Further, it is described by reducing distance between generic sample and increasing the distance between different samples to learn to project Matrix, including：

Define object function to be optimized；The optimization aim of the object function includes three：Generic sample and sample The space length of this set in the projected is small, different classes of sample and sample set in the projected far；It is similar The space length of other sample in the projected is small, different classes of sample in the projected far；It is each after projection Dimension importance is consistent；

Use the projection matrix on the projection matrix and popular world of cross-iteration Optimization Method theorem in Euclid space.

The effect provided in the content of the invention is only the effect of embodiment, rather than whole effects that invention is all, above-mentioned A technical scheme in technical scheme has the following advantages that or beneficial effect：

The present invention provides a kind of method for tracking target that aggregate distance measurement study is arrived based on deep learning feature and point, more Traditional-handwork design feature has been mended to distinguish hypodynamic shortcoming and overcome traditional discriminate tracking pair based on SVM The defects of training sample is under-utilized in assorting process.By putting the learning distance metric to set, the present invention can be effective Ground calculates each target candidate to the distance of all target samples so that classification of each target sample to candidate is played Effect, so as to obtain more preferable classification results.

Brief description of the drawings

Fig. 1 is the method for tracking target flow that the present invention is learnt based on deep learning feature and point to aggregate distance measurement Figure.

Embodiment

As shown in figure 1, a kind of method for tracking target learnt based on deep learning feature and point to aggregate distance measurement, bag Include following steps：

S1, in the start frame of tracking randomly select some target samples and background sample；

S2, target sample feature extraction is carried out to target sample, background sample feature extraction is carried out to background sample；

S3, by the target sample feature clustering of extraction into several To Template set, by the background sample feature of extraction It is clustered into several background template set；

S4, by reducing distance between generic sample and increasing the distance between different samples to learn projection matrix；

S5, according to Gaussian Profile to subsequent frame carry out target candidate collection；The average of Gaussian Profile is previous frame target position Put, variance 1；

S6, the feature for extracting target candidate, and projection matrix is used by To Template set, background template set and target Candidate projects to common subspace；

S7, each target candidate is calculated to the distance of all To Template set, apart from sum as each target candidate Score, final tracking result is the average value of several minimum preceding target candidates of distance.

In step S1, in start frame according to target sample and background sample quantity ratio 1:10 and positive sample quantity at 100 Sampled above.Here 500 target samples of random acquisition and 5000 background samples, wherein target sample and specify with Track target area is handed over and compared more than 0.7, and background sample is handed over the tracing area specified and compared less than 0.5, hands over and ratio is two figures As the pixel count that the common factor in region is included divided by the pixel count that their union is included.

In step S2, target sample feature extraction and right is carried out to target sample using depth convolutional neural networks MDNet Background sample carries out background sample feature extraction：Each sample is zoomed into 107x107 sizes and by the picture of each passage Plain value subtracts 128 inputs as depth convolutional neural networks MDNet, depth convolutional neural networks MDNet the 3rd convolutional layer Feature of the output as the sample.

In step S3, K-means clusterings are used for 7 target sample set to the target sample feature of extraction, this In the target sample set of preferably more than 5 fully to capture the diversity of target performance information, to each target sample set Distribute a class label, such as+1 to+7；K-means clusterings are used for 20 background samples to the background sample feature of extraction This set, the background sample set of preferably more than 10 here is fully to capture the diversity of background information, to each background sample This set distributes a class label, such as -1 to -20.

In step S4, by reducing distance between generic sample and increasing the distance between different samples to learn to project square Battle array, including：

Define object function to be optimizedSection 1It is the point x in theorem in Euclid space_iWith the point S in manifold_jThe distance between keep, i.e., Generic single sample x_iWith sample set S_jSpace length after projection is small, different classes of single sample x_iWith Sample set S_jSpace length it is big；, x_iRepresent any one of target sample or background sample, S_jRepresent target sample Set, any one in background sample set, f () andThe mapping to be learnt is represented, if x_iAnd S_jIt is same with belonging to One classification (be all target sample or be all background sample), 1 (i, j)=1, otherwise, 1 (i, j)=- 1.

Section 2 (G_e+G_r) be the holding of sample point distance and manifold spatially sample point distance in theorem in Euclid space guarantor Hold, i.e., the space length after the projection of generic single sample is small, the space length after different classes of single sample projection It is big；Space length after generic sample set projection is small, the space length after different classes of sample set projection It is big, wherein d (v_i,v_j)=exp (‖ v_i-v_j‖²/σ²), v represents x or S.

Section 3Canonical constraint is represented, that is, each dimension after projecting has Identical importance.

Use the projection matrix on the projection matrix and popular world of cross-iteration Optimization Method theorem in Euclid space.Specifically Ground, order

K_x (x_i,x_j)=<f_x(x_i),f_x(x_j)>, wherein W_xAnd W_sIt is projection matrix to be solved.According toWithWhereinL_x= B_x-Q_x, L_s=B_s-Q_s.V is made to represent x or S, if sample i is as sample j classifications and is k₁(we set k₁=1, it can also set For other values) neighbour, then Q_v(i, j)=d (v_i,v_j)；If sample i and sample j classifications are different and are k₂(we set k₂=5, Other values can be set to) neighbour, then Q_v(i, j)=- d (v_i,v_j)；Other situations make Q_v(i, j)=0.Pass through repeatedly (such as 10 It is secondary) the final W of iteration renewal acquisition_xAnd W_sValue.

Although above-mentioned the embodiment of the present invention is described with reference to accompanying drawing, model not is protected to the present invention The limitation enclosed, one of ordinary skill in the art should be understood that on the basis of technical scheme those skilled in the art are not Need to pay various modifications or deformation that creative work can make still within protection scope of the present invention.

Claims

1. a kind of method for tracking target learnt based on deep learning feature and point to aggregate distance measurement, it is characterized in that, including Following steps：

By the target sample feature clustering of extraction into several To Template set, by the background sample feature clustering Cheng Ruo of extraction Dry background template set；

The feature of target candidate is extracted, and is thrown To Template set, background template set and target candidate using projection matrix Shadow is to common subspace；

Each target candidate is calculated to the distance of all To Template set, apart from score of the sum as each target candidate, Final tracking result is the average value of several minimum preceding target candidates of distance.

A kind of 2. target following side learnt based on deep learning feature and point to aggregate distance measurement as claimed in claim 1 Method, it is characterized in that, the start frame in tracking randomly selects some target samples and background sample, including：

In start frame according to target sample and the quantity ratio 1 of background sample：10 are sampled, the target sample with specify with Track region is handed over and compared more than 0.7, and the background sample is handed over specified tracing area and compared less than 0.5.

A kind of 3. target following side learnt based on deep learning feature and point to aggregate distance measurement as claimed in claim 1 Method, it is characterized in that, it is described that target sample feature extraction is carried out to target sample, background sample feature is carried out to background sample and carried Take, including：

Target sample feature extraction is carried out to target sample using depth convolutional neural networks MDNet and background sample is carried on the back Scape sample characteristics extract.

A kind of 4. target following side learnt based on deep learning feature and point to aggregate distance measurement as claimed in claim 1 Method, it is characterized in that, the target sample feature clustering by extraction is into several To Template set, by the background sample of extraction Feature clustering into several background template set, including：

K-means clusterings are used for several target sample set to the target sample feature of extraction, to each target sample This set distributes a class label；K-means clusterings are used for several background samples to the background sample feature of extraction This set, a class label is distributed to each background sample set.

A kind of 5. target following side learnt based on deep learning feature and point to aggregate distance measurement as claimed in claim 1 Method, it is characterized in that, it is described by reducing distance between generic sample and increasing the distance between different samples to learn projection matrix, Including：

Define object function to be optimized；The optimization aim of the object function includes three：Generic sample and sample set Close that space length in the projected is small, different classes of sample and sample set in the projected far；Generic The space length of sample in the projected is small, different classes of sample in the projected far；Each dimension after projection Importance is consistent；