CN106815576B

CN106815576B - Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine

Info

Publication number: CN106815576B
Application number: CN201710047829.0A
Authority: CN
Inventors: 年睿; 邱书琦; 常瑞杰; 肖玫
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2017-01-20
Filing date: 2017-01-20
Publication date: 2020-07-07
Anticipated expiration: 2037-01-20
Also published as: CN106815576A

Abstract

The invention discloses a target tracking method based on a continuous space-time confidence map and a semi-supervised extreme learning machine, which considers that video image frames are continuous in time, the position of a target to be tracked does not break, in addition, the video image frames are continuous in space, the spatial continuity is realized in the fact that a certain specific relation exists between the target and the background around the target, and when the appearance of the target changes greatly, the relation can help to distinguish the target to be tracked and the background area. Aiming at the problems of deformation and shielding, the invention fully considers the information provided by a real target, fully excavates the distribution similarity of labeled samples and unlabeled samples, improves the tracking precision, provides a semi-supervised tracking method based on an extreme learning machine for excavating the distribution similarity of the labeled samples and the unlabeled samples, and combines the two methods in a coupled tracking frame.

Description

Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine

Technical Field

The invention relates to a target tracking method based on a continuous space-time confidence map and a semi-supervised extreme learning machine, belonging to the technical field of intelligent information processing and target tracking.

Background

Object tracking is an indispensable link in most vision systems. In a specific scene application (such as the field of video surveillance, etc.), automatic, fast and highly robust target tracking is concerned. The method has wide application prospect in the aspects of video monitoring, traffic detection, intelligent robots, submarine target detection and tracking and the like.

The target tracking is an extremely important part in the field of computer vision, and a moving object tracking algorithm in a video carries out data mining in the video by analyzing the information of a video image of each frame in a video image sequence to be tracked, learns the target behavior, captures a large amount of actions, carries out a series of processing on the information, and obtains and marks the corresponding position of the tracked target in the video image. The problems of blocking deformation among objects, complexity of background, illumination brightness change, poor instantaneity and robustness and the like are to be solved urgently in the tracking process. Classical tracking methods such as Meanshift, particle filtering and the like depend on the richness degree of target information contained in a video, and in an actual video image sequence, the information provided by a target is quite limited, so that the target cannot be stably tracked, such as tangible variable occlusion in a scene, and the classical algorithms are rather ineffective.

Namely, the main problems existing in the prior art: (1) the real-time performance and robustness in the tracking process of the video scene to be tracked are poor, the space-time position information of the target is deficient, and the target characteristics are not obvious; (2) when a shelter and a target to be tracked deform in a scene, especially, the situation that the whole target is sheltered and the target to be tracked deforms greatly can occur, which can cause the problem that the tracked target is lost.

Disclosure of Invention

The invention aims to provide a target tracking method based on a continuous space-time confidence map and a semi-supervised extreme learning machine, so as to make up for the defects of the prior art.

The video image frames are continuous in time, the time continuity is realized in the way that the target to be tracked does not change greatly between adjacent frames, and the position of the target to be tracked does not change suddenly; meanwhile, video image frames are continuous in space, the spatial continuity is realized in the fact that a certain specific relation exists between a target and the background around the target, when the appearance of the target is changed greatly, the relation can help to distinguish the target to be tracked and the background area, and the tracking method utilizing the continuous space-time confidence map learning is provided to overcome the problems of poor real-time performance and robustness, insufficient target space-time position information, unobvious target characteristics and the like. Aiming at the problems of deformation and shielding, the information provided by a real target is fully considered, the distribution similarity of the labeled samples and the unlabeled samples is fully mined, the tracking precision is improved, a semi-supervised tracking method based on an extreme learning machine for mining the distribution similarity of the labeled samples and the unlabeled samples is provided, the two methods are combined in a coupled tracking frame, and the tracking with good robustness and high robustness is realized.

In order to achieve the purpose, the specific technical scheme adopted by the invention is realized by the following steps:

the first step,Acquiring n frames of target video A to be tracked in a specific monitored scene to be tracked ═ I₁,…,I_i,…I_nIn which I_iRepresenting the ith frame to-be-tracked video image sequence, and preprocessing the to-be-tracked video sequence by utilizing image filtering denoising and contrast enhancement

Noise reduction and highlighting of the region of interest to be tracked;

step two, in the t frame to be tracked video image sequence I_tSelecting a target O to be tracked by using a rectangular window, and determining the central position O of the target^*O represents the existence of a new target in the scene, and represents the position of the new target, and a two-dimensional confidence map model C of the target to be tracked is defined_t(o); the target area to be tracked is enlarged by two times to form a local background area which is expressed as

In that

The intensity position characteristic w (k) at the coordinate position k is internally extracted to form an intensity position characteristic set

I (k) represents the brightness of the image at coordinate position k,

represents the coordinate o^*A neighborhood of (c); establishing a prior model P (w (k) O) of the target to be tracked of the t-th frame, and calculating a space-time model of the t-th frame

Step three, overlapping and sampling the area where the central position of the target to be tracked is located to obtain N₁Each region block image as a positive sample and N₂Taking the image of each region block as a negative sample, and extracting positive and negative sample data characteristics x_jThe class label of the positive exemplar is 1, the class label of the negative exemplar is 0, y_j∈ {1,0}, establishing labeled sample set

Training sample set X ═ X_s,X_u}＝{(x_j,y_j)},j＝1,...,N₁+N₂；

Step four, training a semi-supervised extreme learning machine network model by using the training sample set X obtained in the step three;

step five, in step I_t+1In the second step, the t frame space-time model obtained in the second step is used

Updating the model, and calculating to obtain the space-time model of the t +1 th frame

Using the obtained t +1 th frame space-time model

Convolution I_t+1Obtaining a space-time confidence map C of the new target_t+1(o), maximize C_t+1(o) determining a target position o in the t +1 th frame;

step six, judging whether the target is shielded, if not, entering the step five, otherwise, entering the step seven;

step seven, in step I_t+1In the formula I_tO obtained in^*Is a target position, at a target position o^*In the region, overlapping sampling is carried out according to the size of a rectangular window of a target region, N region block images are obtained to be used as candidate targets, and the data characteristics of the candidate targets are extracted

Establishing a target image block test sample set to be tracked

Inputting the test sample set into the trained semi-supervised extreme learning machine network in the fourth step to obtain a T +1 th frame test output T, and maximizing the maximum classification response position of the on-line semi-supervised extreme learning machine to obtain a target position o in the T +1 th frame;

step eight, carrying out on-line semi-supervised extreme learning machine network model updating threshold judgment on the maximum classification response result, if the on-line semi-supervised extreme learning machine model does not need to be updated, entering the step five, otherwise entering the step nine;

step nine, the labeled data set obtained in step three

And the test sample set obtained in the step seven

As an unlabeled data set X_u＝X_t+1Step four, retraining the semi-supervised extreme learning machine network model;

and repeating the steps circularly until the tracking is completed on the whole video sequence.

Further, the third step: at the central position o of the target to be tracked^*In the region, overlapped sampling is carried out according to the size of a rectangular window of a target region, and the Euclidean distance from the jth sampling point to the central position of a target is

When in use

Then, sampling obtains N₁The image of each region block is used as a positive sample when

Then, sampling obtains N₂Individual region block image as negative sample, r₁、r₂And r₃Respectively, the sampling radius; extracting positive and negative sample data characteristic x_jEstablishing a training sample set of target image blocks to be tracked, and collecting the training sample set (N)₁+N₂) Using the target image block as a training sample set X { (X)_j,y_j)},j＝1,...,N₁+N₂The class label of the positive exemplar is 1, the class label of the negative exemplar is 0, y_j∈ {1,0}, sequentially scrambling and rearranging the samples in the training sample set, and taking the samples (usually lower proportion) in certain proportion at the top as the marked sample set X_sTaking the rest samples (usually with higher proportion) as the unlabeled sample set X_uAnd X ═ X_s,X_u}。

The fourth step is that: setting input weight and hidden layer bias in a random mode, and if (a, b) represents the input weight a and threshold b obtained by hidden layer nodes, taking a training sample as a labeled data set

Annotation-free data set

Wherein X_sAnd X_uRepresenting an input sample, Y_sIs a reaction of with X_sA corresponding output sample; the mapping function of the hidden layer is g (x), and the mapping function form can be expressed as g (x) 1/(1+ e)^-x) The output weight is represented by β, h (x)_i)＝[G(a₁,b₁,x_i),...,G(a_m,b_m,x_i)]_s×mRepresenting the output matrix of the ith hidden layer, the node number of the hidden layer is m, e_iIndicating the learning error (residual) of the ith input node.

The objective function of the semi-supervised extreme learning machine is as follows:

wherein C is_iThe method comprises the steps of representing a penalty parameter, representing a balance parameter by lambda, obtaining a graph Laplacian operation result by labeled data and unlabeled data by L, obtaining an output matrix of a network by F, and obtaining trace operation by Tr;

the semi-supervised extreme learning machine objective function is expressed in a matrix form as follows:

wherein

Is the first s rows equal to Y_sThe last u rows equal to zero, C is the first s diagonal elements C_iThe rest is a diagonal matrix of zero, and H is a hidden layer output matrix of the network;

the above equation is biased into β:

let the partial derivative be zero, solve to obtain an output weight β of:

when the tag data is greater than the number of nodes in the hidden layer

When the tagged data is less than the number of nodes in the hidden layer

Wherein H^TIs the transpose of matrix H.

The method for judging whether the target is shielded or not is the result of the confidence map

Threshold th for occlusion₁Is judged if

Hour, it indicates the object is blocked, th₁The critical value representing the occlusion can be changed according to different scenes, and when the algorithm is applied to different scenes, the th is manually adjusted₁The value, which normally fluctuates within a certain range, is the targetWhen the light source is shielded,

will rapidly decrease, and the value after the rapid decrease is defined as th₁And judging whether the target is blocked or not according to the value.

The eighth step, the method for judging whether the network model of the on-line semi-supervised extreme learning machine needs to be updated is to respond to the maximum classification result T_maxUpdate threshold th₂Is judged if T_max＞th₂In time, it indicates that the online semi-supervised extreme learning machine network model does not need to be updated to update the threshold th₂And judging whether the network model needs to be updated or not.

The invention has the beneficial effects that: the invention combines a continuous space-time confidence map learning tracking method and a semi-supervised extreme learning machine tracking method, and solves the problems of poor real-time performance and robustness, insufficient target space-time position information, unobvious target characteristics and deformation shielding to cause loss of a tracked target in the tracking process. The method comprises the steps of calculating a continuous space-time confidence map to judge the shielding threshold value, obtaining a method for judging whether a target enters a shielding region, effectively solving the problem of judging the shielding of the target, calculating the maximum response value output by a semi-supervised extreme learning machine network to judge the updating threshold value of the semi-supervised extreme learning machine network model, obtaining a method for judging whether the network model needs to be updated, and effectively solving the problem of poor generalization of the network model. The invention greatly improves the tracking precision and realizes a tracking process with good robustness and high robustness.

Drawings

FIG. 1 is a schematic diagram of an overall tracking process according to the present invention.

FIG. 2 is a region marker map of an object to be tracked in an embodiment.

FIG. 3 is a block diagram of a method for object tracking based on continuous spatiotemporal confidence maps in an exemplary embodiment.

Fig. 4 is a basic framework diagram of a semi-supervised extreme learning machine network.

FIG. 5 is a block diagram of a method for target tracking based on semi-supervised extreme learning in an exemplary embodiment.

Fig. 6 is an example of tracking effect under occlusion in the specific embodiment, where (a) in fig. 6 is a video frame with an object of interest to be tracked, and (b) in fig. 6, (c) in fig. 6, (d) in fig. 6, (e) in fig. 6, and (f) in fig. 6 are video frames for tracking the object of interest after the frame (a) in fig. 6, respectively.

Detailed Description

In order to make the objects, embodiments and advantages of the present invention clearer, the present invention is further illustrated by the following specific examples in conjunction with the accompanying drawings.

The specific flow chart of the invention is shown in fig. 1.

In this embodiment, a segment of classical corridor surveillance video, caviar (384 × 288 pixels, 25 frames per second), is specifically adopted as the video to be tracked.

The method comprises the steps of firstly, preprocessing a video sequence to be tracked by utilizing image filtering denoising and contrast enhancement, reducing noise and highlighting an interested region to be tracked; the method specifically comprises the following steps:

step 1-1, defining a section of classical corridor surveillance video caviar as A, and performing frame division processing to obtain 200 frames of video image sequences to be tracked, namely A ═ { I ═ I₁,…,I_i,…I₂₀₀In which I_iRepresenting the ith frame of video image to be tracked of the corridor monitoring video caviar;

step 1-2, for the 200 frames video image sequence

And carrying out preprocessing of filtering and denoising and contrast enhancement.

Step two, at the t-1 frame to be tracked video image sequence I_t＝1Selecting a target O to be tracked, and determining the central position O of the target^*O represents the existence of a new target in the scene, O represents the position of the new target, and defines a two-dimensional confidence map model C of the target to be tracked_t(o); establishing a prior model P (w (k) O) of the target to be tracked of the t-th frame, and calculating a space-time model of the t-th frame

As shown in fig. 3; the method specifically comprises the following steps:

step 2-1 at I_t＝1Selecting an interested target O to be tracked by a user through a rectangular window W, wherein the width of the rectangular window of a target area is W, the height of the rectangular window of the target area is h, the O represents the position of a new target, and the target area is expanded by two times to form a local background area which is represented as

As shown in fig. 2; extracting intensity position characteristics w (k) at a coordinate position k in a local background region to form an intensity position characteristic set

I (k) represents the brightness of the image at coordinate position k,

represents the coordinate o^*A neighborhood of (c);

step 2-2, converting the tracking problem into a problem of calculating a confidence map of the position of the target to be tracked:

wherein C is_t(o) a confidence map model representing the t frame, representing the new target location o and the old target location o^*The closer the new target position is to the old target position, the larger the confidence value is;

representing a space-time model and describing the relative positions and directions of a new target O and a coordinate point k of a local background area, and P (w (k) I O) represents a prior model and describes the position of an old target

Simulating low-level contour information of the target O to be tracked according to the intensity and the relative position direction of the coordinate point k in the local background area;

step 2-3, calculatingt 1 frame I_t＝1Confidence map of

While obtaining a maximum confidence value

Step 2-4, calculating a prior model of the t-1 frame

Wherein

Is a scale parameter;

step 2-5, calculating t-1 frame I_t＝1Confidence graph model C_t(O) and a prior model P (w (k), O) computing a spatio-temporal model of the object of interest for frame t ═ 1

Wherein

Which represents a fast fourier transform of the signal,

representing an inverse fast fourier transform.

Step three, overlapping and sampling the area where the central position of the target to be tracked is located to obtain N₁Each region block image as a positive sample and N₂Taking the image of each region block as a negative sample, and extracting positive and negative sample data characteristics x_jEstablishing a labeled sample set

And label-free sample set X_uForm a training sample set X { (X)_j,y_j)},j＝1,...,N₁+N₂As shown in fig. 5, the method specifically includes the following steps:

step 3-1, at the center position o^*In the region, overlapped sampling is carried out according to the size of a rectangular window of a target region, and the Euclidean distance from the jth sampling point to the central position of a target is

When in use

Then, 45 area block images are obtained as positive samples by sampling, and when the positive samples are obtained

Then, 31 area block images are obtained by sampling as negative samples, and the radius r is sampled₁、r₂And r₃Set parameters 5, 10 and 20 (unit: pixel), respectively;

step 3-2, extracting positive and negative sample data characteristics x_jEstablishing a training sample set of target image blocks to be tracked, and collecting 76 target image blocks as a training sample set X { (X)_j,y_j) 1, 76, the class label of the positive exemplar is 1, the class label of the negative exemplar is 0, y_j∈{1,0}；

Step 3-3, disordering and rearranging the sample sequence in the training sample set, and taking the first 50 samples as the marked sample set X_sTaking the remaining 26 samples as the unlabeled sample set X_uAnd X ═ X_s,X_u}。

Step four, training the online semi-supervised extreme learning machine network model by using the training sample set X obtained in the step three, and specifically comprising the following steps of:

step 4-1, the semi-supervised extreme learning machine is a single hidden layer feedforward neural network model as shown in fig. 4, and the whole network model is divided into three layers, including: the input layer, the hidden layer and the output layer adopt a random mode to set the input weight and the hidden layer bias, are independent of the training sample, have simple algorithm structure and high calculation efficiency, and if (a, b) represents the hidden layer node to obtainThe training sample is a labeled data set

Annotation-free data set

Wherein X_sAnd X_uRepresenting an input sample, Y_sIs a reaction of with X_sA corresponding output sample; the mapping function of the hidden layer is g (x), and the mapping function form can be expressed as g (x) 1/(1+ e)^-x) The output weight is represented by β, h (x)_i)＝[G(a₁,b₁,x_i),…,G(a₂₀₀₀,b₂₀₀₀,x_i)]_50×2000Representing the output matrix of the ith hidden layer, the node number of the hidden layer is 2000, e_iA learning error (residual) representing the ith input node;

step 4-2, the objective function of the semi-supervised extreme learning machine to be trained is as follows:

wherein C is_iRepresenting penalty parameters, lambda representing balance parameters, L being the graph Laplace operation result obtained by the labeled data and the unlabeled data, F being the output matrix of the network, and T_rIs a trace operation;

and 4-3, expressing the objective function of the semi-supervised extreme learning machine in a matrix form as follows:

wherein

Is the first 50 rows equal to Y_sThe last 26 rows equal the output label sample of zero. C is the first 50 diagonal elements as C_iThe remaining is the diagonal matrix of zero;

step 4-4, obtaining the partial derivative of β by the above formula pair:

step 4-5, making the partial derivative zero, solving to obtain an output weight β as follows:

when the tag data is greater than the number of nodes in the hidden layer

When the tagged data is less than the number of nodes in the hidden layer

Wherein H^TAnd the matrix is a transposed matrix of the matrix H, and the semi-supervised extreme learning machine network model training is finished.

Step five, in the t +1 th frame, utilizing the t frame space-time model obtained in the step two

Using the obtained t +1 th frame space-time model

Convolution image I_t+1Obtaining a space-time confidence map C of the new target_t+1(o) maximizing the confidence map C thus obtained_t+1(o) determining the target position o in the t +1 th frame, as shown in fig. 3, specifically including the following steps:

step 5-1 at I_t+1In o^*Taking a local background area twice the target size for the target location

Extracting intensity position characteristic in the region to form an intensity position characteristic set

Step 5-2, updating a spatio-temporal model of the target of interest to be tracked in the t frame:

where p is the learning rate, where,

is the interesting object space-time model to be tracked calculated in the t frame,

expressed in the frequency domain as:

wherein

Is that

Time domain fourier transform of (a). Time domain filter F_wExpressed as:

F_w＝ρ/(e^jw-(1-ρ))

where j is an imaginary unit;

step 5-3, calculating a t +1 frame confidence map of the target of interest to be tracked:

step 5-4, in the t +1 frame, the position o of the interested target, namely the confidence map of the maximized t +1 frame:

o＝argmaxC_t+1(o)

maximum confidence value of

Judging whether the target is shielded, and specifically comprising the following steps:

step 6-1, obtaining the maximum confidence value of the step 5-4

Threshold th for occlusion₁Is judged if

And if so, indicating that the target is blocked, and judging whether the target is blocked or not. th (h)₁The critical value representing the occlusion can be changed according to different scenes, and when the algorithm is applied to different scenes, the th is manually adjusted₁The value, which normally fluctuates over a certain range, when the target is occluded,

will rapidly decrease, and the value after the rapid decrease is defined as th₁A value; in this detailed description, definitions

Step 6-2, if

If so, indicating that the target is not shielded, performing the step 5-1, otherwise, performing the step 7-1.

Step seven, in the t +1 th frame, overlapping sampling is carried out at the position where the target center has been tracked in the t th frame, candidate target data characteristics are extracted, a target image block test sample to be tracked is established, the test sample is input into the trained online semi-supervised extreme learning machine, and the maximum classification response position in the test sample is the predicted new target position, as shown in fig. 5, the method specifically comprises the following steps:

step 7-1, for the t +1 th frame video image, add^*Is a target position at a central position o^*In the region, overlap sampling according to the size of rectangular window of target region, sampling at jthSampling point to o^*Has a Euclidean distance of

When in use

Sampling to obtain 232 area block images as candidate targets, namely test data, and extracting sample data with characteristics of

And recording the test set as

Radius of sampling r₁The set parameter is 20 (unit: pixel);

and 7-2, testing output is as follows:

T＝H^*β

where β is the output weight calculated for the t frame, H^*For the hidden layer output matrix to be tested,

step 7-3, in the t +1 frame, the position o of the target of interest to be tracked is the maximum classification response position of the maximum t +1 frame semi-supervised extreme learning machine:

o＝arg max T

maximum class response value of T_max。

And step eight, performing online semi-supervised extreme learning machine network model updating threshold judgment on the maximum classification response result, and specifically comprising the following steps:

step 8-1, responding to the maximum classification result T_maxUpdating threshold th for semi-supervised extreme learning machine₂Is judged if T_max＞th₂Then, it indicates that the network model of the on-line semi-supervised extreme learning machine does not need to be updated, so as to judge whether the model of the on-line semi-supervised extreme learning machine needs to be updated th₂Indicating updated threshold value, th in this embodiment₂＝0。

Step 8-2, if T_maxAnd if the model is more than 0, the online semi-supervised extreme learning machine network model does not need to be updated, and the step 5-1 is carried out, otherwise, the step nine is carried out.

Step nine, retraining the online semi-supervised extreme learning machine network model, as shown in fig. 5, specifically as follows: annotated data set obtained in step 3-3

And the test set obtained in step 7-1

As an unlabeled data set X_u＝X_t+1And 4-1, retraining the online semi-supervised extreme learning machine network model.

And repeating the steps circularly until the tracking of the whole monitoring video sequence to be tracked is completed.

For the above-mentioned surveillance video to be tracked, the results of the comparison of the tracking performance by the particle filtering method, the Meanshift method and the method of the present invention are shown in Table 1, and it can be seen that the method of the present invention is superior to the particle filtering method and the Meanshift method in both the center position deviation result and the deviation mean square error result, and the robustness and robustness of the target tracking are realized.

Table 1 shows the comparison of tracking performance of particle filter, Meanshift and the method of the present invention

	Particle filter	Meanshift	The method of the invention
				Deviation of center position	75.4796	22.9740	10.1834
Mean square deviation of deviation	47.8903	12.2607	7.9702

Fig. 6 is an example of the tracking effect under the occlusion condition in the specific embodiment, and it can be seen that the upper target can be accurately tracked under the condition of two times of serious occlusions, further proving the robustness and robustness of the method of the present invention.

The above is the preferred embodiment of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A target tracking method based on a continuous space-time confidence map and a semi-supervised extreme learning machine is characterized by comprising the following steps:

step one, collecting n frames of target video A to be tracked in a specific monitored scene to be tracked, wherein the video A is equal to { I }₁,…,I_i,…I_nIn which I_iRepresenting the ith frame to-be-tracked video image sequence, and preprocessing the to-be-tracked video sequence by utilizing image filtering denoising and contrast enhancement

Noise reduction and highlighting of the region of interest to be tracked;

step two, in the t frame to be tracked video image sequence I_tSelecting a target O to be tracked by using a rectangular window, and determining the central position O of the target^*O represents the existence of a new target in the scene, and O represents the position of the new target, and defines a two-dimensional confidence map model C of the target O to be tracked_t(o); the target area to be tracked is enlarged by two times to form a local background area which is expressed as

In that

I (k) represents the brightness of the image at coordinate position k,

And label-free sample set X_uForm a training sample set X ═ X_s,X_u}＝{(x_j,y_j)},j＝1,...,N₁+N₂；

Using the obtained t +1 th frame space-time model

Establishing a target image block test sample set to be tracked

Inputting the test sample set into the trained semi-supervised extreme learning machine network model in the fourth step to obtain a T +1 th frame test output T, and maximizing the maximum classification response position of the semi-supervised extreme learning machine network model to obtain a target position o in the T +1 th frame;

step eight, judging the updating threshold of the network model of the semi-supervised extreme learning machine according to the maximum classification response result, if the network model of the semi-supervised extreme learning machine does not need to be updated, entering the step five, otherwise entering the step nine;

step nine, the labeled data set obtained in step three

And the test sample set obtained in the step seven

2. The method for tracking an object according to claim 1, wherein the third step is specifically: at the central position o of the target to be tracked^*In the region, overlapped sampling is carried out according to the size of a rectangular window of a target region, and the Euclidean distance from the jth sampling point to the central position of a target is

When in use

Then, sampling obtains N₂Individual region block image as negative sample, r₁、r₂And r₃Respectively, the sampling radius; extracting positive and negative sample data characteristic x_jEstablishing a training sample set of target image blocks to be tracked, and collecting the training sample set (N)₁+N₂) Using the target image block as a training sample set X { (X)_j,y_j)},j＝1,...,N₁+N₂The class label of the positive exemplar is 1, the class label of the negative exemplar is 0, y_j∈ {1,0}, disordering and rearranging the order of the samples in the training sample set, and taking the samples in the first certain proportion as the marked sample set X_sTaking the residual samples as the unlabeled sample set X_uAnd X ═ X_s,X_u}。

3. The method for tracking an object according to claim 1, wherein the fourth step is specifically: setting input weight and hidden layer bias in a random mode, and if (a, b) represents the input weight a and threshold b obtained by hidden layer nodes, taking a training sample as a labeled data set

Annotation-free data set

Wherein X_sAnd X_uRepresenting an input sample, Y_sIs a reaction of with X_sA corresponding output sample; the mapping function of the hidden layer is g (x), and the mapping function form can be expressed as g (x) 1/(1+ e)^-x) The output weight is represented by β, h (x)_i)＝[G(a₁,b₁,x_i),…,G(a_m,b_m,x_i)]_s×mRepresenting the output matrix of the ith hidden layer, the node number of the hidden layer is m, e_iRepresenting the learning error of the ith input node;

f_i＝h(x_i)β,i＝1,...,s+u

wherein

Is the first s rows equal to Y_sThe last u rows equal to zero, C is the first s diagonal elements C_iThe remaining is the diagonal matrix of zero; h is the hidden layer output matrix of the network;

the above equation is biased into β:

let the partial derivative be zero, solve to obtain an output weight β of:

when the tag data is greater than the number of nodes in the hidden layer

When the tagged data is less than the number of nodes in the hidden layer

Wherein H^TIs the transpose of matrix H.

4. The method of claim 1, wherein the sixth step of determining whether the target is occluded is a result of a confidence map

Threshold th for occlusion₁Is judged if

Hour, it indicates the object is blocked, th₁Indicating occlusionThe critical value can be changed according to different scenes, and when the algorithm is applied to different scenes, the th is manually adjusted₁The value, which normally fluctuates over a certain range, when the target is occluded,

5. The method for tracking an object as claimed in claim 1, wherein the step eight of determining whether the semi-supervised extreme learning machine network model needs to be updated is to respond to the maximum classification result T_maxUpdate threshold th₂Is judged if T_max＞th₂In time, it is shown that the semi-supervised extreme learning machine network model does not need to be updated to update the threshold th₂And judging whether the network model needs to be updated or not.