CN106815576B - Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine - Google Patents

Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine Download PDF

Info

Publication number
CN106815576B
CN106815576B CN201710047829.0A CN201710047829A CN106815576B CN 106815576 B CN106815576 B CN 106815576B CN 201710047829 A CN201710047829 A CN 201710047829A CN 106815576 B CN106815576 B CN 106815576B
Authority
CN
China
Prior art keywords
target
tracked
semi
learning machine
extreme learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710047829.0A
Other languages
Chinese (zh)
Other versions
CN106815576A (en
Inventor
年睿
邱书琦
常瑞杰
肖玫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN201710047829.0A priority Critical patent/CN106815576B/en
Publication of CN106815576A publication Critical patent/CN106815576A/en
Application granted granted Critical
Publication of CN106815576B publication Critical patent/CN106815576B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method based on a continuous space-time confidence map and a semi-supervised extreme learning machine, which considers that video image frames are continuous in time, the position of a target to be tracked does not break, in addition, the video image frames are continuous in space, the spatial continuity is realized in the fact that a certain specific relation exists between the target and the background around the target, and when the appearance of the target changes greatly, the relation can help to distinguish the target to be tracked and the background area. Aiming at the problems of deformation and shielding, the invention fully considers the information provided by a real target, fully excavates the distribution similarity of labeled samples and unlabeled samples, improves the tracking precision, provides a semi-supervised tracking method based on an extreme learning machine for excavating the distribution similarity of the labeled samples and the unlabeled samples, and combines the two methods in a coupled tracking frame.

Description

Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine
Technical Field
The invention relates to a target tracking method based on a continuous space-time confidence map and a semi-supervised extreme learning machine, belonging to the technical field of intelligent information processing and target tracking.
Background
Object tracking is an indispensable link in most vision systems. In a specific scene application (such as the field of video surveillance, etc.), automatic, fast and highly robust target tracking is concerned. The method has wide application prospect in the aspects of video monitoring, traffic detection, intelligent robots, submarine target detection and tracking and the like.
The target tracking is an extremely important part in the field of computer vision, and a moving object tracking algorithm in a video carries out data mining in the video by analyzing the information of a video image of each frame in a video image sequence to be tracked, learns the target behavior, captures a large amount of actions, carries out a series of processing on the information, and obtains and marks the corresponding position of the tracked target in the video image. The problems of blocking deformation among objects, complexity of background, illumination brightness change, poor instantaneity and robustness and the like are to be solved urgently in the tracking process. Classical tracking methods such as Meanshift, particle filtering and the like depend on the richness degree of target information contained in a video, and in an actual video image sequence, the information provided by a target is quite limited, so that the target cannot be stably tracked, such as tangible variable occlusion in a scene, and the classical algorithms are rather ineffective.
Namely, the main problems existing in the prior art: (1) the real-time performance and robustness in the tracking process of the video scene to be tracked are poor, the space-time position information of the target is deficient, and the target characteristics are not obvious; (2) when a shelter and a target to be tracked deform in a scene, especially, the situation that the whole target is sheltered and the target to be tracked deforms greatly can occur, which can cause the problem that the tracked target is lost.
Disclosure of Invention
The invention aims to provide a target tracking method based on a continuous space-time confidence map and a semi-supervised extreme learning machine, so as to make up for the defects of the prior art.
The video image frames are continuous in time, the time continuity is realized in the way that the target to be tracked does not change greatly between adjacent frames, and the position of the target to be tracked does not change suddenly; meanwhile, video image frames are continuous in space, the spatial continuity is realized in the fact that a certain specific relation exists between a target and the background around the target, when the appearance of the target is changed greatly, the relation can help to distinguish the target to be tracked and the background area, and the tracking method utilizing the continuous space-time confidence map learning is provided to overcome the problems of poor real-time performance and robustness, insufficient target space-time position information, unobvious target characteristics and the like. Aiming at the problems of deformation and shielding, the information provided by a real target is fully considered, the distribution similarity of the labeled samples and the unlabeled samples is fully mined, the tracking precision is improved, a semi-supervised tracking method based on an extreme learning machine for mining the distribution similarity of the labeled samples and the unlabeled samples is provided, the two methods are combined in a coupled tracking frame, and the tracking with good robustness and high robustness is realized.
In order to achieve the purpose, the specific technical scheme adopted by the invention is realized by the following steps:
the first step,Acquiring n frames of target video A to be tracked in a specific monitored scene to be tracked ═ I1,…,Ii,…InIn which IiRepresenting the ith frame to-be-tracked video image sequence, and preprocessing the to-be-tracked video sequence by utilizing image filtering denoising and contrast enhancement
Figure GDA0002470717950000021
Noise reduction and highlighting of the region of interest to be tracked;
step two, in the t frame to be tracked video image sequence ItSelecting a target O to be tracked by using a rectangular window, and determining the central position O of the target*O represents the existence of a new target in the scene, and represents the position of the new target, and a two-dimensional confidence map model C of the target to be tracked is definedt(o); the target area to be tracked is enlarged by two times to form a local background area which is expressed as
Figure GDA0002470717950000022
In that
Figure GDA0002470717950000023
The intensity position characteristic w (k) at the coordinate position k is internally extracted to form an intensity position characteristic set
Figure GDA0002470717950000024
I (k) represents the brightness of the image at coordinate position k,
Figure GDA0002470717950000025
represents the coordinate o*A neighborhood of (c); establishing a prior model P (w (k) O) of the target to be tracked of the t-th frame, and calculating a space-time model of the t-th frame
Figure GDA0002470717950000026
Step three, overlapping and sampling the area where the central position of the target to be tracked is located to obtain N1Each region block image as a positive sample and N2Taking the image of each region block as a negative sample, and extracting positive and negative sample data characteristics xjThe class label of the positive exemplar is 1, the class label of the negative exemplar is 0, yj∈ {1,0}, establishing labeled sample set
Figure GDA0002470717950000027
Figure GDA0002470717950000028
Training sample set X ═ Xs,Xu}={(xj,yj)},j=1,...,N1+N2
Step four, training a semi-supervised extreme learning machine network model by using the training sample set X obtained in the step three;
step five, in step It+1In the second step, the t frame space-time model obtained in the second step is used
Figure GDA0002470717950000029
Updating the model, and calculating to obtain the space-time model of the t +1 th frame
Figure GDA00024707179500000210
Using the obtained t +1 th frame space-time model
Figure GDA00024707179500000211
Convolution It+1Obtaining a space-time confidence map C of the new targett+1(o), maximize Ct+1(o) determining a target position o in the t +1 th frame;
step six, judging whether the target is shielded, if not, entering the step five, otherwise, entering the step seven;
step seven, in step It+1In the formula ItO obtained in*Is a target position, at a target position o*In the region, overlapping sampling is carried out according to the size of a rectangular window of a target region, N region block images are obtained to be used as candidate targets, and the data characteristics of the candidate targets are extracted
Figure GDA00024707179500000212
Establishing a target image block test sample set to be tracked
Figure GDA00024707179500000213
Inputting the test sample set into the trained semi-supervised extreme learning machine network in the fourth step to obtain a T +1 th frame test output T, and maximizing the maximum classification response position of the on-line semi-supervised extreme learning machine to obtain a target position o in the T +1 th frame;
step eight, carrying out on-line semi-supervised extreme learning machine network model updating threshold judgment on the maximum classification response result, if the on-line semi-supervised extreme learning machine model does not need to be updated, entering the step five, otherwise entering the step nine;
step nine, the labeled data set obtained in step three
Figure GDA0002470717950000031
And the test sample set obtained in the step seven
Figure GDA0002470717950000032
As an unlabeled data set Xu=Xt+1Step four, retraining the semi-supervised extreme learning machine network model;
and repeating the steps circularly until the tracking is completed on the whole video sequence.
Further, the third step: at the central position o of the target to be tracked*In the region, overlapped sampling is carried out according to the size of a rectangular window of a target region, and the Euclidean distance from the jth sampling point to the central position of a target is
Figure GDA0002470717950000033
When in use
Figure GDA0002470717950000034
Then, sampling obtains N1The image of each region block is used as a positive sample when
Figure GDA0002470717950000035
Then, sampling obtains N2Individual region block image as negative sample, r1、r2And r3Respectively, the sampling radius; extracting positive and negative sample data characteristic xjEstablishing a training sample set of target image blocks to be tracked, and collecting the training sample set (N)1+N2) Using the target image block as a training sample set X { (X)j,yj)},j=1,...,N1+N2The class label of the positive exemplar is 1, the class label of the negative exemplar is 0, yj∈ {1,0}, sequentially scrambling and rearranging the samples in the training sample set, and taking the samples (usually lower proportion) in certain proportion at the top as the marked sample set XsTaking the rest samples (usually with higher proportion) as the unlabeled sample set XuAnd X ═ Xs,Xu}。
The fourth step is that: setting input weight and hidden layer bias in a random mode, and if (a, b) represents the input weight a and threshold b obtained by hidden layer nodes, taking a training sample as a labeled data set
Figure GDA0002470717950000036
Annotation-free data set
Figure GDA0002470717950000037
Wherein XsAnd XuRepresenting an input sample, YsIs a reaction of with XsA corresponding output sample; the mapping function of the hidden layer is g (x), and the mapping function form can be expressed as g (x) 1/(1+ e)-x) The output weight is represented by β, h (x)i)=[G(a1,b1,xi),...,G(am,bm,xi)]s×mRepresenting the output matrix of the ith hidden layer, the node number of the hidden layer is m, eiIndicating the learning error (residual) of the ith input node.
The objective function of the semi-supervised extreme learning machine is as follows:
Figure GDA0002470717950000041
wherein C isiThe method comprises the steps of representing a penalty parameter, representing a balance parameter by lambda, obtaining a graph Laplacian operation result by labeled data and unlabeled data by L, obtaining an output matrix of a network by F, and obtaining trace operation by Tr;
the semi-supervised extreme learning machine objective function is expressed in a matrix form as follows:
Figure GDA0002470717950000042
wherein
Figure GDA0002470717950000043
Is the first s rows equal to YsThe last u rows equal to zero, C is the first s diagonal elements CiThe rest is a diagonal matrix of zero, and H is a hidden layer output matrix of the network;
the above equation is biased into β:
Figure GDA0002470717950000044
let the partial derivative be zero, solve to obtain an output weight β of:
when the tag data is greater than the number of nodes in the hidden layer
Figure GDA0002470717950000045
When the tagged data is less than the number of nodes in the hidden layer
Figure GDA0002470717950000046
Wherein HTIs the transpose of matrix H.
The method for judging whether the target is shielded or not is the result of the confidence map
Figure GDA0002470717950000047
Threshold th for occlusion1Is judged if
Figure GDA0002470717950000048
Hour, it indicates the object is blocked, th1The critical value representing the occlusion can be changed according to different scenes, and when the algorithm is applied to different scenes, the th is manually adjusted1The value, which normally fluctuates within a certain range, is the targetWhen the light source is shielded,
Figure GDA0002470717950000049
will rapidly decrease, and the value after the rapid decrease is defined as th1And judging whether the target is blocked or not according to the value.
The eighth step, the method for judging whether the network model of the on-line semi-supervised extreme learning machine needs to be updated is to respond to the maximum classification result TmaxUpdate threshold th2Is judged if Tmax>th2In time, it indicates that the online semi-supervised extreme learning machine network model does not need to be updated to update the threshold th2And judging whether the network model needs to be updated or not.
The invention has the beneficial effects that: the invention combines a continuous space-time confidence map learning tracking method and a semi-supervised extreme learning machine tracking method, and solves the problems of poor real-time performance and robustness, insufficient target space-time position information, unobvious target characteristics and deformation shielding to cause loss of a tracked target in the tracking process. The method comprises the steps of calculating a continuous space-time confidence map to judge the shielding threshold value, obtaining a method for judging whether a target enters a shielding region, effectively solving the problem of judging the shielding of the target, calculating the maximum response value output by a semi-supervised extreme learning machine network to judge the updating threshold value of the semi-supervised extreme learning machine network model, obtaining a method for judging whether the network model needs to be updated, and effectively solving the problem of poor generalization of the network model. The invention greatly improves the tracking precision and realizes a tracking process with good robustness and high robustness.
Drawings
FIG. 1 is a schematic diagram of an overall tracking process according to the present invention.
FIG. 2 is a region marker map of an object to be tracked in an embodiment.
FIG. 3 is a block diagram of a method for object tracking based on continuous spatiotemporal confidence maps in an exemplary embodiment.
Fig. 4 is a basic framework diagram of a semi-supervised extreme learning machine network.
FIG. 5 is a block diagram of a method for target tracking based on semi-supervised extreme learning in an exemplary embodiment.
Fig. 6 is an example of tracking effect under occlusion in the specific embodiment, where (a) in fig. 6 is a video frame with an object of interest to be tracked, and (b) in fig. 6, (c) in fig. 6, (d) in fig. 6, (e) in fig. 6, and (f) in fig. 6 are video frames for tracking the object of interest after the frame (a) in fig. 6, respectively.
Detailed Description
In order to make the objects, embodiments and advantages of the present invention clearer, the present invention is further illustrated by the following specific examples in conjunction with the accompanying drawings.
The specific flow chart of the invention is shown in fig. 1.
In this embodiment, a segment of classical corridor surveillance video, caviar (384 × 288 pixels, 25 frames per second), is specifically adopted as the video to be tracked.
The method comprises the steps of firstly, preprocessing a video sequence to be tracked by utilizing image filtering denoising and contrast enhancement, reducing noise and highlighting an interested region to be tracked; the method specifically comprises the following steps:
step 1-1, defining a section of classical corridor surveillance video caviar as A, and performing frame division processing to obtain 200 frames of video image sequences to be tracked, namely A ═ { I ═ I1,…,Ii,…I200In which IiRepresenting the ith frame of video image to be tracked of the corridor monitoring video caviar;
step 1-2, for the 200 frames video image sequence
Figure GDA0002470717950000051
And carrying out preprocessing of filtering and denoising and contrast enhancement.
Step two, at the t-1 frame to be tracked video image sequence It=1Selecting a target O to be tracked, and determining the central position O of the target*O represents the existence of a new target in the scene, O represents the position of the new target, and defines a two-dimensional confidence map model C of the target to be trackedt(o); establishing a prior model P (w (k) O) of the target to be tracked of the t-th frame, and calculating a space-time model of the t-th frame
Figure GDA0002470717950000061
As shown in fig. 3; the method specifically comprises the following steps:
step 2-1 at It=1Selecting an interested target O to be tracked by a user through a rectangular window W, wherein the width of the rectangular window of a target area is W, the height of the rectangular window of the target area is h, the O represents the position of a new target, and the target area is expanded by two times to form a local background area which is represented as
Figure GDA0002470717950000062
As shown in fig. 2; extracting intensity position characteristics w (k) at a coordinate position k in a local background region to form an intensity position characteristic set
Figure GDA0002470717950000063
I (k) represents the brightness of the image at coordinate position k,
Figure GDA0002470717950000064
represents the coordinate o*A neighborhood of (c);
step 2-2, converting the tracking problem into a problem of calculating a confidence map of the position of the target to be tracked:
Figure GDA0002470717950000065
wherein C ist(o) a confidence map model representing the t frame, representing the new target location o and the old target location o*The closer the new target position is to the old target position, the larger the confidence value is;
Figure GDA0002470717950000066
representing a space-time model and describing the relative positions and directions of a new target O and a coordinate point k of a local background area, and P (w (k) I O) represents a prior model and describes the position of an old target
Figure GDA0002470717950000067
Simulating low-level contour information of the target O to be tracked according to the intensity and the relative position direction of the coordinate point k in the local background area;
step 2-3, calculatingt 1 frame It=1Confidence map of
Figure GDA0002470717950000068
While obtaining a maximum confidence value
Figure GDA0002470717950000069
Step 2-4, calculating a prior model of the t-1 frame
Figure GDA00024707179500000610
Wherein
Figure GDA00024707179500000611
Is a scale parameter;
step 2-5, calculating t-1 frame It=1Confidence graph model Ct(O) and a prior model P (w (k), O) computing a spatio-temporal model of the object of interest for frame t ═ 1
Figure GDA00024707179500000612
Figure GDA00024707179500000613
Wherein
Figure GDA00024707179500000614
Which represents a fast fourier transform of the signal,
Figure GDA00024707179500000615
representing an inverse fast fourier transform.
Step three, overlapping and sampling the area where the central position of the target to be tracked is located to obtain N1Each region block image as a positive sample and N2Taking the image of each region block as a negative sample, and extracting positive and negative sample data characteristics xjEstablishing a labeled sample set
Figure GDA0002470717950000071
And label-free sample set XuForm a training sample set X { (X)j,yj)},j=1,...,N1+N2As shown in fig. 5, the method specifically includes the following steps:
step 3-1, at the center position o*In the region, overlapped sampling is carried out according to the size of a rectangular window of a target region, and the Euclidean distance from the jth sampling point to the central position of a target is
Figure GDA0002470717950000072
When in use
Figure GDA0002470717950000073
Then, 45 area block images are obtained as positive samples by sampling, and when the positive samples are obtained
Figure GDA0002470717950000074
Then, 31 area block images are obtained by sampling as negative samples, and the radius r is sampled1、r2And r3Set parameters 5, 10 and 20 (unit: pixel), respectively;
step 3-2, extracting positive and negative sample data characteristics xjEstablishing a training sample set of target image blocks to be tracked, and collecting 76 target image blocks as a training sample set X { (X)j,yj) 1, 76, the class label of the positive exemplar is 1, the class label of the negative exemplar is 0, yj∈{1,0};
Step 3-3, disordering and rearranging the sample sequence in the training sample set, and taking the first 50 samples as the marked sample set XsTaking the remaining 26 samples as the unlabeled sample set XuAnd X ═ Xs,Xu}。
Step four, training the online semi-supervised extreme learning machine network model by using the training sample set X obtained in the step three, and specifically comprising the following steps of:
step 4-1, the semi-supervised extreme learning machine is a single hidden layer feedforward neural network model as shown in fig. 4, and the whole network model is divided into three layers, including: the input layer, the hidden layer and the output layer adopt a random mode to set the input weight and the hidden layer bias, are independent of the training sample, have simple algorithm structure and high calculation efficiency, and if (a, b) represents the hidden layer node to obtainThe training sample is a labeled data set
Figure GDA0002470717950000075
Annotation-free data set
Figure GDA0002470717950000076
Wherein XsAnd XuRepresenting an input sample, YsIs a reaction of with XsA corresponding output sample; the mapping function of the hidden layer is g (x), and the mapping function form can be expressed as g (x) 1/(1+ e)-x) The output weight is represented by β, h (x)i)=[G(a1,b1,xi),…,G(a2000,b2000,xi)]50×2000Representing the output matrix of the ith hidden layer, the node number of the hidden layer is 2000, eiA learning error (residual) representing the ith input node;
step 4-2, the objective function of the semi-supervised extreme learning machine to be trained is as follows:
Figure GDA0002470717950000077
wherein C isiRepresenting penalty parameters, lambda representing balance parameters, L being the graph Laplace operation result obtained by the labeled data and the unlabeled data, F being the output matrix of the network, and TrIs a trace operation;
and 4-3, expressing the objective function of the semi-supervised extreme learning machine in a matrix form as follows:
Figure GDA0002470717950000081
wherein
Figure GDA0002470717950000082
Is the first 50 rows equal to YsThe last 26 rows equal the output label sample of zero. C is the first 50 diagonal elements as CiThe remaining is the diagonal matrix of zero;
step 4-4, obtaining the partial derivative of β by the above formula pair:
Figure GDA0002470717950000083
step 4-5, making the partial derivative zero, solving to obtain an output weight β as follows:
when the tag data is greater than the number of nodes in the hidden layer
Figure GDA0002470717950000084
When the tagged data is less than the number of nodes in the hidden layer
Figure GDA0002470717950000085
Wherein HTAnd the matrix is a transposed matrix of the matrix H, and the semi-supervised extreme learning machine network model training is finished.
Step five, in the t +1 th frame, utilizing the t frame space-time model obtained in the step two
Figure GDA0002470717950000086
Updating the model, and calculating to obtain the space-time model of the t +1 th frame
Figure GDA0002470717950000087
Using the obtained t +1 th frame space-time model
Figure GDA0002470717950000088
Convolution image It+1Obtaining a space-time confidence map C of the new targett+1(o) maximizing the confidence map C thus obtainedt+1(o) determining the target position o in the t +1 th frame, as shown in fig. 3, specifically including the following steps:
step 5-1 at It+1In o*Taking a local background area twice the target size for the target location
Figure GDA0002470717950000089
Extracting intensity position characteristic in the region to form an intensity position characteristic set
Figure GDA00024707179500000810
Step 5-2, updating a spatio-temporal model of the target of interest to be tracked in the t frame:
Figure GDA00024707179500000811
where p is the learning rate, where,
Figure GDA00024707179500000812
is the interesting object space-time model to be tracked calculated in the t frame,
Figure GDA00024707179500000813
expressed in the frequency domain as:
Figure GDA00024707179500000814
wherein
Figure GDA0002470717950000091
Is that
Figure GDA0002470717950000092
Time domain fourier transform of (a). Time domain filter FwExpressed as:
Fw=ρ/(ejw-(1-ρ))
where j is an imaginary unit;
step 5-3, calculating a t +1 frame confidence map of the target of interest to be tracked:
Figure GDA0002470717950000093
step 5-4, in the t +1 frame, the position o of the interested target, namely the confidence map of the maximized t +1 frame:
o=argmaxCt+1(o)
maximum confidence value of
Figure GDA0002470717950000094
Judging whether the target is shielded, and specifically comprising the following steps:
step 6-1, obtaining the maximum confidence value of the step 5-4
Figure GDA0002470717950000095
Threshold th for occlusion1Is judged if
Figure GDA0002470717950000096
And if so, indicating that the target is blocked, and judging whether the target is blocked or not. th (h)1The critical value representing the occlusion can be changed according to different scenes, and when the algorithm is applied to different scenes, the th is manually adjusted1The value, which normally fluctuates over a certain range, when the target is occluded,
Figure GDA0002470717950000097
will rapidly decrease, and the value after the rapid decrease is defined as th1A value; in this detailed description, definitions
Figure GDA0002470717950000098
Step 6-2, if
Figure GDA0002470717950000099
If so, indicating that the target is not shielded, performing the step 5-1, otherwise, performing the step 7-1.
Step seven, in the t +1 th frame, overlapping sampling is carried out at the position where the target center has been tracked in the t th frame, candidate target data characteristics are extracted, a target image block test sample to be tracked is established, the test sample is input into the trained online semi-supervised extreme learning machine, and the maximum classification response position in the test sample is the predicted new target position, as shown in fig. 5, the method specifically comprises the following steps:
step 7-1, for the t +1 th frame video image, add*Is a target position at a central position o*In the region, overlap sampling according to the size of rectangular window of target region, sampling at jthSampling point to o*Has a Euclidean distance of
Figure GDA00024707179500000910
When in use
Figure GDA00024707179500000911
Sampling to obtain 232 area block images as candidate targets, namely test data, and extracting sample data with characteristics of
Figure GDA00024707179500000912
And recording the test set as
Figure GDA00024707179500000913
Radius of sampling r1The set parameter is 20 (unit: pixel);
and 7-2, testing output is as follows:
T=H*β
where β is the output weight calculated for the t frame, H*For the hidden layer output matrix to be tested,
Figure GDA0002470717950000101
step 7-3, in the t +1 frame, the position o of the target of interest to be tracked is the maximum classification response position of the maximum t +1 frame semi-supervised extreme learning machine:
o=arg max T
maximum class response value of Tmax
And step eight, performing online semi-supervised extreme learning machine network model updating threshold judgment on the maximum classification response result, and specifically comprising the following steps:
step 8-1, responding to the maximum classification result TmaxUpdating threshold th for semi-supervised extreme learning machine2Is judged if Tmax>th2Then, it indicates that the network model of the on-line semi-supervised extreme learning machine does not need to be updated, so as to judge whether the model of the on-line semi-supervised extreme learning machine needs to be updated th2Indicating updated threshold value, th in this embodiment2=0。
Step 8-2, if TmaxAnd if the model is more than 0, the online semi-supervised extreme learning machine network model does not need to be updated, and the step 5-1 is carried out, otherwise, the step nine is carried out.
Step nine, retraining the online semi-supervised extreme learning machine network model, as shown in fig. 5, specifically as follows: annotated data set obtained in step 3-3
Figure GDA0002470717950000102
And the test set obtained in step 7-1
Figure GDA0002470717950000103
As an unlabeled data set Xu=Xt+1And 4-1, retraining the online semi-supervised extreme learning machine network model.
And repeating the steps circularly until the tracking of the whole monitoring video sequence to be tracked is completed.
For the above-mentioned surveillance video to be tracked, the results of the comparison of the tracking performance by the particle filtering method, the Meanshift method and the method of the present invention are shown in Table 1, and it can be seen that the method of the present invention is superior to the particle filtering method and the Meanshift method in both the center position deviation result and the deviation mean square error result, and the robustness and robustness of the target tracking are realized.
Table 1 shows the comparison of tracking performance of particle filter, Meanshift and the method of the present invention
Particle filter Meanshift The method of the invention
Deviation of center position 75.4796 22.9740 10.1834
Mean square deviation of deviation 47.8903 12.2607 7.9702
Fig. 6 is an example of the tracking effect under the occlusion condition in the specific embodiment, and it can be seen that the upper target can be accurately tracked under the condition of two times of serious occlusions, further proving the robustness and robustness of the method of the present invention.
The above is the preferred embodiment of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (5)

1. A target tracking method based on a continuous space-time confidence map and a semi-supervised extreme learning machine is characterized by comprising the following steps:
step one, collecting n frames of target video A to be tracked in a specific monitored scene to be tracked, wherein the video A is equal to { I }1,…,Ii,…InIn which IiRepresenting the ith frame to-be-tracked video image sequence, and preprocessing the to-be-tracked video sequence by utilizing image filtering denoising and contrast enhancement
Figure FDA0002461845200000011
Noise reduction and highlighting of the region of interest to be tracked;
step two, in the t frame to be tracked video image sequence ItSelecting a target O to be tracked by using a rectangular window, and determining the central position O of the target*O represents the existence of a new target in the scene, and O represents the position of the new target, and defines a two-dimensional confidence map model C of the target O to be trackedt(o); the target area to be tracked is enlarged by two times to form a local background area which is expressed as
Figure FDA0002461845200000012
In that
Figure FDA0002461845200000013
The intensity position characteristic w (k) at the coordinate position k is internally extracted to form an intensity position characteristic set
Figure FDA0002461845200000014
I (k) represents the brightness of the image at coordinate position k,
Figure FDA0002461845200000015
represents the coordinate o*A neighborhood of (c); establishing a prior model P (w (k) O) of the target to be tracked of the t-th frame, and calculating a space-time model of the t-th frame
Figure FDA0002461845200000016
Step three, overlapping and sampling the area where the central position of the target to be tracked is located to obtain N1Each region block image as a positive sample and N2Taking the image of each region block as a negative sample, and extracting positive and negative sample data characteristics xjThe class label of the positive exemplar is 1, the class label of the negative exemplar is 0, yj∈ {1,0}, establishing labeled sample set
Figure FDA0002461845200000017
And label-free sample set XuForm a training sample set X ═ Xs,Xu}={(xj,yj)},j=1,...,N1+N2
Step four, training a semi-supervised extreme learning machine network model by using the training sample set X obtained in the step three;
step five, in step It+1In the second step, the t frame space-time model obtained in the second step is used
Figure FDA0002461845200000018
Updating the model, and calculating to obtain the space-time model of the t +1 th frame
Figure FDA0002461845200000019
Using the obtained t +1 th frame space-time model
Figure FDA00024618452000000110
Convolution It+1Obtaining a space-time confidence map C of the new targett+1(o), maximize Ct+1(o) determining a target position o in the t +1 th frame;
step six, judging whether the target is shielded, if not, entering the step five, otherwise, entering the step seven;
step seven, in step It+1In the formula ItO obtained in*Is a target position, at a target position o*In the region, overlapping sampling is carried out according to the size of a rectangular window of a target region, N region block images are obtained to be used as candidate targets, and the data characteristics of the candidate targets are extracted
Figure FDA00024618452000000111
Establishing a target image block test sample set to be tracked
Figure FDA00024618452000000112
Inputting the test sample set into the trained semi-supervised extreme learning machine network model in the fourth step to obtain a T +1 th frame test output T, and maximizing the maximum classification response position of the semi-supervised extreme learning machine network model to obtain a target position o in the T +1 th frame;
step eight, judging the updating threshold of the network model of the semi-supervised extreme learning machine according to the maximum classification response result, if the network model of the semi-supervised extreme learning machine does not need to be updated, entering the step five, otherwise entering the step nine;
step nine, the labeled data set obtained in step three
Figure FDA0002461845200000021
And the test sample set obtained in the step seven
Figure FDA0002461845200000022
As an unlabeled data set Xu=Xt+1Step four, retraining the semi-supervised extreme learning machine network model;
and repeating the steps circularly until the tracking is completed on the whole video sequence.
2. The method for tracking an object according to claim 1, wherein the third step is specifically: at the central position o of the target to be tracked*In the region, overlapped sampling is carried out according to the size of a rectangular window of a target region, and the Euclidean distance from the jth sampling point to the central position of a target is
Figure FDA0002461845200000027
When in use
Figure FDA0002461845200000028
Then, sampling obtains N1The image of each region block is used as a positive sample when
Figure FDA0002461845200000029
Then, sampling obtains N2Individual region block image as negative sample, r1、r2And r3Respectively, the sampling radius; extracting positive and negative sample data characteristic xjEstablishing a training sample set of target image blocks to be tracked, and collecting the training sample set (N)1+N2) Using the target image block as a training sample set X { (X)j,yj)},j=1,...,N1+N2The class label of the positive exemplar is 1, the class label of the negative exemplar is 0, yj∈ {1,0}, disordering and rearranging the order of the samples in the training sample set, and taking the samples in the first certain proportion as the marked sample set XsTaking the residual samples as the unlabeled sample set XuAnd X ═ Xs,Xu}。
3. The method for tracking an object according to claim 1, wherein the fourth step is specifically: setting input weight and hidden layer bias in a random mode, and if (a, b) represents the input weight a and threshold b obtained by hidden layer nodes, taking a training sample as a labeled data set
Figure FDA0002461845200000023
Annotation-free data set
Figure FDA0002461845200000024
Wherein XsAnd XuRepresenting an input sample, YsIs a reaction of with XsA corresponding output sample; the mapping function of the hidden layer is g (x), and the mapping function form can be expressed as g (x) 1/(1+ e)-x) The output weight is represented by β, h (x)i)=[G(a1,b1,xi),…,G(am,bm,xi)]s×mRepresenting the output matrix of the ith hidden layer, the node number of the hidden layer is m, eiRepresenting the learning error of the ith input node;
the objective function of the semi-supervised extreme learning machine is as follows:
Figure FDA0002461845200000025
Figure FDA0002461845200000026
fi=h(xi)β,i=1,...,s+u
wherein C isiThe method comprises the steps of representing a penalty parameter, representing a balance parameter by lambda, obtaining a graph Laplacian operation result by labeled data and unlabeled data by L, obtaining an output matrix of a network by F, and obtaining trace operation by Tr;
the semi-supervised extreme learning machine objective function is expressed in a matrix form as follows:
Figure FDA0002461845200000031
wherein
Figure FDA0002461845200000032
Is the first s rows equal to YsThe last u rows equal to zero, C is the first s diagonal elements CiThe remaining is the diagonal matrix of zero; h is the hidden layer output matrix of the network;
the above equation is biased into β:
Figure FDA0002461845200000033
let the partial derivative be zero, solve to obtain an output weight β of:
when the tag data is greater than the number of nodes in the hidden layer
Figure FDA0002461845200000034
When the tagged data is less than the number of nodes in the hidden layer
Figure FDA0002461845200000035
Wherein HTIs the transpose of matrix H.
4. The method of claim 1, wherein the sixth step of determining whether the target is occluded is a result of a confidence map
Figure FDA0002461845200000036
Threshold th for occlusion1Is judged if
Figure FDA0002461845200000037
Hour, it indicates the object is blocked, th1Indicating occlusionThe critical value can be changed according to different scenes, and when the algorithm is applied to different scenes, the th is manually adjusted1The value, which normally fluctuates over a certain range, when the target is occluded,
Figure FDA0002461845200000038
will rapidly decrease, and the value after the rapid decrease is defined as th1And judging whether the target is blocked or not according to the value.
5. The method for tracking an object as claimed in claim 1, wherein the step eight of determining whether the semi-supervised extreme learning machine network model needs to be updated is to respond to the maximum classification result TmaxUpdate threshold th2Is judged if Tmax>th2In time, it is shown that the semi-supervised extreme learning machine network model does not need to be updated to update the threshold th2And judging whether the network model needs to be updated or not.
CN201710047829.0A 2017-01-20 2017-01-20 Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine Active CN106815576B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710047829.0A CN106815576B (en) 2017-01-20 2017-01-20 Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710047829.0A CN106815576B (en) 2017-01-20 2017-01-20 Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine

Publications (2)

Publication Number Publication Date
CN106815576A CN106815576A (en) 2017-06-09
CN106815576B true CN106815576B (en) 2020-07-07

Family

ID=59111417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710047829.0A Active CN106815576B (en) 2017-01-20 2017-01-20 Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine

Country Status (1)

Country Link
CN (1) CN106815576B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423762A (en) * 2017-07-26 2017-12-01 江南大学 Semi-supervised fingerprinting localization algorithm based on manifold regularization
CN109255321B (en) * 2018-09-03 2021-12-10 电子科技大学 Visual tracking classifier construction method combining history and instant information
CN110211104B (en) * 2019-05-23 2023-01-06 复旦大学 Image analysis method and system for computer-aided detection of lung mass
CN110675382A (en) * 2019-09-24 2020-01-10 中南大学 Aluminum electrolysis superheat degree identification method based on CNN-LapseLM
CN113378673B (en) * 2021-05-31 2022-09-06 中国科学技术大学 Semi-supervised electroencephalogram signal classification method based on consistency regularization
CN113408984B (en) * 2021-06-21 2024-03-22 北京思路智园科技有限公司 Dangerous chemical transportation tracking system and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8254633B1 (en) * 2009-04-21 2012-08-28 Videomining Corporation Method and system for finding correspondence between face camera views and behavior camera views
CN102663453B (en) * 2012-05-03 2014-05-14 西安电子科技大学 Human motion tracking method based on second generation Bandlet transform and top-speed learning machine
CN103942749B (en) * 2014-02-24 2017-01-04 西安电子科技大学 A kind of based on revising cluster hypothesis and the EO-1 hyperion terrain classification method of semi-supervised very fast learning machine
CN104992453B (en) * 2015-07-14 2018-10-23 国家电网公司 Target in complex environment tracking based on extreme learning machine
CN106296734B (en) * 2016-08-05 2018-08-28 合肥工业大学 Method for tracking target based on extreme learning machine and boosting Multiple Kernel Learnings

Also Published As

Publication number Publication date
CN106815576A (en) 2017-06-09

Similar Documents

Publication Publication Date Title
CN106815576B (en) Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine
CN107016357B (en) Video pedestrian detection method based on time domain convolutional neural network
US20210248378A1 (en) Spatiotemporal action detection method
CN108830285B (en) Target detection method for reinforcement learning based on fast-RCNN
CN106971152B (en) Method for detecting bird nest in power transmission line based on aerial images
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN108320306B (en) Video target tracking method fusing TLD and KCF
CN111582349B (en) Improved target tracking algorithm based on YOLOv3 and kernel correlation filtering
CN107633226A (en) A kind of human action Tracking Recognition method and system
CN112364865B (en) Method for detecting small moving target in complex scene
CN114241511B (en) Weak supervision pedestrian detection method, system, medium, equipment and processing terminal
CN109685045A (en) A kind of Moving Targets Based on Video Streams tracking and system
CN112884712A (en) Method and related device for classifying defects of display panel
Jiang et al. A self-attention network for smoke detection
de Silva et al. Towards agricultural autonomy: crop row detection under varying field conditions using deep learning
CN115375737B (en) Target tracking method and system based on adaptive time and serialized space-time characteristics
CN112036300B (en) Moving target detection method based on multi-scale space-time propagation layer
CN111242026A (en) Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN111091101A (en) High-precision pedestrian detection method, system and device based on one-step method
CN111340842A (en) Correlation filtering target tracking algorithm based on joint model
CN111429485B (en) Cross-modal filtering tracking method based on self-adaptive regularization and high-reliability updating
Wang et al. An efficient attention module for instance segmentation network in pest monitoring
Kadim et al. Deep-learning based single object tracker for night surveillance.
Chen et al. A unified model sharing framework for moving object detection
CN107729811B (en) Night flame detection method based on scene modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant