CN110570451A

CN110570451A - multithreading visual target tracking method based on STC and block re-detection

Info

Publication number: CN110570451A
Application number: CN201910716977.6A
Authority: CN
Inventors: 汪鼎文; 陈曦; 王泉德; 孙世磊; 瞿涛
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2019-12-13
Anticipated expiration: 2039-08-05
Also published as: CN110570451B

Abstract

the invention provides a multithreading visual target tracking method based on STC and block re-detection, which comprises the following steps: s1, reading the first frame image and determining a tracking target; s2, establishing a space context model for the first frame image by adopting an STC algorithm; s3, performing blocking operation on a rectangular template area where a tracking target in the first frame image is located, and training an SVM classifier; s4, reading the next frame of image, learning from the previous frame of image to obtain a spatial context model, and calculating a target neighborhood context prior; s5, updating the space-time context model of the current frame image; s6, acquiring a confidence map of the current frame image; s7, judging the occlusion degree of the tracking target in the current frame image according to the confidence probability; s8, selecting a corresponding processing strategy according to the judged shielding condition of the tracked target; and S9, circularly executing the steps S4-S8 until the current video or image sequence is processed. The invention can improve the reliability and efficiency of target tracking.

Description

Multithreading visual target tracking method based on STC and block re-detection

Technical Field

The invention relates to the field of computer vision application target detection and tracking, in particular to a multithreading vision target tracking method based on STC and block re-detection.

background

visual target tracking is an important research direction in computer vision, and has extremely wide application in the fields of military unmanned aircrafts, accurate guidance, air early warning, civil video monitoring, man-machine interaction, unmanned driving and the like, however, the target tracking faces the challenges of target scale change, severe shielding, rapid movement, beyond-view field, illumination change and the like, so that the reliable real-time visual target tracking method has very important practical significance.

The traditional visual target tracking method effectively adopts time context information, tracks the position and the scale of a target in a current frame according to the position and the scale information of an image target of a previous frame, establishes a space-time relation between a tracked target and a local context based on a Bayesian frame on the basis of STC, converts the tracking problem into a calculation confidence map, obtains the problem of the target position by maximizing a target position likelihood function, has good tracking effect on slight shielding, posture change and illumination change of the target, can meet the requirement of real-time processing, and easily causes tracking failure under the conditions of heavy shielding, quick movement, beyond visual field and the like of the target in the long-term tracking process. Considering that the time interval between two adjacent frames of images is short, even if the tracking target is seriously shielded, the local area near the target still has no change, the LCT adopts a correlation filter-based kernel ridge regression method to code an appearance template consisting of the target object and the surrounding environment thereof, the self-adaptive template constructed by the extracted features can resist the severe shielding, the rapid movement and the serious deformation, and the problem that the traditional correlation filter-based tracking method generates drift in the long-time tracking process is solved. In addition, the LCT trains another related filter to estimate the target scale change, and a multi-scale target pyramid is constructed by adopting the HOG characteristics to search the optimal target scale in detail. For the case that the tracking fails due to the long-term severe occlusion and beyond of the field of view of the target during the long-term tracking process, which requires re-detection, the LCT trains the online detector by using a random fern classifier (random fern) and scans the window when activated. However, the target tracking only by adopting the LCT algorithm has the problem of extremely low tracking speed.

Disclosure of Invention

The invention aims to provide a multithreading visual target tracking method based on STC and block re-detection, and aims to solve the problems that tracking failure and low tracking speed are easily caused under the conditions of heavy target shielding, quick movement, beyond visual field and the like in the long-term target tracking process of the conventional target tracking method.

the invention provides a multithreading visual target tracking method based on STC and block re-detection, which comprises the following steps:

S1, reading a first frame image of the video or image sequence, and determining a tracking target;

S2, establishing a space context model for the first frame image by adopting an STC algorithm;

s3, performing blocking operation on the rectangular template area where the tracking target in the first frame image is located, and training an SVM classifier by adopting blocks of the artificially selected target rectangular area in the first frame image;

s4, reading the next frame of image, and learning from the previous frame of image to obtain a spatial context model;

S5, updating the space-time context model of the current frame image according to the space context model learned from the previous frame image in the step S4;

S6, acquiring a confidence map of the current frame image, and acquiring the target position and the confidence probability of the current frame image by maximizing the confidence map;

s7, judging the occlusion degree of the tracking target in the current frame image according to the confidence probability obtained in the step S6;

s8, selecting a corresponding algorithm to update the target position according to the shielding degree of the tracking target judged in the step S7;

And S9, circularly executing the steps S4-S8 until the current video or image sequence is processed.

Further, the blocking performed on the first frame image in step S3 includes vertical blocking in which each sub-region is a rectangular region half-height and one-tenth width of the template region, and horizontal blocking in which each sub-region is a rectangular region half-width and one-tenth height of the template region, and is divided into forty sub-regions in total.

further, the step S4 specifically includes,

Setting the current frame image as t +1 th frame image, and determining the target position x of the previous frame image, i.e. the t th frame image^*learning from the t frame image to obtain a spatial context modelThe specific method comprises the following steps:

The target neighborhood context prior probability is calculated by:

p(c(z)|o)＝I(z)w_σ(z-x^*)

wherein p (c (z) | o) is the prior probability of the context of the target neighborhood and describes the characteristics of a local image region, z is the position of each target neighborhood, and c (z) is the context characteristic set X in the current frame^c＝{c(z)＝(I(z),z)|z∈Ω_c(x^*) O is the tracking target appearing in the current scene, I (z) is the image gray value at position z, w_σ(. cndot.) is a weight function defined by:

Wherein a is a normalization constant, so that the value of p (c (z) | o) is within the interval [0,1], and σ is a scale parameter;

The target location likelihood probability confidence map formula is as follows:

whereinFor convolution operation, x ∈ R²Is the target position, b is the normalized constant, β is the shape parameter, Ω_c(x^*) Denotes x^*a local context area of (a);

To perform fast convolution calculations using FFT, the above equation is transformed to the frequency domain:

whereinfor FFT transformation, an element-to-bit multiplication operation;

deriving a spatial context model h from the above equation^sc(x) The calculation formula (c) is as follows:

WhereinIs an inverse FFT transformation.

Further, the step S5 specifically includes,

Spatial context model learned from t-th frame image by step S4The spatio-temporal context model used for updating the t +1 th frame image is as follows:

where p is a learning parameter, where,Is a spatial context model learned from the image of the t-th frame,Is a spatio-temporal context model cumulatively obtained in the first t frames of a video or image sequence.

Further, the formula for obtaining the confidence map of the current frame image in step S6 is as follows:

Indicating the target location in the t-th frame.

further, in the step S7, when the confidence probability value is greater than a1, it is determined that the tracked target is not occluded; when the confidence probability value is between a2 and a1, judging that the tracking target is in general occlusion; when the confidence probability value is less than a2, judging that the tracking target is seriously shielded or exceeds the field of vision; wherein 1> a1> a2> 0; wherein a1 is 0.75, and a2 is 0.3.

Further, in step S8, (1) if it is determined that the tracked target is not occluded, the current frame image continues to perform target tracking by using STC, and the confidence map c obtained in step S6 is maximized_t+1(x) To obtain the target position

That is, the target position of the current frame image obtained by maximizing the confidence map in step S6 is the final target position;

And updating the target scale, wherein the target size may change in the video image sequence, so the weight function w_σIn (1)The scale parameter σ should also be updated, and the update strategy for σ is as follows:

Wherein c is_t(. is a confidence distribution, s'_tis the target scale estimated from two consecutive frames of images,is the average of the scale estimates, λ, of the previous n successive frames of the image>0 is a given filter parameter;

meanwhile, the rectangular template area where the tracking target is located in the current frame image is subjected to blocking operation, the integral histogram is adopted to extract the gray level histogram of each sub-rectangular area, and the image HOG characteristic of the area where the target is located is used for training an SVM classifier;

(2) if the tracked target is judged to be in general occlusion, continuously adopting STC to update the target position, further performing blocking operation on a rectangular template region where the tracked target is located in the current frame image when the confidence probability value is less than 0.7, extracting a gray level histogram of each sub-rectangular region by adopting an integral histogram, and training an SVM classifier by using image HOG characteristics of the region where the target is located;

(3) If the tracked target is judged to be seriously shielded or exceed the field of vision, the condition that the tracked target is shielded in the current frame image frame is further judged by adopting the blocks, and the method specifically comprises the following steps:

Defining a search range with the radius r by taking the target position obtained by the previous frame of image as the center, wherein for each position (x, y) in the search range, a rectangular target area with the (x, y) as the center corresponds to the target area of the previous frame, adopting blocking operation to rectangularly block the rectangular target area where the position (x, y) is located, and adopting an integral histogram to extract a gray histogram of each block;

Calculating the similarity between the blocks by adopting EMD, obtaining a similarity map which is close to the size of a search area for each block, obtaining an EMD value corresponding to a sub-block with the highest similarity with the current sub-area block in the range of the search area by minimizing the similarity map, sequencing the values corresponding to all the blocks from small to large, updating the target position by adopting STC when the fifth value is smaller than a set threshold value and the shielded degree of the target is lighter, otherwise, scoring by adopting an SVM classifier when the target is seriously shielded or exceeds the visual field, and repositioning the target position by maximizing the scoring map.

Further, EMD is defined as follows:

Setting:

Wherein:

p_iFeatures representing a picture, q_ja feature representing another picture is shown in the figure,represents a feature p_ithe weight of (a) is determined,representative of characteristic q_jweight of d_ijRepresents p_iand q is_jthe distance between them;

The problem is optimized by:

s.t

f_ij≥0,i＝1,2,...,M；j＝1,2,...,N；

M, N show the number of combinations of features and weights in P, Q sets, respectively, to solve for f_ijthe definition of EMD is:

when:

When the histogram is similar to the histogram, the EMD satisfies the triangle inequality, i.e., the EMD is the distance.

The invention also provides a computer-readable storage medium, on which a computer program is stored which, when executed, implements the method of any one of claims 1 to 8.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1 to 8 when executing the program.

compared with the prior art, the invention has the following beneficial effects:

according to the multithreading visual target tracking method based on STC and block re-detection, an algorithm main body adopts STC target tracking, can process the general target detection condition, meanwhile, the re-detection strategy of LCT is used for reference, and the re-detection is carried out on the detection failure problem caused by the conditions that the target is heavily shielded, rapidly moves, exceeds the visual field and the like; in order to realize simplification and effectiveness of the algorithm, the rectangular target template image is subjected to blocking operation to obtain a plurality of predefined rectangular sub-regions, the integral histogram is adopted to calculate the gray level histogram in the rectangular sub-regions, and the gray level histogram is compared with the gray level histogram of the target template image block in the first frame image to train an SVM classifier, so that a KNN classifier and a random fern classifier adopted in the LCT algorithm are replaced to realize target redetection. The invention can improve the reliability of target tracking and the efficiency of target tracking.

drawings

Fig. 1 is an overall flowchart of a multi-thread visual target tracking method based on STC and block re-detection according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an STC spatiotemporal context model and a t +1 th frame image tracking according to an embodiment of the present invention;

fig. 3 is a schematic diagram of horizontal blocks and vertical blocks of an image according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 and fig. 2, an embodiment of the present invention provides a multithread visual target tracking method based on STC and block re-detection, including the following steps:

S1, opening a first frame image of a video or an image sequence, manually setting or determining a tracking target through a target detection algorithm, and simultaneously starting an LCT thread for training an SVM classifier, wherein the trained SVM classifier is used for target re-detection.

and S2, establishing a spatial context model for the first frame image by adopting an STC algorithm, and initializing parameters related to blocking processing and classification training in the LCT.

s3, performing a block dividing operation on the rectangular template region where the tracking target in the first frame image is located, as shown in fig. 3, where the block dividing includes a vertical block and a horizontal block, each sub-region in the vertical block is a rectangular region half-high and one-tenth-high of the template region, and each sub-region in the horizontal block is a rectangular region half-wide and one-tenth-high of the template region, and is divided into forty sub-regions in total. Wherein the number of sub-regions can be adjusted according to the performance requirements of target tracking.

S4, reading the next frame of image, learning from the previous frame of image to obtain a spatial context model, and calculating the context prior of the target neighborhood, wherein the specific processing method comprises the following steps:

setting the current frame image as the t +1 th frame image and the target position in the t frame image as x^*learning from the t frame image to obtain a spatial context modelthe specific method comprises the following steps:

The prior probability p (c (z) | o) of the context of the target neighborhood describing the features of the local image region is calculated as follows:

p(c(z)|o)＝I(z)w_σ(z-x^*)

Where z is an arbitrary position in the target neighborhood, c (z) is the context feature set X in the current frame^c＝{c(z)＝(I(z),z)|z∈Ω_c(x^*)}(Ω_c(x^*) Denotes x^*local context region of (a), o denotes the presence of a tracking target in the image, I (z) is the image grey value at position z, w_σ(. cndot.) is a weight function defined by:

WhereinFor convolution, x is the position of the target in the current frame, b is the normalization constant, and β is the shape factor. To perform fast convolution calculations using FFT, the above equation is transformed to the frequency domain:

Whereinfor FFT transformation, an element-to-bit multiplication operation. The spatial context model h is obtained from the above equation^sc(x) Comprises the following steps:

WhereinIs an inverse FFT transformation.

S5, updating the spatio-temporal context model of the current frame image according to the spatial context model learned from the previous frame image in step S4, with the following formula:

Where p is a learning parameter, where,Is a spatial context model learned from the image of the t-th frame,Is a spatio-temporal context model cumulatively obtained in the first t frames of a video or image sequence;

The above equation can be seen as a temporal filtering process that is easily observed in the frequency domain,

Whereinis thatthe time-domain fourier transform of (a),

Wherein j is an imaginary unit, and ω is frequency;

Is defined byin a similar manner to that described above,time domain filtering F_ωThen the result is given by the following equation,

wherein F_ωis a low pass filter. Therefore, the space-time context model can effectively filter image noise caused by target state change in a video image sequence and enhance the stability of a target tracking algorithm.

s6, calculating the confidence map of the current frame according to the following formula:

whereinAnd representing the target position in the t-th frame, and obtaining the target position in the current frame and the confidence probability thereof by maximizing the confidence map.

s7, judging the degree of the occlusion of the tracking target in the current frame image according to the confidence probability obtained in the step S6: when the confidence probability value is greater than a1, judging that the tracked target has no occlusion, including the condition that the occlusion is light; when the confidence probability value is between a2 and a1, judging that the tracking target is in general occlusion; when the confidence probability value is less than a2, judging that the tracking target is seriously shielded or exceeds the field of vision; the formula is shown in the specification, wherein 1> a1> a2>0, a1 and a2 are determined according to actual conditions, and a1 is 0.75 and a2 is 0.3 in actual application.

s8, selecting a corresponding processing strategy according to the situation that the tracking target is blocked, which is judged in the step S7:

if the tracked target is judged to be not blocked, namely the confidence probability value is larger than 0.75, the current frame image continues to adopt STC for target tracking, and the confidence map c obtained in the step S6 is maximized_t+1(x) To obtain the target position in the t +1 th frame image

that is, the target position of the current frame image obtained by maximizing the confidence map in step S6 is the final target position.

and updating the target scale, wherein the target size may change in the video image sequence, so the weight function w_σthe scale parameter σ in (2) should also be updated, and the update strategy of σ is as follows:

wherein c is_t(. is a confidence distribution, s'_tis the target scale estimated from two consecutive frames of images. To avoid scalingis too sensitive and reduces the noise contribution, s, due to estimation errors_t+1by andupdating similar filtering to estimate, whereinIs the average of the scale estimates, λ, of the previous n successive frames of the image>0 is a given filter parameter.

Meanwhile, the LCT updating flag is set to be 1, and the LCT thread is continuously updated. When the confidence probability value is greater than 0.75, because the tracked target is not blocked or is slightly blocked, the current frame and the previous frame have higher similarity, after the STC processes the current image frame t +1, the rectangular region where the tracked target in the current frame image is located is subjected to blocking operation, the specific operation is similar to the blocking operation in the step S3, the integral histogram is adopted to extract the gray histogram of each block, and the number of times of occurrence of each gray value in the block is counted. By calculating the integral histogram of each position in an image, the integral histogram of any rectangular area in the image can be quickly solved, and the quick calculation of the integral histogram is realized. Simultaneously with the above processing, the image HOG features of the region where the target is located are used to train the SVM classifier for target relocation when needed.

if the tracked target is judged to be in a common shield, namely the confidence probability value is between 0.3 and 0.75, continuously updating the target position by adopting STC; because the confidence probability value of the STC is a change process from high to low in adjacent frames in the process from no shielding to general shielding or complete shielding of the target, if the LCT updating mark is still True and the confidence probability value is more than 0.7 at the moment, the target image begins to exceed the STC processing range due to shielding or deformation, the LCT is not updated any more, the LCT updating mark is False, and meanwhile, the LCT estimation mark is True; and when the confidence probability value is less than 0.7, continuously updating the LCT, and performing blocking operation on the rectangular template region where the tracking target is located in the current frame image, wherein the specific operation is similar to the blocking operation in the step S3, the integral histogram is adopted to extract the gray histogram of each sub-rectangular region, and the image HOG characteristic of the region where the target is located is used for training the SVM classifier so as to perform target relocation when needed.

if the tracked target is judged to be seriously shielded or exceed the field of view, namely the confidence probability value is less than 0.3, the target tracking of the STC algorithm fails, at the moment, the condition that the tracked target in the current frame image is shielded is further judged by adopting the blocks, and the specific processing method comprises the following steps:

because the moving range of the target between the adjacent frames is limited, the target position obtained from the image of the previous frame is taken as the center, the searching range with the radius of r is defined, for each position (x, y) in the searching range, a rectangular target area with the (x, y) as the center corresponds to the target area of the previous frame, the blocking operation of the step S3 is adopted, the rectangular target area corresponding to the position (x, y) is blocked, and the gray level histogram of each block is calculated according to the integral histogram of the whole image;

Then, EMD is used to calculate the similarity between the patches, and two patch feature sets P, Q are set as follows:

Wherein M, N respectively represents the number of features in P, Q, p_i∈R,q_j∈R,(R is a real number set), p_iDenotes the ith (1. ltoreq. i. ltoreq.M) feature in P, q_jRepresents the j (1. ltoreq. j. ltoreq.N) th feature in Q,Represents a feature p_ithe weight of (a) is determined,Representative of characteristic q_jweight of d_ijRepresents p_iAnd q is_jThe euclidean distance between.

The similarity of two image regions is described by EMD, which is defined as follows:

Wherein f is_ijsolving the optimization problem by:

s.t

f_ij≥0,i＝1,2,...,M；j＝1,2,...,N；

when in useThe EMD satisfies the triangle inequality, and can be used for judging the similarity between the two image block histograms.

and for each block, calculating the EMD similarity of each position block and the block in the search area to form a similarity map with the same size as the search area, and obtaining the EMD value corresponding to the sub-block with the highest similarity to the block in the search area range by minimizing the similarity map (the smaller the EMD value is, the more similar the EMD value is). Sequencing the values corresponding to all the blocks from small to large, and verifying through a large number of experiments to obtain the result, wherein when the fifth value is smaller than the set threshold value, the shielding degree of the target is relatively low, and at the moment, the target position is still updated by adopting STC; otherwise, if the target is seriously shielded or exceeds the visual field, performing probability estimation on whether each image area with the same size as the target image area in the whole image is the target area by adopting an SVM classifier, wherein the image area corresponding to the maximum probability is the latest position of the target, and thus, target relocation is realized.

And S9, circularly executing the steps S4-S8 until the current video or image sequence is processed. Specifically, whether the current video or image sequence is processed or not is judged through the circulation condition, if so, target tracking is finished, otherwise, the next frame of image is extracted, and recalculation is started from the fourth step.

according to the multithreading visual target tracking method based on STC and block re-detection provided by the embodiment of the invention, an algorithm main body adopts STC target tracking, can process the general target detection condition, meanwhile, the re-detection strategy of LCT is used for reference, and the re-detection is carried out on the detection failure problem caused by the conditions that the target is heavily blocked, rapidly moves, exceeds the visual field and the like; in order to realize simplification and effectiveness of the algorithm, the rectangular target template image is subjected to blocking operation to obtain a plurality of predefined rectangular sub-regions, the integral histogram is adopted to calculate the gray level histogram in the rectangular sub-regions, and the gray level histogram is compared with the gray level histogram of the target template image block in the first frame image to train an SVM classifier, so that a KNN classifier and a random fern classifier adopted in the LCT algorithm are replaced to realize target redetection. The invention can improve the reliability of target tracking and the efficiency of target tracking.

those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. a multithreading visual target tracking method based on STC and block re-detection is characterized by comprising the following steps:

2. the STC and block re-detection based multi-threaded visual target tracking method of claim 1, wherein: the partitioning performed on the first frame image in step S3 includes vertical partitioning in which each sub-region is a rectangular region half-height and one-tenth-width of the template region, and horizontal partitioning in which each sub-region is a rectangular region half-width and one-tenth-height of the template region, and is totally divided into forty sub-regions.

3. the STC and block re-detection based multi-threaded visual target tracking method of claim 1, wherein: the step S4 specifically includes the steps of,

the target neighborhood context prior probability is calculated by:

p(c(z)|o)＝I(z)w_σ(z-x^*)

Whereinfor convolution operation, x ∈ R²is the target position, b is the normalized constant, β is the shape parameter, Ω_c(x^*) Denotes x^*on part ofA lower region;

WhereinFor FFT transformation, an element-to-bit multiplication operation;

whereinis an inverse FFT transformation.

4. The STC and block re-detection based multi-threaded visual target tracking method of claim 3, wherein: the step S5 specifically includes the steps of,

5. the STC and block re-detection based multi-thread visual target tracking method of claim 4, wherein the equation for obtaining the confidence map of the current frame image in step S6 is as follows:

indicating the target location in the t-th frame.

6. The STC and block re-detection based multi-threaded visual target tracking method of claim 1, wherein: in the step S7, when the confidence probability value is greater than a1, it is determined that the tracked target is not occluded; when the confidence probability value is between a2 and a1, judging that the tracking target is in general occlusion; when the confidence probability value is less than a2, judging that the tracking target is seriously shielded or exceeds the field of vision; wherein 1> a1> a2> 0; wherein a1 is 0.75, and a2 is 0.3.

7. The STC and block re-detection based multi-threaded visual target tracking method of claim 6, wherein: in the step S8, (1) if it is determined that the tracked target is not occluded, the current frame image continues to track the target by using the STC, and the confidence map c obtained in the step S6 is maximized_t+1(x) To obtain the target position

8. The STC and block re-detection based multi-threaded visual target tracking method of claim 7, wherein: EMD is defined as follows:

Setting:

Wherein:

p_i∈R,q_j∈R,

the problem is optimized by:

s.t

f_ij≥0,i＝1,2,...,M；j＝1,2,...,N；

when:

9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed implements the method of any one of claims 1 to 8.

10. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein: the processor, when executing the program, implements the method of any one of claims 1 to 8.