CN110929620B

CN110929620B - Target tracking method and device and storage device

Info

Publication number: CN110929620B
Application number: CN201911121141.8A
Authority: CN
Inventors: 李轶锟; 王耀农; 敦婧瑜; 薛佳乐; 张湾湾
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2023-04-07
Anticipated expiration: 2039-11-15
Also published as: CN110929620A

Abstract

The invention discloses a target tracking method, a target tracking device and a storage device. The method comprises the following steps: acquiring position response information of a current frame based on a position filtering model in the relevant filtering models, and determining a target position prediction result based on the position response information; calculating a peak-to-side lobe ratio based on the position response information; and if the peak sidelobe ratio is smaller than or equal to a preset threshold value, correcting the target position prediction result based on a conditional random field model, and outputting the corrected target position prediction result as a position tracking result of the target to be tracked in the current frame. By the mode, the problem that the tracking precision is reduced when the target is shielded is solved.

Description

Target tracking method and device and storage device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a target tracking method, device and storage device.

Background

The related filtering method is a discriminant method widely used in the field of target tracking, and has very good short-time tracking precision and high processing speed. However, when the target is shielded, the target template is updated by using wrong background information to pollute the model parameters, so that the tracking accuracy is seriously reduced, and the target needs to be further accurately positioned. For the technical problem, a plurality of technical solutions are provided in the prior art to solve, for example, patent CN109285179A discloses a moving target tracking method based on multi-feature fusion, wherein a fixed learning rate is used when a scale filtering model is updated, and the effect of learning more information when a target is not occluded and learning less information or even not learning information when the target is occluded or the confidence is low cannot be achieved by using the fixed learning rate. As disclosed in the document "object tracking based on salient feature regions and conditional random fields": according to the extracted Harris corner points, an interested significant characteristic region is marked out to serve as an independent sub-block in the tracking process, a conditional random field model related to the significant characteristic region is established by utilizing local features of the sub-blocks and constraint conditions on a time-space domain, the final position of a target is determined according to the weight of the influence of the blocks on the target position, and the robustness of the traditional Mean-Shift algorithm under the shielding condition is improved. The main disadvantages of this solution are: the Harris corner detection has large calculation amount, and the category needs to be determined in advance by using K-means clustering. When the optical flow information is introduced to consider the time domain relation, the optical flow information is represented by absolute value distance, and the angle rotation information is ignored. The target tracking based on the Mean-shift algorithm only considers the color characteristics, so that local binary pattern feature calculation needs to be added into the conditional random field, the calculated amount is further increased, and the tracking accuracy of the Mean-shift target tracking algorithm is very limited under the condition that the target is not shielded. Therefore, the technical solutions provided in the prior art can still be optimized, such as further increasing the processing speed and increasing the tracking accuracy.

Disclosure of Invention

The application provides a target tracking method, a target tracking device and a storage device, which can achieve the purpose of further improving the processing speed and the tracking precision when a target is tracked based on a correlation filtering method.

In order to solve the technical problem, the application adopts a technical scheme that: provided is a target tracking method, including:

acquiring position response information of a current frame based on a position filtering model in the relevant filtering models, and determining a target position prediction result based on the position response information;

calculating a peak-to-side lobe ratio based on the position response information;

and if the peak sidelobe ratio is smaller than or equal to a preset threshold value, correcting the target position prediction result based on a conditional random field model, and outputting the corrected target position prediction result as a position tracking result of the target to be tracked in the current frame.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a target tracking device comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the above-described target tracking method;

the processor is to execute the program instructions stored by the memory to achieve target tracking.

In order to solve the above technical problem, the present application adopts another technical solution: a storage device is provided which stores a program file capable of implementing the above-described target tracking method.

The beneficial effect of this application is: the target tracking method, the device and the storage device acquire the position response information of a current frame through a position filtering model based on a relevant filtering model, and determine a target position prediction result based on the position response information; calculating a peak-to-side lobe ratio based on the position response information; and if the peak sidelobe ratio is smaller than or equal to a preset threshold value, correcting the target position prediction result based on a conditional random field model, and outputting the corrected target position prediction result as a position tracking result of the target to be tracked in the current frame. By the mode, the problem that the tracking precision is reduced when the target is shielded is solved.

Drawings

FIG. 1 is a schematic flow chart diagram of a target tracking method according to a first embodiment of the present invention;

FIG. 2 is a sub-flowchart illustrating updating parameters in a position filtering model to obtain a latest position filtering model in the target tracking method according to the first embodiment of the present invention;

FIG. 3 is a schematic block diagram of a first embodiment of the invention;

FIG. 4 is a flowchart illustrating a target tracking method according to a second embodiment of the present invention;

FIG. 5 is a schematic view of a sub-process of updating parameters in a scale filtering model to obtain an updated scale filtering model in a target tracking method according to a second embodiment of the present invention;

FIG. 6 is a schematic diagram of a first configuration of a target tracking apparatus according to an embodiment of the present invention;

FIG. 7 is a second schematic diagram of a target tracking apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a memory device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. In the embodiments of the present application, all directional indicators (such as upper, lower, left, right, front, rear, 8230; \8230;) are used only to explain the relative positional relationship between the components at a specific posture (as shown in the drawing), the motion, etc., and if the specific posture is changed, the directional indicator is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Fig. 1 is a flowchart illustrating a target tracking method according to a first embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. In this embodiment, the correlation filter model used only includes the position filter model. As shown in fig. 1, the method comprises the steps of:

step S101: parameters of the position filtering model are initialized.

Optionally, in step S101, first, information of a target to be tracked in the video data stream when the target first appears is obtained by using a detection algorithm; then extracting a first characteristic of the target to be tracked from information when the target to be tracked appears for the first time; generating a first initial sample by using the first characteristic; and finally, initializing the parameters of the position filtering model by using the first initial sample.

Optionally, an image frame of the target to be tracked appearing for the first time is taken as a boundary frame to intercept the video data stream, so that the image frame is a first frame in the intercepted video data stream, and then a detection algorithm is used to acquire information of the target to be tracked in the intercepted first frame of the video data stream.

Optionally, the detection algorithm is a CNN detection algorithm.

Optionally, the first feature is composed of 28,27-dimensional fhog features and one-dimensional gray scale, and the first initial sample is generated by circulant matrix of the first feature.

Optionally, the step of initializing parameters of the filtering model comprises: the initial sample is trained to minimize the following equation (1):

where f denotes the initial sample, d denotes the feature dimension, l ∈ {1, 2., d }, f ∈ {1, 2.,. D }, f ∈ ^l Indicating the characteristics of the ith channel. h denotes the overall filter model, h ^l A filter model representing the l-th channel, and g a gaussian function of the expected output of the filter model.

The convolution is represented, and the lambda is an artificially selected coefficient, so that the condition that the denominator is zero in the filter overfitting and solving processes can be avoided. In order to accelerate the calculation, the whole calculation process converts the time domain convolution calculation into a complex frequency domain for product calculation.

Alternatively, when the initial sample f is the first initial sample, the parameters in the position filtering model may be initialized by the method described above.

Step S102: parameters of the conditional random field model are initialized.

Optionally, the conditional random field model to be initialized is given in formula (2):

wherein D is _i E {0,1}, i =1, 2. I is _t ,I _t-1 The target region of the t-th time and the t-1 th time is represented, E is composed of a univariate potential function and a bivariate potential function representing the relation between a time domain and a space domain, and the expression is as the following formula (3):

wherein f is ₁ (p _i ,q _i ) Representing a univariate potential function, f ₂ (x _i ,x _j ) Representing a bivariate potential function in the airspace, f ₃ (x _i ,x _j ) Representing a time domain bivariate potential function. K, t ₁ ,τ ₂ Weight coefficients for three potential functions, N (x) _i ) Spatial neighbouring nodes of the i-th node representing the area to be tracked, M (x) _i ) A temporal neighbor node representing the ith node of the area to be tracked. The expression of Z is the following formula (4):

wherein the univariate potential function form is as follows (5):

wherein p is _i (k) Color histogram for characteristic area of target to be trackedGraph q _i (k) Is the color histogram of the target characteristic region to be matched. The univariate potential function is actually the babbitt distance corresponding to the characteristic region of interest.

Wherein the bivariate space potential function is in the form of the following formula (6):

f ₂ (x _i ,x _j )＝|||x _i -x _j ||-Δ _i,j | (6)

wherein x _i The coordinates of the center point of the ith characteristic area of the target area to be matched, x _j The spatial neighborhood center point coordinate, delta, of the ith characteristic region of the target region to be matched _i,j And coordinates of the two areas in the center point of the target area to be tracked are Euclidean distances.

Wherein the bivariate time-domain potential function is in the form of the following formula (7):

wherein L is _i The optical flow vector of the feature point of the selected area of the ith characteristic area of the target area to be matched ensures that the scale and the rotation are not deformed, and L is _j And (3) a spatial neighborhood central optical flow vector of the ith characteristic region of the target region to be matched. The cosine similarity of the two optical flow vectors is measured by the bivariate time domain potential function, and the evaluation accuracy is improved by directly using the Euclidean distance in comparison with the prior art because the angle information is considered. And the target tracking result is corrected by using the conditional random model, and the reliability of target tracking is improved by introducing context information, so that the tracking precision when the target is shielded is improved.

Optionally, in step S102, a Fast corner of the target to be tracked is extracted from information when the target to be tracked appears for the first time; then, classifying Fast corners of the target to be tracked by utilizing a hierarchical clustering algorithm to generate a plurality of initial characteristic areas with the same size; and initializing the parameters of the conditional random field model to be initialized by utilizing the initial characteristic region. Compared with the prior art, the Fast angular point detection algorithm is used for accelerating the operation speed, the characteristic points are clustered by a hierarchical clustering method, and the number of blocks does not need to be specified artificially, so that the conditional random field model is simplified, and the real-time performance of the tracking algorithm is improved.

Optionally, parameters in the conditional random model are trained by using a gradient descent method, and maximum likelihood estimated values of the three weight coefficients are obtained respectively.

Step S103: position response information of the current frame is obtained based on the latest position filtering model, and a target position prediction result is determined based on the position response information.

Optionally, the step of obtaining response information of the current frame based on the latest filtering model comprises: outputting response information of the current frame by inputting the current frame into the following equation (8):

wherein t represents the time corresponding to the current frame, t-a represents the time corresponding to the latest filtering model,

numerator, representing a frequency domain filter model formula in the l dimension at time t-a, greater than or equal to>

Denotes its conjugate transpose, B _t-a And the denominator of the frequency domain filtering model formula at the moment t-a is shown. z is a radical of _t For the image slice corresponding to the target center position at time t (i.e. the current image frame), ->

Is the complex frequency domain result of discrete fourier transform. The time domain result y can be obtained by inverse discrete Fourier transform for the target position result (namely response information) of the complex frequency domain _t 。

Optionally, if the filter model used in the step of obtaining the response information based on the latest filter model is a position filter model, the obtained response information is position response information.

More optionally, the position response information includes a maximum peak response value, and a position of the maximum peak response value is a center position of the target to be tracked in the current frame, so that the position response information may be used to determine a target position prediction result.

Step S104: a peak-to-side lobe ratio is calculated based on the position response information.

Optionally, the position response information further includes a response mean and a response variance.

Optionally, the step of calculating the peak-to-side lobe ratio comprises: and calculating the difference between the maximum peak response value and the response mean value, and then calculating the ratio between the difference and the response variance, wherein the obtained ratio is the peak sidelobe ratio.

Step S105: and judging the peak sidelobe ratio and the size of a preset threshold.

In step S105, whether the target to be tracked is occluded in the current frame is determined by comparing the peak-to-side lobe ratio with a preset threshold. If the peak sidelobe ratio is greater than the preset threshold, it indicates that the target to be tracked is not occluded in the current frame, that is, the target position prediction result obtained in step S103 is accurate, and then steps S106 and S107 are executed; otherwise, it indicates that the target to be tracked is occluded in the current frame, that is, the target position prediction result obtained in step S103 is inaccurate, and needs to be corrected to improve the accuracy of the tracking result, and then steps S108, S109, and S110 are performed.

It should be noted that the order of steps S106 and S107 may be exchanged, for example, step S107 is executed first, and then step S106 is executed. The order of steps S109 and S110 may be exchanged, for example, step S110 is executed first, and then step S109 is executed.

Step S106: and outputting the target position prediction result as a position tracking result of the target to be tracked in the current frame.

Step S107: and updating the parameters in the position filtering model to obtain the latest position filtering model.

In step S107, after the latest position filter model is acquired, the position response information of the next frame is acquired by the latest position filter model.

Optionally, as shown in fig. 2, step S107 comprises at least the following sub-steps:

step S1071: a first high confidence sample library is constructed.

In step S1071, a first initial sample is added to the first high-confidence sample library, and then it is determined whether or not the result of the inverse discrete fourier transform of the position response information acquired in step S103 can be added to the first high-confidence sample library, based on the determination result in step S105. If the result of the determination in step S105 is that the peak-to-side lobe ratio is greater than the preset threshold, that is, the target to be tracked is not blocked in the current frame, the result of performing inverse discrete fourier transform on the position response information of the current frame acquired in step S103 may be added to the first high-confidence sample library; otherwise, the method is not added, and the problem that when the target is shielded, the tracked result contains background information, and the background information is used for updating the parameters of the filtering model, so that the filtering model is easily polluted, and the tracking precision is reduced is solved.

Step S1072: a first adaptive learning rate is calculated.

In step S1072, first, inverse discrete fourier transform is performed on the acquired position response information of the current frame, so as to be in the same data format as that of the samples in the first high-confidence sample library; then calculating the similarity between the result after inverse transformation and each sample in the first high-confidence sample library, and accumulating and summing to obtain a first summation value; calculating the ratio of the first summation value to the sum of the number of samples in the first high-confidence sample library; and finally, calculating the product of the ratio and a preset first parameter, wherein the obtained product value is the first self-adaptive learning rate.

Optionally, before the step of calculating the similarity between the inverse-transformed result and each sample in the first high-confidence sample bank, the number of samples in the first high-confidence sample bank does not exceed a first preset number threshold based on a first preset retention condition. The first high confidence sample library is constructed to evaluate whether the tracking result of each frame is accurate, so as to further determine the learning rate. It is understood that the preservation of the number of samples in the first high-confidence sample library is related to the computing power of the computer, the larger the number of samples is, the larger the computation amount is, the more accurate the evaluation result is, and conversely, the computation amount is small, but the partial accuracy of the evaluation result is sacrificed, so that the number of samples in the first high-confidence sample library does not exceed the first preset number threshold according to the first preset preservation condition.

Optionally, in order to conveniently adjust the learning rate, a first parameter is preset in the process of calculating the first adaptive learning rate, so as to avoid the situation that the finally calculated value of the first adaptive learning rate is too large, for example, exceeds 1.

In step S1072, by calculating the first adaptive learning rate, more information is learned when the target is not occluded, and no information is learned when the target is occluded, thereby solving the problem that the result tracked when the target is occluded includes background information, and the background information is used to update the parameters of the filtering model, which easily causes the filtering model to be contaminated, and the tracking accuracy to be reduced. Compared with the prior art, the method has the advantages that the fixed learning rate is directly used when the filtering model is updated, the adaptive learning rate in the application can ensure that the parameters of the filtering model are not polluted by the background, and the target can be continuously and effectively tracked after being shielded.

Step S1073: parameters in the position filtering model are updated based on the first adaptive learning rate to obtain a latest position filtering model.

Optionally, the step of updating the parameters in the filtering model by the adaptive learning rate includes: the variables in the filter model are then corrected by the following equation (9)

Is updated to obtain the variable->

And variable B in the filter model _t-a Updating to obtain variable B _t ，

Wherein, G and F ^l Denotes g, f ^l The corresponding variable of complex frequency domain obtained by discrete Fourier transform, the angle mark t represents the time, eta _c Representing the adaptive learning rate.

Alternatively, if the adaptive learning rate in the above equation (9) is the first adaptive learning rate, the variables are updated by updating the variables respectively

And B _t-a And updating the parameters in the position filtering model to obtain the latest position filtering model.

Step S108: the target position prediction is modified based on the most recent conditional random field model.

Optionally, in step S108, a target prediction region is extracted according to a preset scale with the target position prediction result as a center; then extracting a Fast corner of the target prediction region; classifying Fast corners of the target prediction region by using a hierarchical clustering algorithm to generate a plurality of characteristic regions with equal size; selecting a characteristic region with correct tracking from the characteristic regions with the same size by using the latest conditional random field model; and finally, correcting the target position prediction result by utilizing the characteristic region with correct tracking to obtain a corrected target position prediction result.

Optionally, the step of selecting a characteristic region with correct tracking from the plurality of characteristic regions with equal size using the latest conditional random field model comprises: and (3) calculating the probability that each characteristic region belongs to the target foreground according to the following formula (10), if the calculated probability is smaller than a preset probability threshold, considering the characteristic region as a characteristic region which fails to track, and if not, considering the characteristic region as a characteristic region which is correctly tracked.

Optionally, the step of correcting the target position prediction result by using the correctly tracked characteristic region includes: firstly, respectively calculating the offset delta x between each correctly tracked characteristic region and the target position prediction result _i Then, the target position prediction result is corrected by the following equation (11):

where Q denotes the number of characteristic regions correctly tracked, x ₀ The corrected target position prediction result is obtained.

Step S109: and outputting the corrected target position prediction result as a position tracking result of the target to be tracked in the current frame.

Step S110: parameters in the conditional random field model are updated to obtain the latest conditional random field model.

Optionally, in step S110, parameters in the conditional random field model are updated according to the tracking result of the current frame to obtain the latest conditional random field model.

Step S111: and judging whether the current frame is an integral multiple of a preset frame number threshold value.

In step S111, if the determination result is yes, parameters in the conditional random field model are updated to obtain a latest conditional random field model, so as to ensure validity of the conditional random field model. Otherwise, the step S103 is continued.

Optionally, the value of the preset frame number threshold ranges from 1 to 25. More optionally, the preset frame number threshold is 20, and by setting the preset frame number threshold in a reasonable range, the calculation amount can be reduced, the processing speed can be increased, and the real-time performance of the tracking algorithm can be improved.

Referring to fig. 3 together, fig. 3 is a schematic diagram of a principle framework of the present embodiment, as shown in fig. 3, the target tracking method of the present embodiment first initializes the target to be tracked by using the detection module, that is, obtains information when the target to be tracked appears for the first time, and optionally, intercepts the video data stream by using an image frame where the target to be tracked appears for the first time as a boundary frame, so that the image frame is a first frame in the intercepted video data stream.

And then, respectively extracting the fHOG characteristics and the angular points of the target to be tracked according to the information. And generating initial samples by using the extracted fHOG characteristic cyclic matrix, wherein the initial samples are used for initializing parameters of the position filtering model. And partitioning the region where the target to be tracked is located according to the extracted corner points to obtain a plurality of initial characteristic regions with the same size, and initializing the parameters of the conditional random field model by utilizing the initial characteristic regions. After initializing the parameters of the position filtering model and the conditional random field model, a high-confidence sample library is required to be constructed, and the initial samples are added into the high-confidence sample library.

Further, position response information of the second frame is acquired using the initialized position filtering model, and a target position prediction result is determined based on the position response information. A peak-to-side lobe ratio (PSR) is then calculated using the position response information and compared to a given threshold.

If the peak sidelobe ratio is larger than a given threshold value, the target to be tracked is judged not to be shielded in the second frame, and the obtained target position prediction result is accurate and can be directly output as the position tracking result of the target to be tracked in the second frame. And then evaluating the correlation between the tracking result and the samples in the high-confidence sample library to determine an adaptive learning rate, and updating parameters in the position filtering model according to the adaptive learning rate to obtain a latest position filtering model for obtaining position response information of the next frame. And finally, adding the tracking result into a high-confidence sample library to update the high-confidence sample library.

And if the peak sidelobe ratio is smaller than or equal to a given threshold, judging that the target to be tracked is shielded in a second frame, wherein the obtained target position prediction result is inaccurate, an initialized conditional random field model is required to be used for correcting the target position prediction result, a characteristic region with correct tracking is used for correcting a characteristic region with wrong tracking so as to obtain a corrected target position, and the corrected target position is used as a position tracking result of the target to be tracked in the second frame to be output. And finally, updating parameters in the conditional random field model through the corrected tracking result to obtain the latest conditional random field model so as to correct the predicted position of the target to be tracked when the target to be tracked is shielded again. The specific process of the correction is shown in step S108, and for brevity, will not be described herein again.

And by analogy, continuously using the latest position filtering model to acquire the position response information of the target to be tracked in the next frame. When the next frame is an integral multiple of 20 frames, in order to ensure the validity of the conditional random field model, even if the target to be tracked is not occluded in the frame, that is, the obtained tracking result of the target to be tracked in the frame is accurate, the parameters in the conditional random field model need to be updated to obtain the latest conditional random field model. Otherwise, parameters in the conditional random field model do not need to be updated, and the latest position filtering model is continuously used for obtaining the position response information of the next frame.

The target tracking method of the first embodiment of the invention is implemented by respectively initializing the parameters of a position filtering model and a conditional random field model; acquiring position response information of the current frame based on the latest position filtering model, and determining a target position prediction result based on the position response information; calculating a peak-to-side lobe ratio based on the position response information; judging the peak sidelobe ratio and a preset threshold value; if the peak value sidelobe ratio is larger than a preset threshold value, outputting the target position prediction result as a position tracking result of the target to be tracked in the current frame, and updating parameters in the position filtering model to obtain a latest position filtering model; and simultaneously judging whether the current frame is an integral multiple of a preset frame number threshold. And if the peak sidelobe ratio is smaller than or equal to a preset threshold value, correcting a target position prediction result based on the latest conditional random field model, outputting the corrected target position prediction result as a position tracking result of the target to be tracked in the current frame, and updating parameters in the conditional random field model to obtain the latest conditional random field model. And if the current frame is an integral multiple of a preset frame number threshold, updating parameters in the conditional random field model to obtain the latest conditional random field model. By the method, the tracking precision and the processing speed when the relevant filtering model is used for tracking the target are further improved, and the real-time performance of the algorithm is guaranteed.

Fig. 4 is a flowchart illustrating a target tracking method according to a second embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 4 if the substantially same result is obtained. It should be noted that, in this embodiment, the correlation filter model used includes a position filter model and a scale filter model. As shown in fig. 4, the method includes the steps of:

step S201: and respectively initializing the parameters of the position filtering model and the parameters of the scale filtering model.

In this embodiment, the step of initializing the parameters of the position filtering model in step S201 in fig. 4 is similar to step S101 in fig. 1, and for brevity, is not described herein again.

In this embodiment, the step of initializing the parameter of the scale filtering model in step S201 in fig. 4 is similar to step S101 in fig. 1, except that when the scale filtering model is initialized, the second feature of the target to be tracked needs to be extracted from the information when the target to be tracked first appears; generating a second initial sample by using the second characteristic; and finally, initializing the parameters of the scale filtering model by using the second initial sample.

Optionally, the second feature is composed of 31-dimensional fhog features, and the second initial sample is generated by circulant-ing the second feature.

Alternatively, when the initial sample f in the formula (1) is replaced by the second initial sample, the parameters in the scale filtering model can be initialized by the method described above.

Step S202: parameters of the conditional random field model are initialized.

In this embodiment, step S202 in fig. 4 is similar to step S102 in fig. 1, and for brevity, is not described herein again.

Step S203: and obtaining the scale response information of the current frame based on the latest scale filtering model, and determining the target scale prediction result based on the scale response information.

In this embodiment, the step S203 in fig. 4 of obtaining the position response information of the current frame based on the latest position filtering model, and the step of determining the target position prediction result based on the position response information is similar to the step S103 in fig. 1, and for brevity, the description is omitted here.

In this embodiment, the step of obtaining the scale response information of the current frame based on the latest scale filtering model in step S203 in fig. 4, and determining the target scale prediction result based on the scale response information is similar to step S103 in fig. 1, except that the filtering model in formula (8) is the scale filtering model, the obtained response information is the scale response information, and the target scale prediction result is determined by the scale response information.

Step S204: a peak-to-side lobe ratio is calculated based on the position response information.

In this embodiment, step S204 in fig. 4 is similar to step S104 in fig. 1, and for brevity, is not described herein again.

Step S205: and judging the size of the peak value sidelobe ratio and a preset threshold value.

In this embodiment, step S205 in fig. 4 is similar to step S105 in fig. 1, and for brevity, is not described herein again. It should be noted that, when it is determined that the target to be tracked is not occluded in the current frame, it indicates that the target position prediction result and the scale prediction result obtained in step S203 are both accurate, and step S206 and step S207 are continuously performed. When it is determined that the target to be tracked is occluded in the current frame, it indicates that both the target position prediction result and the scale prediction result obtained in step S203 are inaccurate. In this embodiment, only the correction of the target position prediction result is described, but not the correction of the target scale prediction result, and any suitable method in the art may be used to correct the scale prediction result, which is not described herein again.

Step S206: and outputting the target position prediction result as a position tracking result of the target to be tracked in the current frame, and outputting the scale position prediction result as a scale tracking result of the target to be tracked in the current frame.

Step S207: updating the parameters in the position filtering model to obtain the latest position filtering model, and updating the parameters in the scale filtering model to obtain the latest scale filtering model.

In this embodiment, the step of updating the parameters in the position filtering model in step S207 in fig. 4 to obtain the latest position filtering model is similar to step S107 in fig. 1, and for brevity, is not repeated herein.

Referring to fig. 5, in the present embodiment, the step of updating the parameters in the scale filtering model in step S207 to obtain the latest scale filtering model at least includes the following sub-steps:

step S2071: a second high confidence sample library is constructed.

In step S2071, a second initial sample is added to the second high-confidence sample library, and then it is determined whether the result of inverse discrete fourier transform of the scale response information acquired in step S203 can be added to the second high-confidence sample library according to the determination result in step S205. If the result of the determination in step S205 is that the peak-to-side lobe ratio is greater than the preset threshold, that is, the target to be tracked is not blocked in the current frame, the result of performing inverse discrete fourier transform on the scale response information of the current frame acquired in step S203 may be added to the second high-confidence sample library; otherwise, the method is not added, and the problem that when the target is shielded, the tracked result contains background information, and the background information is used for updating the parameters of the filtering model, so that the filtering model is easily polluted, and the tracking precision is reduced is solved.

Step S2072: a second adaptive learning rate is calculated.

In step S2072, first, the obtained scale response information of the current frame is subjected to inverse discrete fourier transform, so as to be the same as the data form of the samples in the second high confidence level sample library; then calculating the similarity between the result after inverse transformation and each sample in a second high-confidence sample library, and accumulating and summing to obtain a second summation value; calculating the ratio of the second summation value to the sum of the number of samples in the second high-confidence sample library; and finally, calculating the product of the ratio and a preset second parameter, wherein the obtained product value is the second self-adaptive learning rate.

Optionally, before the step of calculating the similarity between the inverse-transformed result and each sample in the second high-confidence sample library, the number of samples in the second high-confidence sample library does not exceed a second preset number threshold based on a second preset retention condition. The second high confidence sample library is constructed to evaluate whether the tracking result of each frame is accurate, so as to further determine the learning rate. It is understood that the preservation of the number of samples in the second high-confidence sample library is related to the computing power of the computer, the larger the number of samples is, the larger the computation amount is, the more accurate the evaluation result is, otherwise, the computation amount is small, but the partial accuracy of the evaluation result is sacrificed, so that the number of samples in the second high-confidence sample library does not exceed the second preset number threshold according to the second preset preservation condition.

Optionally, in order to conveniently adjust the learning rate, a second parameter is preset in the process of calculating the second adaptive learning rate, so as to avoid the situation that the finally calculated value of the second adaptive learning rate is too large, for example, exceeds 1.

In step S2072, by calculating the second adaptive learning rate, more information is learned when the target is not shielded, and no information is learned when the target is shielded, so that the problem that the scale filtering model is easily polluted and the tracking accuracy is reduced because the background information is included in the tracking result when the target is shielded and the parameter of the scale filtering model is updated by using the background information is solved. Compared with the prior art in which the fixed learning rate is directly used when the filtering model is updated, the adaptive learning rate in the application can ensure that the parameters of the scale filtering model are not polluted by the background, so that the target can be continuously and effectively tracked after being shielded.

Step S2073: and updating the parameters in the scale filtering model based on the second adaptive learning rate to obtain the latest scale filtering model.

In this embodiment, step S2073 in fig. 5 is similar to step S2073 in fig. 2, except that the adaptive learning rate in the above formula (9) is the second adaptive learning rate, and for brevity, the details are not repeated here.

Step S208: the target position prediction is modified based on the most recent conditional random field model.

Step S209: and outputting the corrected target position prediction result as a position tracking result of the target to be tracked in the current frame, and outputting the scale position prediction result as a scale tracking result of the target to be tracked in the current frame.

Step S210: parameters in the conditional random field model are updated to obtain the latest conditional random field model.

Step S211: and judging whether the current frame is an integral multiple of a preset frame number threshold value.

In the present embodiment, steps S208 to S211 in fig. 4 are similar to steps S108 to S111 in fig. 1, respectively, and are not repeated herein for brevity.

The target tracking method of the second embodiment of the invention is implemented by respectively initializing the parameters of a position filtering model, a scale filtering model and a conditional random field model; acquiring position response information of the current frame based on the latest position filtering model, determining a target position prediction result based on the position response information, acquiring scale response information of the current frame based on the latest scale filtering model, and determining a target scale prediction result based on the scale response information; calculating a peak-to-side lobe ratio based on the position response information; judging the size of the peak value sidelobe ratio and a preset threshold value; if the peak value sidelobe ratio is larger than a preset threshold value, outputting the target position prediction result as a position tracking result of the target to be tracked in the current frame, updating parameters in the position filtering model to obtain a latest position filtering model, outputting the scale position prediction result as a scale tracking result of the target to be tracked in the current frame, and updating the parameters in the scale filtering model to obtain the latest scale filtering model; and simultaneously judging whether the current frame is an integral multiple of a preset frame number threshold value. And if the peak sidelobe ratio is smaller than or equal to a preset threshold value, correcting a target position prediction result based on the latest conditional random field model, outputting the corrected target position prediction result as a position tracking result of the target to be tracked in the current frame, outputting the scale position prediction result as a scale tracking result of the target to be tracked in the current frame, and updating parameters in the conditional random field model to obtain the latest conditional random field model. And if the current frame is integral multiple of the preset frame number threshold, updating parameters in the conditional random field model to obtain the latest conditional random field model. By the method, the tracking precision and the processing speed when the relevant filtering model is used for tracking the target are further improved.

Fig. 6 is a first structural diagram of the target tracking device according to the embodiment of the present invention. As shown in fig. 6, the apparatus 30 includes an obtaining module 31, a calculating module 32, a first determining module 33, a first updating module 34, a correcting module 35, a second updating module 36, and an outputting module 37.

And an obtaining module 31, configured to obtain location response information of the current frame based on a location filtering model in the relevant filtering models, and determine a target location prediction result based on the location response information.

The calculation module 32 is coupled to the acquisition module 31 for calculating a peak-to-side lobe ratio based on the position response information.

The first determining module 33 is coupled to the calculating module 32 for determining the peak-to-side lobe ratio and a predetermined threshold.

The first updating module 34 is coupled to the first determining module 33, and configured to update parameters in the correlation filtering model to obtain an updated correlation filtering model when the peak-to-side lobe ratio is greater than a preset threshold.

The obtaining module 31 is further configured to obtain the position response information of the current frame according to the position filtering model in the updated relevant filtering models.

The modification module 35 is coupled to the first determining module 33, and configured to modify the target position prediction result based on the conditional random field model when the peak-to-side lobe ratio is less than or equal to a preset threshold.

A second update module 36 is coupled to the correction module 35 for updating parameters in the conditional random field model to obtain an updated conditional random field model.

The correction module 35 is also configured to correct the target position prediction based on the updated conditional random field model.

The output module 37 is coupled to the first determining module 33 and the obtaining module 31, and configured to output the target position prediction result as a position tracking result of the target to be tracked in the current frame when the peak-to-side lobe ratio is greater than a preset threshold.

The output module 37 is further coupled to the modifying module 35, and configured to output the modified target position prediction result as the position tracking result of the target to be tracked in the current frame when the peak-to-side lobe ratio is less than or equal to the preset threshold.

Optionally, the apparatus 30 further includes a first initializing module, configured to initialize parameters of the position filtering model, where the initializing of the parameters of the position filtering model by the first initializing module may be to obtain information when an object to be tracked in the video data stream first appears by using a detection algorithm; extracting a first characteristic of the target to be tracked from information when the target to be tracked appears for the first time; generating a first initial sample using the first feature; the parameters of the position filtering model are initialized with the first initial sample.

Optionally, the first updating module 34 updates parameters in the position filtering model based on the first adaptive learning rate to obtain an updated position filtering model.

Optionally, the step of the first updating module 34 obtaining the first adaptive learning rate includes: performing inverse discrete Fourier transform on the acquired position response information of the current frame; calculating the similarity between the result after the inverse discrete Fourier transform and each sample in the first high-confidence sample library, and accumulating and summing to obtain a first summation value; calculating a ratio between the first summation value and the sum of the number of samples in the first high confidence sample bank; calculating a product between the ratio and a preset first parameter, wherein the product is the first self-adaptive learning rate; wherein the first high-confidence sample library is constructed after the step of generating first initial samples using the first features, the first high-confidence sample library including at least the first initial samples.

Optionally, the first updating module 34 adds the obtained result of performing inverse discrete fourier transform on the position response information of the current frame to the first high-confidence sample library.

Optionally, the first updating module 34 makes the number of samples in the first high-confidence sample bank not exceed a first preset number threshold based on a first preset retention condition.

Optionally, the obtaining module 31 is further configured to obtain scale response information of the current frame based on a scale filtering model in the relevant filtering models, and determine a target scale prediction result based on the scale response information.

Optionally, the output module 37 is further configured to output the target scale prediction result as a scale tracking result of the target to be tracked in the current frame.

Optionally, the first initialization module is further configured to initialize parameters of the scale filtering model, and the operation of the first initialization module to initialize the parameters of the scale filtering model may be: extracting a second feature of the target to be tracked from the information when the target to be tracked appears for the first time; generating a second initial sample using the second feature; initializing parameters of a scale filtering model using the second initial sample.

Optionally, the first updating module 34 is further configured to update the parameters in the scale filtering model based on the second adaptive learning rate to obtain an updated scale filtering model.

Optionally, the step of the first updating module 34 obtaining the second adaptive learning rate includes: performing inverse discrete Fourier transform on the acquired scale response information of the current frame; calculating the similarity between the result after the inverse discrete Fourier transform and each sample in the second high-confidence sample library, and accumulating and summing to obtain a second summation value; calculating a ratio between the second summation value and the sum of the number of samples in the second high confidence sample bank; calculating a product between the ratio and a preset second parameter, wherein the product is the second self-adaptive learning rate; wherein the second high-confidence sample library is constructed after the step of generating second initial samples using the second features, the second high-confidence sample library including at least the second initial samples.

Optionally, the first updating module 34 adds the obtained result of performing inverse discrete fourier transform on the scale response information of the current frame to the second high-confidence sample library.

Optionally, the first updating module 34 makes the number of samples in the second high-confidence sample bank not exceed a second preset number threshold based on a second preset retention condition.

Alternatively, the operation of the correction module 35 to correct the target position prediction based on the updated conditional random field model may be: taking the target position prediction result as a center, and extracting a target prediction region according to a preset scale; extracting Fast angular points of the target prediction area; classifying Fast corners of the target prediction region by using a hierarchical clustering algorithm to generate a plurality of characteristic regions with equal sizes; selecting a characteristic region with correct tracking from the characteristic regions with the same size by using the updated conditional random field model; and correcting the target position prediction result by utilizing the characteristic region with correct tracking to obtain a corrected target position prediction result.

Optionally, the apparatus 30 further comprises a second initialization module, the second initialization module is configured to initialize parameters of the conditional random field model, and the operation of initializing the parameters of the conditional random field model by the second initialization module may be to extract Fast corner points of the target to be tracked from information of the target to be tracked when the target first appears; classifying Fast corners of the target to be tracked by utilizing a hierarchical clustering algorithm to generate a plurality of initial characteristic areas with equal size; parameters of the conditional random field model are initialized using the initial property regions.

Optionally, the conditional random field model includes a time-domain bivariate potential function, where the time-domain bivariate potential function is a cosine similarity calculation formula between a first variable and a second variable, the first variable is an optical flow vector of a feature point in a selected region of a characteristic region, and the second variable is a spatial neighborhood central optical flow vector of the characteristic region.

Optionally, the apparatus 30 further includes a second determining module, configured to determine whether the current frame is an integer multiple of the preset frame number threshold when the peak-to-side lobe ratio is greater than the preset threshold.

Optionally, if the determination result of the second determination module is yes, the parameters in the conditional random field model are updated to obtain an updated conditional random field model.

Optionally, the preset frame number threshold value ranges from 1 to 25.

Referring to fig. 7, fig. 7 is a second structural schematic diagram of a target tracking device according to an embodiment of the invention. As shown in fig. 7, the apparatus 40 includes a processor 41 and a memory 42 coupled to the processor 41.

The memory 42 stores program instructions for implementing the target tracking method described in any of the above embodiments.

Processor 41 is operative to execute program instructions stored in memory 42 to enable target tracking.

The processor 41 may also be referred to as a CPU (Central Processing Unit). The processor 41 may be an integrated circuit chip having signal processing capabilities. The processor 41 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a memory device according to an embodiment of the invention. The storage device of the embodiment of the present invention stores a program file 51 capable of implementing all the methods described above, wherein the program file 51 may be stored in the storage device in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above are only embodiments of the present application, and not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A target tracking method, comprising:

if the peak sidelobe ratio is smaller than or equal to a preset threshold value, correcting the target position prediction result based on a conditional random field model, and outputting the corrected target position prediction result as a position tracking result of the target to be tracked in the current frame;

if the peak-to-side lobe ratio is larger than the preset threshold, a first high-confidence sample base is constructed, the obtained position response information of the current frame is subjected to inverse discrete Fourier transform, the similarity between the result after the inverse discrete Fourier transform and each sample in the first high-confidence sample base is calculated, a first summation value is obtained through accumulative summation, the ratio between the first summation value and the sum of the number of samples in the first high-confidence sample base is calculated, the product between the ratio and a preset first parameter is calculated, the product is a first adaptive learning rate, and the parameter in the position filtering model is updated based on the first adaptive learning rate to obtain an updated position filtering model.

2. The method of claim 1, wherein the step of obtaining the position response information of the current frame based on the position filtering model in the correlation filtering model comprises: and acquiring the position response information of the current frame based on the position filtering model in the updated related filtering models.

3. The method of claim 2, wherein the step of obtaining the position response information of the current frame based on the position filtering model in the correlation filtering model is preceded by the steps of:

acquiring information of the target to be tracked when the target to be tracked appears for the first time in the video data stream by using a detection algorithm;

extracting a first feature of the target to be tracked from information when the target to be tracked appears for the first time;

generating a first initial sample using the first feature;

initializing parameters of the position filtering model by using the first initial sample;

wherein the first high-confidence sample library is constructed after the step of generating first initial samples using the first features, the first high-confidence sample library including at least the first initial samples.

4. The method of claim 1, wherein the step of calculating the similarity between the inverse discrete fourier transformed result and each sample in the first high confidence sample bank is preceded by the step of: based on a first preset retention condition, the number of samples in the first high-confidence sample bank does not exceed a first preset number threshold.

5. The method of claim 3 wherein said step of modifying said target position prediction based on said conditional random field model comprises: updating parameters in the conditional random field model to obtain an updated conditional random field model;

the step of modifying the target position prediction based on the conditional random field model comprises: modifying the target position prediction based on the updated conditional random field model.

6. The method of claim 5 wherein said step of modifying said target position prediction based on a conditional random field model comprises:

taking the target position prediction result as a center, and extracting a target prediction region according to a preset scale;

extracting Fast angular points of the target prediction area;

classifying Fast corners of the target prediction region by using a hierarchical clustering algorithm to generate a plurality of characteristic regions with equal sizes;

selecting a characteristic region with correct tracking from the characteristic regions with the same size by using the updated conditional random field model;

and correcting the target position prediction result by utilizing the characteristic region with correct tracking to obtain a corrected target position prediction result.

7. The method of claim 6 wherein said step of modifying said target position prediction based on said conditional random field model comprises, prior to said step of modifying said target position prediction:

extracting Fast corner points of the target to be tracked from information when the target to be tracked appears for the first time;

classifying Fast corners of the target to be tracked by utilizing a hierarchical clustering algorithm to generate a plurality of initial characteristic areas with equal size;

parameters of the conditional random field model are initialized using the initial property regions.

8. The method of claim 6 or 7 wherein the conditional random field model includes a time-domain bivariate potential function, the time-domain bivariate potential function being a cosine similarity calculation between a first variable and a second variable, the first variable being an optical flow vector of a selected region feature point of the property region, the second variable being a spatial neighborhood center optical flow vector of the property region.

9. The method of claim 2, wherein said step of determining if said peak to side lobe ratio is greater than said predetermined threshold comprises: and outputting the target position prediction result as a position tracking result of the target to be tracked in the current frame.

10. The method according to claim 9, wherein the step of outputting the target position prediction result as the position tracking result of the target to be tracked in the current frame comprises:

judging whether the current frame is an integral multiple of a preset frame number threshold or not;

if so, updating parameters in the conditional random field model to obtain an updated conditional random field model.

11. The method of claim 3, wherein the correlation filter model further comprises a scale filter model, and the target tracking method further comprises:

obtaining scale response information of the current frame based on a scale filtering model in the relevant filtering model;

and determining a target scale prediction result based on the scale response information, and outputting the target scale prediction result as a scale tracking result of the target to be tracked in the current frame.

12. The method of claim 11, wherein the step of obtaining the scale response information of the current frame based on the scale filter model in the correlation filter model comprises:

extracting a second characteristic of the target to be tracked from information when the target to be tracked appears for the first time;

generating a second initial sample using the second feature;

initializing parameters of a scale filtering model using the second initial sample.

13. The method of claim 12, wherein the step of updating parameters in the correlation filter model to obtain an updated correlation filter model comprises: and updating the parameters in the scale filtering model based on the second adaptive learning rate to obtain an updated scale filtering model.

14. The method of claim 13, wherein obtaining the second adaptive learning rate comprises:

performing inverse discrete Fourier transform on the acquired scale response information of the current frame;

calculating the similarity between the result after the inverse discrete Fourier transform and each sample in a second high-confidence sample library, and accumulating and summing to obtain a second summation value;

calculating a ratio between the second summation value and the sum of the number of samples in the second high confidence sample bank;

calculating a product between the ratio and a preset second parameter, wherein the product is the second self-adaptive learning rate;

wherein the second high-confidence sample library is constructed after the step of generating second initial samples using the second features, the second high-confidence sample library including at least the second initial samples.

15. The method of claim 14, wherein the step of calculating the similarity between the inverse discrete fourier transformed result and each sample in the second high confidence sample library is preceded by the step of: based on a second preset retention condition, the number of samples in the second high-confidence sample bank does not exceed a second preset number threshold.

16. A target tracking apparatus comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the object tracking method of any one of claims 1-15;

17. A storage device characterized by storing a program file capable of implementing the object tracking method according to any one of claims 1 to 15.