CN109102522B

CN109102522B - Target tracking method and device

Info

Publication number: CN109102522B
Application number: CN201810768738.0A
Authority: CN
Inventors: 魏振忠; 闵玥; 谈可; 张广军
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2018-07-13
Filing date: 2018-07-13
Publication date: 2021-08-31
Anticipated expiration: 2038-07-13
Also published as: CN109102522A

Abstract

The invention provides a target tracking method and a target tracking device, wherein the method comprises the following steps: initializing all filters according to a given target position in an initial frame image; and performing response calculation for the frame candidate matched filter of the current frame image. If the response peak value is larger than the threshold value, determining the target position according to the response peak value; if the response peak value does not exist and is larger than the threshold value, the updating of the filter is suspended, and the shielding state is entered; and determining a current frame target detection area according to the previous frame target enclosure frame. Extracting candidate frames according to the detection high-resolution frames, respectively matching filters and calculating response peak values, if the detection scores and the response peak values are both greater than a threshold value, considering that the target is retrieved, and if not, the target is still in a shielding state; and expanding the detection range to the full image by continuous multiframe occlusion until the target is retrieved again. The method has stronger multi-state adaptability and robustness in tracking the posture change of the target, the change of environmental illumination, shielding and other conditions, and is convenient for realizing the whole-process tracking and observation of the target.

Description

Target tracking method and device

Technical Field

The invention relates to a target tracking method and a target tracking device, belongs to the technical field of image processing, and particularly relates to a detection-assisted multi-filter target stable tracking algorithm and a detection-assisted multi-filter target stable tracking device.

Background

Target tracking has been widely used in the research fields of computer vision, monitoring systems, civil security inspection, infrared guidance and the like. The essence of object tracking is to determine the position and geometric information of an object in a sequence of images. Similar background interference, shielding of similar objects of a tracking target and the like greatly increase the difficulty of long-time stable and accurate tracking, and become a research hotspot in the field of computer vision.

Appearance models adopted by the target tracking method are divided into two main categories: a generative model and a discriminant model. The biggest difference is that the generative model does not utilize background information and the discriminant model utilizes background information. Namely, the generative model utilizes each positive sample to establish prior distribution of appearance data of the target, specifically determines the characteristics of the target, and ignores background image information. And the discriminant model also utilizes a negative sample containing a background to train a classifier which can well separate the positive sample from the negative sample and can be popularized. The tracking algorithm is now mainly a discriminant model because the background information can be exploited. The algorithm used by the correlation filter tracking algorithm is a discriminant model,

at present, much work has been done on the related filter Tracking algorithm, and a few effective improvement methods are proposed, for example, the article "Henriques J F, Rui C, Martins P, et al," High-Speed Tracking with Kernelized Correlation Filters, "IEEE Transactions on Pattern Analysis and Machine Analysis, vol.37, No.3, pp.583-596,2015," the article "Danelljan M,

g, Khan F S, "Accurate scale estimation for robust Visual Tracking," in Bright Machine Vision Conference, Nottingham, United Kingdom,2014, pp.65.1-65.11, "adapting to The dimensional changes of The Tracking target," Galoogahi H K, Sim T, Lucy S, "Correlation Filters with limited boundaries," in IEEE Conference Computer Vision and Pattern registration, Boston, USA,2015, pp.4630-4638, "reduced boundary Effect," Danellouin M, Robinson A, Khan F S, et al, "balance Filters: Learg input videos for comparison," Ocular video Tracking camera for "," Ocandum video recorder for, diffusion 472, "diffusion image for diffusion characteristic 2016And (5) continuing the characteristic images and training a universal continuous filter to fuse characteristic image information of different scales and the like. The KCF tracking algorithm with the gaussian kernel function introduced therein is widely applied in practical engineering as a tracking algorithm with excellent performance. However, as the KCF cannot identify occlusion and can only store the latest tracking target image information, when the tracking target is severely deformed, the target bounding box is easy to drift, and a large number of error training samples are introduced when the target is occluded, which all result in the final failure of tracking.

For the problem of severe deformation and occlusion of the Tracking target, the researchers have studied, such as the articles "Ma C, Yang X, Zhang C, et al," Long-term correction Tracking, "in IEEE Conference on Vision and Pattern Recognition, Boston, USA,2015, pp.5388-5396," fast deformation and conservative scale estimation of the Tracking target using a regression model that is not as good as the new rate, "Danelljan M, Hager G, Khan F S, et al," Adaptive subtraction of the Tracking Set: A Universal Formulation for characterization Vision Tracking, "in IEEE Conference on Vision and Pattern Recognition, Las Vegas, USA,2016, 1430-1438," learning weight of the sample of video, "weight learning method of weight of calculation, Wprojector B, marseille, France,2008, pp.788-801. "determine occlusion using tracking trajectory information to assist tracking," Cehovin L, Kristan M, Leonardis A, "road visual tracking using an adaptive doubled-layer visual mode," IEEE Transactions on Pattern Analysis and Machine Analysis, vol.35, No.4, pp.941-953,2013. "correspond occlusion to target template and sparse representation of non-zero elements only in image tile positions, etc. Although the robustness of the tracking algorithm to the occlusion is improved to a certain extent, the methods are large in calculation amount and cannot meet the speed requirement of real-time tracking. In order to avoid the problems, the KCF tracking algorithm adopting a multi-filter can be considered to classify the training samples, so that the typical historical forms of various tracking targets can be recorded, the pollution of error training samples to correct training samples can be reduced, the occlusion is identified by combining the introduced detection algorithm, the target is found back after the target reappears, and the position of a target surrounding frame is corrected.

Disclosure of Invention

The invention aims to provide a target tracking method and a target tracking device.

The technical scheme adopted by the invention is as follows: a target tracking device comprises a visible light zooming imaging system, a control computer and a two-axis servo system, wherein:

the visible light zooming imaging system consists of an industrial camera and a visible light zooming lens. The industrial camera shoots a target image, image data are transmitted to the control computer through the conversion transmission system, and the variable-light focusing lens controls the focal length of the lens according to a feedback result of the tracking computer, so that the size of the target in the shot image is kept constant;

the control computer determines the concrete position of the tracking target frame in the current image by using a tracking algorithm, adjusts the focal length of the lens according to the proportion of the target tracking frame in the image, and controls the two-axis servo system by the deviation amount of the target position and the central point of the field of view given by the tracking algorithm, namely the offset of the target center from the image center, thereby controlling the two-axis servo system to keep the target in the central area of the image.

The two-axis servo system mainly comprises a rotary table body and an electric cabinet. The turntable body part is a final actuating mechanism of the system, adopts a vertical U-shaped structural form and is respectively used for finishing angular motions in the azimuth direction and the pitching direction. The control system is positioned below the inner part of the mechanical table body and is used for placing a control part of the two-axis servo system, and the control part receives control signals of a control computer and realizes real-time motion control of each frame.

The target tracking method comprises the following steps:

step 1: in the initial frame image, extracting a characteristic image of a target according to a given target region, training to obtain the characteristic image and a weight coefficient of a first filter, initializing other filters, and sequentially performing the following operations:

extracting a target characteristic image: extracting a corresponding hog characteristic image according to a given target area;

training weight coefficient: the trained filter is a linear combination of all images obtained by circularly shifting original training images after nonlinear transformation, the weight coefficient of the linear combination is a training weight coefficient, and the calculation mode is as follows:

wherein

x is a c-dimensional hog feature image, and ^ represents discrete Fourier transform, F^-1Represents the inverse discrete fourier transform and is,

represents the multiplication of elements at the same position,^*representing conjugation, λ is the canonical coefficient, σ is the mean square error of the Gaussian distribution, see "Henriques J F, Rui C, Martins P, et al," High-Speed transportation with Kernelized Correlation Filters, "IEEE Transactions on Pattern Analysis and Machine Analysis, vol.37, No.3, pp.583-596,2015". The characteristic image and the weight coefficient of the rest filters are the same as those of the first filter, but the corresponding filter weight is smaller than that of the first filter.

Step 2; obtaining candidate frame feature maps in different scales in the current frame image, respectively matching filters, and setting each filter f_iThe corresponding training feature image is I_fi. For the current candidate frame feature image I with a certain scale, in order to select the best matched filter, the similarity of the training feature images corresponding to the I and all the N filters is calculated, and the following similarity evaluation mode is adopted:

and the filter corresponding to the characteristic image with the minimum S is the best matching filter.

And calculating the response peak value, namely convolving the candidate frame characteristic image with the matched filter, wherein the maximum value of all positions of the convolution result is the response peak value.

And if the response peak value is larger than the threshold value, selecting the best matched filter and the corresponding candidate frame feature image x according to the maximum response peak value, and calculating the weight coefficient alpha by using the selected candidate frame feature image according to the method in the step 1. The candidate frame feature image and the calculated weight coefficient represent newly trained filter information, and the history storage information of the best matched filter is represented by the corresponding history training feature image x 'and the history weight vector alpha'. The best-matched filter updating method comprises the following steps: the historical training image x 'is updated to θ x + (1- θ) x', and the historical weight vector α 'is updated to θ α + (1- θ) α'.

And increasing the filter weight of the best matched filter, correspondingly decreasing the weights of other filters:

wherein f is_βIs the best-matched filter(s) and,

is the filter weight, ω, corresponding to the best matched filter_iAre the filter weights corresponding to the other filters. And if the response peak value does not exist and is larger than the threshold value, entering an occlusion state.

And step 3: and (5) pausing the updating of the filter in the shielding state, and determining a target detection area in the kth frame according to the position of a target surrounding frame of the k-1 frame. Extracting characteristic images of candidate frames with different scales based on the position information of the detection high-resolution frame, respectively matching filters and calculating response peak values, if the detection score and the filter response peak value are both larger than a threshold value, considering that the target is retrieved again, and jumping out of a shielding state, otherwise, directly entering the shielding state for the next frame:

and the target detection area in the k frame is still a rectangular surrounding frame, the center image coordinate of the target detection area in the k frame is the same as the center of the target rectangular surrounding frame in the k-1 frame, and the length and the width of the target detection area in the k frame are respectively specific multiples of the length and the width of the target rectangular surrounding frame in the k-1 frame. When the candidate frame feature image is determined according to the rectangular detection high-resolution frame, the coordinates of the rectangular detection high-resolution frame and the center image of the candidate frame are the same, the length and the width of the candidate rectangular frame are respectively specific multiples eta of the length and the width of the rectangular detection frame, and extracting the candidate frames with different scales, namely, taking eta as different numerical values to extract the feature image

And 4, step 4: and (3) if the target is blocked by 20 continuous frames and is not retrieved, the emergency state is entered, the detection range is expanded to the full image, namely the detection is not limited to the position of the target bounding box of the previous frame and the surrounding area thereof, but the full image is detected, and the subsequent target retrieving steps of detecting the high-score candidate frame matched filter, calculating the response peak value and the like are the same as the algorithm in the step 2.

And 5: and after the target is retrieved again, ending the shielding or emergency state, updating the filter if the weight of the filter of a certain filter is smaller than the threshold, otherwise updating the best matched filter, increasing the weight of the updated filter and correspondingly reducing the weights of the filters of other filters.

The theoretical basis of the invention is a detection-assisted multi-filter tracking algorithm, the implementation method is the steps 1-5, and the implementation block diagram of the complete algorithm is shown in FIG. 4. The main innovation of the method is that all historical training images are classified into different training image sets according to the similarity, a plurality of filters corresponding to different training image sets (namely different target historical forms) are obtained through training, the image positions of the current tracking target are determined through the combined action of the filters, the weight coefficient is determined by the matching degree of the form of the current tracking target and each filter, and the currently determined tracking target enclosure frame is only used for updating the corresponding matched filter. The invention has another innovation that detection auxiliary tracking is introduced to carry out shielding judgment and target retrieval.

The invention has the advantages and effects that: the method can not only memorize various historical forms to adapt to a discontinuous appearance model of a tracked target, but also separate a wrong training sample from a correct training sample, and even if the wrong training sample is introduced, the historical storage information of other correct filters is not polluted. Detection is introduced for correction, and the accuracy of the tracking frame is ensured. The detection is combined with the appearance information of various tracking targets stored by a plurality of filters to judge the occlusion and even the occlusion of the similar target objects, so that the introduction of error samples can be avoided, the detection range can be narrowed (only the original target candidate frame and the surrounding area thereof need to be detected for short-time occlusion) in the process of retrieving the target again, the interference of the similar tracking target objects is eliminated, and the target in any historical typical form is retrieved.

Drawings

FIG. 1 is a flow diagram of a device detection tracking module;

FIG. 2 is a schematic diagram of a two-axis servo system;

FIG. 3 is a visible light zoom imaging system;

FIG. 4 is a block diagram of a complete algorithm implementation of the present invention;

FIG. 5 is a schematic diagram of object loss and recovery in an embodiment of the present invention;

fig. 6 is a schematic view of the tracking effect of the airplane in the embodiment of the invention.

Detailed Description

The device comprises a visible light zooming imaging system, a control computer and a two-axis servo system, wherein:

The target stable tracking method comprises the following steps:

training weight coefficient: in order to more fully express the characteristic image information and further improve the tracking precision, a KCF author nonlinearly converts the original characteristic image into a large-scale characteristic image with enlarged length and width, and trains a corresponding large-scale filter. The large-scale filter is a linear combination of all images obtained by circularly shifting the original training images and subjected to the same nonlinear transformation. And the sum (point multiplication) of the products of all corresponding position elements of the large-scale characteristic image and the large-scale filter is the score of the center point target in the original image. The large-scale filter for training is a linear combination obtained by circularly shifting an original training image and subjecting all images to nonlinear transformation, the weight coefficient of the linear combination is a training weight coefficient, and the calculation mode is as follows:

wherein

x is a c-dimensional hog feature image, and ^ represents discrete Fourier transform, F^-1Representing discrete fourier functionsThe inverse transformation of the leaves is performed,

Step 2; based on the position of the target surrounding frame in the previous frame, acquiring candidate frame feature maps in different scales in the current frame image, respectively matching filters, and setting each filter f_iThe corresponding training feature image is I_fi. For the current candidate frame feature image I with a certain scale, in order to select the best matched filter, the similarity of the training feature images corresponding to the I and all the N filters is calculated, and the following similarity evaluation mode is adopted:

And if the response peak value is larger than the threshold value, selecting the best matched filter and the corresponding candidate frame feature image x according to the maximum response peak value, and calculating the weight coefficient alpha by using the selected candidate frame feature image according to the method in the step 1. The candidate frame feature image and the calculated weight coefficient represent newly trained filter information, and the history storage information of the best matched filter is represented by the corresponding history training feature image x 'and the history weight vector alpha'. The best-matched filter updating method comprises the following steps: the historical training image x 'is updated to θ x + (1- θ) x', and the historical weight vector α 'is updated to θ α + (1- θ) α'. And increasing the filter weight of the best matched filter, correspondingly decreasing the weights of other filters:

wherein f is_βIs the best-matched filter(s) and,

And step 3: and (3) pausing the updating of the filter in the shielding state, calculating a detection area according to the position of the target enclosure frame of the previous frame, and setting the detection area to be M times of the target enclosure frame to obtain a detection area image. The target detection area in the current frame is still a rectangular surrounding frame, the central image coordinates of the rectangular surrounding frame are the same as the center of the target rectangular surrounding frame in the previous frame, and the length and the width of the rectangular surrounding frame in the previous frame are specific multiples of the length and the width of the target rectangular surrounding frame in the previous frame respectively. When the candidate frame feature image is determined according to the rectangular detection high-resolution frame, the coordinates of the rectangular detection high-resolution frame and the center image of the candidate frame are the same, the length and the width of the candidate rectangular frame are respectively specific multiples eta of the length and the width of the rectangular detection frame, and extracting candidate frames with different scales, namely, taking eta as different numerical values to extract the feature image. And detecting all the tracking target similar objects in the detection area by the SSD, and if the detection scores are lower than the threshold value, considering the tracking target similar objects to be shielded, otherwise, obtaining a detection frame with the detection score d larger than the threshold value v 1. Under the condition that the interference of the same kind of target objects is more, the maximum response calculation is carried out on the detection frame matched with the filter again to confirm that the same target is found back, based on the position information of the detection high-resolution frame, the characteristic images of the candidate frames with different scales are extracted, the filters are matched respectively and the response peak value is calculated, if the detection score and the filter response peak value are both larger than the threshold value, the target is found back again, the shielding state is jumped out, and otherwise, the next frame still directly enters the shielding state. Under the tracking environment without the interference of similar objects, the step of calculating the maximum response of the detection frame and re-matching the filter can be omitted.

And 4, step 4: and (3) if 20 continuous frames are shielded and the target is not retrieved, the emergency state is entered, the detection range is expanded to the full image, namely the detection is not limited to the position of the target bounding box of the previous frame and the surrounding area thereof, but the full image is detected, and the subsequent target retrieving steps of detecting the high-score candidate frame matched filter, calculating the response peak value and the like are the same as the algorithm in the step 2.

Examples

The technical solution of the present invention is further described in detail by the following specific examples.

The device consists of a visible light zooming imaging system, a control computer and a two-axis servo system. The visible light zooming imaging system consists of an industrial camera and a visible light zooming lens. The industrial camera shoots a target image and transmits image data to the control computer through the conversion transmission system;

the control computer determines the position of a target enclosure frame in the current frame image by using a tracking algorithm, and controls the two-axis servo system through the deviation of the center of the target enclosure frame and the center point of a view field, namely, the offset of the center of the target from the center of the image, so that the two-axis servo system is controlled to keep the target in the central area of the image. The control computer also controls the focal length of the lens according to the proportion of the target enclosure frame in the current frame image to the whole image, so that the size of the target in the shot image is kept constant. The system detection tracking module flow chart is shown in fig. 1.

The two-axis servo system completes corresponding two-dimensional angular motion under a set instruction for testing. The turntable body part is a final actuating mechanism of the system, adopts a vertical U-shaped structural form and is respectively used for finishing angular motions in the azimuth direction and the pitching direction. The control system is positioned below the inside of the mechanical table body and used for placing a control part of the two-axis servo system, and the functions of real-time motion control, monitoring, protection and the like of each frame are realized. Various control instructions are input through the rocker, and various control functions of the rotary table are achieved. As shown in fig. 2.

The visible light zooming imaging system consists of a visible light zooming lens and an industrial camera. The visible light zoom lens sequentially comprises a focusing assembly, a zooming assembly, a rear fixing assembly, a sealing assembly, a camera assembly and the like. The industrial camera is an SP-5000 model high-definition color camera designed by JAI of Japan, as shown in FIG. 3.

The tracking algorithm of the invention is used for tracking a group of pedestrian moving image sequences which have similar background interference and are shielded even by tracking the similar objects of the target, the images are jpg images with the size of 640 multiplied by 480 and the bit depth of 24, the total frame number of the image sequences is 3624 frames, the tracked target pedestrian is shielded for 9 times, and the total frame number of the shielded pedestrian reaches 1379 frames. The platform used for the experiment was ubuntu 16.04. All experiments are completed on a computer provided with intel core i7 GPU, a main frequency 2.81Ghz, an 8GB memory and a video card NVIDIA GeForce GTX 1050 Ti. Other parameters in the experiment were all provided with code default parameters using the original author.

Fig. 4 presents a system flow diagram of a detection-assisted multi-filter tracking method. The effect of retrieving the shielded target is shown in fig. 5, and the target pedestrian is shielded once per action, so that the shielding is judged, and the target pedestrian reappears and is retrieved.

When the interference of other target similar objects is less in the actual environment applied by the tracking algorithm, the step of detecting the response of the high-score candidate frame matched filter and calculating the filter can be omitted, and the detection result is directly used as the target position. ECO using depth features is selected in the experiment, KCF using hog features and a multi-filter tracking method assisted by KCF improved detection are adopted to track 10 airplane videos which do not have shielding interference and have obvious target form change together. The screenshot of the tracking process is shown in fig. 6, and each row is a schematic diagram of the positions of the tracking boxes in the tracking process in turn from left to right of the ECO, KCF and the method herein. Specific methods for ECO are described in "Danelljan M, Bhat G, Khan F S, et al," ECO: Efficient connection Operators for Transmission, "in IEEE Conference on Computer Vision and Pattern Recognition, State of Hawaii, USA, 2017, pp.6931-6939," KCF specific methods are described in "Henriques J F, Rui C, Martin P, et al," High-Speed Transmission with Kernelized Correlation Filters, "IEEE Transactions on Pattern and Machine analysis integration, vol.37, No.3, pp.583-596, 2015".

It can be seen that although the speed of the KCF is high, the obvious tracking effect is relatively unstable, and when the airplane does not take off, the tracking frame of the KCF has obviously deviated due to the obvious field airflow turbulence and poor picture quality. The ECO is relatively stable, but after the airplane takes off, the ECO cannot adapt to severe deformation due to obvious change of the appearance of the airplane, and the tracking frame gradually deviates from the target. The method proposed herein tracks the aircraft very well throughout, with essentially no drift of the tracking frame. The specific experimental result data and the pair ratio of several tracking methods are shown in table 1, where the FPS is the number of picture frames that can be tracked per second during tracking, and this is related to the size and resolution of the picture, the average CLE is the average center position error and is defined as the average euclidean distance between the target center located by the algorithm and the reference target center, and the average OR is the average coverage and is defined as the average euclidean distance between the target center located by the algorithm and the reference target center.

TABLE 1 ECO, KCF and algorithms herein track various data comparisons

Video name	Land1	Land2	Land3	Land4	Land5	Launch1	Launch2	Launch3	Launch4	Launch5	Mean value of
												Length (frame)	4800	7200	1997	2760	5101	4800	3600	3449	3960	4640	4230.7
FPS(ECO)	＜10	＜10	＜10	＜10	＜10	＜10	＜10	＜10	＜10	＜10	＜10
												FPS(KCF)	55	70	57	47	64	61	62	34	73	87.5	61.05
FPS (modified)	48	34	69	74.6	62	33.6	50	34	43.5	45	49.37
												Average CLE (ECO)	90.5	74.9	131.6	94.5	81	51.7	43	81.8	36	46.3	73.13
Average CLE (KCF)	55	37	94	48.5	45	60	74	105	30.5	43.8	59.28
												Average CLE (modified)	26.43	23	26	34.6	30.5	15\|	25	17.6	16.7	31.8	24.663
Average OR (ECO)	0.76	0.72	0.65	0.8	0.75	0.76	0.82	0.63	0.82	0.75	0.746
												Average OR (KCF)	0.7	0.77	0.73	0.73	0.7	0.73	0.78	0.71	0.73	0.68	0.726
Average OR (modified)	0.92	0.92	0.88	0.87	0.83	0.94	0.9	0.93	0.93	0.81	0.893

Although 10 video segments contain images of the same size, all are 1920 x 1080 pixels, a jpg image of 24 bit depth. But the size and weight of the objects in the image are different, and the weight of the objects largely determines the calculation amount and the tracking speed. The KCF is fastest over most of the video segments because it trains the filter for only one frame of the target bounding box image at a time, the filter is a combination of weights after cyclic shifting of the target bounding box image, and the weight coefficients can be directly calculated. The real-time generally requires that the frame frequency is more than 25 frames, and the KCF can completely meet or even greatly exceed the real-time required speed, but the precision of the KCF cannot be guaranteed. The average OR is always around 0.7, the drift is obvious when the appearance of the target changes dramatically, and the average CLE reaches even 100 pixels on the launch3 video segment.

The ECO needs to combine all the target bounding box images stored in history to train a continuous filter together every time, the filter cannot be directly calculated, and a time-consuming conjugate gradient method needs to be used for iterative optimization, so that the ECO is a tracking algorithm with the slowest speed in all video segments and completely fails to meet the real-time requirement. The average OR is around 0.75, the shift is obvious when the appearance of the target changes dramatically, and the average CLE reaches even 130 pixels in land 3.

The speed of the multi-filter improved algorithm introduced with detection is greatly changed because the multi-filter improved algorithm is the combination of the detection and the KCF, the speed of the detection and the KCF is inconsistent, the detection speed is slower than that of the KCF, and if the number of times of the airplane appearance in a video band is changed violently is large, the typical historical form is large, and the detection needs to be introduced for multiple times, the time consumption is long. The reason why the improved algorithm approaches or even exceeds the KCF is that after the KCF shifts, a large number of background parts which do not belong to the target are framed, so that the calculation amount is increased, and the speed is slowed down. The frame frequency of the improved algorithm is at least more than 30, which is enough to meet the real-time requirement, the precision of the improved algorithm is obviously improved, the average OR is always more than 0.8, frequently reaches 0.9, and the average CLE is always within 35 pixels.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structural protection and equivalent process changes made by using the contents of the present specification and the drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A target tracking method using a target tracking apparatus, the apparatus comprising: a visible light zoom imaging system, a control computer and a two-axis servo system; wherein,

the visible light zooming imaging system consists of an industrial camera and a visible light zooming lens, wherein the industrial camera shoots a target image and transmits image data to the control computer through the conversion transmission system, and the visible light zooming lens controls the focal length of the lens according to a feedback result of the tracking computer, so that the size of the target in the shot image is kept constant;

the control computer determines the specific position of a tracking target frame in the current image by using a tracking algorithm, adjusts the focal length of a lens according to the proportion of the target tracking frame in the image, and controls a two-axis servo system by the deviation amount of the target position and the central point of a view field given by the tracking algorithm, namely the offset of the target center from the image center, so that the two-axis servo system is controlled to keep the target in the central area of the image;

two axis servo mainly comprises two parts of revolving stage platform body and electric cabinet, and revolving stage platform body part is the final actuating mechanism of system, adopts vertical U type structural style, is used for accomplishing the angular motion of position, two directions of every single move respectively, and control system is located the inside below of mechanical stage body, is used for laying two axis servo's control part, and its control signal that accepts the control computer realizes the real-time motion control to each frame, its characterized in that: the method comprises the following implementation steps:

firstly, extracting a characteristic image of a target according to a given target area in an initial frame image, training to obtain a characteristic image and a weight coefficient of a first filter, and initializing other filters and filter weights thereof;

step two, acquiring candidate frame feature maps in different scales in the current frame image, and respectively matching filters to calculate response peak values; if the response peak value is larger than the threshold value, selecting a candidate frame feature map and a filter with corresponding scales according to the maximum response peak value, updating the corresponding filter and increasing the weight of the filter; if the response peak value does not exist and is larger than the threshold value, entering a shielding state;

step three, pausing filter updating in the shielding state, and determining a target detection area in the current frame according to the position of a target surrounding frame of the previous frame; extracting characteristic images of candidate frames with different scales based on the position information of the detected high-resolution frame, respectively matching filters and calculating response peak values, if the detection score and the response peak values are both larger than a threshold value, considering that the target is retrieved again, and if not, directly entering a shielding state for the next frame;

in the third step, the updating of the filter is suspended in the shielding state, and a target detection area in the kth frame is determined according to the position of a target enclosure frame of the k-1 frame; extracting characteristic images of candidate frames with different scales based on the position information of the detection high-resolution frame, respectively matching filters and calculating response peak values, if the detection score and the filter response peak value are both larger than a threshold value, considering that the target is retrieved again, and jumping out of a shielding state, otherwise, directly entering the shielding state in the next frame;

the target detection area in the kth frame is still a rectangular surrounding frame, the center image coordinate of the target detection area is the same as the center of the target rectangular surrounding frame in the k-1 frame, and the length and the width of the target detection area are specific multiples of the length and the width of the target rectangular surrounding frame in the k-1 frame respectively; when a candidate frame feature image is determined according to the rectangular detection high-resolution frame, the coordinates of the rectangular detection high-resolution frame and the center image of the candidate frame are the same, the length and the width of the candidate rectangular frame are respectively specific multiples eta of the length and the width of the rectangular detection frame, and extracting candidate frames with different scales, namely, taking eta as different numerical values to extract the feature image;

fourthly, entering an emergency state when 20 continuous frames are shielded and the target is not retrieved, and expanding the detection range to the full image;

and step five, after the target is retrieved again, ending the shielding or emergency state, if the filter weight of a certain filter is smaller than the threshold value, updating the filter, otherwise, updating the best matched filter, increasing the filter weight of the updated filter and correspondingly reducing the filter weights of other filters.

2. The target tracking method of claim 1, wherein:

in the first step, in an initial frame image, extracting a feature image of a target according to a given target region, training to obtain a feature image and a weight coefficient of a first filter, initializing other filters, and sequentially performing the following operations:

training weight coefficient: the trained filter is a linear combination of all images obtained by circularly shifting the original training images after nonlinear transformation, the weight coefficient of the linear combination is a training weight coefficient, the characteristic images and the weight coefficients of the rest filters are the same as those of the first filter, but the weight of the corresponding filter is smaller than that of the first filter.

3. The target tracking method of claim 1, wherein:

in the second step, the candidate frame feature maps are obtained in different scales in the current frame image, the filters are respectively matched, and each filter f is set_iThe corresponding training feature image is I_fi(ii) a For the current candidate frame feature image I with a certain scale, in order to select the best matched filter, the similarity of the training feature images corresponding to the I and all the N filters is calculated, and the following similarity evaluation mode is adopted:

the filter corresponding to the characteristic image with the minimum S is the matched filter;

calculating a response peak value, namely convolving the current scale candidate frame characteristic image I with a matched filter, wherein the maximum value of all positions of a convolution result is the response peak value;

if the response peak value is larger than the threshold value, selecting the best matching filter and the corresponding candidate frame feature image x according to the maximum response peak value, and calculating a weight coefficient alpha according to the selected candidate frame feature image; the candidate frame characteristic image and the calculated weight coefficient represent newly trained filter information, and the historical storage information of the best matched filter is represented by a corresponding historical training characteristic image x 'and a historical weight vector alpha'; the best-matched filter updating method comprises the following steps: updating the historical training image x 'to be theta x + (1-theta) x', and updating the historical weight vector alpha 'to be theta alpha + (1-theta) alpha';

wherein f is_βIs the best-matched filter(s) and,

is the filter weight, ω, corresponding to the best matched filter_iAnd if no response peak value is larger than the threshold value, entering an occlusion state.

4. The target tracking method of claim 2, wherein:

in the fourth step, 20 continuous frames of shelters are covered and the target is not retrieved, the emergency state is entered, the detection range is expanded to the full image, namely the detection is not limited to the position of the target bounding box of the previous frame and the surrounding area, but the full image is detected, the high-score candidate frame is detected to be matched with the filter, and the response peak value and other subsequent targets are calculated.

5. The target tracking method of claim 2, wherein:

in the fifth step, after the target is retrieved again, the blocking or emergency state is ended, if the filter weight of a certain filter is smaller than the threshold value, the filter is updated, otherwise, the best matched filter is updated, the filter weight of the updated filter is increased, and the filter weights of other filters are correspondingly reduced.