CN113920159A

CN113920159A - Infrared aerial small target tracking method based on full convolution twin network

Info

Publication number: CN113920159A
Application number: CN202111081287.1A
Authority: CN
Inventors: 刘刚; 张文波; 曹紫绚; 董猛; 刘龙哲; 田慧; 权冰洁
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2022-01-11
Anticipated expiration: 2041-09-15
Also published as: CN113920159B

Abstract

The invention provides a full convolution twin network-based infrared aerial small and medium target tracking algorithm, which aims to solve the practical problems of background clutter interference, shielding and the like in the tracking process of an infrared imaging guidance system for aerial small and medium targets. The invention can adapt to complex and various infrared aerial scenes and realize effective and stable real-time tracking of small infrared aerial targets.

Description

Infrared aerial small target tracking method based on full convolution twin network

Technical Field

The invention belongs to the technical field of infrared aerial target tracking, and particularly relates to an infrared aerial small target tracking method based on a full convolution twin network.

Background

The air target tracking is one of the key technologies of the infrared imaging guidance system. The infrared imaging guidance system has higher and higher requirements on an aerial target tracking technology, and mainly reflects the phenomena that the target distance is long and the natural environment is complex and changeable, so that most targets in a field of view in a tracking stage are limited pixels (small targets), characteristic information is weak, background clutter is much interfered, shielding exists and the like, and great troubles are brought to target tracking. Therefore, how to effectively solve the problem of tracking failure caused by background clutter interference, shielding and the like appearing on the target in the infrared air small target tracking process and improve the accuracy and the real-time performance of the tracking method is a technical problem which needs to be solved urgently by technical personnel in the field at present.

At present, most of infrared target tracking are traditional algorithms, and targeted feature extraction methods are designed manually according to different scenes. However, for complex infrared air scenes, it is difficult for existing traditional tracking algorithms to adapt to all situations. In recent years, deep learning algorithms are rapidly developed, and domestic and foreign scholars widely apply deep features to target tracking algorithms and have deep learning-based algorithms which exceed traditional related filtering tracking algorithms in performance. However, the convolution neural network has a large calculation amount in the back propagation process, so that the tracking algorithm for updating the network parameters on line has a low speed, and the real-time tracking requirement cannot be met.

Aiming at the problems, a twin network-based target tracking algorithm adopts a similarity verification method to convert tracking into a template matching problem, and the target tracking algorithm becomes an important research direction in the field of target tracking due to the strong end-to-end training capability and tracking real-time property. At present, a target tracking algorithm based on a full-convolution twin network (SiamFC) is a classic algorithm in the aspect of target tracking, cross-correlation operation is performed on a to-be-searched area and depth characteristics of a target template, similarity is measured by an obtained response value, the maximum position of the response value is selected as a target center point, and good tracking accuracy and speed are obtained.

Disclosure of Invention

Based on the problems, the invention provides a twin network infrared space small and medium target tracking method based on full convolution, which aims to solve the problem of infrared small target tracking in a complex air scene, and particularly solves the situations of background clutter interference, shielding and the like of a tracked target.

In order to achieve the purpose, the invention adopts the technical scheme that: a twin network infrared space small and medium target tracking method based on full convolution comprises the following steps:

s1, inputting the image sequence into a full convolution twin network, selecting the first frame of the sequence as a target template z, providing a region x to be searched for in the subsequent frame, and respectively passing through the convolution neural network sharing parameters

Extracting depth features to obtain a feature map of the target template

And feature map of the region to be searched

S2, using the feature map of the target template

Checking the area to be searched for by convolutionCharacteristic diagram

Performing convolution operation to obtain a characteristic response graph M;

s3, evaluating and judging the current frame target tracking state by using the average peak value correlation energy and the maximum peak value of the current frame response image, and executing the step S4 if the current target tracking is judged to be in a normal state; if the target is judged to be interfered by the background clutter, executing the step S5; if yes, executing step S6;

s4, under normal tracking state, responding to current t frame_tApplying Hamming window with the same size to inhibit boundary effect, and selecting the maximum peak point of the response graph as a target point;

s5, when the targets are interfered by background clutter, forming a candidate point set by the current frame response image multi-peak points, calculating the feature similarity score of each candidate target and the real target of the historical frame by using the depth feature and the local contrast feature, and selecting the candidate point with the highest similarity as the current frame target point;

and S6, when the target is occluded, predicting the target position of the current frame in the occlusion state according to Kalman filtering constructed by the target position information of the historical frame. If the normal tracking is performed after the occlusion is determined to be released, step S4 is executed.

Further, step S3 includes,

s3.1, measuring the fluctuation condition of the response graph by using an Average Peak Correlation Energy (APCE) index, wherein the APCE is specifically defined as:

wherein, F_max、F_minRespectively representing the maximum value and the minimum value in the response diagram; i represents the abscissa of the response plot, j represents the ordinate of the response plot, F_i,jIs the response value at (i, j) in the response map; under the normal tracking condition, the fluctuation of the response diagram is small, and the APCE value is large; when the target is interfered by background clutter and shielded, the response diagram fluctuates severely compared with the APCE tracked normallyThe value is greatly reduced; smaller APCE values indicate more unstable tracking conditions;

s3.2, define λ_APCEAnd

the ratio of the APCE value of the response image of the current frame and the corresponding average value of the maximum peak value and the historical frame is used for quantifying the change degree of the APCE value and the maximum peak value of the current frame, namely:

wherein, APCE_tAnd F_max-t、APCE_iAnd F_max-iThe response map APCE values and the maximum peak value, n, of the current frame and the historical frame, respectively₁For reference historical frame number, passing lambda in tracking process_APCEAnd

the value is combined with the information of the historical frame response map to judge the current tracking state.

Further, in step S3.2 by λ_APCEAnd

the method for judging the current tracking state by combining the value with the information of the historical frame response image comprises the following steps:

a. variation rate lambda of APCE value of current frame response diagram_APCEIf the current tracking state is larger than a certain threshold value, judging that the current tracking state is a normal tracking state, otherwise, executing the step b to judge other tracking states;

b. maximum peak change rate of current frame response map

Less than a threshold and n consecutive before the current frame₄And if the maximum value of the frame response image gradually decreases and the decrease amplitude is larger than a certain threshold value compared with the previous frame, judging that the current tracking state is a shielding state, otherwise, judging that the current tracking state is a background clutter interference state.

Further, step S5 includes:

s5.1, defining a feature similarity score S of each candidate target_jComprises the following steps:

wherein D is a candidate target point set, and the number is n₃(ii) a j is the candidate target number, F_jAnd C_jThe response value and the local contrast of the candidate target j of the current frame; f_max-iAnd C_max-iIs the peak value and local contrast of the maximum peak in the historical frame response map, n₂A historical frame number for reference; beta and 1-beta are respectively the response value and the weight occupied by the local contrast; feature similarity score S constructed using depth feature response values and local contrast features of a target_jThe similarity of the candidate target j and the real target characteristic is measured, S_jSmaller values indicate that the candidate target characteristic value is closer to the real target;

s5.2, searching the multi-peak point in the current frame response image by using a maximum filter, and selecting the front n of the multi-peak value₃The named point is the center point of the candidate target;

s5.3, calculating the local contrast of each candidate target by using the same size of the target frame;

s5.4, calculating corresponding feature similarity score S according to the local contrast and the response value of each candidate target_j，S_jAnd obtaining the position of the center point of the target of the current frame through position transformation by taking the candidate target point with the minimum value as a target point.

Compared with the prior art, the invention has the beneficial effects that: the infrared air small target tracking method provided by the invention uses a full convolution twin network to extract depth characteristics, and obtains a depth characteristic response graph. And judging the current target tracking state through the average peak value correlation energy and the maximum peak value change condition of the response graph. When the target is normally tracked, selecting a maximum peak point of a response graph to correspond to a target central point; when the background clutter interference of the target is judged, eliminating the clutter interference by using a depth characteristic response value and local contrast method; and when the target is judged to be shielded, performing position prediction by using Kalman filtering. The tracking method provided by the invention can effectively process the conditions that the small target in the infrared space is interfered and shielded by a complex background, has better performance on tracking the small target in the infrared space, and can meet the real-time requirement of tracking.

Drawings

FIG. 1 is a schematic overall flow diagram of the tracking method of the present invention;

FIG. 2 is a diagram of a full convolution twin network in accordance with the present invention;

FIG. 3 is a graph of the success rate of the method of the present invention (Our) and 9 other comparison algorithms on an infrared aerial small target test set;

FIG. 4 is a graph of the accuracy of the method of the present invention (Our) and 9 other comparison algorithms on an infrared aerial small target test set;

FIG. 5 is a graph of the success rate of the method of the present invention (Our) and 9 other comparison algorithms on infrared aerial small target test data with complex background interference attributes;

FIG. 6 is a graph of the accuracy of the method of the present invention (Our) and 9 other comparison algorithms on infrared aerial small target test data with complex background interference attributes;

FIG. 7 is a graph of the success rate of the method of the present invention (Our) and 9 other comparison algorithms on infrared aerial small target test data with target occluded attributes;

FIG. 8 is a graph of the accuracy of the method (Our) of the present invention and 9 other comparison algorithms on infrared aerial small target test data with target occluded property.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention.

The principle of the invention is as follows: the infrared air small target tracking method provided by the invention uses a full convolution twin network to extract depth characteristics, and obtains a depth characteristic response graph. And judging the current target tracking state through the average peak value correlation energy and the maximum peak value change condition of the response graph. When the target is normally tracked, selecting a maximum peak point of a response graph to correspond to a target central point; when the background clutter interference of the target is judged, eliminating the clutter interference by using a depth characteristic response value and local contrast method; and when the target is judged to be shielded, performing position prediction by using Kalman filtering. The tracking method provided by the invention can effectively process the conditions that the small target in the infrared space is interfered and shielded by a complex background, has better performance on tracking the small target in the infrared space, and can meet the real-time requirement of tracking.

The invention discloses a twin network infrared space medium and small target tracking method based on full convolution, which comprises the following steps as shown in the attached figure 1:

Extracting depth features to obtain a feature map

And

specifically, the deep feature extraction network in step S1 is designed on the basis of AlexNet. Five convolutional layers in total, wherein the maximum pooling layer is used after the first two convolutional layers, each convolutional layer except the last convolutional layer uses ReLU, and the last three convolutional layers use packet convolution. After each linear layer there is a Batch Normalization layer (Batch Normalization), the convolutional layer has no padding operation, and the net total step size is 8. In addition, the present invention trains features through off-lineAnd extracting a network, wherein in an off-line training stage, a visible light data set (ILSVRC2015) is used for training, and then, the training is further carried out on the infrared aerial small target image sequence to better acquire the depth characteristics of the infrared aerial small target. The network parameter optimization is carried out by using a stochastic gradient descent algorithm in the training process, 50 rounds of training are carried out, and the learning rate is set to be 10^-2The batch size of the training images is set to be 8, and the optimal result in the last 10 rounds is obtained.

S2, using the feature map of the target template

Treating feature maps of search regions for convolution kernels

And performing convolution operation to obtain a characteristic response graph M.

Specifically, in the actual tracking process, as shown in the full convolution twin network structure diagram of fig. 2, the target template and the region to be searched are respectively clipped and transformed into 127 × 127 × 3 z and 255 × 255 × 3 x, and the two are passed through the feature extraction network

Then 6X 6 is obtained

And

the characteristic diagram of (1). Taking the 6 × 6 × 128 feature map as a convolution kernel, performing convolution operation on the feature map with the 22 × 22 × 128 feature map to obtain a 17 × 17 feature response map, and performing bicubic interpolation to obtain a 272 × 272 final feature response map.

And S3, evaluating and judging the target tracking state of the current frame by using the average peak correlation energy and the maximum peak of the response graph of the current frame. If the current target tracking is determined to be in the normal state, step S4 is executed. If the target is judged to be interfered by the background clutter, executing the step S5; if it is determined that the target is blocked, step S6 is executed.

Specifically, step S3 includes:

s3.1, measuring the fluctuation condition of the response graph by using an average peak to correlation energy (APCE) index, wherein the APCE is specifically defined as:

in the formula, F_max、F_minRespectively representing the maximum value and the minimum value in the response diagram; i represents the abscissa of the response plot, j represents the ordinate of the response plot, F_i,jIs the response value at (i, j) in the response map. Under normal tracking conditions, the response plot fluctuates less, the vision appears to be a "unimodal" state, and the APCE value is greater. When the target is interfered by background clutter and shielded, the response graph fluctuates severely, the vision is in a 'multi-peak' state, and the APCE value is greatly reduced compared with that of normal tracking. The current tracking state can be effectively reflected by analyzing the fluctuation condition of the response diagram and calculating the APCE value of the response diagram.

The tracking failure reason is judged according to the change state of the maximum peak value of the current frame response image, when a target is interfered by background clutter, the maximum peak value of the current frame response image changes suddenly compared with the maximum peak value of the previous frame response image, and the maximum peak value of the previous frame response image of the current frame is kept stable for a period of time. In the process that the target is slightly shielded to be completely shielded, the maximum peak value of the corresponding response image starts to be gradually reduced to be the minimum value when the target is completely shielded, namely when the target of the current frame is completely shielded, the maximum peak value of the response image of the current frame is the minimum value, and the maximum value of the response image before the current frame is gradually reduced. Therefore, when the APCE value is small, the change state of the maximum peak value of the response diagram can be combined to classify factors which can cause the tracking failure.

S3.2, definition of lambda in the invention_APCEAnd

wherein, APCE_tAnd F_max-t、APCE_iAnd F_max-iThe response map APCE values and the maximum peak value, n, of the current frame (t frame) and the historical frame, respectively₁For reference historical frame number, n is determined experimentally₁Is 10, in particular by λ during tracking_APCEAnd

In the present invention, λ is passed in the tracking process_APCEAnd

the current tracking state is judged by combining the value with the information of the historical frame response image, and the specific implementation steps are as follows:

a. variation rate lambda of APCE value of current frame response diagram_APCEGreater than a threshold value alpha₁If so, the current tracking state is judged to be the normal tracking state, otherwise, the step S3.2 is executed to judge other tracking states. Determination of alpha from experiments₁Is 0.55.

b. Maximum peak change rate of current frame response map

Less than threshold value alpha₂And n is consecutive before the current frame₄The maximum value of the frame response image is gradually reduced, and the reduction amplitude is larger than alpha compared with the previous frame₃If so, judging that the current tracking state is a shielding state, otherwise, judging that the current tracking state is a background clutter interference state. Determination of alpha from experiments₂Is 0.68, alpha₃Is 0.08, n₄Is 4.

S4, under normal tracking state, responding to current t frame_tAnd applying Hamming window with the same size to inhibit the boundary effect, and selecting the maximum peak point of the response graph as a target point.

Specifically, the maximum peak point in the response graph is selected as a target point, an offset value of the target point relative to the center of the response graph is obtained, and the offset value is multiplied by the total network step length to obtain the center position of the target of the current frame.

S5, when the target is interfered by the background clutter, forming a candidate point set by the current frame response image multi-peak points, calculating the feature similarity score of each candidate target and the real target of the historical frame by using the depth feature and the local contrast feature, and selecting the candidate point with the highest similarity as the current frame target point.

Specifically, step S5 includes:

s5.1, the local contrast in step S5 is the ratio of the target gray level mean to the local neighborhood background gray level mean, and is defined as:

in the formula, Ω represents a target region, Ψ represents a neighborhood of the target region, N is the number of pixels in the region, and I (I, j) is the gray level of the pixel in the original image (I, j).

In step S5, the depth feature response value and the local contrast feature of the target are used to construct a feature similarity score S between each candidate target and the real target of the historical frame_jFeature similarity score S for each candidate target_jIs defined as:

wherein D is a candidate target point set, and the number is n₃(ii) a j is the candidate target number, F_jAnd C_jThe response value and the local contrast of the candidate target j of the current frame. F_max-iAnd C_max-iIs the peak value and local contrast of the maximum peak in the historical frame response map, n₂The historical frame number for reference. Beta and 1-beta are the response values and the weights of the local contrast, respectively. Determination of n from experiments₃Is 8, n₂Is 5, beta is 0.4. Feature similarity score S_jThe candidate target j and the real target feature are measuredSimilarity of (D), S_jSmaller values indicate that the candidate target feature value is closer to the true target.

S5.2, searching the multi-peak point in the current frame response image by using a maximum filter, and selecting the front n of the multi-peak value₃The named point is the candidate target center point.

And S5.3, calculating the local contrast of each candidate target by using the same target frame size.

S6, when the target is shielded, predicting the target position of the current frame in the shielding state according to Kalman filtering constructed by the target position information of the historical frame, when the target is judged to be separated from the shielding, carrying out normal tracking, and executing the step S4.

Specifically, in the target tracking process, a Kalman filter is initialized, and then the motion state of the target is predicted and optimized and estimated in an iteration process according to the subsequent target tracking result. Namely, when the target is occluded, the target position of the current frame can be predicted by using Kalman filtering according to the target position information of the historical frame. In the shielding process, if the APCE value change rate lambda of the current frame_APCEGreater than a threshold value alpha₁And the maximum value change rate lambda_Fmax-tGreater than a threshold value alpha₄If the target is not shielded, normal tracking is carried out. Determination of alpha from experiments₁Is 0.55, alpha₄Is 0.88.

The above-mentioned S1-S2 are the target depth feature extraction process, S3 is the target tracking state judgment, and S4, S5, S6 are processing methods respectively for normal tracking, when the target is interfered by background clutter and when the target is occluded, and are combined together to form a complete target tracking process. In the actual tracking process, the entire target tracking is completed by repeating the steps S1-S4/S5/S6.

The tracking effect of the invention (Our) is verified by a simulation experiment, which adopts a public infrared image weak and small airplane target detection tracking data set under the ground-air background to compare the tracking performance with 9 classical tracking algorithms. The method comprises a complete and a STRCF based on a related filtering algorithm, a related filtering algorithm ECO-HC fusing depth characteristics, a deep learning algorithm MDNet based on online fine adjustment, and deep learning algorithms SimFC, SimRPN, DaSimRPN, SimDW and SimFC + +, which are based on a twin network. The deep learning-based algorithm is trained by adopting the same training set.

The simulation experiment results refer to fig. 3 and fig. 4, which are a success rate graph and a precision graph of the tracking method and other 9 comparison algorithms in the test set, respectively. In fig. 3 and fig. 4, the uppermost curves are the tracking success rate curve and the tracking accuracy curve of the present invention, respectively, and it can be seen that the tracking method of the present invention is significantly superior to other 9 algorithms in terms of tracking success rate and accuracy, and rank first in both tracking success rate and tracking accuracy.

In order to further analyze the performance of the tracking method under the conditions of complex background and occlusion in detail, data of two attributes of complex background interference and occluded condition of a target are selected from a test set, and the tracking method and other 9 comparison algorithms are respectively compared on the two attribute data. In fig. 5 and 6, the uppermost curves are the tracking success rate curve and the tracking accuracy curve of the present invention under the condition of the complex background interference, respectively, as shown in fig. 5 and 6, the tracking performance of the tracking method of the present invention under the condition of the complex background interference is ranked first; in fig. 7 and 8, the curves at the top are the tracking success rate curve and the tracking accuracy curve under the occlusion condition, respectively, as shown in fig. 7 and 8, the tracking performance of the tracking method of the present invention is ranked first under the occlusion condition. In addition, the tracking method has the testing speed of 145 frames/s and meets the real-time requirement.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A twin network infrared space small and medium target tracking method based on full convolution is characterized by comprising the following steps:

Extracting depth features to obtain a feature map of the target template

And feature map of the region to be searched

S2, using the feature map of the target template

Treating feature maps of search regions for convolution kernels

Performing convolution operation to obtain a characteristic response graph M;

s4, under normal tracking state, responding to current t frame_tApplying Hamming window with the same size to inhibit boundary effect, and selectingTaking the maximum peak point of the response map as a target point;

and S6, when the target is shielded, predicting the target position of the current frame in the shielding state according to Kalman filtering constructed by the target position information of the historical frame, and if the target position is judged to be separated from the shielding, carrying out normal tracking, and executing the step S4.

2. The method as claimed in claim 1, wherein the step S3 includes,

wherein, F_max、F_minRespectively representing the maximum value and the minimum value in the response diagram; i represents the abscissa of the response plot, j represents the ordinate of the response plot, F_i,jIs the response value at (i, j) in the response map; under the normal tracking condition, the fluctuation of the response diagram is small, and the APCE value is large; when the target is interfered by background clutter and shielded, the fluctuation of the response diagram is severe, and the APCE value is greatly reduced compared with the APCE value of normal tracking; smaller APCE values indicate more unstable tracking conditions;

s3.2, define λ_APCEAnd

3. The twin network infrared space small and medium target tracking method based on full convolution as claimed in claim 2, wherein in step S3.2, λ is used_APCEAnd

b. maximum peak change rate of current frame response map

4. The method as claimed in claim 1, wherein the step S5 includes:

s5.1, defining each candidate targetFeature similarity score of (S)_jComprises the following steps:

wherein D is a candidate target point set, and the number is n₃(ii) a j is the candidate target number, F_jAnd C_jThe response value and the local contrast of the candidate target j of the current frame; f_max-iAnd C_max-iIs the peak value and local contrast of the maximum peak in the historical frame response map, n₂A historical frame number for reference; beta and 1-beta are respectively the response value and the weight occupied by the local contrast; feature similarity score S constructed using depth feature response values and local contrast features of a target_jThe similarity between the candidate target j and the real target characteristic is measured, S_jSmaller values indicate that the candidate target characteristic value is closer to the real target;