CN113920159B

CN113920159B - Infrared air small and medium target tracking method based on full convolution twin network

Info

Publication number: CN113920159B
Application number: CN202111081287.1A
Authority: CN
Inventors: 刘刚; 张文波; 曹紫绚; 董猛; 刘龙哲; 田慧; 权冰洁
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2024-05-10
Anticipated expiration: 2041-09-15
Also published as: CN113920159A

Abstract

The invention provides a method for judging a target tracking state according to average peak correlation energy and maximum peak of a depth characteristic response diagram on the basis of a full convolution twin network, wherein when background clutter interference occurs, a method for judging local contrast is combined with a depth characteristic response value to select a target, and when occlusion occurs, the position of the target is predicted through Kalman filtering. The method can adapt to complex and diverse infrared air scenes and realize effective and stable real-time tracking of small and medium-sized targets in infrared air.

Description

Infrared air small and medium target tracking method based on full convolution twin network

Technical Field

The invention belongs to the technical field of infrared space target tracking, and particularly relates to an infrared space small target tracking method based on a full convolution twin network.

Background

Aerial target tracking is one of the key technologies for infrared imaging guidance systems. The infrared imaging guidance system has higher and higher requirements on the technology of tracking the air target, and is mainly characterized in that the distance of the target is far, the natural environment is complex and changeable, so that the phenomena of most of targets in the field of view of the tracking stage are limited pixels (small targets), characteristic information is weak, background clutter interference is large, shielding exists and the like, and great trouble is brought to the tracking of the target. Therefore, in the process of tracking small and medium targets in infrared air, how to effectively solve the problem of tracking failure caused by background clutter interference, shielding and the like of the targets, and improve the accuracy and the instantaneity of a tracking method is a technical problem which needs to be solved by the current technicians in the field.

At present, most of the infrared target tracking methods are conventional algorithms, and the methods are manually designed according to different scenes to obtain targeted characteristic extraction methods. However, for complex infrared air scenes, it has been difficult for conventional tracking algorithms to accommodate all situations. In recent years, the deep learning algorithm is rapidly developed, and students at home and abroad widely apply the deep features to the target tracking algorithm, and the deep learning algorithm based on the traditional correlation filtering tracking algorithm has been adopted. However, the calculation amount of the back propagation process of the convolutional neural network is large, so that the speed of a tracking algorithm for online updating network parameters is low, and the real-time tracking requirement cannot be met.

Aiming at the problems, a target tracking algorithm based on a twin network adopts a similarity verification method to convert tracking into a template matching problem, and the end-to-end training capability and the tracking instantaneity are strong, so that the method becomes an important research direction in the field of target tracking. At present, a target tracking algorithm based on a full convolution twin network (Fully-Convolutional Siamese Networks, siamFC) is a classical algorithm in the aspect of target tracking, the cross-correlation operation is carried out on the depth characteristics of a region to be searched and a target template, the obtained response value is used for measuring the similarity, and the position with the maximum response value is selected as a target center point, so that good tracking precision and speed are obtained.

Disclosure of Invention

Based on the problems, the invention provides a full convolution twin network-based infrared air small target tracking method, which aims to solve the problem of infrared small target tracking in a complex air scene, and particularly to solve the situations of background clutter interference, shielding and the like of a tracked target.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: a method for tracking small and medium targets in infrared air based on a full convolution twin network comprises the following steps:

S1, inputting an image sequence into a full convolution twin network, selecting a first frame of the sequence to be marked as a target template z, and providing a region x to be searched by a subsequent frame, wherein the first frame and the subsequent frame respectively pass through a convolution neural network sharing parameters Extracting depth features to obtain a feature map/> -of the target templateAnd feature map of the region to be searched/>

S2, using a feature map of the target templateFeature map/>, for a region to be searched, for convolution kernelsPerforming convolution operation to obtain a characteristic response diagram M;

S3, evaluating and judging a target tracking state of the current frame by using the average peak correlation energy and the maximum peak of the current frame response diagram, and executing a step S4 if the current target tracking is judged to be in a normal state; if the target is judged to be interfered by the background clutter, executing a step S5; if the target is judged to be shielded, executing a step S6;

s4, under the normal tracking state, applying a Hamming window suppression boundary effect with the same size to a current t frame response diagram M _t, and selecting the maximum peak point of the response diagram as a target point;

S5, when the target is interfered by background clutter, forming a candidate point set by using multi-peak points of the current frame response graph, calculating feature similarity scores of each candidate target and a historical frame real target by using depth feature and local contrast feature, and selecting a candidate point with highest similarity as a current frame target point;

S6, when the target is shielded, predicting the target position of the current frame in the shielding state according to Kalman filtering constructed by the target position information of the historical frame. And if the shielding is judged to be separated, normal tracking is performed, and step S4 is executed.

Further, the step S3 includes,

S3.1, measuring fluctuation conditions of a response graph by using an Average Peak Correlation Energy (APCE) index, wherein the APCE is specifically defined as:

Wherein, F _max、F_min represents the maximum value and the minimum value in the response diagram respectively; i represents the abscissa of the response graph, j represents the ordinate of the response graph, and F _i,j is the response value at (i, j) in the response graph; under the normal tracking condition, the response graph has smaller fluctuation and larger APCE value; when the target is interfered by background clutter and is blocked, the fluctuation of the response diagram is severe, and compared with the APCE value of normal tracking, the response diagram is greatly reduced; smaller APCE values indicate a less stable tracking state;

S3.2, definition lambda _APCE and The ratio of the APCE value of the response graph of the current frame to the corresponding average value of the maximum peak value and the historical frame is used for quantifying the variation degree of the APCE value and the maximum peak value of the current frame, namely:

wherein APCE _t, F _max-t、APCE_i and F _max-i are the response map APCE value and maximum peak value of the current frame and the history frame, n ₁ is the reference history frame number, and the tracking process is carried out by lambda _APCE and F _max-i The values are combined with the information of the historical frame response map to determine the current tracking state.

Further, step S3.2 is performed by lambda _APCE andThe value is combined with the information of the historical frame response diagram to judge the current tracking state, and the method comprises the following steps:

a. When the change rate lambda _APCE of the APCE value of the current frame response chart is larger than a certain threshold value, judging that the current tracking state is a normal tracking state, otherwise, executing the step b to judge other tracking states;

b. Maximum peak change rate of current frame response map And if the current tracking state is smaller than a certain threshold value and the maximum value of the response graph of the continuous n ₄ frames before the current frame is gradually reduced, and the reduction amplitude is larger than a certain threshold value compared with the previous frame, judging that the current tracking state is an occlusion state, otherwise, judging that the current tracking state is a background clutter interference state.

Further, step S5 includes:

s5.1, defining a feature similarity score S _j of each candidate target as follows:

Wherein D is a candidate target point set, and the number is n ₃; j is a candidate target sequence number, and F _j and C _j are the response value and local contrast of the candidate target j of the current frame; f _max-i and C _max-i are peak values of maximum peaks and local contrasts in the historical frame response diagram, and n ₂ is the reference historical frame number; beta and 1-beta are the weights occupied by the response value and the local contrast respectively; the feature similarity score S _j constructed by the depth feature response value and the local contrast feature of the target is used for measuring the similarity between the candidate target j and the real target feature, and the smaller the S _j value is, the closer the candidate target feature value is to the real target;

S5.2, searching a multimodal point in the current frame response diagram by using a maximum filter, and selecting n ₃ points before the multimodal peak as the center point of the candidate target;

S5.3, calculating the local contrast of each candidate target by using the same size of the target frame;

And S5.4, calculating a candidate target point with the minimum corresponding characteristic similarity score S _j,S_j according to the local contrast and the response value of each candidate target as a target point, and obtaining the position of the center point of the current frame target through position transformation.

Compared with the prior art, the invention has the beneficial effects that: according to the infrared space small and medium target tracking method provided by the invention, the depth features are extracted by using the full convolution twin network, and the depth feature response diagram is obtained. And judging the current target tracking state through the average peak correlation energy and the maximum peak change condition of the response diagram. Selecting a maximum peak point of the response diagram to correspond to a target center point when the target is tracked normally; when the background clutter interference of the target is judged, the depth characteristic response value is combined with the local contrast method to eliminate the clutter interference; and when the target is judged to be shielded, performing position prediction by using Kalman filtering. The tracking method provided by the invention can effectively treat the condition that the small and medium infrared targets are interfered by complex background and shielded, has better performance on tracking the small and medium infrared targets, and can meet the real-time requirement of tracking.

Drawings

FIG. 1 is a schematic overall flow diagram of a tracking method of the present invention;

FIG. 2 is a diagram of a full convolution twinning network in accordance with the present invention;

FIG. 3 is a graph of the success rate of the method (Our) of the present invention and other 9 comparison algorithms on an infrared empty small target test set;

FIG. 4 is a graph of the accuracy of the method (Our) of the present invention and other 9 comparison algorithms on an infrared empty small target test set;

FIG. 5 is a graph of the success rate of the method (Our) of the present invention and other 9 comparison algorithms on small and medium target test data in the infrared air with complex background interference properties;

FIG. 6 is a graph of the accuracy of the method (Our) of the present invention and other 9 comparison algorithms on small and medium-sized target test data in the infrared air with complex background interference properties;

FIG. 7 is a graph of the success rate of the method (Our) of the present invention and other 9 comparison algorithms on small and medium-sized target test data in the infrared air with target occlusion properties;

FIG. 8 is a graph of the accuracy of the method (Our) of the present invention and other 9 comparison algorithms on small and medium-sized target test data in the infrared air with target occlusion properties.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all, embodiments of the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention.

The principle of the invention is as follows: according to the infrared space small and medium target tracking method provided by the invention, the depth features are extracted by using the full convolution twin network, and the depth feature response diagram is obtained. And judging the current target tracking state through the average peak correlation energy and the maximum peak change condition of the response diagram. Selecting a maximum peak point of the response diagram to correspond to a target center point when the target is tracked normally; when the background clutter interference of the target is judged, the depth characteristic response value is combined with the local contrast method to eliminate the clutter interference; and when the target is judged to be shielded, performing position prediction by using Kalman filtering. The tracking method provided by the invention can effectively treat the condition that the small and medium infrared targets are interfered by complex background and shielded, has better performance on tracking the small and medium infrared targets, and can meet the real-time requirement of tracking.

The invention discloses a method for tracking small and medium targets in infrared air based on a full convolution twin network, which is shown in a figure 1 and comprises the following steps:

S1, inputting an image sequence into a full convolution twin network, selecting a first frame of the sequence to be marked as a target template z, and providing a region x to be searched by a subsequent frame, wherein the first frame and the subsequent frame respectively pass through a convolution neural network sharing parameters Extracting depth features to obtain a feature mapAnd/>

Specifically, the depth feature extraction network in step S1 is designed on the basis of AlexNet. Five layers of convolution layers are used after the first two convolution layers, the largest pooling layer is used, each convolution layer except the last one uses a ReLU, and the last three layers use packet convolution. Following each linear layer is a batch normalization layer (Batch Normalization), the convolutional layer is not filling operation, and the total step size of the network is 8. In addition, the invention uses the visible light data set (ILSVRC) to train in the off-line training stage through the off-line training feature extraction network, and then trains further on the infrared air small and medium target image sequence, so as to obtain the depth features of the infrared air small and medium targets better. In the training process, a random gradient descent algorithm is used for optimizing network parameters, 50 training rounds are performed, the learning rate is set to be 10 ^-2, the batch size of training images is set to be 8, and the optimal result in the last 10 rounds is obtained.

S2, using a feature map of the target templateFeature map/>, for a region to be searched, for convolution kernelsAnd performing convolution operation to obtain a characteristic response diagram M.

Specifically, in the actual tracking process, as shown in fig. 2, the full convolution twin network structure diagram, the target template and the region to be searched are respectively cut and transformed into 127×127×3 z and 255×255×3 x, and the two are processed by the feature extraction networkAfter that, 6X/>, was obtainedAnd/>Is a feature map of (1). And performing convolution operation on the characteristic diagram with the size of 6 multiplied by 128 as a convolution kernel and the characteristic diagram with the size of 22 multiplied by 128 to obtain a characteristic response diagram with the size of 17 multiplied by 17, and performing bicubic interpolation to obtain a final characteristic response diagram with the size of 272 multiplied by 272.

And S3, evaluating and judging the target tracking state of the current frame by using the average peak correlation energy and the maximum peak of the response diagram of the current frame. If it is determined that the current target tracking is in the normal state, step S4 is executed. If the target is judged to be interfered by the background clutter, executing a step S5; if it is determined that the target is blocked, step S6 is executed.

Specifically, step S3 includes:

S3.1, measuring fluctuation conditions of a response graph by using an average peak correlation energy (AVERAGE PEAK to correlation energy, APCE) index, wherein the APCE is specifically defined as:

Wherein F _max、F_min represents the maximum value and the minimum value in the response diagram respectively; i represents the abscissa of the response map, j represents the ordinate of the response map, and F _i,j is the response value at (i, j) in the response map. Under the normal tracking condition, the response diagram has smaller fluctuation, the visual appearance is in a 'single peak' state, and the APCE value is larger. When the target is interfered by background clutter and is blocked, the response diagram fluctuates violently, and the vision presents a 'multimodal' state, and compared with the APCE value of normal tracking, the APCE value of the target is greatly reduced. The current tracking state can be effectively reflected by analyzing the fluctuation condition of the response diagram and calculating the APCE value of the response diagram.

Judging the cause of tracking failure according to the change state of the maximum peak value of the response diagram of the current frame, and when the target is interfered by background clutter, the maximum peak value of the response diagram of the current frame is suddenly changed compared with the maximum peak value of the response diagram of the previous frame, and the maximum peak value of the response diagram before the current frame is kept stable for a period of time. In the process of slightly shielding the target to completely shielding, the maximum peak value of the corresponding response diagram is gradually reduced, and the maximum peak value of the response diagram is gradually reduced to the minimum value when the target of the current frame is completely shielded. Thus, when the APCE value is small, factors that may cause tracking failure can be classified in conjunction with the changing state of the maximum peak of the response map.

S3.2, the invention defines lambda _APCE andThe ratio of the APCE value of the response graph of the current frame to the corresponding average value of the maximum peak value and the historical frame is used for quantifying the variation degree of the APCE value and the maximum peak value of the current frame, namely:

Wherein APCE _t, F _max-t、APCE_i and F _max-i are the response map APCE value and maximum peak value of the current frame (t frame) and the history frame respectively, n ₁ is the history frame number of reference, n ₁ is determined to be 10 according to experiments, and the values are determined to pass through lambda _APCE and lambda _max-i in the tracking process The values are combined with the information of the historical frame response map to determine the current tracking state.

In the invention, lambda _APCE and lambda _APCE are used in the tracking processThe value is combined with the information of the historical frame response diagram to judge the current tracking state, and the specific implementation steps are as follows:

a. When the change rate lambda _APCE of the current frame response map APCE value is greater than the threshold alpha ₁, the current tracking state is judged to be the normal tracking state, otherwise, step S3.2 is executed to judge other tracking states. Alpha ₁ was determined to be 0.55 based on the experiment.

B. Maximum peak change rate of current frame response mapAnd if the current tracking state is smaller than the threshold value alpha ₂ and the maximum value of the response map of n ₄ frames which are continuous before the current frame is gradually reduced, and the reduction amplitude is larger than alpha ₃ compared with the previous frame, judging that the current tracking state is an occlusion state, and otherwise, judging that the current tracking state is a background clutter interference state. It was experimentally determined that α ₂ was 0.68, α ₃ was 0.08, and n ₄ was 4.

S4, in a normal tracking state, applying a Hamming window suppression boundary effect with the same size to the current t frame response map M _t, and selecting the maximum peak point of the response map as a target point.

Specifically, the maximum peak point in the response diagram is selected as a target point, the offset value of the maximum peak point relative to the center of the response diagram is obtained, and the center position of the current frame target can be obtained by multiplying the offset value by the total step length of the network.

And S5, when the target is interfered by background clutter, forming a candidate point set by using multi-peak points of the current frame response graph, calculating the feature similarity score of each candidate target and the real target of the historical frame by using the depth feature and the local contrast feature, and selecting the candidate point with the highest similarity as the current frame target point.

Specifically, step S5 includes:

S5.1, in the step S5, the local contrast ratio of the target gray average value to the local neighborhood background gray average value is defined as:

where Ω represents a target region, ψ represents a neighborhood of the target region, N is the number of pixels in the region, and I (I, j) is the gray value of the pixel at the original image (I, j).

In step S5, a feature similarity score S _j between each candidate object and the real object of the history frame is constructed by using the depth feature response value and the local contrast feature of the object, and the feature similarity score S _j of each candidate object is defined as:

Wherein D is a candidate target point set, and the number is n ₃; j is the candidate object sequence number, and F _j and C _j are the response value and local contrast of the candidate object j for the current frame. F _max-i and C _max-i are peak values of maximum peaks and local contrasts in the history frame response map, and n ₂ is the history frame number of the reference. Beta and 1-beta are the weights taken up by the response value and local contrast, respectively. According to the experiment, n ₃ is 8, n ₂ is 5, and beta is 0.4. The feature similarity score S _j measures the similarity of the candidate object j to the real object feature, and a smaller value of S _j indicates that the candidate object feature value is closer to the real object.

S5.2, searching for a multimodal point in the current frame response chart by using a maximum filter, and selecting n ₃ points before the multimodal peak as candidate target center points.

S5.3, calculating the local contrast of each candidate target by using the same target frame size.

S6, when the target is shielded, predicting the target position of the current frame in the shielding state according to Kalman filtering constructed by the target position information of the historical frame, and performing normal tracking after judging that the target is out of shielding, and executing the step S4.

Specifically, in the process of target tracking, a Kalman filter is initialized, and then the motion state of a target is predicted and optimally estimated in an iterative process according to a subsequent target tracking result. I.e. the target position of the current frame can be predicted by using kalman filtering from the target position information of the history frame when the target is occluded. In the occlusion process, if the APCE value change rate lambda _APCE of the current frame is larger than the threshold value alpha ₁ and the maximum value change rate lambda _Fmax-t is larger than the threshold value alpha ₄, the target is indicated to be out of occlusion, and normal tracking is performed. Alpha ₁ was determined to be 0.55 and alpha ₄ was determined to be 0.88 based on the experiment.

The S1-S2 is a target depth feature extraction process, S3 is target tracking state judgment, and S4, S5 and S6 are processing methods respectively aiming at normal tracking, target interference caused by background clutter and target shielding, and are combined together to form a complete target tracking process. In the actual tracking process, the whole target tracking is completed by repeating the steps S1-S4/S5/S6.

The tracking effect of the invention (Our) is verified by a simulation experiment, wherein the simulation experiment adopts an infrared image weak and small aircraft target detection tracking dataset under the open ground-air background, and the tracking performance is compared with 9 classical tracking algorithms. The method comprises a stage based on a correlation filtering algorithm, an STRCF, a correlation filtering algorithm ECO-HC integrating depth characteristics, a depth learning algorithm MDNet based on-line fine tuning, and depth learning algorithms SiamFC, siamRPN, daSiamRPN, siamDW and SiamFC ++ based on a twin network. The deep learning-based algorithm adopts the same training set for training.

The simulation experiment results refer to fig. 3 and 4, which are respectively a success rate diagram and an accuracy diagram of the tracking method and other 9 comparison algorithms on the test set. In fig. 3 and fig. 4, the uppermost curves are the tracking success rate curve and the tracking accuracy curve of the present invention, respectively, and it can be seen that the tracking method of the present invention is significantly better than other 9 algorithms in terms of tracking success rate and accuracy, and the tracking success rate and the tracking accuracy are ranked first.

In order to further analyze the performance of the tracking method under the complex background and shielding conditions in detail, the data of the two attributes of the complex background interference and the shielding conditions of the target are selected from the test set, and the tracking method and other 9 comparison algorithms are respectively compared on the two attribute data. In fig. 5 and 6, the uppermost curves are a tracking success rate curve and a tracking precision curve of the tracking method under the condition of complex background interference, and as shown in fig. 5 and 6, the tracking performance of the tracking method is ranked first under the condition of complex background interference; in fig. 7 and 8, the curve at the top is the tracking success rate curve and the tracking accuracy curve of the present invention under the shielding condition, respectively, and as shown in fig. 7 and 8, the tracking performance of the tracking method of the present invention is ranked first under the shielding condition. In addition, the test speed of the tracking method reaches 145 frames/s, and the real-time requirement is met.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The method for tracking the small and medium-sized targets in the infrared air based on the full convolution twin network is characterized by comprising the following steps of:

S6, when the target is shielded, predicting the target position of the current frame in the shielding state according to Kalman filtering constructed by the target position information of the historical frame, and if the target is judged to be separated from shielding, carrying out normal tracking, and executing the step S4; the step S3 includes the steps of,

wherein APCE _t, F _max-t、APCE_i and F _max-i are the response map APCE value and maximum peak value of the current frame and the history frame, n ₁ is the reference history frame number, and the tracking process is carried out by lambda _APCE and F _max-i The value is combined with the information of the historical frame response diagram to judge the current tracking state; the step S5 comprises the following steps:

Wherein D is a candidate target point set, and the number is n ₃; j is a candidate target sequence number, and F _j and C _j are the response value and local contrast of the candidate target j of the current frame; f _max-i and C _max-i are peak values of maximum peaks and local contrasts in the historical frame response diagram, and n ₂ is the reference historical frame number; beta and 1-beta are the weights occupied by the response value and the local contrast respectively; the feature similarity score S _j constructed by using the depth feature response value and the local contrast feature of the target measures the similarity between the candidate target j and the real target feature, and the smaller the S _j value is, the closer the candidate target feature value is to the real target;

2. The method for tracking the small and medium targets in the infrared air based on the full convolution twin network as set forth in claim 1, wherein in step S3.2, lambda _APCE and lambda _APCE are passedThe value is combined with the information of the historical frame response diagram to judge the current tracking state, and the method comprises the following steps: