CN114677554A

CN114677554A - Statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort

Info

Publication number: CN114677554A
Application number: CN202210179351.8A
Authority: CN
Inventors: 杨文�; 舒浩宇; 陈培陪; 顾凯峰; 刘奉奉; 田腾飞
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-06-28

Abstract

The invention relates to a statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort, which comprises the following steps: step S1, acquiring an infrared small target image under a complex background by using an infrared imaging device, and acquiring an infrared small target image data set; step S2, preprocessing the infrared small target image data set to obtain a preprocessed image data set, and dividing the preprocessed image data set into a training set and a verification set; step S3, training a YOLOv5S model in a YOLOv5 algorithm, and acquiring a training set of a Deepsort model; step S4, inputting the training set of the Deepsort model into the Deepsort model for training, and constructing an infrared small target detection and tracking recognizer; and step S5, detecting and tracking the infrared small target in real time by using an infrared small target detection and tracking recognizer. The method can accurately and quickly detect the infrared small target under the complex background, and improves the robustness and the detection rate. Moreover, the invention can achieve the real-time tracking effect.

Description

Statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort

Technical Field

The invention relates to the field of target detection and tracking, in particular to a statistical filtering infrared small target detection and tracking method based on YOLOv5 and Deepsort.

Background

Currently, target detection technology has wide application in various fields. The detection of the infrared small target is always a hotspot in the field of infrared image processing, and the technical research of the detection has great significance in the fields of military early warning, pattern recognition, image processing and the like. However, in an actual scene, the imaging distance of the target in the infrared image is long, and the target generally appears in a point form. And the infrared signal is severely attenuated by air during propagation, causing it to exist generally as a gaussian distributed point target without significant shape and texture information in itself. In addition, the infrared small target in the complex background is often submerged by noise and clutter, resulting in a low signal-to-noise ratio of the infrared image containing the small target. These features present a significant challenge to infrared small target detection techniques.

The traditional target detection algorithm comprises region box selection, feature extraction and classification, and has two problems: firstly, the region selection strategy has no pertinence, and the time complexity is high; secondly, the robustness is poor. With the continuous development of deep learning in the field of computer vision, the deep learning method makes a major breakthrough in the aspect of target detection, and is mainly divided into two types at present: one is a second-order algorithm based on a detection box and a classifier, such as an R-CNN (regional convolutional neural network) series, which has high accuracy, but has a low detection speed due to a complex network structure, and is difficult to meet real-time target detection; another class is regression-based first-order algorithms, such as the YOLO (you see only once) series. The algorithm has high reasoning speed and can meet the requirement of real-time detection. The 5 th generation version YOLOv5 of the YOLO series has a lighter framework and a faster reasoning speed, but still can only identify and detect a target object and cannot track. Furthermore, YOLOv5 is very susceptible to missed detection when environmental conditions are not good.

The Deepsort (depth ordering) tracking algorithm predicts and updates the target by using Kalman filtering, and performs data association matching on a prediction frame and a detection frame of the target track by using the Hungarian algorithm in cascade matching, so that the target can be well tracked. However, the existing detection tracking method combining YOLOv5 and the Deepsort tracking algorithm is directed at large targets, such as pedestrians. Therefore, a detection and tracking method for infrared small targets in a complex background needs to be developed.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort, which can accurately and quickly realize background separation, detect and track infrared small targets under a complex background, and simultaneously can improve robustness and detection rate.

The invention provides a statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort, which comprises the following steps:

step S1, acquiring an infrared small target image under a complex background by using an infrared imaging device, and acquiring an infrared small target image data set;

step S2, preprocessing the infrared small target image data set to obtain a preprocessed image data set, and dividing the preprocessed image data set into a training set and a verification set;

Step S3, training a YOLOv5S model in a YOLOv5 algorithm according to the training set and the verification set, and acquiring a training set of a Deepsort model;

step S4, inputting the training set of the Deepsort model into the Deepsort model for training, acquiring the weight parameters of the Deepsort model, and constructing an infrared small target detection and tracking recognizer;

and step S5, detecting and tracking the infrared small target in real time by using the identifier for detecting and tracking the infrared small target.

Further, the step S1 includes:

step S11, using infrared imaging equipment to take continuous frame shooting samples of one or more aircrafts in different complex backgrounds to obtain an initial image set;

step S12, marking a real target detection frame in each image of the initial image set, and constructing label information of the infrared small target, where the initial image set and the label information of the infrared small target together form an infrared small target image data set.

Further, the method for preprocessing the infrared small target image data set in step S2 includes:

step S21, carrying out data cleaning on the infrared small target image data set to obtain a cleaned image data set;

Step S22, randomly selecting a plurality of frame images from the cleaned image data set, and performing data enhancement on the selected plurality of frame images to obtain an enhanced image data set;

step S23, extracting a background image of each frame of image from the cleaned image data set, and mixing the extracted background images to obtain a mixed background image set;

step S24, noise points contained in each frame of image are extracted from the infrared small target image data set, and the extracted noise points are randomly pasted to the cleaned image data set, the enhanced image data set and the mixed background image set;

and step S25, merging the cleaned image data set, the enhanced image data set and the mixed background image set, to which the noise points are randomly pasted in the step S24, and obtaining a final preprocessed image data set.

Further, the data enhancement method in step 22 is a paste enhancement method or a Mosaic enhancement method.

Further, the step S3 includes:

step S31, initializing YOLOv5S model parameters, including: batch processing amount, iteration times, image resolution, intersection ratio threshold and confidence coefficient threshold;

step S32, inputting the training set into a YOLOv5S model, and outputting a predicted target detection box after each iteration according to initialized parameters of the YOLOv5S model;

Step S33, IOU loss calculation is carried out on the predicted target detection frame after each iteration and the real target detection frame in the label information, and learning weight after each iteration is obtained;

step S34, aiming at the learning weight after each iteration, selecting the learning weight with the minimum test error on the verification set according to the verification set, and taking the learning weight with the minimum test error as the weight parameter of the YOLOv5S model;

and step S35, acquiring a target recognition candidate frame in each frame of image according to the weight parameters of the YOLOv5S model, wherein the target recognition candidate frame in each frame of image forms a training set of the Deepsort model.

Further, the batch throughput is set to 64, the number of iterations is set to 2000, the image resolution is set to 640 x 640, the intersection ratio threshold is set to 0.4, and the confidence threshold is set to 0.6.

Further, the step S4 includes:

step S41, determining the real position of the infrared small target in each frame of image according to the training set of the Deepsort model, and determining the predicted position of the infrared small target in the kth frame of image by Kalman filtering according to the real position of the infrared small target in the kth-1 (k is more than or equal to 2) frame of image;

step S42, performing cascade matching on the predicted position of the infrared small target in the k frame image and the real position of the infrared small target in the k frame image by using a Hungarian algorithm to obtain a result of successful primary matching, a track which is not matched for the first time and a detection frame which is not matched for the first time;

Step S43, IOU matching is carried out on the track which is not matched for the first time and the detection frame which is not matched for the first time in the step S42, and a result which is matched for the second time successfully, the track which is not matched for the second time and the detection frame which is not matched for the second time are obtained;

step S44, updating Kalman filtering parameters according to the result of successful primary matching and the result of successful secondary matching;

step S45, distributing a new track and a new ID for the unmatched detection frame again, and extracting a feature set of the target object in the detection frame through the ReID; meanwhile, whether the track which is not matched again is in a determined state or not is judged, the track which is in the determined state and has the mismatch number less than 30 frames is reserved, and the steps S41-S44 are repeated.

Step S5 includes:

step S51, setting parameters in an identifier for detecting and tracking the infrared small target, loading the weight parameters of the YOLOv5S model to the YOLOv5S model, and loading the weight parameters of the Deepsort model to the Deepsort model;

step S52, acquiring a real-time image of the infrared small target by using infrared imaging equipment, and performing contrast enhancement processing on the acquired real-time image to acquire an enhanced real-time image;

step S53, carrying out pixel average statistics on the first 100 frames of the enhanced real-time image to obtain blind spots in the real-time image, and storing the position of each blind spot into a blind spot list;

Step S54, inputting the enhanced real-time image into a YOLOV5S model, and generating a real-time target candidate frame set with the confidence coefficient more than 0.01;

and step S55, inputting the real-time target candidate frame set into a Deepsort model, acquiring the filtered real-time target candidate frame set, eliminating target candidate frames containing the blind spot list in the filtered real-time target candidate frame set, generating a final target detection frame, and realizing real-time tracking of the infrared small target.

Further, the method for performing contrast enhancement processing on the acquired real-time image in step S52 includes: and acquiring the histogram distribution of each frame of image, and changing the histogram distribution of each frame of image into an approximately uniform distribution histogram.

According to the method, a YOLOv5 frame for target detection and a Deepsort frame for target tracking are combined, preprocessing such as data cleaning and enhancing and post-processing of background removal are designed aiming at the complex background of the infrared small target, so that interference points and blind element points in the background can be overcome, the infrared small target under the complex background can be accurately and quickly detected, and the robustness and the detection rate are improved. Moreover, the invention can achieve the real-time tracking effect.

Drawings

Fig. 1 is a flow chart of a statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort according to the invention.

Fig. 2 is a schematic diagram of label information of a small labeled infrared target.

Fig. 3 is a flowchart of step S4 in fig. 1.

Fig. 4(a) -4 (h) are graphs of the performance indicators described above with a confidence threshold of 0.001.

FIG. 5(a) is a graph of a P-R plot over a test set of an identification detection model of the present invention; FIG. 5(b) is a graph of the variation of the F1 score at different confidence thresholds; FIG. 5(c) is a graph of the corresponding rate of accuracy change for different confidence thresholds; FIG. 5(d) is a graph of the change in recall for different confidence thresholds.

Fig. 6(a) -6 (c) are final tracking prediction effect diagrams.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort, which includes the following steps:

and step S1, acquiring an infrared small target image under a complex background by using an infrared imaging device, and acquiring an infrared small target image data set. The complex background refers to the background of trees, cloudless sky, buildings, continuously changing weather, complex cloud layers, sea surfaces, sea and air.

Specifically, step S1 includes:

and step S11, using an infrared imaging device to carry out continuous frame shooting sampling on one or more aircrafts in different complex backgrounds to obtain an initial image set. The sampling frame rate may be 50ms, 100ms, or other suitable frame rate. In the shooting process of the infrared imaging device, a blurred image or a missing image can be acquired due to the influences of sensor delay, noise interference and the like. For these blurred images or missing images, a resampling process is required to obtain a clear and complete initial image set.

Step S12, marking a real target detection frame in each image of the initial image set, and constructing tag information of the infrared small target, where the initial image set and the tag information of the infrared small target together form an infrared small target image data set.

The method for labeling the real target detection frame is manual labeling, for example, labeling with a labeling tool, and obtaining the coordinates (x, y) of the top left corner point of the target detection frame, the length h of the target detection frame, and the width w of the target detection frame, as shown in fig. 2. The marked x, y, h and w form the label information of the infrared small target.

Human error factors can exist due to manual labeling, such as severe deviation from a real target or slight error due to misoperation. Therefore, in order to reduce the error of artificial behavior as much as possible, enhance the information intensity of the data set, and improve the training precision of the model, it is necessary to use visualization operation to determine whether the annotation is correct. The judgment process is as follows: converting an xml file format generated by labeling into a txt file format, and storing data information in the txt file into a list; reading image information in the initial image set by using an opencv tool; and comparing the data information stored in the list with the read image information, and judging whether the target detection frame is well overlapped with the area where the infrared small target is located, so as to obtain the accuracy of the label information. When the overlapping area of the target detection frame and the area where the infrared small target is located occupies more than 90% of the total area of the two areas, the target detection frame and the area where the infrared small target is located are well overlapped, and the proportion of the overlapping area in the total area is the accuracy of the label information. If the accuracy of the label information is lower than 90%, the labeling information needs to be manually reset, and the accuracy is recalculated.

And step S2, preprocessing the infrared small target image data set to obtain a preprocessed image data set, and dividing the preprocessed image data set into a training set and a verification set.

The method for preprocessing the infrared small target image data set comprises the following steps:

and step S21, performing data cleaning on the infrared small target image data set to obtain a cleaned image data set.

Data cleansing refers to the processing of default and outliers using statistical weighted averaging or machine learning. During image acquisition, part of the image may be abnormal and missing due to noise interference, and thus a cleaning process is required. In this example, data cleaning was performed using a statistical weighted average method: detecting whether a certain frame of image is missing or abnormal in the infrared small target image data set, and if the certain frame of image is missing, performing average filling by using the front frame image and the rear frame image of the frame of image; if some frame image is abnormal (namely, the frame image is in discontinuous association with two frames of images before and after the frame image), deleting the frame image, and performing average filling by using the two frames of images before and after the frame image.

And step S22, randomly selecting a plurality of frame images from the cleaned image data set, and performing data enhancement on the selected plurality of frame images to obtain an enhanced image data set.

Data enhancement refers to expanding an additional data set by carrying out operations of random turning, scaling, cutting, rotating, pasting, splicing and the like on original data. In this embodiment, a paste enhancement method or a Mosaic enhancement method is used to perform data enhancement on the selected frames of images. The sticking enhancement method comprises the following steps: after the image is subjected to large-scale scaling with the proportion of 0.1-2 times, the infrared small targets in one frame of image are pasted into the other frame of image at random, so that the number of the infrared small targets is increased, and the detection is enhanced from the perspective of the number of the infrared small targets. The Mosaic enhancing method comprises the following steps: 4 frames of pictures are randomly adopted and spliced in a random zooming, cutting and sequencing mode, so that the infrared small targets are distributed uniformly, and a plurality of infrared small targets are arranged in one frame of image, and the detection is enhanced from the distribution and quantity angles of the infrared small targets.

And step S23, extracting the background image of each frame of image from the cleaned image data set, and mixing the extracted background images to obtain a mixed background image set. The image set of the mixed background is obtained, the recognition capability of the model to the complex background can be enhanced, and the false detection rate of the background object as the target object is reduced. The method for extracting the background image comprises the following steps: and corroding the target object and the noise by adopting an image expansion corrosion method, wherein the reserved image is the background image.

And step S24, noise contained in each frame of image is extracted from the infrared small target image data set, and the extracted noise is randomly pasted to the cleaned image data set, the enhanced image data set and the mixed background image set so as to expand the data distribution of the noise. Wherein, the noise point is cut and extracted by adopting a manual method.

And step S25, merging the cleaned image data set, the enhanced image data set and the mixed background image set randomly pasted with the noise points in the step S24 to obtain a final preprocessed image data set.

After obtaining the pre-processed image data set, the pre-processed image data set is processed in a random sampling mode according to the following steps of 8: scale of 2 is divided into training and validation sets. Namely 0.8 part of training set and 0.2 part of training set

Is a validation set.

And step S3, training a YOLOv5S model in the YOLOv5 algorithm according to the training set and the verification set divided in the step S2, and acquiring a training set of the Deepsort model. The YOLOv5s model is adopted by the invention to realize the detection effect more quickly.

The YOLOv5s model can be regarded as a two-stage target detection algorithm, which comprises a reference network layer based on ResNet, a Neck network layer adopting an FPN + PAN structure and used for outputting a target detection result, an output layer and an output end after non-maximum suppression processing. Wherein the reference network layer outputs a feature mapping matrix. The Neck network layer adopts the FPN + PAN structure to improve the diversity and robustness of the features and strengthen the fusion capability of the network features. The FPN represents a characteristic pyramid network, and the strong semantic characteristics (namely the shape of a target object) of the image are extracted by utilizing the up-sampling from top to bottom; PAN represents a pixel aggregation network, and the strong localization features (i.e., the position of the object) of the image are extracted by using a bottom-up network. The FPN is fused with the PAN, and aggregation of shape and position characteristics can be achieved. And the output layer adopts GIoU _ Loss as a Loss function of the Bounding box and outputs a target detection result. After the target detection result is obtained, post-processing is performed, and non-maximum suppression is adopted to eliminate a plurality of frames on the same target and stacked output bounding frames.

Specifically, step S3 includes:

step S31, initializing YOLOv5S model parameters, including: batch size (batch size), number of iterations (epoch), image resolution, cross-over ratio (IOU) threshold, and confidence threshold.

Because the image set is large, all images cannot be directly input together during model training, otherwise dimension explosion can be caused, and therefore a batch method is adopted, and 64 frames of images are output each time. The YOLOv5s model network includes two processes of forward propagation and backward propagation, and when all images have undergone one forward and backward propagation, the process is called iteration 1, and the iteration is to perform gradient optimization and find the weight for converging the loss function. In this embodiment, the blocksize is set to 64, the epoch is set to 2000, the image resolution is set to 640 × 640, the merge ratio threshold is set to 0.4, and the confidence threshold is set to 0.6.

And step S32, inputting the training set divided in the step S2 into a YOLOv5S model, and outputting a predicted target detection frame after each iteration according to initialized parameters of the YOLOv5S model. In the process of outputting the target detection frame at the output end of the YOLOv5s model, the detected target detection frame needs to be filtered according to the confidence, the target detection frame with the confidence smaller than the set confidence threshold of 0.6 is deleted, and the reserved target detection frame is the predicted target detection frame.

And step S33, IOU loss calculation is carried out on the predicted target detection frame after each iteration and the real target detection frame in the label information, and the learning weight after each iteration is obtained.

Step S34, for the learning weight after each iteration, according to the verification set divided in step S2, selecting the learning weight with the minimum test error (i.e., the highest accuracy) on the verification set, and using the learning weight corresponding to the minimum test error as the weight parameter of the YOLOv5S model. The method for acquiring the test error specifically comprises the following steps: and inputting the verification set into a YOLOv5s model with the learning weight obtained by the current iteration as a weight parameter to obtain a prediction box comprising labeling information and a confidence coefficient indicating whether a target object exists, and calculating the loss of the cross entropy of the IOU and the classification to obtain a test error.

And step S35, detecting the target object according to the weight parameters of the YOLOv5S model, and acquiring a target recognition candidate frame in each frame of image, wherein the target recognition candidate frame in each frame of image forms a training set of the Deepsort model. Specifically, the training set partitioned in step S2 is input to the YOLOv5S model with determined weight parameters again, and a target recognition candidate box is obtained according to the set confidence threshold. For example, if there are two targets, the corresponding real target frames should be two, but because the image data set is enhanced as described above, there may be 10, 15 or other number of anchor frames output by the YOLOv5s model, and these anchor frames with a larger number are target recognition candidate frames. However, the number of the anchor frames is greater than 2, which indicates that false detection exists or missing detection exists, so that the anchor frames need to be used as a training set of the Deepsort model and input to the Deepsort model for further processing.

Step S4 is carried out, the training set of the Deepsort model is input into the Deepsort model for training, the weight parameters of the Deepsort model are obtained, and the recognizer for detecting and tracking the infrared small target is constructed.

Before inputting the training set of the Deepsort model into the Deepsort model for training, the network structure of the Deepsort model needs to be modified into a network structure capable of processing small target images with pixel widths of about 1-20, or the target recognition candidate frame needs to be preprocessed to a size which can be correctly distinguished by the Deepsort model.

And training by adopting a Deepsort model, aiming at filtering false detection frames and supplementing missed detection frames. Specifically, as shown in fig. 3, step S4 includes:

and step S41, determining the real position of the infrared small target in each frame of image according to the training set of the Deepsort model, and determining the predicted position of the infrared small target in the kth frame of image by adopting Kalman filtering according to the real position of the infrared small target in the kth frame of image (k is more than or equal to 2). Note that the actual position referred to in this step is a position framed by the target recognition candidate frame in each frame image, and is different from the position framed by the actual target detection frame in step S12. The true position is a determined value used to distinguish between the determined value and a predicted value subsequently using kalman filtering.

The Kalman filtering is used for predicting the track and filtering false detection frames according to the predicted track information. For example, if the predicted state information at the next time is [2, 3, 4, 5], the detection information [10,11,12,13] in the candidate frame is a false detection frame, and such a detection frame needs to be filtered out. That is, a coarse filtering is performed according to the confidence threshold, and a fine filtering is performed according to the kalman filtering.

Kalman filtering consists mainly of eight system states of the anchor frame: x ═ X, y, r, where X and y represent the abscissa and ordinate, respectively, of the center point of the anchor frame in the image, h represents the height of the anchor frame, and r represents the ratio of the height to the width of the anchor frame; and x, y and r are respectively derivatives of x and y in the current state and represent the state information of the object in the previous frame of image. The kalman filtering algorithm is to predict the state information at the next moment according to the current state information. The kalman filter formula is:

x'＝Ax

P'＝APA^T+Q

y＝z-Hx'

K＝P'H^T(HP'H^T+R)^-1

P＝(I-KH)P'

wherein, A is a system matrix,

p is a covariance matrix used for measuring the error between the predicted value and the true value; q is the process noise that is carried when a state transition occurs; k is a gain matrix to minimize P; r is measurement noise existing when the state is measured, and is generally Gaussian white noise; i is unit moment Matrix, H is the measurement matrix.

And step S42, performing cascade matching on the predicted position of the infrared small target in the k frame image and the real position of the infrared small target in the k frame image by using a Hungarian algorithm to obtain a result of successful primary matching, a track of primary unmatched objects and a detection box of the primary unmatched objects. Specifically, a cost matrix based on cosine distance is calculated, a detection frame with an overlarge cosine matrix is deleted, the Mahalanobis distance between the Kalman prediction track and the actual detection frame is used as the cost matrix by the Hungary algorithm, the track is matched with the detection frame, and a matching result is returned. It should be noted that, since the hungarian algorithm has a time delay, and in order to overcome the interference, the cascade matching of step S42 is performed between the loop detection of 30 frames, and if the matching is still unsuccessful after 30 frames, the matching is discarded.

And step S43, performing IOU matching on the track which is not matched for the first time and the detection frame which is not matched for the first time in the step S42 to obtain a result of successful matching for the second time, the track which is not matched for the second time and the detection frame which is not matched for the second time. Specifically, for a track matched with only one frame, the IOU distance between the track and an unmatched detection frame is calculated, the detection frame with the overlarge IOU distance is deleted, the Hungarian algorithm is used for taking the IOU distance between the Kalman prediction track and the actual detection frame as a cost matrix, the track is matched with the detection frame, and a matching result is returned. It should be noted that the IOU matching in step S43 is performed between each frame of image, that is, if the matching is not successful at the current time due to interference or the like, the content of the current frame will not be matched again in the next frame.

And step S44, updating the parameters of Kalman filtering according to the result of successful primary matching and the result of successful secondary matching. Specifically, the above Kalman filtering formula is adopted to update the covariance and mean value in Kalman filtering parameters, P is the covariance,

is a predicted value of the trajectory, which is expected

Which is the mean value.

Step S45, assigning a new track and a new ID to the detection box which is not matched again, and passing throughReIDExtracting a feature set of a target object in the detection frame; meanwhile, whether the track which is not matched again is in a determined state or not is judged, the track which is in the determined state and has the mismatch number less than 30 frames is reserved, and the steps S41-S44 are repeated. Specifically, if the detection frame does not match the existing tracking track, the tracking track of the detection frame is initialized (initialized by using the cosine similarity metric matrix, and the default value is 0). The initialized state of the new track is an undetermined state, and the undetermined state can be converted into a determined state only if the condition that three continuous frames are successfully matched is met. If the locus of the determined state must be continuously mismatched with the target more than 30 times, the locus is deleted.

The Deepsort model matches a moving target by using a Hungarian algorithm, identifies a target identifier by using a depth recognition network, tracks the target according to the target identifier, stores the target identifier information, and can be identified again according to the target identifier after the target is lost for a period of time and reappeared. The Hungarian algorithm is a method for solving the correlation between the detection result and the tracking prediction result, and the Mahalanobis distance between the Kalman prediction result of the motion state of the existing motion target and the detection result is used for correlating the operation information.

The mahalanobis distance formula is as follows:

d⁽¹⁾(i,j)＝(d_i-y_i)^TS_i ^-1(d_i-y_i)

d_iindicates the ith detection position, y_iIndicating the predicted position of the ith tracker to the target, S_iRepresenting a covariance matrix between the detected position and the average tracked position;

mahalanobis distance takes into account uncertainty in the state measurement by calculating the standard deviation between the detected position and the average tracked position. If the Mahalanobis distance associated with a certain time is less than a specified threshold t⁽¹⁾Then, the motion state association is set to be successful, and the function used is:

a value of 1 indicates successful association, t⁽¹⁾9.4877 is taken.

And step S5, detecting and tracking the infrared small target in real time by using an infrared small target detection and tracking recognizer.

Specifically, step S5 includes:

step S51, setting parameters in the recognizer for detecting and tracking the infrared small target, loading the weight parameters of the YOLOv5S model to the YOLOv5S model, and loading the weight parameters of the Deepsort model to obtain a serial model of the YOLOv5S model and the Deepsort model. The confidence threshold of the YOLOv5s model is set to 0.01, and the intersection ratio is set to 0.4.

And step S52, acquiring a real-time image of the infrared small target by using the infrared imaging equipment, and performing contrast enhancement processing on the acquired real-time image to acquire an enhanced real-time image so as to reduce the difficulty of feature identification of the infrared small target.

The method for carrying out contrast enhancement processing on the acquired real-time image comprises the following steps: and acquiring the histogram distribution of each frame of image, and changing the histogram distribution of each frame of image into an approximately uniform distribution histogram to enhance the contrast of the image.

The method of changing the histogram distribution to an approximately uniform distribution histogram is: performing monotonous nonlinear mapping transformation f, namely D on each pixel point of the original image_B＝f(D_A) And keeping the total number of the pixel points unchanged as follows:

wherein H_A(D) Is a histogram distribution of the original image, H_B(D) To be uniformly distributed, taking

A₀Taking 256 as the number of pixels and L as the depth of gray level.

Step S53, performing pixel average statistics on the first 100 frames of the enhanced real-time image, obtaining blind spots in the real-time image, and storing the position of each blind spot in a blind spot list. Specifically, the blind spot acquisition method includes: averaging the pixel gray levels of the previous 100 frames of images at the same point, intercepting 0.7-0.9 times of the average value of the blind pixels as a threshold value as the average value of the blind pixels is far higher than that of the normal pixels, and regarding the pixels exceeding the threshold value as blind points.

And step S54, inputting the enhanced real-time image into a YOLOV5S model, and generating a real-time target candidate frame set with the confidence coefficient of more than 0.01.

The infrared small target is detected and tracked in real time, and the accuracy, the recall rate, the F1 score, the mAP and the like can be calculated on the verification set. Fig. 4(a) -4 (h) are graphs of the performance indicators described above with a confidence threshold of 0.001, and it can be seen that at a given threshold of 0.001, an accuracy of approximately 0.94 is achieved, and a recall of 0.75 is achieved, and fig. 4(a) shows the error generated by the label box and the prediction box. The reason for the jump is that it breaks during training and adjusts the label. The recall rate is low because part of the training set given by the picture is blocked by the target background or is inundated by background light, and the situations need to be processed by a prediction algorithm and a separation algorithm of the target background and the target.

FIG. 5(a) is a graph of the P-R curve over a test set of the identification detection model of the present invention. The P-R curve can visually reflect the performance of the model, in the P-R curve, the accuracy is used as a vertical coordinate, the recall rate is used as a horizontal coordinate, and the larger the area below the PR curve is, the better the performance of the model is. From the figure, the performance of the model is better.

Fig. 5(b) is a graph of the variation of F1 scores at different confidence thresholds. Since the accuracy and the recall rate both reflect the quality of the model, but the accuracy and the recall rate need to be measured at the same time, an F1 score curve is adopted, wherein an F1 score is a harmonic mean of the accuracy and the recall rate, and when the confidence threshold is smaller, the F1 performance index is increased along with the increase of the confidence threshold; the F1 performance metric then decreases as the confidence threshold increases.

Fig. 5(c) is a graph of the corresponding accuracy rate change for different confidence thresholds. The prediction accuracy converged to substantially 99% with a confidence of 0.1.

FIG. 5(d) is a graph of the change in recall for different confidence thresholds. As confidence increases, recall rates continue to decline.

The final tracking prediction effect is as shown in fig. 6(a) -6 (c), there are single target and multiple target cases respectively, and the number on the anchor frame is the id number of the target object for tracking record.

In addition, the confidence threshold change is shown in table 1:

TABLE 1 confidence change comparison Table

Confidence threshold	Rate of accuracy	Recall rate	mAP@0.5	mAP@[0.5:0.95]
					0.001	0.937	0.756	0.786	0.635
0.01	0.937	0.756	0.771	0.625
					0.1	0.998	0.5	0.505	0.434

The invention combines a YOLOv5s model for target detection and a Deepsort model for target tracking, designs preprocessing such as data cleaning and enhancement and post-processing of background removal aiming at the complex background under the infrared small target scene, and can accurately and quickly detect and track the infrared small target under the complex background.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and various modifications may be made to the above-described embodiment of the present invention. All simple and equivalent changes and modifications made according to the claims and the content of the specification of the present invention are within the scope of the claims of the present invention. The invention has not been described in detail in order to avoid obscuring the invention.

Claims

1. A statistical filtering infrared small target detection tracking method based on YOLOv5 and Deepsort is characterized by comprising the following steps:

And step S5, detecting and tracking the infrared small target in real time by using the recognizer for detecting and tracking the infrared small target.

2. The method for detecting and tracking the small infrared target based on the statistical filtering of YOLOv5 and depsort of claim 1, wherein the step S1 comprises:

step S11, using infrared imaging equipment to take continuous frame shooting samples of one or more aircrafts under different complex backgrounds to obtain an initial image set;

3. The method for detecting and tracking statistical filtering infrared small targets based on YOLOv5 and Deepsort according to claim 1, wherein the method for preprocessing the infrared small target image data set in step S2 comprises:

4. The method for detecting and tracking the small statistical filtering infrared target based on YOLOv5 and Deepsort of claim 3, wherein the data enhancement method in step 22 is a paste enhancement method or a Mosaic enhancement method.

5. The method for detecting and tracking the small infrared target based on the statistical filtering of YOLOv5 and depsort of claim 1, wherein the step S3 comprises:

step S33, IOU loss calculation is carried out on the predicted target detection frame after each iteration and the real target detection frame in the label information, and the learning weight after each iteration is obtained;

6. The YOLOv5 and Deepsort based statistical filtering infrared small target detection and tracking method as claimed in claim 5, wherein the batch processing amount is set to 64, the number of iterations is set to 2000, the image resolution is set to 640 x 640, the cross-over ratio threshold is set to 0.4, and the confidence threshold is set to 0.6.

7. The method for detecting and tracking the small statistical filtering infrared target based on YOLOv5 and Deepsort of claim 1, wherein the step S4 comprises:

Step S41, determining the real position of the infrared small target in each frame of image according to the training set of the Deepsort model, and determining the predicted position of the infrared small target in the kth frame of image by adopting Kalman filtering according to the real position of the infrared small target in the kth frame of image (k is more than or equal to 2);

step S42, cascade matching is carried out on the predicted position of the infrared small target in the k frame image and the real position of the infrared small target in the k frame image by adopting a Hungarian algorithm, and a result of successful primary matching, a track of primary unmatched objects and a detection frame of primary unmatched objects are obtained;

step S45, assigning a new track and a new ID to the detection box which is not matched again, and passingReIDExtracting a feature set of a target object in the detection frame; meanwhile, whether the track which is not matched again is in a determined state or not is judged, the track which is in the determined state and has the mismatch number less than 30 frames is reserved, and the steps S41-S44 are repeated.

8. The method for detecting and tracking the small infrared target based on the statistical filtering of YOLOv5 and depsort of claim 1, wherein the step S5 comprises:

step S51, setting parameters in an identifier for detecting and tracking the infrared small target, loading the weight parameters of a YOLOv5S model to a YOLOv5S model, and loading the weight parameters of the Deepsort model to the Deepsort model;

step S53, carrying out pixel average statistics on the first 100 frames of the enhanced real-time image to obtain blind points in the real-time image, and storing the position of each blind point into a blind point list;

9. The method for detecting and tracking the small infrared target based on the statistical filtering of YOLOv5 and depsort of claim 7, wherein the method for performing the contrast enhancement processing on the acquired real-time image in the step S52 comprises: and acquiring the histogram distribution of each frame of image, and changing the histogram distribution of each frame of image into an approximately uniform distribution histogram.