CN112380997A

CN112380997A - Model identification and undercarriage retraction and extension detection method based on deep learning

Info

Publication number: CN112380997A
Application number: CN202011277840.4A
Authority: CN
Inventors: 陈海峰; 朱学伟; 刘青; 贾昆
Original assignee: Wuhan Joho Technology Co ltd
Current assignee: Wuhan Joho Technology Co ltd
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2021-02-19

Abstract

The invention relates to the technical field of safe flight, in particular to a method for identifying a machine type and detecting retraction and extension of an undercarriage based on deep learning, wherein a YOLOv3 target tracking thread and a KCF target tracking thread are respectively designed, and after the YOLOv3 target tracking thread detects that the airplane or the undercarriage is retracted and extended, the detected category information and retraction information are sent to the KCF target tracking thread; carrying out target position detection on target position information detected by a YOLOv3 target tracking thread by using a KCF target tracking thread, calculating responses among samples, finding out a detection frame with the maximum response value as a target frame, and acquiring confidence information of the detection frame; and performing fusion comparison on the two data, and outputting the mean value of the position information and the confidence degrees of the two threads if the calculated position difference is within a set threshold value. The method reduces false detection of the YOLOv3 algorithm caused by sudden environmental change by exerting good tracking performance of KCF, and overcomes the defect that the YOLOv3 algorithm depends on training samples too much.

Description

Model identification and undercarriage retraction and extension detection method based on deep learning

Technical Field

The invention relates to the technical field of safe flight, in particular to a model identification and undercarriage retraction detection method based on deep learning.

Background

The target tracking based on detection is a common target tracking method, and the tracking of a video sequence can be completed by carrying out target detection and identification on each frame of image. Laser and infrared 1997, 03, discloses an artificial intelligent monitoring system for landing gear retraction, which is characterized in that video signals are pre-processed and fed into an artificial neural network together with distance signals acquired by a laser range finder through acquisition. The automatic identification system for the retraction and the release of the aircraft landing gear, which is disclosed by the patent number 201610554460.8, can realize full-time high-definition binocular observation of an aircraft landing area under the multispectral condition. The patent No. 201811313628.1 discloses a method for detecting the retractable state of landing gear of a multi-model airplane with a ground-based view angle, which automatically determines the retractable state of the landing gear of the airplane through feature analysis and multi-frame comprehensive decision processing.

The comparison document discloses a technical scheme of identifying a target image by an artificial neural network. In addition, the YOLOv3 algorithm based on deep learning in the prior art performs well in the aspect of target detection, but the YOLOv3 has high requirements on early-stage training samples, and if once a shot target and a background image are not contained in the training samples, the YOLOv3 cannot detect the target, so that tracking failure is caused. And the target tracking algorithm is influenced by adverse effects of illumination, deformation and the like, so that the accuracy is reduced, and for this reason, a model identification and undercarriage retraction detection method based on deep learning is provided.

Disclosure of Invention

The invention aims to provide a model identification and undercarriage retraction detection method based on deep learning, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: the model identification and undercarriage retraction detection method based on deep learning comprises the following steps:

s1, respectively designing a YOLOv3 target tracking thread and a KCF target tracking thread, and after detecting that the airplane or the landing gear is folded and unfolded by the YOLOv3 target tracking thread, sending the detected category information and folding and unfolding information to the KCF target tracking thread;

s2, carrying out target position detection on target position information detected by a YOLOv3 target tracking thread by utilizing a KCF target tracking thread, calculating responses among samples, finding out a detection frame with the maximum response value as a target frame, and acquiring confidence information of the detection frame;

and S3, performing fusion comparison on information acquired by the KCF target tracking thread and a detection result of the YOLOv3 target tracking thread, outputting a mean value of the position information and confidence degrees of the two threads if the calculated position difference is within a set threshold value, and not outputting the information and updating the KCF template if the result comparison difference is large.

Preferably, the network building and model training process of YOLOv3 is as follows:

s11, zooming image and image segmentation: an input image is firstly divided into S multiplied by S grids with equal size and then processed in two aspects;

s12, boundary box prediction: in this step, YOLO gives two prediction frames for each grid, where the given prediction frames are self-defined based on the size of the center point of the grid, and each grid predicts B bounding frames, each bounding frame has four coordinates and a confidence, so the final prediction result is S × S (B × 5+ C) vectors, where S is the number of divided grids, B is the number of targets in charge of each grid, and C is the number of categories;

s13, prediction of class probability graph: the responsible is the classification of the grid, and the predicted result is put in the final result of S (B) S (5 + C);

s14, passing the image through a full convolution neural network: darknet-53 multi-scale classification model with four convolutional layers and two fully connected networks;

s15, setting a loss function as the square sum of the frame coordinate error, the IOU error and the category error;

s16, obtaining an optimal frame through a non-maximum suppression algorithm to be used as regression;

and S17, correcting the network parameters through multiple iterations.

Preferably, the target detection algorithm flow of the detector of the KCF target tracking thread is as follows:

s21, inputting a video and extracting a single frame;

s22, judging whether the image is a first frame image or not, if so, initializing the position of a target rectangular frame, constructing a training sample through a cyclic matrix according to the target position, and if not, constructing a detection sample at the target position in a cyclic displacement mode;

s23, extracting HOG features of the image at the position of the search rectangular frame, converting training of the sample into a ridge regression problem through Fourier transform, performing discrete Fourier transform, calculating weight coefficients of the training sample, updating parameters, judging whether video input exists or not, if so, circularly executing the step S21, otherwise, completing the target detection process.

Preferably, the parameter updating process of step S23 is: firstly, extracting HOG characteristics from a detection sample, and performing Fourier transform; secondly, calculating a cross-correlation matrix of the detection samples; then, calculating a response value of the detection sample, taking the detection sample as a confidence coefficient, and updating the position information; finally, judging whether the response value of the detection sample is greater than 0.75, and if so, extracting the HOG characteristics of the image at the position of the search rectangular frame; otherwise, no parameter update is performed.

Preferably, the YOLOv3 is accelerated by an accelerator for the neural network response phase in the response phase, so as to shorten the response time.

Preferably, the model generated by the YOLOv3 algorithm needs to be subjected to a branch reduction process, and the branch reduction process is as follows: training, pruning, fine-tuning the pruned model, and performing in a circulating manner; in the branch reducing process, a scaling factor gamma in batch standardization is used as an importance factor, namely, the smaller the gamma is, the less important the corresponding network layer is, and the network layer can be cut; to constrain the magnitude of γ, a regularization term for λ is added to the objective equation to achieve automatic pruning during training.

Compared with the prior art, the invention has the beneficial effects that: constructing a deep learning model for model identification and undercarriage retraction identification on the basis of a YOLOv3 detection algorithm by adopting a deep learning method; aiming at the defect that the YOLOv3 algorithm is excessively dependent on a training sample, a model landing gear detection system based on KCF and YOLOv3 algorithm is adopted, and the good tracking performance of KCF is exerted, so that the false detection of the YOLOv3 algorithm caused by sudden environmental changes is reduced; meanwhile, the YOLOv3 model is subjected to branch reduction and compression, and the response speed in engineering application is increased. The method is verified by early tests, the detection speed can be improved by 50% under the same hardware condition, compared with a conventional deep learning target identification detection algorithm, the precision ratio can be improved by about 4%, and the mAP of YOLOv2 is improved by about 2%.

Drawings

FIG. 1 is a flow chart of a detection algorithm based on KCF and YOLOv3 in accordance with the present invention;

FIG. 2 is a plot of LOSS, IOU and Batch and iteration times for this training;

FIG. 3 is a bar graph of the effect of the selection of the subject λ on γ;

FIG. 4 is a line graph of pruning proportion versus accuracy for the present invention;

FIG. 5 is a R-P graph of the model of the invention and the landing gear.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a technical scheme that: the model identification and undercarriage retraction detection method based on deep learning comprises the following steps: firstly, a YOLOv3 detector is designed, and whether the object is the same as the existing KCF detector is judged every time the object is detected, or a new KCF detector is generated to track the object. The specific method comprises the following steps: respectively designing a YOLOv3 target tracking thread and a KCF target tracking thread, sending detected category information and retraction information to the KCF target tracking thread after the retraction of an airplane or a landing gear is detected by YOLOv3, then carrying out target position detection on the target position information detected by YOLOv3 by using the KCF target tracking thread, calculating the response between samples, finding out a detection frame with the maximum response value as a target frame, acquiring confidence information of the target frame, finally carrying out fusion comparison with the detection result of the YOLOv3 thread, and outputting the mean value of the position information and the confidence values of the two threads if the calculated position difference is within a certain threshold value. And if the comparison difference of the results is larger, not outputting the information and updating the KCF template.

According to the scheme, the detection of the targets such as the five types of machines, the landing gear of the five types of machines and the like is carried out on the premise of improving the YOLOv3 model, compared with the traditional target detection method, the method has the advantages in the detection and recognition level of a single-frame image, and higher detection accuracy and higher detection speed can be obtained. The KCF target tracking algorithm is that a target detector of a current frame is trained by a target and a target track of the current frame in the tracking process through an on-line target detector training method, then the target at the motion track position predicted by a tracker is detected by the detector in the next frame, whether the target is a target to be detected or not is judged, and the original detector is updated according to the detection result. The targets within the trace are typically recorded as positive samples and the remaining environments as negative samples. However, the KCF algorithm still has shortcomings in the aspects of scale transformation, feature extraction, target loss, and the like.

Aiming at the defects of the existing method, the KCF algorithm and the YOLO algorithm are combined in the scheme, so that the adverse effects of illumination, deformation and the like on the target tracking algorithm are overcome, and the accuracy, robustness and adaptability of the target tracking algorithm are improved. The specific flow is shown in figure 1.

Implementation flow of YOLOv3 detection method

Data preparation and environment construction scheme

In order to realize the identification of the two types of targets, the convolutional neural network is adopted to construct an identification model, so that a large number of airplanes (visible light and infrared images) and undercarriage books are needed to complete data acquisition and marking of 20000 pieces of target objects of 5 types as training data sets. And marking the airplane and the undercarriage in the image in sequence by adopting a calibration software Labelimg to obtain a calibration file.

A VOC data set is made from a large number of samples collected. The training hardware platform of the experiment in the scheme is 9700K + Titan X + CUDA + CUDNN; the test hardware platform is an Nvidia Jetson series development board; the software environment is Ubuntu16.04+ opencv2.4.9+ python 3. All the environments are completed on the basis of supporting CUDA9.0 and CUDNN7.0, a DarkNet framework is downloaded and built, the network structure design of YOLOv3 is completed, and a training file and a training environment are configured.

Network building and model training scheme

The network building and model training algorithm flow adopted by the scheme is as follows:

zooming an image and segmenting the image: an input image is first divided into S × S equal-sized grids, which are subsequently processed in two ways.

Boundary box prediction: in this step, YOLO gives two prediction boxes for each grid. The prediction box is given based on the center point of the grid, and the size is self-defined. Each grid predicts B bounding boxes, each bounding box has four coordinates and a confidence, so the final prediction result is S X S (B X5 + C) vectors, wherein S is the number of divided grids, B is the number of targets in charge of each grid, and C is the number of categories.

Predicting the class probability map: the responsibility is for the classification of the grid, and the predicted results are likewise placed in the final results of S x S (B x 5+ C). The expression means: each cell corresponds to B bounding boxes, the width and height range of the bounding boxes is a full graph, and the position of the bounding box of the object is found by taking the cell as the center. Each bounding box corresponds to a score which represents whether an object exists at the position and the positioning accuracy:

each cell corresponds to C probability values, the Class P (Class | object) corresponding to the maximum probability is found, and the cell is considered to contain the object or the objectA portion of an object;

fourthly, passing the image through a full convolution neural network: the Darknet-53 multiscale classification model is associated with four convolutional layers and two fully connected networks.

And fifthly, setting loss functions as square and square of frame coordinate error, IOU error and category error.

And sixthly, acquiring an optimal frame as regression through a non-maximum suppression algorithm.

And seventhly, iteratively correcting the network parameters for multiple times.

Inputting the marked image files and data files into a built network, training through the algorithm, setting the initial learning rate to be 0.0001, realizing training visualization through programming, and observing IOU (input output) rate, recall rate and a loss function curve during training. As shown in fig. 2. The Loss function Loss trend is the variation trend of the Loss function Loss in the training process, in the experimental process, 64 block graphs are used as a group, 8 block graphs are used as a batch for training, ten thousand iterations are performed totally, the drop of the Loss function basically tends to be stable when the training is performed for seven thousand iterations, no obvious change exists in the seven thousand to ten thousand iterations, and the model is regarded as convergence when the seven thousand iterations are performed. The experiment is realized in the environment that the GPU is Titan X.

Design scheme of KCF tracking flow

Aiming at the defect that the YOLOv3 algorithm depends on a data set too much, and in order to enhance the robustness, adaptability and accuracy of the model, the scheme adds a KCF detection target tracking thread for the omission checking and positioning correction of the YOLOv3 detection algorithm. The KCF target detection algorithm flow in the scheme is as follows:

input video and extract single frame.

And secondly, judging whether the image is a first frame image, initializing the position of a target rectangular frame if the image is the first frame image, constructing a training sample through a cyclic matrix according to the target position, and constructing a detection sample at the target position in a cyclic displacement mode if the image is not the first frame image.

Extracting HOG characteristics of the image at the position of the search rectangular frame, converting training of the sample into a ridge regression problem through Fourier transform, performing discrete Fourier transform, calculating weight coefficients of the training sample, updating parameters, judging whether video input exists or not, performing 1 in a circulating mode if the video input exists, and otherwise, finishing the target detection process.

And fourthly, constructing a detection sample at the target position in a cyclic displacement mode.

Updating parameters: firstly, extracting HOG characteristics from a detection sample, and performing Fourier transform; secondly, calculating a cross-correlation matrix of the detection samples; then, calculating a response value of the detection sample, taking the detection sample as a confidence coefficient, and updating the position information; finally, judging whether the response value of the detection sample is greater than 0.75, and if so, extracting the HOG characteristics of the image at the position of the search rectangular frame; otherwise, no parameter update is performed.

Optimization scheme of YOLOv3

TensorRT acceleration

The problem that the response time of a network model is too long is often faced when deep learning reality deployment, and therefore a phase-oriented acceleration method is needed. TensorRT is an accelerator provided by NVIDIA for the neural network response phase. Compared with the training process, the model structure and parameters are fixed during network inference, the size of the batch images is generally small, the requirement on precision is lower compared with the training process, and therefore a large optimization space is provided. The scheme adopts TensorRT and is optimized in the following aspects:

combining certain layers

Sometimes, the cost of reading and writing the memory by the network is too high instead of the calculation amount, and operations of multiple layers are combined into the same layer in TensorRT, so that kernel starting and memory reading and writing can be reduced to a certain extent. Operations such as convolution and excitation are done in one go. In addition, layers with the same input and the same convolution kernel size are merged into the same layer, and the computation amount of the same input is eliminated by using a pre-allocation buffer area and the like.

② support data types of FP16 or INT8

During training, the requirement on calculation accuracy is high due to gradient and the like, but the response stage can accelerate operation by using data types with low accuracy, so that the size of the model is reduced.

Automatic adjustment of kernel

TensorRT has optimization of a write algorithm level aiming at different hyper-parameters, for example, which algorithm is used for convolution operation can be determined according to the hyper-parameters such as the size of a convolution kernel and the input size.

Fourthly, dynamic tensor memory

TensorRT reduces the memory overhead through optimization, and improves the reloading of the memory.

Multiple parallel operations

For the case of CUDA support, parallel operations may be performed for multiple branches of the same input.

Model compression

The total number of models generated by using the YOLOv3 algorithm is 106, the parameter number and the network structure are very complex, the response speed can be greatly influenced in the engineering application process, and experiments show that the original model identification speed can only reach 5FPS in the Nvidia Jetson Nano. However, a redundant layer exists in the network, so that the network structure needs to be pruned. The flow of branch reduction is as follows:

the scaling factor gamma in batch standardization is used as an importance factor, namely, the smaller the gamma is, the less important the corresponding network layer is, and the network layer can be cut.

Secondly, a regular term related to lambda is added in the target equation for restricting the size of gamma, so that automatic pruning can be realized in training, which is not realized in the conventional model compression. The method comprises the following steps of dividing into three parts, namely, training; second, pruning; and thirdly, finely adjusting the pruned model and circularly executing.

The specific operation details are as follows: the analysis was carried out by experiment lambda, usually 0.00001 or 0.0001, as the case may be. After gamma is obtained, the method similar to the energy ratio in similar PCA is adopted, all the gamma of the current layer are added, then the gamma is arranged in the order from large to small, the larger part is selected, and the scheme is selected to be about 70%. The effect of the choice of λ on γ is shown in fig. 3.

When λ is 0, the objective function does not penalize γ, and when λ is 0.00001, it can be found that more than 450 λ ═ 0.0 are close to 0 as a whole. When λ is 0.0001, there is a greater sparsity constraint on γ, and it can be seen that nearly 2000 γ are around 0.0.

Percentage pruning: the more the clipping, the smaller the model; too much clipping results in loss of precision. This is contradictory, so this scheme has made experimental contrast, and the experiment finds that, when the pruning exceeds 80%, the precision can descend by a wide margin. As shown in fig. 4 (Baseline, sparse training Trained, modified Pruned, Fine-tuned, Test error, Pruned channels). In the application process, the accuracy and the speed are selected according to the environment and the requirement of the engineering. The scheme adopts a 35% branch-reducing optimization model, so that the speed is increased to 30 fps.

Verification experiment result and analysis of scheme and algorithm

The design of a double-thread model is completed through the algorithm, the model is used for testing images and videos after training is completed, 31925 frames are used for testing the videos, and the number of the frames with targets is 18723. The video is the coexistence of the airplane and the landing gear.

The detection result is visually programmed for the first video, the confidence coefficient of each frame of detection result is obtained and is shown in table 1, the IOU threshold value is set to be 0.7, and R-P curves of the machine type and the undercarriage are respectively drawn, and the R-P curves are shown in fig. 5. The evaluation of the algorithm is statistically mAP, and the index of the experimental result is shown in Table 2.

TABLE 1 Framing detection results and confidence statistics

Table 2 evaluation of the algorithm of this scheme

In the aspect of accuracy, the algorithm of the scheme is superior to the traditional detection method in the aspects of machine type and undercarriage identification; compared with a detection algorithm based on deep learning, the landing gear detection algorithm based on the LetNet-5 is improved by 6.3 percent, and is improved by 2.9 percent compared with a method based on moving target detection and ResNet-v 2; meanwhile, the algorithm of the scheme reduces the false detection rate and the missed detection rate. In the aspect of operation speed, the operation speed of the algorithm in the scheme is 56fps in a training environment, the algorithm is transplanted into an RTX2060 GPU system, and 30 frames per second can be achieved by adopting model branch reduction and model compression.

The method is used for detecting the targets of five types of machines, landing gears of the machines and the like on the premise of improving the YOLOv3 model, and compared with the traditional target detection method, the method has the advantages in the detection and identification level of a single-frame image, and can obtain higher detection accuracy and higher detection speed.

The KCF target tracking algorithm in the prior art trains a target detector with a target of a current frame and a target track by an on-line target detector training method in a tracking process, then detects the target at a motion track position predicted by a tracker in a next frame by using the detector, judges whether the target is a target to be detected, and updates the original detector according to a detection result. The targets within the trace are typically recorded as positive samples and the remaining environments as negative samples. However, the KCF algorithm still has shortcomings in the aspects of scale transformation, feature extraction, target loss, and the like. The scheme combines the KCF algorithm and the YOLO algorithm, overcomes the adverse effects of illumination, deformation and the like on the target tracking algorithm, improves the accuracy, robustness and adaptability of the target tracking algorithm, and has unexpected technical effects.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. The model identification and undercarriage retraction detection method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:

2. The deep learning-based model identification and undercarriage retraction detection method according to claim 1, wherein: the network building and model training process of the Yolov3 is as follows:

and S17, correcting the network parameters through multiple iterations.

3. The deep learning-based model identification and undercarriage retraction detection method according to claim 1, wherein: the target detection algorithm flow of the detector of the KCF target tracking thread is as follows:

s21, inputting a video and extracting a single frame;

4. The deep learning-based model identification and undercarriage retraction detection method according to claim 3, wherein: the parameter updating process of step S23 includes: firstly, extracting HOG characteristics from a detection sample, and performing Fourier transform; secondly, calculating a cross-correlation matrix of the detection samples; then, calculating a response value of the detection sample, taking the detection sample as a confidence coefficient, and updating the position information; finally, judging whether the response value of the detection sample is greater than 0.75, and if so, extracting the HOG characteristics of the image at the position of the search rectangular frame; otherwise, no parameter update is performed.

5. The deep learning-based model identification and undercarriage retraction detection method according to claim 1, wherein: the YOLOv3 accelerates in the response phase through an accelerator for the response phase of the neural network, and the response time is shortened.

6. The deep learning-based model identification and undercarriage retraction detection method according to claim 1, wherein: the model generated by the YOLOv3 algorithm needs to be subjected to branch reduction treatment, and the branch reduction process comprises the following steps: training, pruning, fine-tuning the pruned model, and performing in a circulating manner; in the branch reducing process, a scaling factor gamma in batch standardization is used as an importance factor, namely, the smaller the gamma is, the less important the corresponding network layer is, and the network layer can be cut; to constrain the magnitude of γ, a regularization term for λ is added to the objective equation to achieve automatic pruning during training.