CN111814662B

CN111814662B - Visible light image airplane rapid detection method based on miniature convolutional neural network

Info

Publication number: CN111814662B
Application number: CN202010646717.9A
Authority: CN
Inventors: 晏焕钱; 李波; 韦星星; 王越; 赖汝锋
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-07-07
Filing date: 2020-07-07
Publication date: 2022-06-24
Anticipated expiration: 2040-07-07
Also published as: CN111814662A

Abstract

The invention discloses a visible light image airplane rapid detection method based on a miniature convolutional neural network, which comprises the following steps: (1) counting airplane dimension information in the training data set to obtain a data set; calculating the final size of the sliding window; (2) calculating lambda corresponding to a given channel characteristic type omega through the channel characteristics of the training data set_ΩCompleting the establishment of a rapid characteristic pyramid, and completing the training of a rapid candidate box generation algorithm by adopting an Adaboost algorithm; (3) and (4) correcting the parameters of the candidate frame generation algorithm through a linear Search algorithm Search-delta, and judging the candidate area again by adopting a miniature convolutional neural network, wherein if the current area is classified as true by the network, the candidate frame is considered to have an airplane, otherwise, the current candidate frame is considered as a background and is discarded. The invention has the advantages of high detection speed, high precision, small occupied space of the whole algorithm model and low requirement on the hardware condition of the operation platform.

Description

Visible light image airplane rapid detection method based on miniature convolutional neural network

Technical Field

The invention belongs to the technical field of digital image processing, and particularly relates to a visible light image airplane rapid detection method based on a miniature convolutional neural network.

Background

Target detection is one of the core problems of machine vision and also one of the fastest growing artificial intelligence techniques in recent years. It can be a popular overview for a task to find the target to be identified from a given picture and give the location of this target. The object detection method has been studied and developed for several decades, and can be roughly divided into object detection of the vj (viola jones) era in which artificial features are combined with machine learning and object detection of the deep learning era. The method mainly adopts a dense sliding window to judge whether a target exists in a current window, the process relates to the characteristic extraction of the current window and the identification and judgment of a classifier on the current window, and the method is usually slow; the latter completes the target detection task by learning and fitting the target position and category through a complex network model and a large amount of data, and can be roughly divided into two branches, namely a candidate frame-based deep learning method and a regression prediction-based deep learning method, and the like.

Target detection in the remote sensing image relates to positioning of a region of interest in the satellite shooting region and identification of the current region. It is different from the detection target in the natural image in that: the targets are formed in a top view, multiple directions exist in the targets, the brightness of the targets is greatly different, and the background environment where the targets are located is relatively complex. The detection of the airport airplane in the remote sensing image plays an important role in military reconnaissance and airport monitoring. The target detection algorithm can automatically mark and position the airplane target in the airport, the operation plays an extremely important role in recording and describing the later-stage airport state, and manpower and material resources can be reduced. Although the target detection technology under the natural image has a series of breakthrough developments, detection research on airplanes in the remote sensing image airport is relatively few, and the task still has some problems to be solved, such as high requirement of an actual algorithm on hardware conditions, insufficient processing effect of the algorithm on complex and variable detection environments, and the like.

Therefore, in order to meet the requirement of a lower hardware environment as much as possible, and simultaneously have higher detection accuracy and higher detection speed, through the analysis and research on the airplane detection algorithm in the visible light airport in the current remote sensing image, the problem to be solved by the technical personnel in the field is urgently needed to provide the airport airplane detection algorithm in the visible light image with low requirement on hardware conditions, high detection accuracy and high detection speed.

Disclosure of Invention

The technical problem that this application will solve lies in:

(1) an efficient target detection method is provided and applied to detection of an airport airplane under visible light in a remote sensing image;

(2) a novel method for calculating the size of a sliding window is provided, the method obtains the length and width priori knowledge of an airplane by counting the proportional information of an airplane target in a remote sensing picture, and the size of the sliding window is calculated by adopting a square window. The sliding window calculated by the method can avoid the influence of human marking errors and shooting angles to a certain extent, and can effectively solve the problem of high missing rate caused by multi-directionality of the target angle;

(3) in order to avoid the defects of low precision, low speed, large calculated amount and the like of a candidate frame extraction algorithm at the early stage of target detection, a rapid candidate frame generation algorithm is realized by combining the characteristics of an aggregation channel and an Adaboost classification algorithm;

(4) the method is a simple and efficient linear search algorithm, and the algorithm can be used for quickly fine-tuning the quick candidate box generation algorithm, so that the effectiveness of the quick candidate box generation algorithm is improved;

(5) an efficient and lightweight convolutional neural network is designed for classification of target and background regions. The network model has a shallow layer number, less parameter quantity and higher classification accuracy. The method is different from most of currently available network models and has the defects of large model capacity, low speed and large calculation amount.

In order to achieve the above object, the present application adopts the following technical solutions:

a visible light image airplane rapid detection method based on a miniature convolution neural network comprises the following steps:

(1) counting airplane dimension information in the training data set to obtain a data set; next, a minimum side length S of the aircraft object is calculated_min(ii) a Finally, calculating the average length-width ratio R of the airplane_targetTo calculate the final sliding window size: (S)_min×R_target,S_min×R_target)；

(2) Three types of channel characteristics are calculated for each visible light field picture: a standard gradient magnitude channel characteristic, a gradient direction channel characteristic, and a LUV color channel characteristic; calculating lambda corresponding to a given channel characteristic type omega by using a least square estimation method through channel characteristics of a training data set_ΩCompleting the establishment of a rapid characteristic pyramid, and completing the training of a rapid candidate box generation algorithm by adopting an Adaboost algorithm, wherein lambda_ΩRepresenting an information loss coefficient;

(3) re-detecting the training data set by the fast candidate frame generation algorithm under different parameters delta through a linear Search algorithm Search-delta, and calculating corresponding detection precision and recall rate; obtaining a high-precision and high-recall-rate quick candidate frame generation model by adjusting the parameter delta;

(4) and (3) finishing the re-judgment of the candidate area by adopting a miniature convolutional neural network, if the current area is classified as true by the network, determining that the plane exists in the candidate frame, and if not, considering the current candidate frame as the background and abandoning the current candidate frame.

Preferably, the data set B { (h)_i,w_i) N, where N denotes the number of currently existing aircraft targets, h_iNumber of pixels, w, representing the length of the ith aircraft_iIndicating the number of pixels occupied by the width of the ith aircraft, and i indicates the aircraft number.

Preferably, the minimum side length S of the aircraft object is calculated_min＝min{min(h),min(w)}；

Average length to width ratio of aircraft

Preferably, for the Adaboost algorithm, the following is defined:

wherein x represents the aggregation channel characteristic of the current window, the threshold thr is used for judging whether the current area has a target, and SCLF_xAn output value representing the strong classifier SCLF to x, which represents the probability magnitude that x is the target;

SCLF_xconsists of a series of weak classifiers, which are expressed as follows:

clf denotes a weak classifier constructed with a tree structure of depth 2; weight_mIs the weight corresponding to each weak classifier; theta_mIs a parameter of each weak classifier; δ represents the weight correction coefficient of the weak classifier and is set to 0 during the training phase.

Preferably, the fast candidate box generation algorithm in step (3) is finely adjusted, and the method is as follows: the IoU value for the current candidate box is defined as follows: IoU ═ GB)/(GB ═ DB), where GB denotes the true labeled target frame and DB denotes the target frame generated by the fast candidate frame; the precision and the recall rate are indirectly obtained through IoU values; the index fine-tuned by the fast candidate box generation algorithm is defined as

Fidx_i＝γ×Recall_i+Precision_iWherein γ is a hyperparameter; recall_iIndicating the recall rate after i parameter delta adjustments, Precision_iThe detection accuracy after i-time parameter δ adjustment is shown.

Preferably, the miniature convolutional neural network comprises 4 convolutional layers, 2 mean pooling layers and 2 full-connection layers; the input of the network is a picture with the size of 32 × 3, the network adopts a ReLu function as an activation function, each layer of convolution operation is followed by the activation function, the network is divided into a feature extraction part and a full connection layer, and the whole convolution neural network is a two-classification network model.

Preferably, the feature extraction part performs a convolution operation on the input image by using 18 convolution kernels of 5 × 5, performs a convolution operation by using 24 convolution kernels of 3 × 3 and performs a mean pooling operation by using kernels of 2 × 2, performs a convolution operation by using 32 convolution kernels of 3 × 3 and performs a mean pooling operation by using kernels of 2 × 2, and performs a convolution operation by using 32 convolution kernels of 3 × 3 again, thereby completing the feature extraction of the candidate region.

Preferably, the full connecting layer part is mainly used for classification judgment, and has two layers in total; the first layer inputs the feature vectors obtained by the feature extraction part with the size of 32 × 3, outputs the vectors with the size of 128, and is followed by a Dropout layer and a ReLu activation function layer with the probability size P of 0.1; the second layer is a vector with 128 dimensions as input and 2 vectors as output.

Preferably, the miniature convolutional neural network adopts a cross entropy function as a loss function of the network, which is defined as follows:

where BZ represents the batch size, set to 64, y in network training_iThe true tag, which represents the current candidate box, contains a target of 1, otherwise 0,

is a predicted outcome of the network; the optimization of the loss function adopts an Adam optimizer, and the training data set is iterated for 50 times in total, wherein the learning rate lr of the first 40 times is 0.001, and the learning rate lr of the last 10 times is 0.0003; when the miniature convolutional neural network is trained, one part of a training data set is from a real label in an original training data set, and the other part of the training data set is from a candidate frame generated after a rapid candidate frame generation algorithm is run on the training data set; all the pictures to be input into the network are normalized, and the normalized mean value is mean ═ 0.485,0.456,0.406]Variance std ═ 0.229,0.224,0.225]The current input pixel point isAnd X, the normalized pixel is (X-mean)/std.

The invention has the beneficial effects that:

1. the detection speed is high.

2. The detection precision is high.

3. The whole algorithm model occupies less space and has low requirements on the hardware conditions of the operating platform.

4. The algorithm is fast in training speed and needs not to be iterated for hundreds of times.

5. The algorithm does not need a large amount of data training and is very suitable for the characteristic of small remote sensing data quantity.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of the algorithm flow of the present invention.

FIG. 2 is a schematic diagram of a miniature convolutional neural network according to the present invention.

FIG. 3 is a schematic view of the sliding window of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a visible light image airplane rapid detection method based on a miniature convolutional neural network, and the method is shown in the attached figure 1, wherein a yellow frame represents a training process, a green frame represents a detection process, and a blue frame represents a detection result. The detection method comprises the following steps:

s1: calculation of the sliding window:

because the feature extraction under different scales is completed by adopting the rapid feature pyramid, and the establishment of the feature pyramid is completed by adopting a down-sampling mode, the initial sliding window is a window where a smaller target is located. In order to avoid the influence of artificial marking and shooting angles and the property of adapting to the multidirectional airplane angle and the like, the square sliding window is adopted to finish the airplane detection. The sliding window size calculated by the method is more representative and universal. It mainly comprises the following three steps:

firstly, counting and sorting the airplane dimensions in the training data set to obtain an airplane dimension data set B { (h)_i,w_i) N, where N denotes the number of currently existing aircraft, h_iNumber of pixels, w, representing the length of the ith aircraft_iThe number of pixels occupied by the width of the ith airplane is represented, and i represents an airplane number;

then, calculating the minimum side length information of the airplane:

S_min＝min{min(h),min(w)} (1)

secondly, counting and sorting the length-width ratio information r of each airplane_i：

So that the average length-width ratio information R can be estimated_target：

In summary, the size of the final sliding window is (S)_min×R_target,S_min×R_target). Referring to fig. 3, blue dots represent aircraft scale information and red dots represent calculated sliding window scale information.

S2: fast candidate box generation based on aggregated channel features

The candidate frame generation algorithm is mostly based on a segmentation algorithm or a clustering algorithm, and the algorithm has the defects of high running speed, low precision, high missing rate and the like. In the traditional target detection in the VJ era, due to the difference of target sizes, sliding window judgment needs to be carried out on a plurality of scale pyramids to ensure higher detection rate. The feature needs to be recalculated at each scale, so the whole detection process is less efficient. The fast pyramid method adopts a characteristic diagram adjacent interpolation mode to calculate the characteristic diagram under partial scale. The detection of the corresponding target can be rapidly finished through the cooperation of the operation and the integral map. The quick characteristic pyramid method is combined with the aggregation channel characteristic and the Adaboost algorithm to complete target detection in real time, and the strategy is adopted to realize a candidate box generation algorithm with high recall rate and high reliability. The method comprises the following specific steps:

polymerization channel characteristics: the feature channel is a mapping of the input picture, which may be point-to-point or region-to-region, and the transformed picture is a feature. And the input picture I has the channel characteristic of C ═ omega (I), and the aggregation channel characteristic of the input picture I can be obtained by connecting and smoothing the characteristic channels in C. The channel features used in the algorithm include three classes: standard gradient magnitude channel characteristics, gradient direction channel characteristics (6 directions), and LUV color channel characteristics;

quick characteristic pyramid: input Picture I, whose Standard feature pyramid can be denoted C_sΩ (R (I, s)), where s denotes the scale size, and the function R (I, s) denotes sampling of picture I with scale s. The fast feature pyramid is different from the standard feature pyramid in that only a part of the scale is sampled (s' is an element of {1,1/2,1/4. }), and other intermediate scale features are calculated by adopting a linear interpolation mode

For a given channel characteristic type Ω, its corresponding λ_ΩThe least squares estimation can be performed as follows:

wherein N represents N pictures, f_ΩThe definition is as follows:

wherein h is_s×w_sRepresenting the dimension of the picture at the current scale s, i and j representing the position of the pixel, function f_Ω(I_s) Can be represented as C_sAverage value; by indicating the information loss factor lambda_ΩThe combination of corresponding formulas can make the characteristic diagram constructed by linear interpolation more accurate.

And (3) generating a candidate frame: a soft-concatenated Adaboost algorithm is used to accomplish the selection of candidate boxes, which is defined as follows:

where x represents the aggregate channel characteristics of the current window and the threshold thr is used to determine whether the current region has a target, which is typically set to 0. SCLF_xThe output value of the strong classifier SCLF vs. x, which indicates the likelihood of the target existing in the current window, is represented. SCLF_xConsists of a series of weak classifiers, which are expressed as follows:

clf denotes a weak classifier constructed with a tree structure of depth 2, weight_mIs the weight, θ, corresponding to each weak classifier_mIs a parameter of each weak classifier, and δ represents a weight correction coefficient of the weak classifier, which is 0 in the training phase.

S3: fine tuning of fast candidate box generation algorithms

The fast candidate box generation algorithm aims to generate candidate target regions with high recall rate and high accuracy, so that the Adaboost algorithm parameters need to be adjusted after training. Through the correction of the parameter delta, on one hand, the accuracy of the classification result is ensured, and on the other hand, the recall rate can be improved as much as possible. Usually the IoU value of the candidate box and any one of the manually labeled boxes is greater than 0.5, then the current candidate box is considered to be correct, and the IoU value is defined as follows:

IoU＝(GB∩DB)/(GB∪DB) (8)

wherein GB represents a target frame of a real label, and DB represents a target frame generated by a quick candidate frame. Thus, the Precision (Precision) and Recall (Recall) can be determined:

tp (true positive) indicates that the candidate box determines that there is a target, and in fact the current region does have a target. Fp (false positive) indicates that the candidate box determines that there is a target, but in fact there is no target in the current region. Fn (false negative) indicates that the candidate box determines that there is no target, but in fact the current region does. The fast candidate box generation algorithm is used for detecting the training data set under different parameters delta, and corresponding detection precision (precision) and recall (recall) can be calculated. And a high-precision high-recall-rate quick candidate frame generation model is obtained by adjusting the parameter delta.

Since the purpose of the fast candidate box generation algorithm is to obtain a high-precision and high-recall candidate box, wherein the recall rate directly affects the classification effect of the micro convolutional neural network in the later stage, the index finely tuned by the fast candidate box generation algorithm is defined as:

Fidx_i＝γ×Recall_i+Precision_i (11)

where γ is a hyperparameter to emphasize the importance of recall values, and furthermore larger γ values will make the curve formed by the Fidx values smoother. The parameter delta is selected by adopting a Search-delta algorithm, and the code of the Search-delta algorithm is as follows:

s4: design of miniature convolutional neural network

The micro convolutional neural network is used for judging whether the target exists in the candidate area and comprises 4 convolutional layers, 2 average pooling layers and 2 full-connection layers. The specific structure of the network model is shown in fig. 2, the network model is divided into a feature extraction layer and a classification layer, the first line is a rough model of the model, and each block in the second line describes each component in the corresponding first line in detail. Each candidate region is sampled to a size of 32 x 3 and then input to the network. The network adopts a ReLu function as an activation function, the activation function is followed after each layer of convolution operation, and the network is divided into a feature extraction part and a full connection layer. The feature extraction part firstly uses 18 convolution kernels with 5 × 5 to perform convolution operation on an input picture, then uses 24 convolution kernels with 3 × 3 to perform convolution operation and uses kernels with 2 × 2 to perform mean pooling operation, then uses 32 convolution kernels with 3 × 3 to perform convolution operation and kernels with 2 × 2 to perform mean pooling operation, and finally uses 32 convolution kernels with 3 × 3 to perform convolution operation again, so that the feature extraction work of the candidate region is completed. The full connecting layer part is mainly used for classification judgment and has two layers in total. The first layer receives a feature vector (obtained by the feature extraction section) having a size of 32 × 3, outputs a vector having a size of 128, and is followed by a Dropout layer and a ReLu activation function layer having a probability size P of 0.1. The second layer is a vector with 128 dimensions as input and a vector with 2 dimensions as output, and the micro convolutional neural network aims at identifying a target area, so that only two types of judgment are needed.

The miniature convolutional neural network adopts a cross entropy function as a loss function of the network, which is defined as follows:

where BZ represents the batch size, set to 64, y in network training_iThe true tag, which represents the current candidate box, contains an airplane and is 1, otherwise it is 0,

is the predicted result of the network. Optimization of the loss function was performed using an Adam optimizer, for a total of 50 iterations of the training data set, with the first 40 learning rates lr being 0.001 and the last 10 learning rates lr being 0.0003. When the miniature convolutional neural network is trained, part of the training data set comes from the real labels in the original training data set, and part of the training data set comes from the candidate frames generated after the rapid candidate frame generation algorithm runs on the training data set. All the pictures to be input into the network are normalized, and the normalized mean value is mean ═ 0.485,0.456,0.406]Variance std ═ 0.229,0.224,0.225]And if the current input pixel point is X, the normalized pixel is (X-mean)/std. In addition, due to the fact that the aircraft has the challenges of multi-directionality, small training data volume and the like, the current data are all turned over up and down, left and right to expand the data volume, and therefore the classification performance of the network is improved.

S5: detection of aircraft in visible light airport based on fast candidate frame and miniature convolutional neural network

Firstly, counting the target size in a training data set, calculating the size of a sliding window by adopting the scheme in the first step, then, training a rapid candidate frame generation model by adopting the sliding window and the scheme in the second step, wherein lambda corresponding to different channel feature types omega is finished in the training process_ΩAnd recording the trained fast candidate box generation model. And then, correcting the parameter delta by using the training data set, so that a candidate frame with high reliability and high recall rate can be generated under the current detection model, and replacing the candidate frame generation model which is just stored by the corrected fast candidate frame generation model.

And then, training the miniature convolutional neural network, wherein one part of the training data set is from the real labels in the original training data set, and the other part of the training data set is used for generating candidate frames generated after the training data set is operated by the rapid candidate frame generation algorithm. And all the pictures to be input into the network are subjected to normalization processing. In addition, due to the challenges of multi-directionality, small training data volume and the like of the target, the current data is turned over up and down and left and right to expand the data volume, and the classification performance of the network is improved.

And finally, generating candidate areas by using a rapid candidate box generation algorithm for the input remote sensing picture, judging the areas again by using a miniature convolutional neural network, indicating that a target exists in the current area when the network model classification result is true, and otherwise, discarding the current candidate box as a background.

The method for detecting the airplane in the airport under the visible light is mainly provided for detecting the airplane in the airport under the visible light of 2-5 meters, but the method is also suitable for detecting the target in the visible light remote sensing image under other resolutions, and aiming at the airport under the visible light of other resolutions, the method only needs to recalculate the size of a sliding window, retrain a fast candidate frame generation model and a micro convolutional neural network model and fine tune a candidate frame generation model.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A visible light image airplane rapid detection method based on a miniature convolutional neural network is characterized by comprising the following steps:

(3) re-detecting the training data set by the fast candidate frame generation algorithm under different parameters delta through a linear Search algorithm Search-delta, and calculating corresponding detection precision and recall rate; obtaining a high-precision high-recall-rate quick candidate frame generation model by adjusting the parameter delta;

(4) and adopting the miniature convolutional neural network to judge the candidate area again, if the current area is classified as true by the network, considering that the plane exists in the candidate frame, and if not, considering the current candidate frame as the background and abandoning the current candidate frame.

2. The method for rapidly detecting the visible light image airplane based on the miniature convolutional neural network as claimed in claim 1, wherein the data set B { (h)_i,w_i) N, where N denotes the number of currently existing aircraft targets, h_iNumber of pixels, w, representing the length of the ith aircraft_iRepresenting the number of pixels occupied by the width of the ith aircraft.

3. The method for rapidly detecting the visible light image airplane based on the miniature convolutional neural network as claimed in claim 2, wherein the minimum side length S of the airplane target is calculated_min＝min{min(h),min(w)}；

Average length to width ratio of aircraft

4. The method for rapidly detecting the airplane with the visible light image based on the miniature convolutional neural network as claimed in claim 1, wherein for the Adaboost algorithm, the following is defined:

wherein x represents the aggregation channel characteristic of the current window, the threshold thr is used for judging whether the current area has a target, SCLF_xAn output value representing the strong classifier SCLF to x, which represents the probability magnitude that x is the target;

SCLF_xconsists of a series of weak classifiers, which are expressed as follows:

clf denotes a weak classifier constructed with a tree structure of depth 2; weight_mIs the weight corresponding to each weak classifier; theta_mIs a parameter of each weak classifier; δ represents the weight correction coefficient of the weak classifier, which is set to 0 during the training phase.

5. The method for rapidly detecting the visible light image airplane based on the miniature convolutional neural network as claimed in claim 1, wherein the fast candidate box generation algorithm in step (3) is finely adjusted by: the IoU value for the current candidate box is defined as follows: IoU ═ GB ≈ dDB)/(GB @ DB), where GB denotes a true labeled target frame, and DB denotes a target frame generated by a fast candidate frame; the precision and the recall rate are indirectly obtained through IoU values; the index fine-tuned by the fast candidate box generation algorithm is defined as

Fidx_i＝γ×Recall_i+Precision_iWherein γ is a hyperparameter; recall_iIndicating the recall, Precision, after i parameter delta adjustments_iThe detection accuracy after i-time parameter δ adjustment is shown.

6. The method for rapidly detecting the visible light image airplane based on the miniature convolutional neural network as claimed in claim 1, wherein the miniature convolutional neural network has 4 convolutional layers, 2 mean pooling layers and 2 full-link layers; the input of the network is a picture with the size of 32 × 3, the network adopts a ReLu function as an activation function, each layer of convolution operation is followed by the activation function, the network is divided into a feature extraction part and a full connection layer, and the whole convolution neural network is a two-classification network model.

7. The method as claimed in claim 6, wherein the feature extraction part performs a convolution operation on the input image by using 18 convolution kernels with 5 × 5, performs a convolution operation by using 24 convolution kernels with 3 × 3 and performs a mean pooling operation by using kernels with 2 × 2, performs a convolution operation by using 32 convolution kernels with 3 × 3 and performs a mean pooling operation by using kernels with 2 × 2, and performs a convolution operation by using 32 convolution kernels with 3 × 3 again, thereby completing the feature extraction of the candidate region.

8. The method for rapidly detecting the visible light image airplane based on the miniature convolutional neural network as claimed in claim 7, wherein the full connecting layer part is mainly used for classification judgment, and has two layers; inputting the feature vectors obtained by the feature extraction part with the size of 32 × 3 into the first layer, outputting the vectors with the size of 128, and then, immediately connecting a Dropout layer and a ReLu activation function layer with the probability size P of 0.1 to the first layer; the second layer is a vector with 128 dimensions as input and 2 vectors as output.

9. The method for rapidly detecting the visible light image airplane based on the miniature convolutional neural network as claimed in claim 8, wherein the miniature convolutional neural network adopts a cross entropy function as a loss function of the network, which is defined as follows:

is a prediction result of the network; the optimization of the loss function adopts an Adam optimizer, and the training data set is iterated for 50 times in total, wherein the learning rate lr of the first 40 times is 0.001, and the learning rate lr of the last 10 times is 0.0003; when the miniature convolutional neural network is trained, one part of the training data set is from the real label in the original training data set, and the other part of the training data set is from the candidate frame generated after the rapid candidate frame generation algorithm is operated on the training data set; all the pictures to be input into the network are normalized, and the normalized mean value is mean ═ 0.485,0.456,0.406]Variance std ═ 0.229,0.224,0.225]And if the current input pixel point is X, the normalized pixel is (X-mean)/std.