CN113780462B

CN113780462B - Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof

Info

Publication number: CN113780462B
Application number: CN202111119764.9A
Authority: CN
Inventors: 许毅平; 田岩; 李若男
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2024-03-19
Anticipated expiration: 2041-09-24
Also published as: CN113780462A

Abstract

The invention discloses a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, belonging to the field of vehicle detection, comprising the following steps: establishing a vehicle detection network, and training the vehicle detection network by utilizing aerial photo data set training; the training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; the vehicle detection network is a deep learning neural network, takes an image as an input, and is used for predicting the position and the category of the vehicle in the input image and outputting prediction confidence; the training loss function is: l (L) _total ＝L _loc +L _cls +L _disc ；L _loc For regression loss, L _cls Is a classification loss; l (L) _disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L _disc The smaller; after training is finished, the establishment of the vehicle detection network is completed. The invention can establish a more accurate vehicle detection network and improve the accuracy of vehicle detection.

Description

Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof

Technical Field

The invention belongs to the field of vehicle detection, and particularly relates to a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof.

Background

The extraction and detection of vehicle features of aerial images of unmanned aerial vehicles are an important branch and a difficulty. The unmanned aerial vehicle aerial image vehicle detection refers to the steps of acquiring RGB images through unmanned aerial vehicle aerial image, and predicting the positions and the categories of the vehicles in the images. Compared with traffic images shot by a traditional fixed camera, the unmanned aerial vehicle aerial image has wider monitoring visual angles and different shooting heights, and accordingly the problems of complex and various backgrounds, large vehicle scale change, uneven vehicle category distribution and the like of the images can be brought, so that how to quickly and correctly detect the vehicles in the images becomes a challenging task.

The essence of vehicle detection is to extract the characteristic with discrimination and complete the task of vehicle classification and regression. Compared with the traditional target detection method, the deep learning-based method has obvious advantages in feature extraction and classification regression, so that the existing unmanned aerial vehicle aerial image vehicle detection method is mostly improved on the basis of a general detection algorithm.

The classification task of target detection requires that the extracted features contain more advanced information, and the regression task requires that the features contain more position and detail information, but the two requirements are difficult to be simultaneously combined on the same feature diagram. In the feature extraction network, the feature resolution of the shallow layer is higher, and the shallow layer contains rich position and detail information, so that the shallow layer is more suitable for detecting small targets; however, the semantic information is low, the noise is high, and the semantic information is not suitable for target classification tasks, so that a large number of false checks can be caused. The deep features are more abstract, have stronger semantic information, and are more suitable for target classification tasks; but because of its larger receptive field, the resolution of the features is lower, resulting in poor perceptibility of the details and not suitable for the localization of small targets. Therefore, the deep and shallow features are fused, a feature pyramid is constructed, and the obtained features are taken, so that the advanced feature information of the shallow features can be effectively enhanced, and the target detection precision, especially the small target detection precision, is improved. In the aspect of feature fusion based on pyramid, the existing fusion mode mainly adopts simple equal weight addition processing or channel merging processing, and the contribution degree of features of different layers is not considered in the process of feature fusion, so that the utilization of the features and the effective expression of the feature information are insufficient by a network, and the detection precision is affected. In the aspect of loss function design, the current method only uses the classification loss representing the target class and the regression loss for positioning the target to restrict the optimization of network parameters, and finally the class discrimination capability of the extracted characteristics of the network is limited, which affects the accuracy of vehicle detection, so the detection accuracy of vehicle detection from unmanned aerial vehicle aerial images needs to be further improved.

Disclosure of Invention

Aiming at the defects and improvement demands of the prior art, the invention provides a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, and aims to establish a more accurate vehicle detection network and improve the accuracy of vehicle detection.

In order to achieve the above object, according to one aspect of the present invention, there is provided a vehicle detection network establishment method based on an aerial image of an unmanned aerial vehicle, including:

establishing a vehicle detection network to be trained, and training the vehicle detection network by utilizing aerial photo data set training; the training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; the vehicle detection network is a deep learning neural network model, takes an image as an input, and is used for predicting the position and the category of a vehicle in the input image and outputting prediction confidence;

in the training process, the loss function of the calculation loss is as follows: l (L) _total ＝L _loc +L _cls +L _disc ；L _loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) _cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) _disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L _disc The smaller;

after training is finished, the establishment of the vehicle detection network is completed.

Further, L _disc ＝L _var +L _dist ；

Wherein L is _var Is a variance term for representing the feature aggregation degree of the training samples of the same kind, and the higher the feature aggregation degree of the training samples of the same kind is, the variance term L _var The smaller the value of (2); l (L) _dist Is a distance item for representing the characteristic dispersion degree of different training samples, and the higher the characteristic dispersion degree of different training samples is, the distance item L _dist The smaller the value of (c).

Further, the method comprises the steps of,

wherein C represents the total number of vehicle categories in the aerial data set, N _c Represents the total number of training samples corresponding to the c-th category in the aerial data set, mu _c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x _i A feature vector representing an i Zhang Hang th shot; delta _v Delta is a preset threshold value _v >0；[x] ₊ ＝max(0,x)。

Further, the method comprises the steps of,

wherein C represents the total number of vehicle categories in the aerial dataset; c _A And c _B Two different vehicle classes are represented,and->Respectively representing the average value of the feature vectors of the aerial images corresponding to the two categories; delta _d Delta is a preset threshold value _d >0；[x] ₊ ＝max(0,x)。

Further, the method comprises the steps of,

L _cls ＝-α _t W _t (1-P _t ) ^γ log(P _t )

wherein,a is a super parameter; />p represents the prediction confidence of the corresponding category; gamma is a preset weight coefficient; />w _class Weights representing misclassified training samples of corresponding class, +.>p _class Representing the probability of occurrence of training samples of the corresponding class in all training samples; beta is a super parameter.

Further, the vehicle detection network includes: a feature extraction backbone network, a feature fusion network, a classification sub-network and a regression sub-network;

the feature extraction backbone network is used for extracting three features with different scales of an input image, and the features are marked as C3, C4 and C5 in sequence according to the scales from large to small;

the feature fusion network comprises a semantic feature extraction module and a feature pyramid weighting fusion module, wherein the semantic feature extraction module is used for carrying out further feature extraction on the feature C5 to obtain a feature C6, and carrying out further feature extraction on the feature C6 to obtain a feature C7; the feature pyramid weighted fusion module is a 5-layer feature pyramid network, 5 layers of features outputted by the feature pyramid weighted fusion module are sequentially marked as P3-P7 from bottom to top, wherein the feature P7 is a result of convolution operation of the feature C7, and the feature PM is a result of convolution operation of the upper layer feature P (M+1) and the feature CM;

the classification sub-network is used for predicting the type of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network and outputting prediction confidence;

the regression sub-network is used for predicting the position of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network;

wherein M is a positive integer, and M is more than or equal to 3 and less than or equal to 6.

Further, in the training process, the weighted fusion of the feature P (M+1) and the feature CM is the dynamic adjustment of the weight coefficient of the feature PM.

According to another aspect of the present invention, there is provided a vehicle detection method based on an aerial image of an unmanned aerial vehicle, including:

inputting an aerial image to be detected into a vehicle detection network established by the vehicle detection network establishment method based on the unmanned aerial vehicle aerial image, so that the vehicle detection network predicts the position and the category of the vehicle, namely the prediction confidence;

and drawing a prediction result output by the vehicle detection network in the aerial image to finish vehicle detection.

Further, the vehicle detection method based on the unmanned aerial vehicle aerial image provided by the invention further comprises the following steps before the prediction result output by the vehicle detection network is drawn in the aerial image:

removing redundant prediction frames in the prediction result;

the prediction frame is a detection frame determined by the position information.

According to yet another aspect of the present invention, there is provided a computer readable storage medium comprising a stored computer program; when the computer program is executed by the processor, the equipment where the computer readable storage medium is located is controlled to execute the vehicle detection network establishment method based on the unmanned aerial vehicle aerial image and/or the vehicle detection method based on the unmanned aerial vehicle aerial image.

In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

(1) According to the invention, when the vehicle detection network based on the unmanned aerial vehicle aerial image is trained, the used loss function comprises a regression loss item and a category loss item, an inter-category identifiable loss item is introduced, the more the feature distribution of the similar training sample is concentrated, the more the feature distribution of the different similar training samples is dispersed, the smaller the inter-category identifiable loss is, therefore, the distribution condition of the training sample in the feature space can be changed through network training by introducing the inter-category identifiable loss item, more constraints are provided for optimizing network parameters, the feature distribution of the same category training sample is concentrated, the feature distribution of the different category samples is far away, finally, the features among the different categories extracted by the network are more differentiated, and the conditions of complex background in the unmanned aerial vehicle aerial image, different vehicle categories in the scene, different forms, different angles and the like can be effectively reduced, the conditions of false detection, false detection and wrong classification in the detection process can be effectively improved.

(2) The invention discloses a method for detecting a vehicle based on unmanned aerial vehicle aerial images, which comprises the steps of training a vehicle detection network based on unmanned aerial vehicle aerial images, wherein a used loss function comprises a regression loss term, a category loss term and an inter-category distinguishable loss term, wherein category loss penalizes a category of a prediction error on the basis of a focal loss classification loss, and simultaneously, a parameter W is introduced _t Different weights can be given according to different numbers of samples of different categories, the weight of the samples with few categories and misclassification can be increased, network training can pay more attention to misclassification of few categories, and the accuracy of vehicle detection is further improved.

(3) The vehicle detection network established by the invention further carries out feature extraction on the basis of the features extracted by the feature extraction backbone network, so that deeper features are obtained, network parameters are reduced, the depth of the network is deepened, higher semantic features are extracted, and deep and shallow convolution features are fused from top to bottom from the top layer of a feature pyramid by constructing a five-layer feature pyramid network, so that the advanced feature information of the shallow features can be effectively enhanced, the accuracy of target detection, especially small target detection, and vehicles with various scales can be detected; meanwhile, the original fusion mode is changed, and the multi-layer features are adaptively fused by adopting a feature weighted fusion mode, so that the network can purposefully extract the features according to the requirements of final regression and classification tasks. Based on the vehicle detection network with the special structure, the invention can further improve the detection precision of vehicle detection on the aerial image of the unmanned aerial vehicle.

(4) The vehicle detection network established by the invention, wherein the weighting coefficient for carrying out feature weighted fusion is a parameter which can be learned, so that the network can autonomously learn the importance of each input feature in training, dynamically adjust the weighting coefficient and further improve the detection precision of vehicle detection.

Drawings

FIG. 1 is a schematic diagram of a vehicle detection network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a feature pyramid weighted fusion module according to an embodiment of the present invention;

fig. 3 is a flowchart of a vehicle detection method based on an aerial image of an unmanned aerial vehicle according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

In order to improve the detection precision of vehicle detection based on unmanned aerial vehicle aerial images, the invention provides a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, and the whole thought is as follows: by introducing the identifiable loss among the classes into the loss function, the distribution condition of the training sample in the feature space is changed by utilizing the network training, more constraints are provided for network parameter optimization, so that the features among different classes extracted by the network are more differentiated, the prediction result precision of the network is higher, and the detection precision of vehicle detection is further improved; on the basis, the network structure is further improved, the feature weighting fusion of higher-level semantic information is realized, the weight coefficient can be dynamically adjusted in the training process, and the detection precision of vehicle detection is further improved.

The following are examples.

Example 1:

a vehicle detection network establishment method based on unmanned aerial vehicle aerial images comprises the following steps:

and establishing a vehicle detection network to be trained, training the vehicle detection network by using the aerial photo data set training, and completing the establishment of the vehicle detection network after the training is finished.

The training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; optionally, in this embodiment, the aerial photographing dataset used is specifically a public unmanned aerial vehicle aerial photographing dataset UAVDT, where the UAVDT dataset includes 50 video sequences of different scenes, totaling 4 ten thousand multiframe images, where vehicle categories are classified into three categories of car, truck and bus; in other embodiments of the present invention, other data sets may be used, or the corresponding data sets may be built by themselves.

In the embodiment, the vehicle detection network is a deep learning neural network model, and is input into a three-channel unmanned aerial vehicle aerial image for predicting the position and the type of the vehicle in the input image and outputting prediction confidence; the structure of the vehicle detection network is shown in fig. 1, and the vehicle detection network integrally comprises three parts, namely a feature extraction backbone network, a feature fusion network based on a feature pyramid weighted fusion module, and a classification and regression sub-network. Wherein:

the feature extraction backbone network is used for extracting three features with different scales of an input image, and the features are marked as C3, C4 and C5 in sequence according to the scales from large to small; as an optional implementation manner, in this embodiment, a res net50 is used as a feature extraction backbone network, features are extracted by using five convolution blocks (Conv 1-Conv 5 in turn from bottom to top), a C3 layer feature map based on three downsampling of an original image is obtained by Conv3, a C4 layer feature map based on four downsampling of the original image is obtained by Conv4, and a C5 layer feature map based on five downsampling of the original image is obtained by Conv 5; in other embodiments of the invention, other feature extraction networks may be used as the feature extraction backbone network in the vehicle detection network;

the feature fusion network comprises a semantic feature extraction module and a feature pyramid weighting fusion module; the semantic feature extraction module is used for further feature extraction of the feature C5 to obtain a feature C6, and further feature extraction of the feature C6 to obtain a feature C7, specifically, the embodiment performs one 3*3 convolution on the basis of the C5 layer feature map to obtain a C6 layer feature map, and performs one 3*3 convolution on the basis of the C6 layer feature map to obtain a C7 layer feature map; the feature pyramid weighted fusion module is a 5-layer feature pyramid network, 5 layers of features outputted by the feature pyramid weighted fusion module are sequentially marked as P3-P7 from bottom to top, wherein the feature P7 is a result of convolution operation of the feature C7, and the feature PM is a result of convolution operation of the upper layer feature P (M+1) and the feature CM;

wherein M is a positive integer, and M is more than or equal to 3 and less than or equal to 6; the main function of the vehicle detection network established by the embodiment is to deepen the depth of the network to extract higher-level semantic information while reducing the network parameter number by convolution of 3*3 to obtain the C6 and C7 layers; in the feature pyramid weighted fusion module, a five-layer feature pyramid network is constructed from a P7 layer effective feature layer, deep and shallow convolution features are fused from top to bottom from the top layer of the feature pyramid, an original fusion mode is changed, and multiple layers of features are adaptively fused in a feature weighted fusion mode; the weight coefficients of the features of different layers are adjusted through a back propagation autonomous learning strategy, so that the network can purposefully extract the features according to the requirements of final regression and classification tasks; in the prediction output stage, five effective feature layers are respectively input into a classification sub-network and a regression sub-network, and the result of each effective feature layer is synthesized to obtain a final prediction result; p3, P4, P5, P6, P7 are obtained by downsampling the original figures 3,4, 5, 6, 7, respectively, and the corresponding receptive fields are 8 x 8, 16 x 16, 32 x 32, 64 x 64, 128 x 128, respectively, so as to be capable of detecting vehicles with various scales;

in addition, in the embodiment, the contribution degree of the convolution features of different layers to the final regression and classification tasks is different, so that in the process of fusing the deep and shallow convolution features, a weighting fusion mode is adopted, and a learnable weight coefficient is introduced to enable the network to autonomously learn the importance of each input feature; the feature fusion process in the five-layer feature pyramid is further described below, and as shown in fig. 2, taking the fusion process of the P4 layer and the C3 layer as an example, the weight coefficient of the P4 layer after upsampling is w ₃₁ The weight coefficient of the C3 layer after the number of characteristic channels is compressed to 256 by 3*3 convolution is w ₃₂ Multiplying the P4 layer and the C3 layer with the corresponding weight coefficients respectively, and then passing through a 3*3 convolution block to obtain a final effective characteristic layer P3; in FIG. 2, w _M1 And w _M2 Respectively represent the weight coefficients of the (M+1) -th layer feature P (M+1) and the M-th layer feature CM in the five-layer feature pyramid when the (M+1) -th layer feature P (M+1) -th layer feature CM are fused, for example, w in the fusion process shown in FIG. 2 ₃₁ Weight coefficient representing feature P4, w ₃₂ The weight coefficient representing the feature C3. Based on the above fusion process, in the five-layer feature pyramid, the output of each effective feature layer can be expressed as:

P ₇ ＝Conv(C ₇ )

in the above, w _ij (i=3, 4, …,7;j =1, 2) represents the weight of the j-th input feature of the i-th layer, and mainly measures the importance of each input feature; epsilon is a small value to avoid a denominator of 0, alternatively in this embodiment epsilon is 0.0001;

in this embodiment, in the training process, the weighting coefficient w of the feature weighted fusion _ij The network can learn the importance of each input characteristic independently in training for being a learnable parameter, and dynamically adjust the weight coefficient, so that the detection precision of vehicle detection is further improved.

In order to make the characteristics among different categories extracted by the network more distinguishable, and make the detection network be better applied to the complex scene under unmanned aerial vehicle, the invention increases the identifiable loss among the categories on the basis of regression loss and improved classification loss, and evaluates the difference between the model predicted value and the true value by using three categories of losses of regression loss, classification loss and identifiable loss among the categories; specifically, in this embodiment, in the training process, the loss function of the calculation loss is: l (L) _total ＝L _loc +L _cls +L _disc ；L _loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) _cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) _disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L _disc The smaller; the following describes each loss in detail:

a. the first term is regression loss, representing the predicted value f (x _i ) (including the abscissa and ordinate of the center point of the prediction frame and the width and height of the prediction frame) and true value y _i Differences between; the mathematical expression is:

wherein,i epsilon { x, y, w, h } represents the error between the horizontal and vertical coordinates of the central points of the prediction frame and the real frame and the width and height of the rectangular frame;

b. the second term is classification loss, and the situation of vehicle class prediction errors is evaluated; when a large number of samples are used for training, the attention degree of the original focal local classification loss on the few-class misclassified samples is low, and the few-class misclassified samples are easy to ignore; therefore, the embodiment penalizes the class of the prediction error on the basis of the local loss of classification, and gives different weights according to the different numbers of the class samples, so that the weight of the class-less misclassified samples is increased, and the network training can pay more attention to the class-less misclassified samples; the mathematical expression is as follows:

L _cls ＝-α _t W _t (1-P _t ) ^γ log(P _t )

wherein,a is a super parameter, passing alpha _t The weight of positive and negative samples in the training process is adjusted; />p represents the confidence that the network gives that the prediction is of that class; the weight of different detection difficult samples is adjusted through gamma, so that the aim of reducing the loss of the easy-to-classify samples is fulfilled;w _class the weight of each misclassified sample is represented; />p _class Representing the probability of the occurrence of the target of the corresponding category in the overall training sample; />num _class The number of samples representing the corresponding category, total_num representing the total number of samples, beta being the super-parameter; according to w _class As can be seen from the calculation formula of (2), in the case of a class prediction error, the smaller the number of training samples of the same class is, the weight coefficient W to which the training samples are assigned _t The larger the sample size is, the purpose of increasing the weight of the few-class misclassified samples is achieved;

c. the third item is the inter-class identifiable loss, aiming at the situations of complex background in the unmanned aerial vehicle aerial image, different types, different forms, angles and the like of vehicles in the scene, and more differentiated features can effectively reduce the situations of false detection, missed detection and misclassification in the detection process, so that the inter-class identifiable loss is increased on the basis of the regression loss and the classification loss, the distribution situation of training samples in a feature space is changed by utilizing network training, and the optimization of network parameters is restrained, so that the feature distribution of the similar samples is more concentrated, and the feature distribution of different types of samples is more far away; the mathematical expression for the identifiable loss between classes is:

L _disc ＝L _var +L _dist

the loss consists mainly of two parts: a variance term and a distance term; l (L) _var Is a variance term for representing the feature aggregation degree of the training samples of the same kind, and the higher the feature aggregation degree of the training samples of the same kind is, the variance term L _var The smaller the value of (2); l (L) _dist Is a distance item for representing the characteristic dispersion degree of different training samples, and the higher the characteristic dispersion degree of different training samples is, the distance item L _dist The smaller the value of (2);

variance term L _var The calculation expression is specifically as follows:

wherein C represents the total number of vehicle categories in the aerial data set, N _c Represents the total number of training samples corresponding to the c-th category in the aerial data set, mu _c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x _i A feature vector representing an i Zhang Hang th shot; delta _v Delta is a preset threshold value _v >0；[x] ₊ =max (0, x), i.e. when +|μ _c -x _i II is less than delta _v When the loss is 0; when II mu _c -x _i II is greater than delta _v When the method is used, the loss is calculated, so that the aim of gathering the same class of characteristics is fulfilled; alternatively, in the present embodiment, δ _v The value of (2) is 0.6;

distance item L _dist The calculation expression of (2) is specifically as follows:

c represents the total number of vehicle categories in the aerial data set; c _A And c _B Two different vehicle classes are represented,and->Respectively representing the average value of the feature vectors of the aerial images corresponding to the two categories; delta _d Delta is a preset threshold value _d >0；[x] ₊ =max (0, x), i.e. when the distance between class centers of different classes +.>Greater than delta _d When the loss value is 0; when->Less than delta _d When the method is used, the loss is calculated, so that the purpose that different types of samples are far away from each other is realized; finally, different classes of vehiclesThe characteristic distance between vehicles is limited to delta _d Outside of that; alternatively, in the present embodiment, δ _d The value of (2) is 3.6.

Example 2:

a vehicle detection method based on unmanned aerial vehicle aerial image, as shown in figure 3, comprises the following steps:

inputting an aerial image to be detected into a vehicle detection network established by the vehicle detection network establishment method based on the aerial image of the unmanned aerial vehicle provided by the above embodiment 1, so as to predict the position and the category of the vehicle, namely, the prediction confidence by the vehicle detection network; the input image may be a frame of image in a video captured by an aerial camera; in order to avoid the influence of abnormal values and extreme values on results, the embodiment can also preprocess the unmanned aerial vehicle aerial image before inputting the image into a vehicle detection network; optionally, the preprocessing performed in the embodiment specifically includes normalization and normalization, where the normalization specifically includes dividing the input image data by 255, and the normalization specifically includes dividing the normalized input image minus the mean value by the standard deviation;

drawing a prediction result output by a vehicle detection network in an aerial image to finish vehicle detection; in order to clearly display a vehicle detection result in an input aerial image, in the embodiment, before a prediction result output by a vehicle detection network is drawn in the aerial image, post-processing is performed on the vehicle detection result, and the post-processing operation includes removing a redundant prediction frame in the prediction result; optionally, in this embodiment, the redundant prediction block is removed by a Non-maximum suppression (Non-Maximum Suppression, NMS) method;

the vehicle position information output from the vehicle detection network may be expressed as (x) ₁ ,y ₁ ,x ₂ ,y ₂ )，(x ₁ ,y ₁ ) Representing the upper left corner coordinates of a predicted rectangular box of the vehicle, (x) ₂ ,y ₂ ) Representing the lower right corner coordinates of the vehicle prediction rectangular frame; the vehicle type information is specifically whether the input aerial image contains a vehicle, and in the case of containing the vehicle, the vehicle type information can be a car (car), a bus (bus) or the like; prediction confidence for measuring checkingMeasuring the reliability of the result; the prediction frame is a detection frame determined by the position information.

Example 3:

a computer readable storage medium comprising a stored computer program; when the computer program is executed by the processor, the device where the computer readable storage medium is located is controlled to execute the vehicle detection network establishment method based on the aerial image of the unmanned aerial vehicle provided in the above embodiment 1, and/or the vehicle detection method based on the aerial image of the unmanned aerial vehicle provided in the above embodiment 2.

In order to better illustrate the effect of the unmanned aerial vehicle aerial image-based vehicle detection network, the model is respectively subjected to qualitative and quantitative tests on the UAVDT data set. Since the class distribution in the dataset is severely unbalanced, the number of car class vehicles occupies 92% of the entire dataset, the present invention tests by combining these three classes into a single car class as is customary in the paper "The Unmanned Aerial Vehicle Benchmark: object Detection and Tracking [ J ]" (DDu, qi Y, yu H, et al. Springer, cham, 2018.).

Qualitative analysis: selecting six images from different video sequences in UAVDT data sets, wherein the heights, angles and weather conditions of unmanned aerial vehicle aerial photographs are different, and the scenes corresponding to the six selected images are respectively as follows: (a) low-altitude, side-looking, daytime scenes, (b) low-altitude, overhead, night scenes, (c) hollow, side-looking, daytime scenes, (d) hollow, front-looking, night scenes, (e) high-altitude, front-looking, daytime scenes, and (f) high-altitude, overhead, night scenes. The scene is too complex and is not illustrated here. From the detection result, the vehicle detection network provided by the invention can basically and correctly detect vehicles in different aerial photographing scenes, and meanwhile, higher detection confidence coefficient is maintained, instead of being effective only for unmanned aerial vehicle aerial photographing images of specific photographing heights or specific photographing angles or specific photographing weather, and the generalization capability and robustness of an algorithm are verified.

Quantitative analysis: as shown in table 1, in order to more intuitively illustrate the effect of the vehicle detection network provided by the invention, the test is performed by using 20 test sequences divided in the UAVDT dataset, and the test result is evaluated by using an average precision average (mAP); according to experimental results, compared with the current advanced unmanned aerial vehicle aerial image vehicle detection methods of UAV-Net, LSN, GANet, NDFT, spotNet, D Det, the vehicle detection method provided by the invention has the best effect, and can be used for detecting more vehicle targets to the greatest extent while ensuring the accuracy.

Table 1 comparison analysis of different unmanned aerial vehicle aerial image vehicle detection methods

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The vehicle detection network establishment method based on the unmanned aerial vehicle aerial image is characterized by comprising the following steps of:

establishing a vehicle detection network to be trained, and training the vehicle detection network by utilizing aerial photo data set training; the training samples in the aerial photographing data set are aerial photographing images marked with vehicle positions and categories; the vehicle detection network is a deep learning neural network, takes an image as an input, and is used for predicting the position and the category of the vehicle in the input image and outputting prediction confidence;

in the training process, the loss function of the calculation loss is as follows: l (L) _total ＝L _loc +L _cls +L _disc ；L _loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) _cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) _disc Is a distinguishable loss between classes and is used for representing the distribution condition of training samples in a characteristic space, and the more the characteristic distribution of the training samples of the same class is gatheredThe more the feature distribution of different classes of training samples is dispersed, the more the loss L can be identified among the classes _disc The smaller;

after training is finished, the establishment of the vehicle detection network is completed;

wherein, the classification loss L _cls The method comprises the following steps:

L _cls ＝-α _t W _t (1-P _t ) ^γ log(P _t )

a is a super parameter; />p represents the prediction confidence of the corresponding category; gamma is a preset weight coefficient; />w _class Weights representing misclassified training samples of corresponding class, +.>p _class Representing the probability of occurrence of training samples of the corresponding class in all training samples; beta is a super parameter.

2. The method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle according to claim 1, wherein L _disc ＝L _var +L _dist ；

3. A method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle as claimed in claim 2,

wherein C represents the total number of vehicle categories in the aerial dataset, N _c Representing the total number of training samples corresponding to the c-th category in the aerial data set, mu _c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x _i A feature vector representing an i Zhang Hang th shot; delta _v Delta is a preset threshold value _v >0；[x] ₊ ＝max(0,x)。

4. A method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle as claimed in claim 2,

5. The unmanned aerial vehicle aerial image-based vehicle detection network establishment method according to any one of claims 1 to 4, wherein the vehicle detection network comprises: a feature extraction backbone network, a feature fusion network, a classification sub-network and a regression sub-network;

the feature extraction backbone network is used for extracting three features with different scales of an input image, and is marked as C in sequence from large scale to small scale ₃ 、C ₄ And C ₅ ；

The feature fusion network comprises a semantic feature extraction module and a feature pyramid weighted fusion module, wherein the semantic feature extraction module is used for extracting features C ₅ Further extracting features to obtain feature C ₆ And for feature C ₆ Further extracting features to obtain feature C ₇ The method comprises the steps of carrying out a first treatment on the surface of the The feature pyramid weighted fusion module is a 5-layer feature pyramid network, and 5-layer features output by the feature pyramid weighted fusion module are sequentially marked as P from bottom to top ₃ ～P ₇ Wherein the feature P ₇ Is characterized by C ₇ The result after convolution operation, characteristic P _M For its upper layer feature P _(M+1) And feature C _M Weighting and fusing the results of convolution operation;

the classifying sub-network is used for fusing the characteristics P output by the network according to the characteristics ₃ ～P ₇ Predicting the type of the vehicle in the input image and outputting prediction confidence;

the regression sub-network is used for fusing the characteristics P output by the network according to the characteristics ₃ ～P ₇ Predicting a position of a vehicle in an input image;

6. The method for establishing a vehicle detection network based on aerial images of claim 5, wherein during training, the feature P _(M+1) And feature C _M Weighted fusion is feature P _M The weight coefficient of (2) is dynamically adjusted.

7. The vehicle detection method based on the unmanned aerial vehicle aerial image is characterized by comprising the following steps of:

inputting an aerial image to be detected into a vehicle detection network established by the unmanned aerial vehicle aerial image-based vehicle detection network establishment method according to any one of claims 1 to 6, so as to predict the position and the category of a vehicle, namely, the prediction confidence degree, by the vehicle detection network;

8. The unmanned aerial vehicle-based vehicle detection method of claim 7, further comprising, prior to plotting the prediction output by the vehicle detection network in the aerial image:

removing redundant prediction frames in the prediction result;

wherein the prediction frame is a detection frame determined by the position information.

9. A computer readable storage medium comprising a stored computer program; when the computer program is executed by a processor, the device where the computer readable storage medium is located is controlled to execute the vehicle detection network building method based on the aerial image of the unmanned aerial vehicle according to any one of claims 1 to 6 and/or the vehicle detection method based on the aerial image of the unmanned aerial vehicle according to any one of claims 7 to 8.