CN113780462B - Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof - Google Patents
Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof Download PDFInfo
- Publication number
- CN113780462B CN113780462B CN202111119764.9A CN202111119764A CN113780462B CN 113780462 B CN113780462 B CN 113780462B CN 202111119764 A CN202111119764 A CN 202111119764A CN 113780462 B CN113780462 B CN 113780462B
- Authority
- CN
- China
- Prior art keywords
- vehicle
- feature
- vehicle detection
- network
- aerial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 130
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 76
- 238000009826 distribution Methods 0.000 claims abstract description 23
- 230000006870 function Effects 0.000 claims abstract description 9
- 238000013135 deep learning Methods 0.000 claims abstract description 5
- 238000013528 artificial neural network Methods 0.000 claims abstract 2
- 230000004927 fusion Effects 0.000 claims description 35
- 238000000605 extraction Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 9
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 239000006185 dispersion Substances 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 6
- 238000007499 fusion processing Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000004451 qualitative analysis Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000012113 quantitative test Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, belonging to the field of vehicle detection, comprising the following steps: establishing a vehicle detection network, and training the vehicle detection network by utilizing aerial photo data set training; the training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; the vehicle detection network is a deep learning neural network, takes an image as an input, and is used for predicting the position and the category of the vehicle in the input image and outputting prediction confidence; the training loss function is: l (L) total =L loc +L cls +L disc ;L loc For regression loss, L cls Is a classification loss; l (L) disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L disc The smaller; after training is finished, the establishment of the vehicle detection network is completed. The invention can establish a more accurate vehicle detection network and improve the accuracy of vehicle detection.
Description
Technical Field
The invention belongs to the field of vehicle detection, and particularly relates to a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof.
Background
The extraction and detection of vehicle features of aerial images of unmanned aerial vehicles are an important branch and a difficulty. The unmanned aerial vehicle aerial image vehicle detection refers to the steps of acquiring RGB images through unmanned aerial vehicle aerial image, and predicting the positions and the categories of the vehicles in the images. Compared with traffic images shot by a traditional fixed camera, the unmanned aerial vehicle aerial image has wider monitoring visual angles and different shooting heights, and accordingly the problems of complex and various backgrounds, large vehicle scale change, uneven vehicle category distribution and the like of the images can be brought, so that how to quickly and correctly detect the vehicles in the images becomes a challenging task.
The essence of vehicle detection is to extract the characteristic with discrimination and complete the task of vehicle classification and regression. Compared with the traditional target detection method, the deep learning-based method has obvious advantages in feature extraction and classification regression, so that the existing unmanned aerial vehicle aerial image vehicle detection method is mostly improved on the basis of a general detection algorithm.
The classification task of target detection requires that the extracted features contain more advanced information, and the regression task requires that the features contain more position and detail information, but the two requirements are difficult to be simultaneously combined on the same feature diagram. In the feature extraction network, the feature resolution of the shallow layer is higher, and the shallow layer contains rich position and detail information, so that the shallow layer is more suitable for detecting small targets; however, the semantic information is low, the noise is high, and the semantic information is not suitable for target classification tasks, so that a large number of false checks can be caused. The deep features are more abstract, have stronger semantic information, and are more suitable for target classification tasks; but because of its larger receptive field, the resolution of the features is lower, resulting in poor perceptibility of the details and not suitable for the localization of small targets. Therefore, the deep and shallow features are fused, a feature pyramid is constructed, and the obtained features are taken, so that the advanced feature information of the shallow features can be effectively enhanced, and the target detection precision, especially the small target detection precision, is improved. In the aspect of feature fusion based on pyramid, the existing fusion mode mainly adopts simple equal weight addition processing or channel merging processing, and the contribution degree of features of different layers is not considered in the process of feature fusion, so that the utilization of the features and the effective expression of the feature information are insufficient by a network, and the detection precision is affected. In the aspect of loss function design, the current method only uses the classification loss representing the target class and the regression loss for positioning the target to restrict the optimization of network parameters, and finally the class discrimination capability of the extracted characteristics of the network is limited, which affects the accuracy of vehicle detection, so the detection accuracy of vehicle detection from unmanned aerial vehicle aerial images needs to be further improved.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, and aims to establish a more accurate vehicle detection network and improve the accuracy of vehicle detection.
In order to achieve the above object, according to one aspect of the present invention, there is provided a vehicle detection network establishment method based on an aerial image of an unmanned aerial vehicle, including:
establishing a vehicle detection network to be trained, and training the vehicle detection network by utilizing aerial photo data set training; the training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; the vehicle detection network is a deep learning neural network model, takes an image as an input, and is used for predicting the position and the category of a vehicle in the input image and outputting prediction confidence;
in the training process, the loss function of the calculation loss is as follows: l (L) total =L loc +L cls +L disc ;L loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L disc The smaller;
after training is finished, the establishment of the vehicle detection network is completed.
Further, L disc =L var +L dist ;
Wherein L is var Is a variance term for representing the feature aggregation degree of the training samples of the same kind, and the higher the feature aggregation degree of the training samples of the same kind is, the variance term L var The smaller the value of (2); l (L) dist Is a distance item for representing the characteristic dispersion degree of different training samples, and the higher the characteristic dispersion degree of different training samples is, the distance item L dist The smaller the value of (c).
Further, the method comprises the steps of,
wherein C represents the total number of vehicle categories in the aerial data set, N c Represents the total number of training samples corresponding to the c-th category in the aerial data set, mu c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x i A feature vector representing an i Zhang Hang th shot; delta v Delta is a preset threshold value v >0;[x] + =max(0,x)。
Further, the method comprises the steps of,
wherein C represents the total number of vehicle categories in the aerial dataset; c A And c B Two different vehicle classes are represented,and->Respectively representing the average value of the feature vectors of the aerial images corresponding to the two categories; delta d Delta is a preset threshold value d >0;[x] + =max(0,x)。
Further, the method comprises the steps of,
L cls =-α t W t (1-P t ) γ log(P t )
wherein,a is a super parameter; />p represents the prediction confidence of the corresponding category; gamma is a preset weight coefficient; />w class Weights representing misclassified training samples of corresponding class, +.>p class Representing the probability of occurrence of training samples of the corresponding class in all training samples; beta is a super parameter.
Further, the vehicle detection network includes: a feature extraction backbone network, a feature fusion network, a classification sub-network and a regression sub-network;
the feature extraction backbone network is used for extracting three features with different scales of an input image, and the features are marked as C3, C4 and C5 in sequence according to the scales from large to small;
the feature fusion network comprises a semantic feature extraction module and a feature pyramid weighting fusion module, wherein the semantic feature extraction module is used for carrying out further feature extraction on the feature C5 to obtain a feature C6, and carrying out further feature extraction on the feature C6 to obtain a feature C7; the feature pyramid weighted fusion module is a 5-layer feature pyramid network, 5 layers of features outputted by the feature pyramid weighted fusion module are sequentially marked as P3-P7 from bottom to top, wherein the feature P7 is a result of convolution operation of the feature C7, and the feature PM is a result of convolution operation of the upper layer feature P (M+1) and the feature CM;
the classification sub-network is used for predicting the type of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network and outputting prediction confidence;
the regression sub-network is used for predicting the position of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network;
wherein M is a positive integer, and M is more than or equal to 3 and less than or equal to 6.
Further, in the training process, the weighted fusion of the feature P (M+1) and the feature CM is the dynamic adjustment of the weight coefficient of the feature PM.
According to another aspect of the present invention, there is provided a vehicle detection method based on an aerial image of an unmanned aerial vehicle, including:
inputting an aerial image to be detected into a vehicle detection network established by the vehicle detection network establishment method based on the unmanned aerial vehicle aerial image, so that the vehicle detection network predicts the position and the category of the vehicle, namely the prediction confidence;
and drawing a prediction result output by the vehicle detection network in the aerial image to finish vehicle detection.
Further, the vehicle detection method based on the unmanned aerial vehicle aerial image provided by the invention further comprises the following steps before the prediction result output by the vehicle detection network is drawn in the aerial image:
removing redundant prediction frames in the prediction result;
the prediction frame is a detection frame determined by the position information.
According to yet another aspect of the present invention, there is provided a computer readable storage medium comprising a stored computer program; when the computer program is executed by the processor, the equipment where the computer readable storage medium is located is controlled to execute the vehicle detection network establishment method based on the unmanned aerial vehicle aerial image and/or the vehicle detection method based on the unmanned aerial vehicle aerial image.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
(1) According to the invention, when the vehicle detection network based on the unmanned aerial vehicle aerial image is trained, the used loss function comprises a regression loss item and a category loss item, an inter-category identifiable loss item is introduced, the more the feature distribution of the similar training sample is concentrated, the more the feature distribution of the different similar training samples is dispersed, the smaller the inter-category identifiable loss is, therefore, the distribution condition of the training sample in the feature space can be changed through network training by introducing the inter-category identifiable loss item, more constraints are provided for optimizing network parameters, the feature distribution of the same category training sample is concentrated, the feature distribution of the different category samples is far away, finally, the features among the different categories extracted by the network are more differentiated, and the conditions of complex background in the unmanned aerial vehicle aerial image, different vehicle categories in the scene, different forms, different angles and the like can be effectively reduced, the conditions of false detection, false detection and wrong classification in the detection process can be effectively improved.
(2) The invention discloses a method for detecting a vehicle based on unmanned aerial vehicle aerial images, which comprises the steps of training a vehicle detection network based on unmanned aerial vehicle aerial images, wherein a used loss function comprises a regression loss term, a category loss term and an inter-category distinguishable loss term, wherein category loss penalizes a category of a prediction error on the basis of a focal loss classification loss, and simultaneously, a parameter W is introduced t Different weights can be given according to different numbers of samples of different categories, the weight of the samples with few categories and misclassification can be increased, network training can pay more attention to misclassification of few categories, and the accuracy of vehicle detection is further improved.
(3) The vehicle detection network established by the invention further carries out feature extraction on the basis of the features extracted by the feature extraction backbone network, so that deeper features are obtained, network parameters are reduced, the depth of the network is deepened, higher semantic features are extracted, and deep and shallow convolution features are fused from top to bottom from the top layer of a feature pyramid by constructing a five-layer feature pyramid network, so that the advanced feature information of the shallow features can be effectively enhanced, the accuracy of target detection, especially small target detection, and vehicles with various scales can be detected; meanwhile, the original fusion mode is changed, and the multi-layer features are adaptively fused by adopting a feature weighted fusion mode, so that the network can purposefully extract the features according to the requirements of final regression and classification tasks. Based on the vehicle detection network with the special structure, the invention can further improve the detection precision of vehicle detection on the aerial image of the unmanned aerial vehicle.
(4) The vehicle detection network established by the invention, wherein the weighting coefficient for carrying out feature weighted fusion is a parameter which can be learned, so that the network can autonomously learn the importance of each input feature in training, dynamically adjust the weighting coefficient and further improve the detection precision of vehicle detection.
Drawings
FIG. 1 is a schematic diagram of a vehicle detection network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a feature pyramid weighted fusion module according to an embodiment of the present invention;
fig. 3 is a flowchart of a vehicle detection method based on an aerial image of an unmanned aerial vehicle according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In order to improve the detection precision of vehicle detection based on unmanned aerial vehicle aerial images, the invention provides a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, and the whole thought is as follows: by introducing the identifiable loss among the classes into the loss function, the distribution condition of the training sample in the feature space is changed by utilizing the network training, more constraints are provided for network parameter optimization, so that the features among different classes extracted by the network are more differentiated, the prediction result precision of the network is higher, and the detection precision of vehicle detection is further improved; on the basis, the network structure is further improved, the feature weighting fusion of higher-level semantic information is realized, the weight coefficient can be dynamically adjusted in the training process, and the detection precision of vehicle detection is further improved.
The following are examples.
Example 1:
a vehicle detection network establishment method based on unmanned aerial vehicle aerial images comprises the following steps:
and establishing a vehicle detection network to be trained, training the vehicle detection network by using the aerial photo data set training, and completing the establishment of the vehicle detection network after the training is finished.
The training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; optionally, in this embodiment, the aerial photographing dataset used is specifically a public unmanned aerial vehicle aerial photographing dataset UAVDT, where the UAVDT dataset includes 50 video sequences of different scenes, totaling 4 ten thousand multiframe images, where vehicle categories are classified into three categories of car, truck and bus; in other embodiments of the present invention, other data sets may be used, or the corresponding data sets may be built by themselves.
In the embodiment, the vehicle detection network is a deep learning neural network model, and is input into a three-channel unmanned aerial vehicle aerial image for predicting the position and the type of the vehicle in the input image and outputting prediction confidence; the structure of the vehicle detection network is shown in fig. 1, and the vehicle detection network integrally comprises three parts, namely a feature extraction backbone network, a feature fusion network based on a feature pyramid weighted fusion module, and a classification and regression sub-network. Wherein:
the feature extraction backbone network is used for extracting three features with different scales of an input image, and the features are marked as C3, C4 and C5 in sequence according to the scales from large to small; as an optional implementation manner, in this embodiment, a res net50 is used as a feature extraction backbone network, features are extracted by using five convolution blocks (Conv 1-Conv 5 in turn from bottom to top), a C3 layer feature map based on three downsampling of an original image is obtained by Conv3, a C4 layer feature map based on four downsampling of the original image is obtained by Conv4, and a C5 layer feature map based on five downsampling of the original image is obtained by Conv 5; in other embodiments of the invention, other feature extraction networks may be used as the feature extraction backbone network in the vehicle detection network;
the feature fusion network comprises a semantic feature extraction module and a feature pyramid weighting fusion module; the semantic feature extraction module is used for further feature extraction of the feature C5 to obtain a feature C6, and further feature extraction of the feature C6 to obtain a feature C7, specifically, the embodiment performs one 3*3 convolution on the basis of the C5 layer feature map to obtain a C6 layer feature map, and performs one 3*3 convolution on the basis of the C6 layer feature map to obtain a C7 layer feature map; the feature pyramid weighted fusion module is a 5-layer feature pyramid network, 5 layers of features outputted by the feature pyramid weighted fusion module are sequentially marked as P3-P7 from bottom to top, wherein the feature P7 is a result of convolution operation of the feature C7, and the feature PM is a result of convolution operation of the upper layer feature P (M+1) and the feature CM;
the classification sub-network is used for predicting the type of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network and outputting prediction confidence;
the regression sub-network is used for predicting the position of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network;
wherein M is a positive integer, and M is more than or equal to 3 and less than or equal to 6; the main function of the vehicle detection network established by the embodiment is to deepen the depth of the network to extract higher-level semantic information while reducing the network parameter number by convolution of 3*3 to obtain the C6 and C7 layers; in the feature pyramid weighted fusion module, a five-layer feature pyramid network is constructed from a P7 layer effective feature layer, deep and shallow convolution features are fused from top to bottom from the top layer of the feature pyramid, an original fusion mode is changed, and multiple layers of features are adaptively fused in a feature weighted fusion mode; the weight coefficients of the features of different layers are adjusted through a back propagation autonomous learning strategy, so that the network can purposefully extract the features according to the requirements of final regression and classification tasks; in the prediction output stage, five effective feature layers are respectively input into a classification sub-network and a regression sub-network, and the result of each effective feature layer is synthesized to obtain a final prediction result; p3, P4, P5, P6, P7 are obtained by downsampling the original figures 3,4, 5, 6, 7, respectively, and the corresponding receptive fields are 8 x 8, 16 x 16, 32 x 32, 64 x 64, 128 x 128, respectively, so as to be capable of detecting vehicles with various scales;
in addition, in the embodiment, the contribution degree of the convolution features of different layers to the final regression and classification tasks is different, so that in the process of fusing the deep and shallow convolution features, a weighting fusion mode is adopted, and a learnable weight coefficient is introduced to enable the network to autonomously learn the importance of each input feature; the feature fusion process in the five-layer feature pyramid is further described below, and as shown in fig. 2, taking the fusion process of the P4 layer and the C3 layer as an example, the weight coefficient of the P4 layer after upsampling is w 31 The weight coefficient of the C3 layer after the number of characteristic channels is compressed to 256 by 3*3 convolution is w 32 Multiplying the P4 layer and the C3 layer with the corresponding weight coefficients respectively, and then passing through a 3*3 convolution block to obtain a final effective characteristic layer P3; in FIG. 2, w M1 And w M2 Respectively represent the weight coefficients of the (M+1) -th layer feature P (M+1) and the M-th layer feature CM in the five-layer feature pyramid when the (M+1) -th layer feature P (M+1) -th layer feature CM are fused, for example, w in the fusion process shown in FIG. 2 31 Weight coefficient representing feature P4, w 32 The weight coefficient representing the feature C3. Based on the above fusion process, in the five-layer feature pyramid, the output of each effective feature layer can be expressed as:
P 7 =Conv(C 7 )
in the above, w ij (i=3, 4, …,7;j =1, 2) represents the weight of the j-th input feature of the i-th layer, and mainly measures the importance of each input feature; epsilon is a small value to avoid a denominator of 0, alternatively in this embodiment epsilon is 0.0001;
in this embodiment, in the training process, the weighting coefficient w of the feature weighted fusion ij The network can learn the importance of each input characteristic independently in training for being a learnable parameter, and dynamically adjust the weight coefficient, so that the detection precision of vehicle detection is further improved.
In order to make the characteristics among different categories extracted by the network more distinguishable, and make the detection network be better applied to the complex scene under unmanned aerial vehicle, the invention increases the identifiable loss among the categories on the basis of regression loss and improved classification loss, and evaluates the difference between the model predicted value and the true value by using three categories of losses of regression loss, classification loss and identifiable loss among the categories; specifically, in this embodiment, in the training process, the loss function of the calculation loss is: l (L) total =L loc +L cls +L disc ;L loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L disc The smaller; the following describes each loss in detail:
a. the first term is regression loss, representing the predicted value f (x i ) (including the abscissa and ordinate of the center point of the prediction frame and the width and height of the prediction frame) and true value y i Differences between; the mathematical expression is:
wherein,i epsilon { x, y, w, h } represents the error between the horizontal and vertical coordinates of the central points of the prediction frame and the real frame and the width and height of the rectangular frame;
b. the second term is classification loss, and the situation of vehicle class prediction errors is evaluated; when a large number of samples are used for training, the attention degree of the original focal local classification loss on the few-class misclassified samples is low, and the few-class misclassified samples are easy to ignore; therefore, the embodiment penalizes the class of the prediction error on the basis of the local loss of classification, and gives different weights according to the different numbers of the class samples, so that the weight of the class-less misclassified samples is increased, and the network training can pay more attention to the class-less misclassified samples; the mathematical expression is as follows:
L cls =-α t W t (1-P t ) γ log(P t )
wherein,a is a super parameter, passing alpha t The weight of positive and negative samples in the training process is adjusted; />p represents the confidence that the network gives that the prediction is of that class; the weight of different detection difficult samples is adjusted through gamma, so that the aim of reducing the loss of the easy-to-classify samples is fulfilled;w class the weight of each misclassified sample is represented; />p class Representing the probability of the occurrence of the target of the corresponding category in the overall training sample; />num class The number of samples representing the corresponding category, total_num representing the total number of samples, beta being the super-parameter; according to w class As can be seen from the calculation formula of (2), in the case of a class prediction error, the smaller the number of training samples of the same class is, the weight coefficient W to which the training samples are assigned t The larger the sample size is, the purpose of increasing the weight of the few-class misclassified samples is achieved;
c. the third item is the inter-class identifiable loss, aiming at the situations of complex background in the unmanned aerial vehicle aerial image, different types, different forms, angles and the like of vehicles in the scene, and more differentiated features can effectively reduce the situations of false detection, missed detection and misclassification in the detection process, so that the inter-class identifiable loss is increased on the basis of the regression loss and the classification loss, the distribution situation of training samples in a feature space is changed by utilizing network training, and the optimization of network parameters is restrained, so that the feature distribution of the similar samples is more concentrated, and the feature distribution of different types of samples is more far away; the mathematical expression for the identifiable loss between classes is:
L disc =L var +L dist
the loss consists mainly of two parts: a variance term and a distance term; l (L) var Is a variance term for representing the feature aggregation degree of the training samples of the same kind, and the higher the feature aggregation degree of the training samples of the same kind is, the variance term L var The smaller the value of (2); l (L) dist Is a distance item for representing the characteristic dispersion degree of different training samples, and the higher the characteristic dispersion degree of different training samples is, the distance item L dist The smaller the value of (2);
variance term L var The calculation expression is specifically as follows:
wherein C represents the total number of vehicle categories in the aerial data set, N c Represents the total number of training samples corresponding to the c-th category in the aerial data set, mu c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x i A feature vector representing an i Zhang Hang th shot; delta v Delta is a preset threshold value v >0;[x] + =max (0, x), i.e. when +|μ c -x i II is less than delta v When the loss is 0; when II mu c -x i II is greater than delta v When the method is used, the loss is calculated, so that the aim of gathering the same class of characteristics is fulfilled; alternatively, in the present embodiment, δ v The value of (2) is 0.6;
distance item L dist The calculation expression of (2) is specifically as follows:
c represents the total number of vehicle categories in the aerial data set; c A And c B Two different vehicle classes are represented,and->Respectively representing the average value of the feature vectors of the aerial images corresponding to the two categories; delta d Delta is a preset threshold value d >0;[x] + =max (0, x), i.e. when the distance between class centers of different classes +.>Greater than delta d When the loss value is 0; when->Less than delta d When the method is used, the loss is calculated, so that the purpose that different types of samples are far away from each other is realized; finally, different classes of vehiclesThe characteristic distance between vehicles is limited to delta d Outside of that; alternatively, in the present embodiment, δ d The value of (2) is 3.6.
Example 2:
a vehicle detection method based on unmanned aerial vehicle aerial image, as shown in figure 3, comprises the following steps:
inputting an aerial image to be detected into a vehicle detection network established by the vehicle detection network establishment method based on the aerial image of the unmanned aerial vehicle provided by the above embodiment 1, so as to predict the position and the category of the vehicle, namely, the prediction confidence by the vehicle detection network; the input image may be a frame of image in a video captured by an aerial camera; in order to avoid the influence of abnormal values and extreme values on results, the embodiment can also preprocess the unmanned aerial vehicle aerial image before inputting the image into a vehicle detection network; optionally, the preprocessing performed in the embodiment specifically includes normalization and normalization, where the normalization specifically includes dividing the input image data by 255, and the normalization specifically includes dividing the normalized input image minus the mean value by the standard deviation;
drawing a prediction result output by a vehicle detection network in an aerial image to finish vehicle detection; in order to clearly display a vehicle detection result in an input aerial image, in the embodiment, before a prediction result output by a vehicle detection network is drawn in the aerial image, post-processing is performed on the vehicle detection result, and the post-processing operation includes removing a redundant prediction frame in the prediction result; optionally, in this embodiment, the redundant prediction block is removed by a Non-maximum suppression (Non-Maximum Suppression, NMS) method;
the vehicle position information output from the vehicle detection network may be expressed as (x) 1 ,y 1 ,x 2 ,y 2 ),(x 1 ,y 1 ) Representing the upper left corner coordinates of a predicted rectangular box of the vehicle, (x) 2 ,y 2 ) Representing the lower right corner coordinates of the vehicle prediction rectangular frame; the vehicle type information is specifically whether the input aerial image contains a vehicle, and in the case of containing the vehicle, the vehicle type information can be a car (car), a bus (bus) or the like; prediction confidence for measuring checkingMeasuring the reliability of the result; the prediction frame is a detection frame determined by the position information.
Example 3:
a computer readable storage medium comprising a stored computer program; when the computer program is executed by the processor, the device where the computer readable storage medium is located is controlled to execute the vehicle detection network establishment method based on the aerial image of the unmanned aerial vehicle provided in the above embodiment 1, and/or the vehicle detection method based on the aerial image of the unmanned aerial vehicle provided in the above embodiment 2.
In order to better illustrate the effect of the unmanned aerial vehicle aerial image-based vehicle detection network, the model is respectively subjected to qualitative and quantitative tests on the UAVDT data set. Since the class distribution in the dataset is severely unbalanced, the number of car class vehicles occupies 92% of the entire dataset, the present invention tests by combining these three classes into a single car class as is customary in the paper "The Unmanned Aerial Vehicle Benchmark: object Detection and Tracking [ J ]" (DDu, qi Y, yu H, et al. Springer, cham, 2018.).
Qualitative analysis: selecting six images from different video sequences in UAVDT data sets, wherein the heights, angles and weather conditions of unmanned aerial vehicle aerial photographs are different, and the scenes corresponding to the six selected images are respectively as follows: (a) low-altitude, side-looking, daytime scenes, (b) low-altitude, overhead, night scenes, (c) hollow, side-looking, daytime scenes, (d) hollow, front-looking, night scenes, (e) high-altitude, front-looking, daytime scenes, and (f) high-altitude, overhead, night scenes. The scene is too complex and is not illustrated here. From the detection result, the vehicle detection network provided by the invention can basically and correctly detect vehicles in different aerial photographing scenes, and meanwhile, higher detection confidence coefficient is maintained, instead of being effective only for unmanned aerial vehicle aerial photographing images of specific photographing heights or specific photographing angles or specific photographing weather, and the generalization capability and robustness of an algorithm are verified.
Quantitative analysis: as shown in table 1, in order to more intuitively illustrate the effect of the vehicle detection network provided by the invention, the test is performed by using 20 test sequences divided in the UAVDT dataset, and the test result is evaluated by using an average precision average (mAP); according to experimental results, compared with the current advanced unmanned aerial vehicle aerial image vehicle detection methods of UAV-Net, LSN, GANet, NDFT, spotNet, D Det, the vehicle detection method provided by the invention has the best effect, and can be used for detecting more vehicle targets to the greatest extent while ensuring the accuracy.
Table 1 comparison analysis of different unmanned aerial vehicle aerial image vehicle detection methods
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (9)
1. The vehicle detection network establishment method based on the unmanned aerial vehicle aerial image is characterized by comprising the following steps of:
establishing a vehicle detection network to be trained, and training the vehicle detection network by utilizing aerial photo data set training; the training samples in the aerial photographing data set are aerial photographing images marked with vehicle positions and categories; the vehicle detection network is a deep learning neural network, takes an image as an input, and is used for predicting the position and the category of the vehicle in the input image and outputting prediction confidence;
in the training process, the loss function of the calculation loss is as follows: l (L) total =L loc +L cls +L disc ;L loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) disc Is a distinguishable loss between classes and is used for representing the distribution condition of training samples in a characteristic space, and the more the characteristic distribution of the training samples of the same class is gatheredThe more the feature distribution of different classes of training samples is dispersed, the more the loss L can be identified among the classes disc The smaller;
after training is finished, the establishment of the vehicle detection network is completed;
wherein, the classification loss L cls The method comprises the following steps:
L cls =-α t W t (1-P t ) γ log(P t )
a is a super parameter; />p represents the prediction confidence of the corresponding category; gamma is a preset weight coefficient; />w class Weights representing misclassified training samples of corresponding class, +.>p class Representing the probability of occurrence of training samples of the corresponding class in all training samples; beta is a super parameter.
2. The method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle according to claim 1, wherein L disc =L var +L dist ;
Wherein L is var Is a variance term for representing the feature aggregation degree of the training samples of the same kind, and the higher the feature aggregation degree of the training samples of the same kind is, the variance term L var The smaller the value of (2); l (L) dist Is a distance item for representing the characteristic dispersion degree of different training samples, and the higher the characteristic dispersion degree of different training samples is, the distance item L dist The smaller the value of (c).
3. A method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle as claimed in claim 2,
wherein C represents the total number of vehicle categories in the aerial dataset, N c Representing the total number of training samples corresponding to the c-th category in the aerial data set, mu c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x i A feature vector representing an i Zhang Hang th shot; delta v Delta is a preset threshold value v >0;[x] + =max(0,x)。
4. A method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle as claimed in claim 2,
wherein C represents the total number of vehicle categories in the aerial dataset; c A And c B Two different vehicle classes are represented,and->Respectively representing the average value of the feature vectors of the aerial images corresponding to the two categories; delta d Delta is a preset threshold value d >0;[x] + =max(0,x)。
5. The unmanned aerial vehicle aerial image-based vehicle detection network establishment method according to any one of claims 1 to 4, wherein the vehicle detection network comprises: a feature extraction backbone network, a feature fusion network, a classification sub-network and a regression sub-network;
the feature extraction backbone network is used for extracting three features with different scales of an input image, and is marked as C in sequence from large scale to small scale 3 、C 4 And C 5 ;
The feature fusion network comprises a semantic feature extraction module and a feature pyramid weighted fusion module, wherein the semantic feature extraction module is used for extracting features C 5 Further extracting features to obtain feature C 6 And for feature C 6 Further extracting features to obtain feature C 7 The method comprises the steps of carrying out a first treatment on the surface of the The feature pyramid weighted fusion module is a 5-layer feature pyramid network, and 5-layer features output by the feature pyramid weighted fusion module are sequentially marked as P from bottom to top 3 ~P 7 Wherein the feature P 7 Is characterized by C 7 The result after convolution operation, characteristic P M For its upper layer feature P (M+1) And feature C M Weighting and fusing the results of convolution operation;
the classifying sub-network is used for fusing the characteristics P output by the network according to the characteristics 3 ~P 7 Predicting the type of the vehicle in the input image and outputting prediction confidence;
the regression sub-network is used for fusing the characteristics P output by the network according to the characteristics 3 ~P 7 Predicting a position of a vehicle in an input image;
wherein M is a positive integer, and M is more than or equal to 3 and less than or equal to 6.
6. The method for establishing a vehicle detection network based on aerial images of claim 5, wherein during training, the feature P (M+1) And feature C M Weighted fusion is feature P M The weight coefficient of (2) is dynamically adjusted.
7. The vehicle detection method based on the unmanned aerial vehicle aerial image is characterized by comprising the following steps of:
inputting an aerial image to be detected into a vehicle detection network established by the unmanned aerial vehicle aerial image-based vehicle detection network establishment method according to any one of claims 1 to 6, so as to predict the position and the category of a vehicle, namely, the prediction confidence degree, by the vehicle detection network;
and drawing a prediction result output by the vehicle detection network in the aerial image to finish vehicle detection.
8. The unmanned aerial vehicle-based vehicle detection method of claim 7, further comprising, prior to plotting the prediction output by the vehicle detection network in the aerial image:
removing redundant prediction frames in the prediction result;
wherein the prediction frame is a detection frame determined by the position information.
9. A computer readable storage medium comprising a stored computer program; when the computer program is executed by a processor, the device where the computer readable storage medium is located is controlled to execute the vehicle detection network building method based on the aerial image of the unmanned aerial vehicle according to any one of claims 1 to 6 and/or the vehicle detection method based on the aerial image of the unmanned aerial vehicle according to any one of claims 7 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111119764.9A CN113780462B (en) | 2021-09-24 | 2021-09-24 | Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111119764.9A CN113780462B (en) | 2021-09-24 | 2021-09-24 | Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113780462A CN113780462A (en) | 2021-12-10 |
CN113780462B true CN113780462B (en) | 2024-03-19 |
Family
ID=78853022
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111119764.9A Active CN113780462B (en) | 2021-09-24 | 2021-09-24 | Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113780462B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114374951B (en) * | 2022-01-12 | 2024-04-30 | 重庆邮电大学 | Dynamic pre-deployment method for multiple unmanned aerial vehicles |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717387A (en) * | 2019-09-02 | 2020-01-21 | 东南大学 | Real-time vehicle detection method based on unmanned aerial vehicle platform |
WO2020181685A1 (en) * | 2019-03-12 | 2020-09-17 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
CN111814584A (en) * | 2020-06-18 | 2020-10-23 | 北京交通大学 | Vehicle weight identification method under multi-view-angle environment based on multi-center measurement loss |
CN112613552A (en) * | 2020-12-18 | 2021-04-06 | 北京工业大学 | Convolutional neural network emotion image classification method combining emotion category attention loss |
-
2021
- 2021-09-24 CN CN202111119764.9A patent/CN113780462B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181685A1 (en) * | 2019-03-12 | 2020-09-17 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
CN110717387A (en) * | 2019-09-02 | 2020-01-21 | 东南大学 | Real-time vehicle detection method based on unmanned aerial vehicle platform |
CN111814584A (en) * | 2020-06-18 | 2020-10-23 | 北京交通大学 | Vehicle weight identification method under multi-view-angle environment based on multi-center measurement loss |
CN112613552A (en) * | 2020-12-18 | 2021-04-06 | 北京工业大学 | Convolutional neural network emotion image classification method combining emotion category attention loss |
Also Published As
Publication number | Publication date |
---|---|
CN113780462A (en) | 2021-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111723748B (en) | Infrared remote sensing image ship detection method | |
CN111460968B (en) | Unmanned aerial vehicle identification and tracking method and device based on video | |
CN112200045B (en) | Remote sensing image target detection model establishment method based on context enhancement and application | |
CN107273832B (en) | License plate recognition method and system based on integral channel characteristics and convolutional neural network | |
CN111079640A (en) | Vehicle type identification method and system based on automatic amplification sample | |
CN111126278A (en) | Target detection model optimization and acceleration method for few-category scene | |
CN115661044A (en) | Multi-source fusion-based substation power equipment fault detection method | |
CN112287896A (en) | Unmanned aerial vehicle aerial image target detection method and system based on deep learning | |
CN106778540A (en) | Parking detection is accurately based on the parking event detecting method of background double layer | |
CN114781514A (en) | Floater target detection method and system integrating attention mechanism | |
CN115661720A (en) | Target tracking and identifying method and system for shielded vehicle | |
CN111274964B (en) | Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle | |
CN113989487A (en) | Fault defect detection method and system for live-action scheduling | |
CN113780462B (en) | Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof | |
CN111160100A (en) | Lightweight depth model aerial photography vehicle detection method based on sample generation | |
CN116363532A (en) | Unmanned aerial vehicle image traffic target detection method based on attention mechanism and re-parameterization | |
CN117152601A (en) | Underwater target detection method and system based on dynamic perception area routing | |
CN111274986A (en) | Dish identification and classification method based on image analysis | |
CN115700808A (en) | Dual-mode unmanned aerial vehicle identification method for adaptively fusing visible light and infrared images | |
CN114821356B (en) | Optical remote sensing target detection method for accurate positioning | |
CN115861595A (en) | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning | |
CN116309270A (en) | Binocular image-based transmission line typical defect identification method | |
CN115984723A (en) | Road damage detection method, system, device, storage medium and computer equipment | |
CN114067186B (en) | Pedestrian detection method and device, electronic equipment and storage medium | |
CN112069997B (en) | Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |