CN113780462B - Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof - Google Patents

Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof Download PDF

Info

Publication number
CN113780462B
CN113780462B CN202111119764.9A CN202111119764A CN113780462B CN 113780462 B CN113780462 B CN 113780462B CN 202111119764 A CN202111119764 A CN 202111119764A CN 113780462 B CN113780462 B CN 113780462B
Authority
CN
China
Prior art keywords
vehicle
feature
vehicle detection
network
aerial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111119764.9A
Other languages
Chinese (zh)
Other versions
CN113780462A (en
Inventor
许毅平
田岩
李若男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202111119764.9A priority Critical patent/CN113780462B/en
Publication of CN113780462A publication Critical patent/CN113780462A/en
Application granted granted Critical
Publication of CN113780462B publication Critical patent/CN113780462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, belonging to the field of vehicle detection, comprising the following steps: establishing a vehicle detection network, and training the vehicle detection network by utilizing aerial photo data set training; the training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; the vehicle detection network is a deep learning neural network, takes an image as an input, and is used for predicting the position and the category of the vehicle in the input image and outputting prediction confidence; the training loss function is: l (L) total =L loc +L cls +L disc ;L loc For regression loss, L cls Is a classification loss; l (L) disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L disc The smaller; after training is finished, the establishment of the vehicle detection network is completed. The invention can establish a more accurate vehicle detection network and improve the accuracy of vehicle detection.

Description

Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof
Technical Field
The invention belongs to the field of vehicle detection, and particularly relates to a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof.
Background
The extraction and detection of vehicle features of aerial images of unmanned aerial vehicles are an important branch and a difficulty. The unmanned aerial vehicle aerial image vehicle detection refers to the steps of acquiring RGB images through unmanned aerial vehicle aerial image, and predicting the positions and the categories of the vehicles in the images. Compared with traffic images shot by a traditional fixed camera, the unmanned aerial vehicle aerial image has wider monitoring visual angles and different shooting heights, and accordingly the problems of complex and various backgrounds, large vehicle scale change, uneven vehicle category distribution and the like of the images can be brought, so that how to quickly and correctly detect the vehicles in the images becomes a challenging task.
The essence of vehicle detection is to extract the characteristic with discrimination and complete the task of vehicle classification and regression. Compared with the traditional target detection method, the deep learning-based method has obvious advantages in feature extraction and classification regression, so that the existing unmanned aerial vehicle aerial image vehicle detection method is mostly improved on the basis of a general detection algorithm.
The classification task of target detection requires that the extracted features contain more advanced information, and the regression task requires that the features contain more position and detail information, but the two requirements are difficult to be simultaneously combined on the same feature diagram. In the feature extraction network, the feature resolution of the shallow layer is higher, and the shallow layer contains rich position and detail information, so that the shallow layer is more suitable for detecting small targets; however, the semantic information is low, the noise is high, and the semantic information is not suitable for target classification tasks, so that a large number of false checks can be caused. The deep features are more abstract, have stronger semantic information, and are more suitable for target classification tasks; but because of its larger receptive field, the resolution of the features is lower, resulting in poor perceptibility of the details and not suitable for the localization of small targets. Therefore, the deep and shallow features are fused, a feature pyramid is constructed, and the obtained features are taken, so that the advanced feature information of the shallow features can be effectively enhanced, and the target detection precision, especially the small target detection precision, is improved. In the aspect of feature fusion based on pyramid, the existing fusion mode mainly adopts simple equal weight addition processing or channel merging processing, and the contribution degree of features of different layers is not considered in the process of feature fusion, so that the utilization of the features and the effective expression of the feature information are insufficient by a network, and the detection precision is affected. In the aspect of loss function design, the current method only uses the classification loss representing the target class and the regression loss for positioning the target to restrict the optimization of network parameters, and finally the class discrimination capability of the extracted characteristics of the network is limited, which affects the accuracy of vehicle detection, so the detection accuracy of vehicle detection from unmanned aerial vehicle aerial images needs to be further improved.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, and aims to establish a more accurate vehicle detection network and improve the accuracy of vehicle detection.
In order to achieve the above object, according to one aspect of the present invention, there is provided a vehicle detection network establishment method based on an aerial image of an unmanned aerial vehicle, including:
establishing a vehicle detection network to be trained, and training the vehicle detection network by utilizing aerial photo data set training; the training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; the vehicle detection network is a deep learning neural network model, takes an image as an input, and is used for predicting the position and the category of a vehicle in the input image and outputting prediction confidence;
in the training process, the loss function of the calculation loss is as follows: l (L) total =L loc +L cls +L disc ;L loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L disc The smaller;
after training is finished, the establishment of the vehicle detection network is completed.
Further, L disc =L var +L dist
Wherein L is var Is a variance term for representing the feature aggregation degree of the training samples of the same kind, and the higher the feature aggregation degree of the training samples of the same kind is, the variance term L var The smaller the value of (2); l (L) dist Is a distance item for representing the characteristic dispersion degree of different training samples, and the higher the characteristic dispersion degree of different training samples is, the distance item L dist The smaller the value of (c).
Further, the method comprises the steps of,
wherein C represents the total number of vehicle categories in the aerial data set, N c Represents the total number of training samples corresponding to the c-th category in the aerial data set, mu c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x i A feature vector representing an i Zhang Hang th shot; delta v Delta is a preset threshold value v >0;[x] + =max(0,x)。
Further, the method comprises the steps of,
wherein C represents the total number of vehicle categories in the aerial dataset; c A And c B Two different vehicle classes are represented,and->Respectively representing the average value of the feature vectors of the aerial images corresponding to the two categories; delta d Delta is a preset threshold value d >0;[x] + =max(0,x)。
Further, the method comprises the steps of,
L cls =-α t W t (1-P t ) γ log(P t )
wherein,a is a super parameter; />p represents the prediction confidence of the corresponding category; gamma is a preset weight coefficient; />w class Weights representing misclassified training samples of corresponding class, +.>p class Representing the probability of occurrence of training samples of the corresponding class in all training samples; beta is a super parameter.
Further, the vehicle detection network includes: a feature extraction backbone network, a feature fusion network, a classification sub-network and a regression sub-network;
the feature extraction backbone network is used for extracting three features with different scales of an input image, and the features are marked as C3, C4 and C5 in sequence according to the scales from large to small;
the feature fusion network comprises a semantic feature extraction module and a feature pyramid weighting fusion module, wherein the semantic feature extraction module is used for carrying out further feature extraction on the feature C5 to obtain a feature C6, and carrying out further feature extraction on the feature C6 to obtain a feature C7; the feature pyramid weighted fusion module is a 5-layer feature pyramid network, 5 layers of features outputted by the feature pyramid weighted fusion module are sequentially marked as P3-P7 from bottom to top, wherein the feature P7 is a result of convolution operation of the feature C7, and the feature PM is a result of convolution operation of the upper layer feature P (M+1) and the feature CM;
the classification sub-network is used for predicting the type of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network and outputting prediction confidence;
the regression sub-network is used for predicting the position of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network;
wherein M is a positive integer, and M is more than or equal to 3 and less than or equal to 6.
Further, in the training process, the weighted fusion of the feature P (M+1) and the feature CM is the dynamic adjustment of the weight coefficient of the feature PM.
According to another aspect of the present invention, there is provided a vehicle detection method based on an aerial image of an unmanned aerial vehicle, including:
inputting an aerial image to be detected into a vehicle detection network established by the vehicle detection network establishment method based on the unmanned aerial vehicle aerial image, so that the vehicle detection network predicts the position and the category of the vehicle, namely the prediction confidence;
and drawing a prediction result output by the vehicle detection network in the aerial image to finish vehicle detection.
Further, the vehicle detection method based on the unmanned aerial vehicle aerial image provided by the invention further comprises the following steps before the prediction result output by the vehicle detection network is drawn in the aerial image:
removing redundant prediction frames in the prediction result;
the prediction frame is a detection frame determined by the position information.
According to yet another aspect of the present invention, there is provided a computer readable storage medium comprising a stored computer program; when the computer program is executed by the processor, the equipment where the computer readable storage medium is located is controlled to execute the vehicle detection network establishment method based on the unmanned aerial vehicle aerial image and/or the vehicle detection method based on the unmanned aerial vehicle aerial image.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
(1) According to the invention, when the vehicle detection network based on the unmanned aerial vehicle aerial image is trained, the used loss function comprises a regression loss item and a category loss item, an inter-category identifiable loss item is introduced, the more the feature distribution of the similar training sample is concentrated, the more the feature distribution of the different similar training samples is dispersed, the smaller the inter-category identifiable loss is, therefore, the distribution condition of the training sample in the feature space can be changed through network training by introducing the inter-category identifiable loss item, more constraints are provided for optimizing network parameters, the feature distribution of the same category training sample is concentrated, the feature distribution of the different category samples is far away, finally, the features among the different categories extracted by the network are more differentiated, and the conditions of complex background in the unmanned aerial vehicle aerial image, different vehicle categories in the scene, different forms, different angles and the like can be effectively reduced, the conditions of false detection, false detection and wrong classification in the detection process can be effectively improved.
(2) The invention discloses a method for detecting a vehicle based on unmanned aerial vehicle aerial images, which comprises the steps of training a vehicle detection network based on unmanned aerial vehicle aerial images, wherein a used loss function comprises a regression loss term, a category loss term and an inter-category distinguishable loss term, wherein category loss penalizes a category of a prediction error on the basis of a focal loss classification loss, and simultaneously, a parameter W is introduced t Different weights can be given according to different numbers of samples of different categories, the weight of the samples with few categories and misclassification can be increased, network training can pay more attention to misclassification of few categories, and the accuracy of vehicle detection is further improved.
(3) The vehicle detection network established by the invention further carries out feature extraction on the basis of the features extracted by the feature extraction backbone network, so that deeper features are obtained, network parameters are reduced, the depth of the network is deepened, higher semantic features are extracted, and deep and shallow convolution features are fused from top to bottom from the top layer of a feature pyramid by constructing a five-layer feature pyramid network, so that the advanced feature information of the shallow features can be effectively enhanced, the accuracy of target detection, especially small target detection, and vehicles with various scales can be detected; meanwhile, the original fusion mode is changed, and the multi-layer features are adaptively fused by adopting a feature weighted fusion mode, so that the network can purposefully extract the features according to the requirements of final regression and classification tasks. Based on the vehicle detection network with the special structure, the invention can further improve the detection precision of vehicle detection on the aerial image of the unmanned aerial vehicle.
(4) The vehicle detection network established by the invention, wherein the weighting coefficient for carrying out feature weighted fusion is a parameter which can be learned, so that the network can autonomously learn the importance of each input feature in training, dynamically adjust the weighting coefficient and further improve the detection precision of vehicle detection.
Drawings
FIG. 1 is a schematic diagram of a vehicle detection network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a feature pyramid weighted fusion module according to an embodiment of the present invention;
fig. 3 is a flowchart of a vehicle detection method based on an aerial image of an unmanned aerial vehicle according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In order to improve the detection precision of vehicle detection based on unmanned aerial vehicle aerial images, the invention provides a vehicle detection network establishment method based on unmanned aerial vehicle aerial images and application thereof, and the whole thought is as follows: by introducing the identifiable loss among the classes into the loss function, the distribution condition of the training sample in the feature space is changed by utilizing the network training, more constraints are provided for network parameter optimization, so that the features among different classes extracted by the network are more differentiated, the prediction result precision of the network is higher, and the detection precision of vehicle detection is further improved; on the basis, the network structure is further improved, the feature weighting fusion of higher-level semantic information is realized, the weight coefficient can be dynamically adjusted in the training process, and the detection precision of vehicle detection is further improved.
The following are examples.
Example 1:
a vehicle detection network establishment method based on unmanned aerial vehicle aerial images comprises the following steps:
and establishing a vehicle detection network to be trained, training the vehicle detection network by using the aerial photo data set training, and completing the establishment of the vehicle detection network after the training is finished.
The training sample in the aerial photographing data set is an aerial photographing image marked with the position and the type of the vehicle; optionally, in this embodiment, the aerial photographing dataset used is specifically a public unmanned aerial vehicle aerial photographing dataset UAVDT, where the UAVDT dataset includes 50 video sequences of different scenes, totaling 4 ten thousand multiframe images, where vehicle categories are classified into three categories of car, truck and bus; in other embodiments of the present invention, other data sets may be used, or the corresponding data sets may be built by themselves.
In the embodiment, the vehicle detection network is a deep learning neural network model, and is input into a three-channel unmanned aerial vehicle aerial image for predicting the position and the type of the vehicle in the input image and outputting prediction confidence; the structure of the vehicle detection network is shown in fig. 1, and the vehicle detection network integrally comprises three parts, namely a feature extraction backbone network, a feature fusion network based on a feature pyramid weighted fusion module, and a classification and regression sub-network. Wherein:
the feature extraction backbone network is used for extracting three features with different scales of an input image, and the features are marked as C3, C4 and C5 in sequence according to the scales from large to small; as an optional implementation manner, in this embodiment, a res net50 is used as a feature extraction backbone network, features are extracted by using five convolution blocks (Conv 1-Conv 5 in turn from bottom to top), a C3 layer feature map based on three downsampling of an original image is obtained by Conv3, a C4 layer feature map based on four downsampling of the original image is obtained by Conv4, and a C5 layer feature map based on five downsampling of the original image is obtained by Conv 5; in other embodiments of the invention, other feature extraction networks may be used as the feature extraction backbone network in the vehicle detection network;
the feature fusion network comprises a semantic feature extraction module and a feature pyramid weighting fusion module; the semantic feature extraction module is used for further feature extraction of the feature C5 to obtain a feature C6, and further feature extraction of the feature C6 to obtain a feature C7, specifically, the embodiment performs one 3*3 convolution on the basis of the C5 layer feature map to obtain a C6 layer feature map, and performs one 3*3 convolution on the basis of the C6 layer feature map to obtain a C7 layer feature map; the feature pyramid weighted fusion module is a 5-layer feature pyramid network, 5 layers of features outputted by the feature pyramid weighted fusion module are sequentially marked as P3-P7 from bottom to top, wherein the feature P7 is a result of convolution operation of the feature C7, and the feature PM is a result of convolution operation of the upper layer feature P (M+1) and the feature CM;
the classification sub-network is used for predicting the type of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network and outputting prediction confidence;
the regression sub-network is used for predicting the position of the vehicle in the input image according to the characteristics P3-P7 output by the characteristic fusion network;
wherein M is a positive integer, and M is more than or equal to 3 and less than or equal to 6; the main function of the vehicle detection network established by the embodiment is to deepen the depth of the network to extract higher-level semantic information while reducing the network parameter number by convolution of 3*3 to obtain the C6 and C7 layers; in the feature pyramid weighted fusion module, a five-layer feature pyramid network is constructed from a P7 layer effective feature layer, deep and shallow convolution features are fused from top to bottom from the top layer of the feature pyramid, an original fusion mode is changed, and multiple layers of features are adaptively fused in a feature weighted fusion mode; the weight coefficients of the features of different layers are adjusted through a back propagation autonomous learning strategy, so that the network can purposefully extract the features according to the requirements of final regression and classification tasks; in the prediction output stage, five effective feature layers are respectively input into a classification sub-network and a regression sub-network, and the result of each effective feature layer is synthesized to obtain a final prediction result; p3, P4, P5, P6, P7 are obtained by downsampling the original figures 3,4, 5, 6, 7, respectively, and the corresponding receptive fields are 8 x 8, 16 x 16, 32 x 32, 64 x 64, 128 x 128, respectively, so as to be capable of detecting vehicles with various scales;
in addition, in the embodiment, the contribution degree of the convolution features of different layers to the final regression and classification tasks is different, so that in the process of fusing the deep and shallow convolution features, a weighting fusion mode is adopted, and a learnable weight coefficient is introduced to enable the network to autonomously learn the importance of each input feature; the feature fusion process in the five-layer feature pyramid is further described below, and as shown in fig. 2, taking the fusion process of the P4 layer and the C3 layer as an example, the weight coefficient of the P4 layer after upsampling is w 31 The weight coefficient of the C3 layer after the number of characteristic channels is compressed to 256 by 3*3 convolution is w 32 Multiplying the P4 layer and the C3 layer with the corresponding weight coefficients respectively, and then passing through a 3*3 convolution block to obtain a final effective characteristic layer P3; in FIG. 2, w M1 And w M2 Respectively represent the weight coefficients of the (M+1) -th layer feature P (M+1) and the M-th layer feature CM in the five-layer feature pyramid when the (M+1) -th layer feature P (M+1) -th layer feature CM are fused, for example, w in the fusion process shown in FIG. 2 31 Weight coefficient representing feature P4, w 32 The weight coefficient representing the feature C3. Based on the above fusion process, in the five-layer feature pyramid, the output of each effective feature layer can be expressed as:
P 7 =Conv(C 7 )
in the above, w ij (i=3, 4, …,7;j =1, 2) represents the weight of the j-th input feature of the i-th layer, and mainly measures the importance of each input feature; epsilon is a small value to avoid a denominator of 0, alternatively in this embodiment epsilon is 0.0001;
in this embodiment, in the training process, the weighting coefficient w of the feature weighted fusion ij The network can learn the importance of each input characteristic independently in training for being a learnable parameter, and dynamically adjust the weight coefficient, so that the detection precision of vehicle detection is further improved.
In order to make the characteristics among different categories extracted by the network more distinguishable, and make the detection network be better applied to the complex scene under unmanned aerial vehicle, the invention increases the identifiable loss among the categories on the basis of regression loss and improved classification loss, and evaluates the difference between the model predicted value and the true value by using three categories of losses of regression loss, classification loss and identifiable loss among the categories; specifically, in this embodiment, in the training process, the loss function of the calculation loss is: l (L) total =L loc +L cls +L disc ;L loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) disc The inter-class discriminable loss is used for representing the distribution condition of training samples in a feature space, and the more the feature distribution of the training samples of the same class is concentrated, the more the feature distribution of the training samples of different classes is dispersed, the inter-class discriminable loss L disc The smaller; the following describes each loss in detail:
a. the first term is regression loss, representing the predicted value f (x i ) (including the abscissa and ordinate of the center point of the prediction frame and the width and height of the prediction frame) and true value y i Differences between; the mathematical expression is:
wherein,i epsilon { x, y, w, h } represents the error between the horizontal and vertical coordinates of the central points of the prediction frame and the real frame and the width and height of the rectangular frame;
b. the second term is classification loss, and the situation of vehicle class prediction errors is evaluated; when a large number of samples are used for training, the attention degree of the original focal local classification loss on the few-class misclassified samples is low, and the few-class misclassified samples are easy to ignore; therefore, the embodiment penalizes the class of the prediction error on the basis of the local loss of classification, and gives different weights according to the different numbers of the class samples, so that the weight of the class-less misclassified samples is increased, and the network training can pay more attention to the class-less misclassified samples; the mathematical expression is as follows:
L cls =-α t W t (1-P t ) γ log(P t )
wherein,a is a super parameter, passing alpha t The weight of positive and negative samples in the training process is adjusted; />p represents the confidence that the network gives that the prediction is of that class; the weight of different detection difficult samples is adjusted through gamma, so that the aim of reducing the loss of the easy-to-classify samples is fulfilled;w class the weight of each misclassified sample is represented; />p class Representing the probability of the occurrence of the target of the corresponding category in the overall training sample; />num class The number of samples representing the corresponding category, total_num representing the total number of samples, beta being the super-parameter; according to w class As can be seen from the calculation formula of (2), in the case of a class prediction error, the smaller the number of training samples of the same class is, the weight coefficient W to which the training samples are assigned t The larger the sample size is, the purpose of increasing the weight of the few-class misclassified samples is achieved;
c. the third item is the inter-class identifiable loss, aiming at the situations of complex background in the unmanned aerial vehicle aerial image, different types, different forms, angles and the like of vehicles in the scene, and more differentiated features can effectively reduce the situations of false detection, missed detection and misclassification in the detection process, so that the inter-class identifiable loss is increased on the basis of the regression loss and the classification loss, the distribution situation of training samples in a feature space is changed by utilizing network training, and the optimization of network parameters is restrained, so that the feature distribution of the similar samples is more concentrated, and the feature distribution of different types of samples is more far away; the mathematical expression for the identifiable loss between classes is:
L disc =L var +L dist
the loss consists mainly of two parts: a variance term and a distance term; l (L) var Is a variance term for representing the feature aggregation degree of the training samples of the same kind, and the higher the feature aggregation degree of the training samples of the same kind is, the variance term L var The smaller the value of (2); l (L) dist Is a distance item for representing the characteristic dispersion degree of different training samples, and the higher the characteristic dispersion degree of different training samples is, the distance item L dist The smaller the value of (2);
variance term L var The calculation expression is specifically as follows:
wherein C represents the total number of vehicle categories in the aerial data set, N c Represents the total number of training samples corresponding to the c-th category in the aerial data set, mu c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x i A feature vector representing an i Zhang Hang th shot; delta v Delta is a preset threshold value v >0;[x] + =max (0, x), i.e. when +|μ c -x i II is less than delta v When the loss is 0; when II mu c -x i II is greater than delta v When the method is used, the loss is calculated, so that the aim of gathering the same class of characteristics is fulfilled; alternatively, in the present embodiment, δ v The value of (2) is 0.6;
distance item L dist The calculation expression of (2) is specifically as follows:
c represents the total number of vehicle categories in the aerial data set; c A And c B Two different vehicle classes are represented,and->Respectively representing the average value of the feature vectors of the aerial images corresponding to the two categories; delta d Delta is a preset threshold value d >0;[x] + =max (0, x), i.e. when the distance between class centers of different classes +.>Greater than delta d When the loss value is 0; when->Less than delta d When the method is used, the loss is calculated, so that the purpose that different types of samples are far away from each other is realized; finally, different classes of vehiclesThe characteristic distance between vehicles is limited to delta d Outside of that; alternatively, in the present embodiment, δ d The value of (2) is 3.6.
Example 2:
a vehicle detection method based on unmanned aerial vehicle aerial image, as shown in figure 3, comprises the following steps:
inputting an aerial image to be detected into a vehicle detection network established by the vehicle detection network establishment method based on the aerial image of the unmanned aerial vehicle provided by the above embodiment 1, so as to predict the position and the category of the vehicle, namely, the prediction confidence by the vehicle detection network; the input image may be a frame of image in a video captured by an aerial camera; in order to avoid the influence of abnormal values and extreme values on results, the embodiment can also preprocess the unmanned aerial vehicle aerial image before inputting the image into a vehicle detection network; optionally, the preprocessing performed in the embodiment specifically includes normalization and normalization, where the normalization specifically includes dividing the input image data by 255, and the normalization specifically includes dividing the normalized input image minus the mean value by the standard deviation;
drawing a prediction result output by a vehicle detection network in an aerial image to finish vehicle detection; in order to clearly display a vehicle detection result in an input aerial image, in the embodiment, before a prediction result output by a vehicle detection network is drawn in the aerial image, post-processing is performed on the vehicle detection result, and the post-processing operation includes removing a redundant prediction frame in the prediction result; optionally, in this embodiment, the redundant prediction block is removed by a Non-maximum suppression (Non-Maximum Suppression, NMS) method;
the vehicle position information output from the vehicle detection network may be expressed as (x) 1 ,y 1 ,x 2 ,y 2 ),(x 1 ,y 1 ) Representing the upper left corner coordinates of a predicted rectangular box of the vehicle, (x) 2 ,y 2 ) Representing the lower right corner coordinates of the vehicle prediction rectangular frame; the vehicle type information is specifically whether the input aerial image contains a vehicle, and in the case of containing the vehicle, the vehicle type information can be a car (car), a bus (bus) or the like; prediction confidence for measuring checkingMeasuring the reliability of the result; the prediction frame is a detection frame determined by the position information.
Example 3:
a computer readable storage medium comprising a stored computer program; when the computer program is executed by the processor, the device where the computer readable storage medium is located is controlled to execute the vehicle detection network establishment method based on the aerial image of the unmanned aerial vehicle provided in the above embodiment 1, and/or the vehicle detection method based on the aerial image of the unmanned aerial vehicle provided in the above embodiment 2.
In order to better illustrate the effect of the unmanned aerial vehicle aerial image-based vehicle detection network, the model is respectively subjected to qualitative and quantitative tests on the UAVDT data set. Since the class distribution in the dataset is severely unbalanced, the number of car class vehicles occupies 92% of the entire dataset, the present invention tests by combining these three classes into a single car class as is customary in the paper "The Unmanned Aerial Vehicle Benchmark: object Detection and Tracking [ J ]" (DDu, qi Y, yu H, et al. Springer, cham, 2018.).
Qualitative analysis: selecting six images from different video sequences in UAVDT data sets, wherein the heights, angles and weather conditions of unmanned aerial vehicle aerial photographs are different, and the scenes corresponding to the six selected images are respectively as follows: (a) low-altitude, side-looking, daytime scenes, (b) low-altitude, overhead, night scenes, (c) hollow, side-looking, daytime scenes, (d) hollow, front-looking, night scenes, (e) high-altitude, front-looking, daytime scenes, and (f) high-altitude, overhead, night scenes. The scene is too complex and is not illustrated here. From the detection result, the vehicle detection network provided by the invention can basically and correctly detect vehicles in different aerial photographing scenes, and meanwhile, higher detection confidence coefficient is maintained, instead of being effective only for unmanned aerial vehicle aerial photographing images of specific photographing heights or specific photographing angles or specific photographing weather, and the generalization capability and robustness of an algorithm are verified.
Quantitative analysis: as shown in table 1, in order to more intuitively illustrate the effect of the vehicle detection network provided by the invention, the test is performed by using 20 test sequences divided in the UAVDT dataset, and the test result is evaluated by using an average precision average (mAP); according to experimental results, compared with the current advanced unmanned aerial vehicle aerial image vehicle detection methods of UAV-Net, LSN, GANet, NDFT, spotNet, D Det, the vehicle detection method provided by the invention has the best effect, and can be used for detecting more vehicle targets to the greatest extent while ensuring the accuracy.
Table 1 comparison analysis of different unmanned aerial vehicle aerial image vehicle detection methods
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. The vehicle detection network establishment method based on the unmanned aerial vehicle aerial image is characterized by comprising the following steps of:
establishing a vehicle detection network to be trained, and training the vehicle detection network by utilizing aerial photo data set training; the training samples in the aerial photographing data set are aerial photographing images marked with vehicle positions and categories; the vehicle detection network is a deep learning neural network, takes an image as an input, and is used for predicting the position and the category of the vehicle in the input image and outputting prediction confidence;
in the training process, the loss function of the calculation loss is as follows: l (L) total =L loc +L cls +L disc ;L loc For regression loss, representing the difference between the predicted value and the true value of the vehicle position; l (L) cls For classification loss, representing a difference between a predicted value and a true value of a vehicle class; l (L) disc Is a distinguishable loss between classes and is used for representing the distribution condition of training samples in a characteristic space, and the more the characteristic distribution of the training samples of the same class is gatheredThe more the feature distribution of different classes of training samples is dispersed, the more the loss L can be identified among the classes disc The smaller;
after training is finished, the establishment of the vehicle detection network is completed;
wherein, the classification loss L cls The method comprises the following steps:
L cls =-α t W t (1-P t ) γ log(P t )
a is a super parameter; />p represents the prediction confidence of the corresponding category; gamma is a preset weight coefficient; />w class Weights representing misclassified training samples of corresponding class, +.>p class Representing the probability of occurrence of training samples of the corresponding class in all training samples; beta is a super parameter.
2. The method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle according to claim 1, wherein L disc =L var +L dist
Wherein L is var Is a variance term for representing the feature aggregation degree of the training samples of the same kind, and the higher the feature aggregation degree of the training samples of the same kind is, the variance term L var The smaller the value of (2); l (L) dist Is a distance item for representing the characteristic dispersion degree of different training samples, and the higher the characteristic dispersion degree of different training samples is, the distance item L dist The smaller the value of (c).
3. A method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle as claimed in claim 2,
wherein C represents the total number of vehicle categories in the aerial dataset, N c Representing the total number of training samples corresponding to the c-th category in the aerial data set, mu c Representing the average value of feature vectors of aerial images corresponding to the c-th category, and x i A feature vector representing an i Zhang Hang th shot; delta v Delta is a preset threshold value v >0;[x] + =max(0,x)。
4. A method for establishing a vehicle detection network based on aerial images of an unmanned aerial vehicle as claimed in claim 2,
wherein C represents the total number of vehicle categories in the aerial dataset; c A And c B Two different vehicle classes are represented,and->Respectively representing the average value of the feature vectors of the aerial images corresponding to the two categories; delta d Delta is a preset threshold value d >0;[x] + =max(0,x)。
5. The unmanned aerial vehicle aerial image-based vehicle detection network establishment method according to any one of claims 1 to 4, wherein the vehicle detection network comprises: a feature extraction backbone network, a feature fusion network, a classification sub-network and a regression sub-network;
the feature extraction backbone network is used for extracting three features with different scales of an input image, and is marked as C in sequence from large scale to small scale 3 、C 4 And C 5
The feature fusion network comprises a semantic feature extraction module and a feature pyramid weighted fusion module, wherein the semantic feature extraction module is used for extracting features C 5 Further extracting features to obtain feature C 6 And for feature C 6 Further extracting features to obtain feature C 7 The method comprises the steps of carrying out a first treatment on the surface of the The feature pyramid weighted fusion module is a 5-layer feature pyramid network, and 5-layer features output by the feature pyramid weighted fusion module are sequentially marked as P from bottom to top 3 ~P 7 Wherein the feature P 7 Is characterized by C 7 The result after convolution operation, characteristic P M For its upper layer feature P (M+1) And feature C M Weighting and fusing the results of convolution operation;
the classifying sub-network is used for fusing the characteristics P output by the network according to the characteristics 3 ~P 7 Predicting the type of the vehicle in the input image and outputting prediction confidence;
the regression sub-network is used for fusing the characteristics P output by the network according to the characteristics 3 ~P 7 Predicting a position of a vehicle in an input image;
wherein M is a positive integer, and M is more than or equal to 3 and less than or equal to 6.
6. The method for establishing a vehicle detection network based on aerial images of claim 5, wherein during training, the feature P (M+1) And feature C M Weighted fusion is feature P M The weight coefficient of (2) is dynamically adjusted.
7. The vehicle detection method based on the unmanned aerial vehicle aerial image is characterized by comprising the following steps of:
inputting an aerial image to be detected into a vehicle detection network established by the unmanned aerial vehicle aerial image-based vehicle detection network establishment method according to any one of claims 1 to 6, so as to predict the position and the category of a vehicle, namely, the prediction confidence degree, by the vehicle detection network;
and drawing a prediction result output by the vehicle detection network in the aerial image to finish vehicle detection.
8. The unmanned aerial vehicle-based vehicle detection method of claim 7, further comprising, prior to plotting the prediction output by the vehicle detection network in the aerial image:
removing redundant prediction frames in the prediction result;
wherein the prediction frame is a detection frame determined by the position information.
9. A computer readable storage medium comprising a stored computer program; when the computer program is executed by a processor, the device where the computer readable storage medium is located is controlled to execute the vehicle detection network building method based on the aerial image of the unmanned aerial vehicle according to any one of claims 1 to 6 and/or the vehicle detection method based on the aerial image of the unmanned aerial vehicle according to any one of claims 7 to 8.
CN202111119764.9A 2021-09-24 2021-09-24 Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof Active CN113780462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111119764.9A CN113780462B (en) 2021-09-24 2021-09-24 Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111119764.9A CN113780462B (en) 2021-09-24 2021-09-24 Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof

Publications (2)

Publication Number Publication Date
CN113780462A CN113780462A (en) 2021-12-10
CN113780462B true CN113780462B (en) 2024-03-19

Family

ID=78853022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111119764.9A Active CN113780462B (en) 2021-09-24 2021-09-24 Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof

Country Status (1)

Country Link
CN (1) CN113780462B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114374951B (en) * 2022-01-12 2024-04-30 重庆邮电大学 Dynamic pre-deployment method for multiple unmanned aerial vehicles

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717387A (en) * 2019-09-02 2020-01-21 东南大学 Real-time vehicle detection method based on unmanned aerial vehicle platform
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN111814584A (en) * 2020-06-18 2020-10-23 北京交通大学 Vehicle weight identification method under multi-view-angle environment based on multi-center measurement loss
CN112613552A (en) * 2020-12-18 2021-04-06 北京工业大学 Convolutional neural network emotion image classification method combining emotion category attention loss

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN110717387A (en) * 2019-09-02 2020-01-21 东南大学 Real-time vehicle detection method based on unmanned aerial vehicle platform
CN111814584A (en) * 2020-06-18 2020-10-23 北京交通大学 Vehicle weight identification method under multi-view-angle environment based on multi-center measurement loss
CN112613552A (en) * 2020-12-18 2021-04-06 北京工业大学 Convolutional neural network emotion image classification method combining emotion category attention loss

Also Published As

Publication number Publication date
CN113780462A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN111723748B (en) Infrared remote sensing image ship detection method
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
CN112200045B (en) Remote sensing image target detection model establishment method based on context enhancement and application
CN107273832B (en) License plate recognition method and system based on integral channel characteristics and convolutional neural network
CN111079640A (en) Vehicle type identification method and system based on automatic amplification sample
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN115661044A (en) Multi-source fusion-based substation power equipment fault detection method
CN112287896A (en) Unmanned aerial vehicle aerial image target detection method and system based on deep learning
CN106778540A (en) Parking detection is accurately based on the parking event detecting method of background double layer
CN114781514A (en) Floater target detection method and system integrating attention mechanism
CN115661720A (en) Target tracking and identifying method and system for shielded vehicle
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN113989487A (en) Fault defect detection method and system for live-action scheduling
CN113780462B (en) Vehicle detection network establishment method based on unmanned aerial vehicle aerial image and application thereof
CN111160100A (en) Lightweight depth model aerial photography vehicle detection method based on sample generation
CN116363532A (en) Unmanned aerial vehicle image traffic target detection method based on attention mechanism and re-parameterization
CN117152601A (en) Underwater target detection method and system based on dynamic perception area routing
CN111274986A (en) Dish identification and classification method based on image analysis
CN115700808A (en) Dual-mode unmanned aerial vehicle identification method for adaptively fusing visible light and infrared images
CN114821356B (en) Optical remote sensing target detection method for accurate positioning
CN115861595A (en) Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning
CN116309270A (en) Binocular image-based transmission line typical defect identification method
CN115984723A (en) Road damage detection method, system, device, storage medium and computer equipment
CN114067186B (en) Pedestrian detection method and device, electronic equipment and storage medium
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant