CN116416534A - Unmanned aerial vehicle spare area identification method facing protection target - Google Patents
Unmanned aerial vehicle spare area identification method facing protection target Download PDFInfo
- Publication number
- CN116416534A CN116416534A CN202310139757.8A CN202310139757A CN116416534A CN 116416534 A CN116416534 A CN 116416534A CN 202310139757 A CN202310139757 A CN 202310139757A CN 116416534 A CN116416534 A CN 116416534A
- Authority
- CN
- China
- Prior art keywords
- attention
- module
- feature map
- feature
- global feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000011218 segmentation Effects 0.000 claims abstract description 50
- 230000004927 fusion Effects 0.000 claims abstract description 28
- 238000001514 detection method Methods 0.000 claims abstract description 13
- 238000012216 screening Methods 0.000 claims abstract description 4
- 238000011176 pooling Methods 0.000 claims description 17
- 238000010586 diagram Methods 0.000 claims description 6
- 238000009499 grossing Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 18
- 238000012549 training Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/17—Terrestrial scenes taken from planes or by drones
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Remote Sensing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a protection target-oriented unmanned aerial vehicle spare area identification method, which comprises the following steps: collecting historical aerial image data of the unmanned aerial vehicle, screening the historical aerial image data and marking pixel points by pixel points to form an aerial image data set; inputting the aerial photographing data set into a target recognition network to obtain a context characteristic; the target recognition network comprises a plurality of layers of semantic segmentation models and a unified attention fusion module connected with the semantic segmentation models, wherein after the aerial data set is input into the semantic segmentation models, the obtained global feature map of a part of layers is input into the unified attention fusion module, and a context feature map is obtained; and respectively inputting the context feature map into a semantic segmentation head and a target detection head, and fusing the output results of the semantic segmentation head and the target detection head into identification results. The invention detects pedestrians and vehicles in the spare landing area through the semantic segmentation technology in computer vision so as to ensure that the unmanned aerial vehicle can safely land to the spare landing area.
Description
Technical Field
The invention relates to the technical field of semantic segmentation in computer vision, in particular to a protection target-oriented unmanned aerial vehicle spare area recognition method.
Background
The unmanned aerial vehicle aerial image semantic segmentation is to apply semantic segmentation technology to unmanned aerial vehicle aerial image technology to enable unmanned aerial vehicle to obtain intelligent perception capability of scene targets. Aiming at the semantic segmentation of the unmanned aerial vehicle aerial image, the object-oriented extremely complex aerial scene graph, the spare areas to be identified comprise horizontal roofs, horizontal floors, horizontal grasslands and the like, pedestrians and vehicles in each spare area are detected, and if no pedestrians and vehicles exist in the scene, the spare areas can be judged.
Disclosure of Invention
The invention aims to detect pedestrians and vehicles in a spare landing area through a semantic segmentation technology in computer vision so as to ensure that an unmanned aerial vehicle can safely land to the spare landing area, and provides a protection target-oriented unmanned aerial vehicle spare landing area identification method.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
a protection target-oriented unmanned aerial vehicle standby area identification method comprises the following steps:
and 3, respectively inputting the context feature map into a semantic segmentation head and a target detection head, and fusing the output results of the semantic segmentation head and the target detection head into identification results.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the semantic segmentation technology is utilized to segment and identify the preparation descending region of the unmanned aerial vehicle, and as the semantic segmentation is a pixel-level image understanding method, the preparation descending region identification is more accurate and more efficient, and the STDC-BiSeNet network model is in a leading technology in the current real-time semantic segmentation field, so that the scientificity and popularization of the method are embodied.
The invention identifies pedestrians and vehicles in the preliminary landing area, and has good identification effect, thereby guaranteeing the life and property safety of pedestrians on the ground.
According to the invention, the STDC-BiSeNet backbone network is shared by semantic segmentation and target detection, so that the participation of the whole task model is reduced, the whole model is lightened, and the rapid deployment of the model is facilitated.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a feature attention weighting module according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a full convolution module according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a process of the unified attention fusion module according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish one from another, and are not to be construed as indicating or implying a relative importance or implying any actual such relationship or order between such entities or operations. In addition, the terms "connected," "coupled," and the like may be used to denote a direct connection between elements, or an indirect connection via other elements.
Example 1:
the invention is realized by the following technical scheme, as shown in fig. 1, and the unmanned aerial vehicle spare area identification method facing the protection target comprises the following steps:
In order to enable the unmanned aerial vehicle to show better generalization capability when executing a target recognition task, the embodiment collects a large amount of historical aerial image data or video data in different scenes, different time periods and different areas, screens the data and marks the data pixel by pixel, uses a labeme image marking tool during marking, and makes the marked data into a VOC data set format to form an aerial image data set.
The current mainstream semantic segmentation model is mostly an encoder-decoder (encoder-decoder) structure, which is used for feature extraction in the encoder part, so that the feature map is rich in semantic information while the resolution of the feature map is gradually reduced, and the final segmentation prediction result is decoded by using the features encoded by the encoder as input in the decoder part. There are many problems in the basic framework, in which detail information is required in addition to semantic information in the semantic segmentation task, a model often loses a large amount of detail information in the continuous convolution and pooling processes, and the process often causes the model parameters to become large.
In order to achieve the real-time effect in the semantic segmentation task, the scheme adopts a semantic segmentation model (STDC-BiSeNet), and the model is simple in network, small in parameter quantity, very light in weight and good in segmentation performance. The method is built on an unmanned aerial vehicle platform, and can well realize the identification of a spare landing area, a forced landing area and a protection target.
Referring to fig. 1, the semantic segmentation model includes 5 layers, which are a first full convolution module, a second full convolution module, a first feature attention weight module, a second feature attention weight module, and a third feature attention weight module that are sequentially connected.
The aerial photographing data set with the scale of 224 x 3 is input into a first full convolution module, and a first feature map with the scale of 112 x 32 is output to a second full convolution module after being processed by the first full convolution module; and outputting a second characteristic diagram with the scale of 56 x 64 to the first characteristic attention weighting module after being processed by the second full convolution module.
The first global feature map F with the scale of 28 x 256 is output to the second feature attention weight module after being processed by the first feature attention weight module low1 The method comprises the steps of carrying out a first treatment on the surface of the The second global feature map F with the scale of 14 x 512 is output to a third feature attention weight module after being processed by the second feature attention weight module low2 The method comprises the steps of carrying out a first treatment on the surface of the Outputting a third global feature map F with the scale of 7 x 1024 to the global pooling layer after being processed by a third feature attention weight module low3 。
The first feature attention weight module, the second feature attention weight module and the third feature attention weight module of the semantic segmentation model output a first global feature map F to the unified attention fusion module respectively low1 Second global feature map F low2 Third global feature map F low3 。
The first full convolution module and the second full convolution module have the same structure, and referring to fig. 3, the first full convolution module and the second full convolution module each include a convolution layer, a normalization layer, and an activation layer that are sequentially connected.
The first feature attention weight module, the second feature attention weight module and the third feature attention weight module have the same structure, please refer to fig. 2, and the first feature attention weight module, the second feature attention weight module and the third feature attention weight module all include a global pooling layer, and a first attention convolution layer, a second attention convolution layer, a third attention convolution layer, a fourth attention convolution layer and a Concat layer which are sequentially connected. The convolution kernel size of the first attention convolution layer is 1*1, the convolution kernel size of the second attention convolution layer is 3*3, the convolution kernel size of the third attention convolution layer is 3*3, and the convolution kernel size of the fourth attention convolution layer is 3*3.
With continued reference to fig. 2, the aerial data set passes through a first full convolution module and a second full convolution module to obtain a low-level feature map F 0 Low-level feature map F 0 Obtaining a first global feature subgraph F through a first attention convolution layer 1 The method comprises the steps of carrying out a first treatment on the surface of the First global feature subgraph F 1 Obtaining a second global feature subgraph F through a second attention convolution layer 2 The method comprises the steps of carrying out a first treatment on the surface of the Second global feature subgraph F 2 Obtaining a third global feature subgraph F through a third attention convolution layer 3 The method comprises the steps of carrying out a first treatment on the surface of the Third global feature subgraph F 3 Obtaining a fourth global feature subgraph F through a fourth attention convolution layer 4 The method comprises the steps of carrying out a first treatment on the surface of the First global feature subgraph F 1 After passing through the global pooling layer with the core size of 3*3, the core is combined with a second global feature subgraph F 2 Third global feature subgraph F 3 Four global feature subgraphs F 4 Fused into a global feature map F through Concat layer low 。
It is easy to understand that the first feature attention weighting module outputs a first global feature map F low1 The second feature attention weighting module outputs a second global feature map F low2 The third feature attention weighting module outputs a third global feature map F low3 。
With continued reference to fig. 1, the unified attention fusion module is further connected to a pyramid pooling module, and the pyramid pooling module is used to increase receptive fields when extracting the context feature map.
The pyramid pooling module outputs a third global feature map F to a third feature attention weighting module low3 Processing to obtain a third high-level global feature map F high3 The method comprises the steps of carrying out a first treatment on the surface of the Map the third global feature map F low3 And a third high-level global feature map F high3 The unified attention fusion module is input together to obtain a third context feature diagram F out3 。
Map of third context feature F out3 As a second high-level global feature map F high2 And a second feature attention weighting moduleOutput second global feature map F low2 The unified attention fusion module is input together to obtain a second context feature map F out2 。
Map the second context feature F out2 As a first high-level global feature map F high1 A first global feature map F output by the first feature attention weighting module low1 The unified attention fusion module is input together to obtain a first context feature map F out1 。
Referring to FIG. 4, a third global feature map F is obtained by the unified attention fusion module low3 And a third high-level global feature map F high3 To illustrate the processing, the pyramid pooling module first pools the third global feature map F low3 Processing to obtain a third high-level global feature map F high3 Third global feature map F low3 And a third high-level global feature map F high3 The unified attention fusion module is input together to obtain a third context feature diagram F out3 :
For the third high-level global feature map F high3 Upsampling to form F up3 :
F up3 =Upsample(F high3 )
Will F up3 And a third global feature map F low3 The channels of the common input attention mechanism produce weights a, 1-a:
(a,1-a)=Attention(F up3 ,F high3 )
wherein a is F up3 Is 1-a is F low3 Weights of (2);
and then F is arranged up3 、F high3 Multiplying the third context feature map F by the respective weights out3 :
F out3 =F up3 *a+F low3 *(1-a)。
It is easy to understand that the second context feature map F is obtained out2 With the first context feature map F out1 The same manner is not repeated.
On the other hand, the low-level feature map F of the first attention convolution layer is input in FIG. 2 0 The number of channels is M, and a first global feature subgraph F is obtained after the first attention convolution layer 1 The channel number of (2) is M/2, and then the second global feature subgraph F obtained after the downward convolution treatment and the second attention convolution layer 2 The number of channels of the third attention convolution layer and the fourth attention convolution layer is M/4, then the number of channels of the third attention convolution layer and the fourth attention convolution layer is M/8, and then the first global feature subgraph F 1 Second global feature subgraph F 2 Third global feature subgraph F 3 Fourth global feature subgraph F 4 And performing jump connection splicing fusion. The feature map output to the unified attention fusion module by the pyramid pooling module needs up-sampling, the number of channels is continuously increased, and the feature space is continuously reduced, so that the calculation cost is balanced.
In order to enhance the feature extraction of the target recognition network, the target recognition network has the context multiscale capability, so that the scheme introduces a unified attention fusion module, and the global feature graphs output by the first feature attention weight module, the second feature attention weight module and the third attention weight module are transmitted into the unified attention fusion module to be unified fused, thereby fully utilizing the relations between the spaces of input features and between channels, which is a key factor for improving the segmentation precision.
In summary, the semantic segmentation module, the unified attention fusion module and the pyramid pooling module are connected with the target recognition network, so that the network calculation amount is reduced on the basis of the traditional BiSeNet model, and meanwhile, the calculation efficiency of the model is improved. The result of layer jump connection is adopted integrally, a unified attention fusion module and a pyramid pooling module are introduced, the receptive field of the target recognition network is enlarged, and the context characteristics are fused.
And 3, respectively inputting the context feature map into a semantic segmentation head and a target detection head, and fusing the output results of the semantic segmentation head and the target detection head into identification results.
Map the first context feature F out1 The method comprises the steps that as the final output context characteristics of a target recognition network, a prediction part is input, the prediction part comprises two parallel parts of a semantic segmentation head and a target detection head, and the context characteristics are subjected to semantic segmentationAnd after the header and the target detection header, the content is displayed on a graph, and the output results are fused into the identification result.
The Loss functions widely used in most semantic segmentation methods at present are a Dice Loss function and a cross entropy function, and for a single pixel point, the Dice Loss function is derived from a Dice coefficient and is a measurement function for measuring similarity of a set, and is generally used for calculating the similarity between two samples, wherein the Dice Loss function has the following formula:
wherein, p is the true value of the pixel point, and the value is 0 and 1; y is a pixel point predicted value, and is a value which is subjected to sigmoid or softmax, and the value is between (0 and 1); epsilon is a smoothing coefficient, and its function is to prevent denominator prediction from being 0, and it may also function as a smoothing loss and gradient, where epsilon=1.
For multiple pixels, the Dice Loss function formula is as follows:
however, the solution finds that the negative samples are too many in the model training process, which results in inconsistent results during training and testing, and the convergence effect is relatively poor during model training. Therefore, the method improves the denominator of the Dice Loss function into the form of the square sum through experiments, better convergence can be realized, and the improved Dice Loss function is as follows:
however, in the training task of the semantic segmentation model, there is a phenomenon that the number of simple negative samples is too large, and the model cannot distinguish between the positive samples and the difficult negative samples due to the too large number of simple samples. To solve this problem, the present solution continuously adjusts each during the training of the modelThe weight of the sample was determined using (1-y i ) As a weight for each sample. For a simple sample, because the model can easily fit y i Pushing to 1, so the weight of the training device becomes smaller gradually in the training process, and the modified Dice Loss function is finally as follows:
wherein n represents the total number of samples of the aerial photographing data set, and i represents the ith sample of the aerial photographing data set; p is p i The pixel point true value of i Zhang Yangben is represented, and the value is 0 or 1; y is i The pixel point predicted value of the ith sample is represented, and the value is between (0 and 1); epsilon represents the smoothing system.
The cross entropy function is mainly used for measuring the difference between the predicted distribution Y and the real distribution P of the same random variable X, and the cross entropy function formula is as follows:
to solve the problem of class imbalance, the same usage (1-y i ) As the weight of each sample, the improved cross entropy function formula is:
because the invention performs small target semantic segmentation and target recognition on the high-altitude ground, gradient saturation phenomenon may occur due to extreme conditions in the model training process, the improved Dice Loss function and the cross entropy function are combined, and the combined total Loss function is as follows:
L=L dice `+L ce `
in summary, the semantic segmentation and target detection technology is implemented in the semantic segmentation model, the backbone network used is STDC-BiSeNet, the parameter number of the total model is greatly optimized, and the model has the advantage of light weight by using the same backbone network, so that the rapid deployment of the model is facilitated. The model is deployed into TX2 for testing, semantic segmentation MPA (average pixel precision) reaches 90%, target detection MAP (average precision) reaches 96.8%, and fps reaches 59, which shows that the model has high-efficiency segmentation performance and real-time performance in the unmanned aerial vehicle aerial data set established in the invention.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (7)
1. A protection target-oriented unmanned aerial vehicle standby area identification method is characterized by comprising the following steps of: the method comprises the following steps:
step 1, collecting historical aerial image data of an unmanned aerial vehicle, screening the historical aerial image data and labeling pixel points by pixel points to form an aerial image data set;
step 2, inputting the aerial photographing data set into a target recognition network to obtain context characteristics; the target recognition network comprises a plurality of layers of semantic segmentation models and a unified attention fusion module connected with the semantic segmentation models, wherein after the aerial data set is input into the semantic segmentation models, the obtained global feature map of a part of layers is input into the unified attention fusion module, and a context feature map is obtained;
and 3, respectively inputting the context feature map into a semantic segmentation head and a target detection head, and fusing the output results of the semantic segmentation head and the target detection head into identification results.
2. The protection-target-oriented unmanned aerial vehicle standby area identification method according to claim 1, wherein the method comprises the following steps of: the semantic segmentation model of each layer comprises a first full convolution module, a second full convolution module, a first characteristic attention weight module, a second characteristic attention weight module and a third characteristic attention weight module which are connected in sequence;
the aerial photographing data set with the scale of 224 x 3 is input into a first full convolution module, and a first feature map with the scale of 112 x 32 is output to a second full convolution module after being processed by the first full convolution module; after being processed by the second full convolution module, the second feature map with the scale of 56 x 64 is output to the first feature attention weight module;
the first global feature map F with the scale of 28 x 256 is output to the second feature attention weight module after being processed by the first feature attention weight module low1 The method comprises the steps of carrying out a first treatment on the surface of the The second global feature map F with the scale of 14 x 512 is output to a third feature attention weight module after being processed by the second feature attention weight module low2 The method comprises the steps of carrying out a first treatment on the surface of the Outputting a third global feature map F with the scale of 7 x 1024 to the global pooling layer after being processed by a third feature attention weight module low3 ;
The first feature attention weight module, the second feature attention weight module and the third feature attention weight module of the semantic segmentation model output a first global feature map F to the unified attention fusion module respectively low1 Second global feature map F low2 Third global feature map F low3 。
3. The protection-target-oriented unmanned aerial vehicle spare area identification method according to claim 2, wherein the method comprises the following steps of: the first full convolution module and the second full convolution module comprise a convolution layer, a normalization layer and an activation layer which are sequentially connected.
4. The protection-target-oriented unmanned aerial vehicle spare area identification method according to claim 2, wherein the method comprises the following steps of: the first feature attention weight module, the second feature attention weight module and the third feature attention weight module comprise global pooling layers, and a first attention convolution layer, a second attention convolution layer, a third attention convolution layer, a fourth attention convolution layer and a Concat layer which are sequentially connected;
the convolution kernel size of the first attention convolution layer is 1*1, the convolution kernel size of the second attention convolution layer is 3*3, the convolution kernel size of the third attention convolution layer is 3*3, and the convolution kernel size of the fourth attention convolution layer is 3*3;
input to the first attention convolution layer is a low-level feature map F 0 Low-level feature map F 0 Obtaining a first global feature subgraph F through a first attention convolution layer 1 The method comprises the steps of carrying out a first treatment on the surface of the First global feature subgraph F 1 Obtaining a second global feature subgraph F through a second attention convolution layer 2 The method comprises the steps of carrying out a first treatment on the surface of the Second global feature subgraph F 2 Obtaining a third global feature subgraph F through a third attention convolution layer 3 The method comprises the steps of carrying out a first treatment on the surface of the Third global feature subgraph F 3 Obtaining a fourth global feature subgraph F through a fourth attention convolution layer 4 ;
First global feature subgraph F 1 After passing through the global pooling layer with the core size of 3*3, the core is combined with a second global feature subgraph F 2 Third global feature subgraph F 3 Four global feature subgraphs F 4 Fused into a global feature map F through Concat layer low 。
5. The protection-target-oriented unmanned aerial vehicle spare area identification method according to claim 2, wherein the method comprises the following steps of: the unified attention fusion module is also connected with a pyramid pooling module, and the pyramid pooling module is used for increasing the receptive field when the context feature map is extracted;
the pyramid pooling module outputs a third global feature map F to a third feature attention weighting module low3 Processing to obtain a third high-level global feature map F high3 The method comprises the steps of carrying out a first treatment on the surface of the Map the third global feature map F low3 And a third high-level global feature map F high3 The unified attention fusion module is input together to obtain a third context feature diagram F out3 ;
Map of third context feature F out3 As a second high-level global feature map F high2 A second global feature map F output by the second feature attention weighting module low2 The unified attention fusion module is input together to obtain a second context feature map F out2 ;
Map the second context feature F out2 As a first high-level global feature map F high1 A first global feature map F output by the first feature attention weighting module low1 The unified attention fusion module is input together to obtain a first context feature map F out1 。
6. The protection-target-oriented unmanned aerial vehicle spare area identification method of claim 5, wherein the method comprises the following steps of: global feature map F low And an advanced global feature map F high The unified attention fusion module is input together to obtain a context feature map F out Comprises the steps of:
for the advanced global feature map F high Upsampling to form F up :
F up =Upsample(F high )
Will F up And global feature map F low The channels of the common input attention mechanism produce weights a, 1-a:
(a,1-a)=Attention(F up ,F high )
wherein a is F up Is 1-a is F low Weights of (2);
and then F is arranged up 、F high Multiplying the obtained product with the weight to obtain a context feature map F out :
F out =F up *a+F low *(1-a)。
7. The protection-target-oriented unmanned aerial vehicle standby area identification method according to claim 1, wherein the method comprises the following steps of: the loss function of the target recognition network is as follows:
the Dice Loss function:
cross entropy function:
wherein n represents the total number of samples of the aerial photographing data set, and i represents the ith sample of the aerial photographing data set; p is p i The pixel point true value of i Zhang Yangben is represented, and the value is 0 or 1; y is i The pixel point predicted value of the ith sample is represented, and the value is between (0 and 1); epsilon represents the smoothing coefficient;
the overall loss function of the object recognition network is:
L=L dice `+L ce `。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310139757.8A CN116416534A (en) | 2023-02-21 | 2023-02-21 | Unmanned aerial vehicle spare area identification method facing protection target |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310139757.8A CN116416534A (en) | 2023-02-21 | 2023-02-21 | Unmanned aerial vehicle spare area identification method facing protection target |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116416534A true CN116416534A (en) | 2023-07-11 |
Family
ID=87057277
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310139757.8A Pending CN116416534A (en) | 2023-02-21 | 2023-02-21 | Unmanned aerial vehicle spare area identification method facing protection target |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116416534A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117593716A (en) * | 2023-12-07 | 2024-02-23 | 山东大学 | Lane line identification method and system based on unmanned aerial vehicle inspection image |
-
2023
- 2023-02-21 CN CN202310139757.8A patent/CN116416534A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117593716A (en) * | 2023-12-07 | 2024-02-23 | 山东大学 | Lane line identification method and system based on unmanned aerial vehicle inspection image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188705B (en) | Remote traffic sign detection and identification method suitable for vehicle-mounted system | |
CN113065558A (en) | Lightweight small target detection method combined with attention mechanism | |
EP3101594A1 (en) | Saliency information acquisition device and saliency information acquisition method | |
CN108090472B (en) | Pedestrian re-identification method and system based on multi-channel consistency characteristics | |
CN107092884B (en) | Rapid coarse-fine cascade pedestrian detection method | |
CN110246148B (en) | Multi-modal significance detection method for depth information fusion and attention learning | |
CN107025440A (en) | A kind of remote sensing images method for extracting roads based on new convolutional neural networks | |
CN110889398B (en) | Multi-modal image visibility detection method based on similarity network | |
CN111242127A (en) | Vehicle detection method with granularity level multi-scale characteristics based on asymmetric convolution | |
CN110751209B (en) | Intelligent typhoon intensity determination method integrating depth image classification and retrieval | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
CN114742799B (en) | Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network | |
CN107967474A (en) | A kind of sea-surface target conspicuousness detection method based on convolutional neural networks | |
CN110490155B (en) | Method for detecting unmanned aerial vehicle in no-fly airspace | |
CN115035295B (en) | Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function | |
CN112784756B (en) | Human body identification tracking method | |
CN111738114B (en) | Vehicle target detection method based on anchor-free accurate sampling remote sensing image | |
CN113780132A (en) | Lane line detection method based on convolutional neural network | |
CN107545281B (en) | Single harmful gas infrared image classification and identification method based on deep learning | |
CN116416534A (en) | Unmanned aerial vehicle spare area identification method facing protection target | |
CN111027440B (en) | Crowd abnormal behavior detection device and detection method based on neural network | |
CN113011308A (en) | Pedestrian detection method introducing attention mechanism | |
CN115115973A (en) | Weak and small target detection method based on multiple receptive fields and depth characteristics | |
CN115526852A (en) | Molten pool and splash monitoring method in selective laser melting process based on target detection and application | |
CN116342894A (en) | GIS infrared feature recognition system and method based on improved YOLOv5 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |