CN116416534A - Unmanned aerial vehicle spare area identification method facing protection target - Google Patents

Unmanned aerial vehicle spare area identification method facing protection target Download PDF

Info

Publication number
CN116416534A
CN116416534A CN202310139757.8A CN202310139757A CN116416534A CN 116416534 A CN116416534 A CN 116416534A CN 202310139757 A CN202310139757 A CN 202310139757A CN 116416534 A CN116416534 A CN 116416534A
Authority
CN
China
Prior art keywords
attention
module
feature map
feature
global feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310139757.8A
Other languages
Chinese (zh)
Inventor
屈若锟
刘晔璐
陈忠辉
谭锦涛
李诚龙
江波
黄龙杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Xunyi Network Technology Co ltd
Civil Aviation Flight University of China
Original Assignee
Hangzhou Xunyi Network Technology Co ltd
Civil Aviation Flight University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xunyi Network Technology Co ltd, Civil Aviation Flight University of China filed Critical Hangzhou Xunyi Network Technology Co ltd
Priority to CN202310139757.8A priority Critical patent/CN116416534A/en
Publication of CN116416534A publication Critical patent/CN116416534A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Remote Sensing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a protection target-oriented unmanned aerial vehicle spare area identification method, which comprises the following steps: collecting historical aerial image data of the unmanned aerial vehicle, screening the historical aerial image data and marking pixel points by pixel points to form an aerial image data set; inputting the aerial photographing data set into a target recognition network to obtain a context characteristic; the target recognition network comprises a plurality of layers of semantic segmentation models and a unified attention fusion module connected with the semantic segmentation models, wherein after the aerial data set is input into the semantic segmentation models, the obtained global feature map of a part of layers is input into the unified attention fusion module, and a context feature map is obtained; and respectively inputting the context feature map into a semantic segmentation head and a target detection head, and fusing the output results of the semantic segmentation head and the target detection head into identification results. The invention detects pedestrians and vehicles in the spare landing area through the semantic segmentation technology in computer vision so as to ensure that the unmanned aerial vehicle can safely land to the spare landing area.

Description

Unmanned aerial vehicle spare area identification method facing protection target
Technical Field
The invention relates to the technical field of semantic segmentation in computer vision, in particular to a protection target-oriented unmanned aerial vehicle spare area recognition method.
Background
The unmanned aerial vehicle aerial image semantic segmentation is to apply semantic segmentation technology to unmanned aerial vehicle aerial image technology to enable unmanned aerial vehicle to obtain intelligent perception capability of scene targets. Aiming at the semantic segmentation of the unmanned aerial vehicle aerial image, the object-oriented extremely complex aerial scene graph, the spare areas to be identified comprise horizontal roofs, horizontal floors, horizontal grasslands and the like, pedestrians and vehicles in each spare area are detected, and if no pedestrians and vehicles exist in the scene, the spare areas can be judged.
Disclosure of Invention
The invention aims to detect pedestrians and vehicles in a spare landing area through a semantic segmentation technology in computer vision so as to ensure that an unmanned aerial vehicle can safely land to the spare landing area, and provides a protection target-oriented unmanned aerial vehicle spare landing area identification method.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
a protection target-oriented unmanned aerial vehicle standby area identification method comprises the following steps:
step 1, collecting historical aerial image data of an unmanned aerial vehicle, screening the historical aerial image data and labeling pixel points by pixel points to form an aerial image data set;
step 2, inputting the aerial photographing data set into a target recognition network to obtain context characteristics; the target recognition network comprises a plurality of layers of semantic segmentation models and a unified attention fusion module connected with the semantic segmentation models, wherein after the aerial data set is input into the semantic segmentation models, the obtained global feature map of a part of layers is input into the unified attention fusion module, and a context feature map is obtained;
and 3, respectively inputting the context feature map into a semantic segmentation head and a target detection head, and fusing the output results of the semantic segmentation head and the target detection head into identification results.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, the semantic segmentation technology is utilized to segment and identify the preparation descending region of the unmanned aerial vehicle, and as the semantic segmentation is a pixel-level image understanding method, the preparation descending region identification is more accurate and more efficient, and the STDC-BiSeNet network model is in a leading technology in the current real-time semantic segmentation field, so that the scientificity and popularization of the method are embodied.
The invention identifies pedestrians and vehicles in the preliminary landing area, and has good identification effect, thereby guaranteeing the life and property safety of pedestrians on the ground.
According to the invention, the STDC-BiSeNet backbone network is shared by semantic segmentation and target detection, so that the participation of the whole task model is reduced, the whole model is lightened, and the rapid deployment of the model is facilitated.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a feature attention weighting module according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a full convolution module according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a process of the unified attention fusion module according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish one from another, and are not to be construed as indicating or implying a relative importance or implying any actual such relationship or order between such entities or operations. In addition, the terms "connected," "coupled," and the like may be used to denote a direct connection between elements, or an indirect connection via other elements.
Example 1:
the invention is realized by the following technical scheme, as shown in fig. 1, and the unmanned aerial vehicle spare area identification method facing the protection target comprises the following steps:
step 1, collecting historical aerial image data of the unmanned aerial vehicle, screening the historical aerial image data and labeling pixel points by pixel to form an aerial data set.
In order to enable the unmanned aerial vehicle to show better generalization capability when executing a target recognition task, the embodiment collects a large amount of historical aerial image data or video data in different scenes, different time periods and different areas, screens the data and marks the data pixel by pixel, uses a labeme image marking tool during marking, and makes the marked data into a VOC data set format to form an aerial image data set.
Step 2, inputting the aerial photographing data set into a target recognition network to obtain context characteristics; the target recognition network comprises a plurality of layers of semantic segmentation models and a unified attention fusion module connected with the semantic segmentation models, wherein after the aerial data set is input into the semantic segmentation models, the obtained global feature map of the partial layers is input into the unified attention fusion module, and the context feature map is obtained.
The current mainstream semantic segmentation model is mostly an encoder-decoder (encoder-decoder) structure, which is used for feature extraction in the encoder part, so that the feature map is rich in semantic information while the resolution of the feature map is gradually reduced, and the final segmentation prediction result is decoded by using the features encoded by the encoder as input in the decoder part. There are many problems in the basic framework, in which detail information is required in addition to semantic information in the semantic segmentation task, a model often loses a large amount of detail information in the continuous convolution and pooling processes, and the process often causes the model parameters to become large.
In order to achieve the real-time effect in the semantic segmentation task, the scheme adopts a semantic segmentation model (STDC-BiSeNet), and the model is simple in network, small in parameter quantity, very light in weight and good in segmentation performance. The method is built on an unmanned aerial vehicle platform, and can well realize the identification of a spare landing area, a forced landing area and a protection target.
Referring to fig. 1, the semantic segmentation model includes 5 layers, which are a first full convolution module, a second full convolution module, a first feature attention weight module, a second feature attention weight module, and a third feature attention weight module that are sequentially connected.
The aerial photographing data set with the scale of 224 x 3 is input into a first full convolution module, and a first feature map with the scale of 112 x 32 is output to a second full convolution module after being processed by the first full convolution module; and outputting a second characteristic diagram with the scale of 56 x 64 to the first characteristic attention weighting module after being processed by the second full convolution module.
The first global feature map F with the scale of 28 x 256 is output to the second feature attention weight module after being processed by the first feature attention weight module low1 The method comprises the steps of carrying out a first treatment on the surface of the The second global feature map F with the scale of 14 x 512 is output to a third feature attention weight module after being processed by the second feature attention weight module low2 The method comprises the steps of carrying out a first treatment on the surface of the Outputting a third global feature map F with the scale of 7 x 1024 to the global pooling layer after being processed by a third feature attention weight module low3
The first feature attention weight module, the second feature attention weight module and the third feature attention weight module of the semantic segmentation model output a first global feature map F to the unified attention fusion module respectively low1 Second global feature map F low2 Third global feature map F low3
The first full convolution module and the second full convolution module have the same structure, and referring to fig. 3, the first full convolution module and the second full convolution module each include a convolution layer, a normalization layer, and an activation layer that are sequentially connected.
The first feature attention weight module, the second feature attention weight module and the third feature attention weight module have the same structure, please refer to fig. 2, and the first feature attention weight module, the second feature attention weight module and the third feature attention weight module all include a global pooling layer, and a first attention convolution layer, a second attention convolution layer, a third attention convolution layer, a fourth attention convolution layer and a Concat layer which are sequentially connected. The convolution kernel size of the first attention convolution layer is 1*1, the convolution kernel size of the second attention convolution layer is 3*3, the convolution kernel size of the third attention convolution layer is 3*3, and the convolution kernel size of the fourth attention convolution layer is 3*3.
With continued reference to fig. 2, the aerial data set passes through a first full convolution module and a second full convolution module to obtain a low-level feature map F 0 Low-level feature map F 0 Obtaining a first global feature subgraph F through a first attention convolution layer 1 The method comprises the steps of carrying out a first treatment on the surface of the First global feature subgraph F 1 Obtaining a second global feature subgraph F through a second attention convolution layer 2 The method comprises the steps of carrying out a first treatment on the surface of the Second global feature subgraph F 2 Obtaining a third global feature subgraph F through a third attention convolution layer 3 The method comprises the steps of carrying out a first treatment on the surface of the Third global feature subgraph F 3 Obtaining a fourth global feature subgraph F through a fourth attention convolution layer 4 The method comprises the steps of carrying out a first treatment on the surface of the First global feature subgraph F 1 After passing through the global pooling layer with the core size of 3*3, the core is combined with a second global feature subgraph F 2 Third global feature subgraph F 3 Four global feature subgraphs F 4 Fused into a global feature map F through Concat layer low
It is easy to understand that the first feature attention weighting module outputs a first global feature map F low1 The second feature attention weighting module outputs a second global feature map F low2 The third feature attention weighting module outputs a third global feature map F low3
With continued reference to fig. 1, the unified attention fusion module is further connected to a pyramid pooling module, and the pyramid pooling module is used to increase receptive fields when extracting the context feature map.
The pyramid pooling module outputs a third global feature map F to a third feature attention weighting module low3 Processing to obtain a third high-level global feature map F high3 The method comprises the steps of carrying out a first treatment on the surface of the Map the third global feature map F low3 And a third high-level global feature map F high3 The unified attention fusion module is input together to obtain a third context feature diagram F out3
Map of third context feature F out3 As a second high-level global feature map F high2 And a second feature attention weighting moduleOutput second global feature map F low2 The unified attention fusion module is input together to obtain a second context feature map F out2
Map the second context feature F out2 As a first high-level global feature map F high1 A first global feature map F output by the first feature attention weighting module low1 The unified attention fusion module is input together to obtain a first context feature map F out1
Referring to FIG. 4, a third global feature map F is obtained by the unified attention fusion module low3 And a third high-level global feature map F high3 To illustrate the processing, the pyramid pooling module first pools the third global feature map F low3 Processing to obtain a third high-level global feature map F high3 Third global feature map F low3 And a third high-level global feature map F high3 The unified attention fusion module is input together to obtain a third context feature diagram F out3
For the third high-level global feature map F high3 Upsampling to form F up3
F up3 =Upsample(F high3 )
Will F up3 And a third global feature map F low3 The channels of the common input attention mechanism produce weights a, 1-a:
(a,1-a)=Attention(F up3 ,F high3 )
wherein a is F up3 Is 1-a is F low3 Weights of (2);
and then F is arranged up3 、F high3 Multiplying the third context feature map F by the respective weights out3
F out3 =F up3 *a+F low3 *(1-a)。
It is easy to understand that the second context feature map F is obtained out2 With the first context feature map F out1 The same manner is not repeated.
On the other hand, the low-level feature map F of the first attention convolution layer is input in FIG. 2 0 The number of channels is M, and a first global feature subgraph F is obtained after the first attention convolution layer 1 The channel number of (2) is M/2, and then the second global feature subgraph F obtained after the downward convolution treatment and the second attention convolution layer 2 The number of channels of the third attention convolution layer and the fourth attention convolution layer is M/4, then the number of channels of the third attention convolution layer and the fourth attention convolution layer is M/8, and then the first global feature subgraph F 1 Second global feature subgraph F 2 Third global feature subgraph F 3 Fourth global feature subgraph F 4 And performing jump connection splicing fusion. The feature map output to the unified attention fusion module by the pyramid pooling module needs up-sampling, the number of channels is continuously increased, and the feature space is continuously reduced, so that the calculation cost is balanced.
In order to enhance the feature extraction of the target recognition network, the target recognition network has the context multiscale capability, so that the scheme introduces a unified attention fusion module, and the global feature graphs output by the first feature attention weight module, the second feature attention weight module and the third attention weight module are transmitted into the unified attention fusion module to be unified fused, thereby fully utilizing the relations between the spaces of input features and between channels, which is a key factor for improving the segmentation precision.
In summary, the semantic segmentation module, the unified attention fusion module and the pyramid pooling module are connected with the target recognition network, so that the network calculation amount is reduced on the basis of the traditional BiSeNet model, and meanwhile, the calculation efficiency of the model is improved. The result of layer jump connection is adopted integrally, a unified attention fusion module and a pyramid pooling module are introduced, the receptive field of the target recognition network is enlarged, and the context characteristics are fused.
And 3, respectively inputting the context feature map into a semantic segmentation head and a target detection head, and fusing the output results of the semantic segmentation head and the target detection head into identification results.
Map the first context feature F out1 The method comprises the steps that as the final output context characteristics of a target recognition network, a prediction part is input, the prediction part comprises two parallel parts of a semantic segmentation head and a target detection head, and the context characteristics are subjected to semantic segmentationAnd after the header and the target detection header, the content is displayed on a graph, and the output results are fused into the identification result.
The Loss functions widely used in most semantic segmentation methods at present are a Dice Loss function and a cross entropy function, and for a single pixel point, the Dice Loss function is derived from a Dice coefficient and is a measurement function for measuring similarity of a set, and is generally used for calculating the similarity between two samples, wherein the Dice Loss function has the following formula:
Figure BDA0004087102450000081
wherein, p is the true value of the pixel point, and the value is 0 and 1; y is a pixel point predicted value, and is a value which is subjected to sigmoid or softmax, and the value is between (0 and 1); epsilon is a smoothing coefficient, and its function is to prevent denominator prediction from being 0, and it may also function as a smoothing loss and gradient, where epsilon=1.
For multiple pixels, the Dice Loss function formula is as follows:
Figure BDA0004087102450000082
however, the solution finds that the negative samples are too many in the model training process, which results in inconsistent results during training and testing, and the convergence effect is relatively poor during model training. Therefore, the method improves the denominator of the Dice Loss function into the form of the square sum through experiments, better convergence can be realized, and the improved Dice Loss function is as follows:
Figure BDA0004087102450000091
however, in the training task of the semantic segmentation model, there is a phenomenon that the number of simple negative samples is too large, and the model cannot distinguish between the positive samples and the difficult negative samples due to the too large number of simple samples. To solve this problem, the present solution continuously adjusts each during the training of the modelThe weight of the sample was determined using (1-y i ) As a weight for each sample. For a simple sample, because the model can easily fit y i Pushing to 1, so the weight of the training device becomes smaller gradually in the training process, and the modified Dice Loss function is finally as follows:
Figure BDA0004087102450000092
wherein n represents the total number of samples of the aerial photographing data set, and i represents the ith sample of the aerial photographing data set; p is p i The pixel point true value of i Zhang Yangben is represented, and the value is 0 or 1; y is i The pixel point predicted value of the ith sample is represented, and the value is between (0 and 1); epsilon represents the smoothing system.
The cross entropy function is mainly used for measuring the difference between the predicted distribution Y and the real distribution P of the same random variable X, and the cross entropy function formula is as follows:
Figure BDA0004087102450000093
to solve the problem of class imbalance, the same usage (1-y i ) As the weight of each sample, the improved cross entropy function formula is:
Figure BDA0004087102450000094
because the invention performs small target semantic segmentation and target recognition on the high-altitude ground, gradient saturation phenomenon may occur due to extreme conditions in the model training process, the improved Dice Loss function and the cross entropy function are combined, and the combined total Loss function is as follows:
L=L dice `+L ce `
in summary, the semantic segmentation and target detection technology is implemented in the semantic segmentation model, the backbone network used is STDC-BiSeNet, the parameter number of the total model is greatly optimized, and the model has the advantage of light weight by using the same backbone network, so that the rapid deployment of the model is facilitated. The model is deployed into TX2 for testing, semantic segmentation MPA (average pixel precision) reaches 90%, target detection MAP (average precision) reaches 96.8%, and fps reaches 59, which shows that the model has high-efficiency segmentation performance and real-time performance in the unmanned aerial vehicle aerial data set established in the invention.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A protection target-oriented unmanned aerial vehicle standby area identification method is characterized by comprising the following steps of: the method comprises the following steps:
step 1, collecting historical aerial image data of an unmanned aerial vehicle, screening the historical aerial image data and labeling pixel points by pixel points to form an aerial image data set;
step 2, inputting the aerial photographing data set into a target recognition network to obtain context characteristics; the target recognition network comprises a plurality of layers of semantic segmentation models and a unified attention fusion module connected with the semantic segmentation models, wherein after the aerial data set is input into the semantic segmentation models, the obtained global feature map of a part of layers is input into the unified attention fusion module, and a context feature map is obtained;
and 3, respectively inputting the context feature map into a semantic segmentation head and a target detection head, and fusing the output results of the semantic segmentation head and the target detection head into identification results.
2. The protection-target-oriented unmanned aerial vehicle standby area identification method according to claim 1, wherein the method comprises the following steps of: the semantic segmentation model of each layer comprises a first full convolution module, a second full convolution module, a first characteristic attention weight module, a second characteristic attention weight module and a third characteristic attention weight module which are connected in sequence;
the aerial photographing data set with the scale of 224 x 3 is input into a first full convolution module, and a first feature map with the scale of 112 x 32 is output to a second full convolution module after being processed by the first full convolution module; after being processed by the second full convolution module, the second feature map with the scale of 56 x 64 is output to the first feature attention weight module;
the first global feature map F with the scale of 28 x 256 is output to the second feature attention weight module after being processed by the first feature attention weight module low1 The method comprises the steps of carrying out a first treatment on the surface of the The second global feature map F with the scale of 14 x 512 is output to a third feature attention weight module after being processed by the second feature attention weight module low2 The method comprises the steps of carrying out a first treatment on the surface of the Outputting a third global feature map F with the scale of 7 x 1024 to the global pooling layer after being processed by a third feature attention weight module low3
The first feature attention weight module, the second feature attention weight module and the third feature attention weight module of the semantic segmentation model output a first global feature map F to the unified attention fusion module respectively low1 Second global feature map F low2 Third global feature map F low3
3. The protection-target-oriented unmanned aerial vehicle spare area identification method according to claim 2, wherein the method comprises the following steps of: the first full convolution module and the second full convolution module comprise a convolution layer, a normalization layer and an activation layer which are sequentially connected.
4. The protection-target-oriented unmanned aerial vehicle spare area identification method according to claim 2, wherein the method comprises the following steps of: the first feature attention weight module, the second feature attention weight module and the third feature attention weight module comprise global pooling layers, and a first attention convolution layer, a second attention convolution layer, a third attention convolution layer, a fourth attention convolution layer and a Concat layer which are sequentially connected;
the convolution kernel size of the first attention convolution layer is 1*1, the convolution kernel size of the second attention convolution layer is 3*3, the convolution kernel size of the third attention convolution layer is 3*3, and the convolution kernel size of the fourth attention convolution layer is 3*3;
input to the first attention convolution layer is a low-level feature map F 0 Low-level feature map F 0 Obtaining a first global feature subgraph F through a first attention convolution layer 1 The method comprises the steps of carrying out a first treatment on the surface of the First global feature subgraph F 1 Obtaining a second global feature subgraph F through a second attention convolution layer 2 The method comprises the steps of carrying out a first treatment on the surface of the Second global feature subgraph F 2 Obtaining a third global feature subgraph F through a third attention convolution layer 3 The method comprises the steps of carrying out a first treatment on the surface of the Third global feature subgraph F 3 Obtaining a fourth global feature subgraph F through a fourth attention convolution layer 4
First global feature subgraph F 1 After passing through the global pooling layer with the core size of 3*3, the core is combined with a second global feature subgraph F 2 Third global feature subgraph F 3 Four global feature subgraphs F 4 Fused into a global feature map F through Concat layer low
5. The protection-target-oriented unmanned aerial vehicle spare area identification method according to claim 2, wherein the method comprises the following steps of: the unified attention fusion module is also connected with a pyramid pooling module, and the pyramid pooling module is used for increasing the receptive field when the context feature map is extracted;
the pyramid pooling module outputs a third global feature map F to a third feature attention weighting module low3 Processing to obtain a third high-level global feature map F high3 The method comprises the steps of carrying out a first treatment on the surface of the Map the third global feature map F low3 And a third high-level global feature map F high3 The unified attention fusion module is input together to obtain a third context feature diagram F out3
Map of third context feature F out3 As a second high-level global feature map F high2 A second global feature map F output by the second feature attention weighting module low2 The unified attention fusion module is input together to obtain a second context feature map F out2
Map the second context feature F out2 As a first high-level global feature map F high1 A first global feature map F output by the first feature attention weighting module low1 The unified attention fusion module is input together to obtain a first context feature map F out1
6. The protection-target-oriented unmanned aerial vehicle spare area identification method of claim 5, wherein the method comprises the following steps of: global feature map F low And an advanced global feature map F high The unified attention fusion module is input together to obtain a context feature map F out Comprises the steps of:
for the advanced global feature map F high Upsampling to form F up
F up =Upsample(F high )
Will F up And global feature map F low The channels of the common input attention mechanism produce weights a, 1-a:
(a,1-a)=Attention(F up ,F high )
wherein a is F up Is 1-a is F low Weights of (2);
and then F is arranged up 、F high Multiplying the obtained product with the weight to obtain a context feature map F out
F out =F up *a+F low *(1-a)。
7. The protection-target-oriented unmanned aerial vehicle standby area identification method according to claim 1, wherein the method comprises the following steps of: the loss function of the target recognition network is as follows:
the Dice Loss function:
Figure FDA0004087102440000041
cross entropy function:
Figure FDA0004087102440000042
wherein n represents the total number of samples of the aerial photographing data set, and i represents the ith sample of the aerial photographing data set; p is p i The pixel point true value of i Zhang Yangben is represented, and the value is 0 or 1; y is i The pixel point predicted value of the ith sample is represented, and the value is between (0 and 1); epsilon represents the smoothing coefficient;
the overall loss function of the object recognition network is:
L=L dice `+L ce `。
CN202310139757.8A 2023-02-21 2023-02-21 Unmanned aerial vehicle spare area identification method facing protection target Pending CN116416534A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310139757.8A CN116416534A (en) 2023-02-21 2023-02-21 Unmanned aerial vehicle spare area identification method facing protection target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310139757.8A CN116416534A (en) 2023-02-21 2023-02-21 Unmanned aerial vehicle spare area identification method facing protection target

Publications (1)

Publication Number Publication Date
CN116416534A true CN116416534A (en) 2023-07-11

Family

ID=87057277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310139757.8A Pending CN116416534A (en) 2023-02-21 2023-02-21 Unmanned aerial vehicle spare area identification method facing protection target

Country Status (1)

Country Link
CN (1) CN116416534A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593716A (en) * 2023-12-07 2024-02-23 山东大学 Lane line identification method and system based on unmanned aerial vehicle inspection image

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593716A (en) * 2023-12-07 2024-02-23 山东大学 Lane line identification method and system based on unmanned aerial vehicle inspection image

Similar Documents

Publication Publication Date Title
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
EP3101594A1 (en) Saliency information acquisition device and saliency information acquisition method
CN108090472B (en) Pedestrian re-identification method and system based on multi-channel consistency characteristics
CN107092884B (en) Rapid coarse-fine cascade pedestrian detection method
CN110246148B (en) Multi-modal significance detection method for depth information fusion and attention learning
CN107025440A (en) A kind of remote sensing images method for extracting roads based on new convolutional neural networks
CN110889398B (en) Multi-modal image visibility detection method based on similarity network
CN111242127A (en) Vehicle detection method with granularity level multi-scale characteristics based on asymmetric convolution
CN110751209B (en) Intelligent typhoon intensity determination method integrating depth image classification and retrieval
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN114742799B (en) Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network
CN107967474A (en) A kind of sea-surface target conspicuousness detection method based on convolutional neural networks
CN110490155B (en) Method for detecting unmanned aerial vehicle in no-fly airspace
CN115035295B (en) Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
CN112784756B (en) Human body identification tracking method
CN111738114B (en) Vehicle target detection method based on anchor-free accurate sampling remote sensing image
CN113780132A (en) Lane line detection method based on convolutional neural network
CN107545281B (en) Single harmful gas infrared image classification and identification method based on deep learning
CN116416534A (en) Unmanned aerial vehicle spare area identification method facing protection target
CN111027440B (en) Crowd abnormal behavior detection device and detection method based on neural network
CN113011308A (en) Pedestrian detection method introducing attention mechanism
CN115115973A (en) Weak and small target detection method based on multiple receptive fields and depth characteristics
CN115526852A (en) Molten pool and splash monitoring method in selective laser melting process based on target detection and application
CN116342894A (en) GIS infrared feature recognition system and method based on improved YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination