CN114943835A - Real-time semantic segmentation method for aerial images of ice slush unmanned aerial vehicle in yellow river - Google Patents
Real-time semantic segmentation method for aerial images of ice slush unmanned aerial vehicle in yellow river Download PDFInfo
- Publication number
- CN114943835A CN114943835A CN202210415977.4A CN202210415977A CN114943835A CN 114943835 A CN114943835 A CN 114943835A CN 202210415977 A CN202210415977 A CN 202210415977A CN 114943835 A CN114943835 A CN 114943835A
- Authority
- CN
- China
- Prior art keywords
- module
- feature map
- output
- channels
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 15
- 238000005070 sampling Methods 0.000 claims description 63
- 230000004927 fusion Effects 0.000 claims description 26
- 238000011176 pooling Methods 0.000 claims description 24
- 238000010586 diagram Methods 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 9
- 230000003213 activating effect Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 3
- 238000010923 batch production Methods 0.000 claims description 2
- 238000010200 validation analysis Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/17—Terrestrial scenes taken from planes or by drones
Abstract
The invention discloses a real-time semantic segmentation method for aerial images of a Huanghe ice unmanned aerial vehicle, which comprises the steps of constructing a Huanghe ice semantic segmentation data set according to collected aerial images of the ice of the unmanned aerial vehicle, wherein the data set comprises the aerial images of the ice of the Huanghe unmanned aerial vehicle and tag data; and training the segmentation network FastICENet by using the constructed yellow river ice semantic segmentation data set to obtain a final semantic segmentation model. Even if the size and the shape of the ice in the image are different, the detection result of the invention is still accurate; when the precision of the semantic segmentation network is similar to that of other networks, the segmentation speed of the semantic segmentation network is far better than that of other semantic segmentation networks.
Description
Technical Field
The invention belongs to the technical field of pattern recognition, and particularly relates to a real-time semantic segmentation method for an aerial image.
Background
Semantic segmentation is a very important field in computer vision, and refers to identifying images at a pixel level, i.e. marking the object class to which each pixel in an image belongs, and the goal is to predict the class label of each pixel in the image. River ice condition monitoring has important significance for river management of shipping industry. Accurate ice segmentation is one of the most important techniques in ice condition monitoring studies. Among them, lightweight semantic segmentation is especially important in ice condition monitoring. The input image needs to be quickly analyzed, and the auxiliary system and the external environment can interact in time. Specifically, the method is used for rapidly and accurately segmenting the input yellow river ice image, monitoring the ice condition in the river in real time and giving early warning in time. Therefore, it is necessary to design a real-time accurate lightweight semantic segmentation network.
An early ice semantic segmentation method is mainly used for solving the technical problem that the existing ice detection method is poor in accuracy. For example, a segmentation network structure is constructed, the network comprises a shallow branch and a deep branch, and a channel attention module is added into the deep branch; adding a position attention module in the shallow layer branch; the fusion module is used for fusing the shallow branch and the deep branch. And putting the data in the training set into the network in batches, and training the constructed neural network by adopting cross entropy loss and an RMSprop optimizer. And finally, inputting an image to be tested, and testing by using the trained model. The method can selectively perform multi-level and multi-scale feature fusion, captures context information based on an attention mechanism, obtains a feature map with higher resolution and obtains better segmentation effect. However, the problem of slow segmentation speed exists, the segmentation network cannot be operated on low-power-consumption equipment in real time, and the actual landing requirement of the yellow river ice slush segmentation is difficult to meet.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a real-time semantic segmentation method for aerial images of a yellow river ice unmanned aerial vehicle, wherein a yellow river ice semantic segmentation data set is constructed according to collected aerial ice images of the unmanned aerial vehicle, and the data set comprises the aerial ice images of the yellow river unmanned aerial vehicle and tag data; and training the segmentation network FastICENet by using the constructed yellow river ice semantic segmentation data set to obtain a final semantic segmentation model. Even if the size and the shape of the ice in the image are different, the detection result of the invention is still accurate; when the precision of the semantic segmentation network is similar to that of other networks, the segmentation speed of the semantic segmentation network is far better than that of other semantic segmentation networks.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a yellow river ice semantic segmentation data set according to the collected unmanned aerial vehicle aerial ice image, wherein the data set comprises the yellow river unmanned aerial vehicle aerial ice image and label data; dividing a data set into a training set, a verification set and a test set;
step 2: constructing a semantic segmentation model FastICENet;
the semantic segmentation model FastICENet comprises a shallow detail branch, a deep semantic branch and a fusion upsampling module; the shallow detail branch is used for extracting low-level detail information of the slush image, the deep semantic branch is used for extracting deep semantic information of the slush image, and finally the deep semantic branch and the shallow detail branch are fused and sampled by the fusion upsampling module to obtain a semantic segmentation result with the same size as the original image;
step 2-1: the shallow detail branch is specifically as follows: the input image with the size of h multiplied by w, wherein h and w are the height and the width of the image respectively, sequentially pass through a convolution module I, a convolution module II and a convolution module II, and the resolution of a feature map is h/8 multiplied by w/8 after passing through the three convolution modules;
step 2-2: the deep semantic branch is specifically as follows:
step 2-2-1: an input image with the size of h multiplied by w sequentially passes through a first down-sampling module, a second down-sampling module and a third down-sampling module, and a feature map is obtained after the input image passes through the three down-sampling modules, wherein the resolution ratio is h/8 multiplied by w/8;
step 2-2-2: inputting the feature map obtained in the step 2-2-1 into a dense connection module I based on the phantom feature map, wherein the resolution of the output feature map is still h/8 xw/8;
step 2-2-3: inputting the feature map obtained in the step 2-2-2 into a fourth down-sampling module, wherein the resolution of the output feature map is h/16 multiplied by w/16;
step 2-2-4: inputting the feature map obtained in the step 2-2-3 into a second dense connection module based on a phantom feature map, and respectively inputting the output feature map into a first attention thinning module and a mean pooling module; stacking the output of the attention thinning module I and the output result of the average pooling module according to channels, and taking the obtained characteristic diagram as the output of the step 2-2-4;
step 2-2-5: enabling the feature map obtained in the step 2-2-4 to pass through a first up-sampling module, wherein the size of the output feature map is h/8 multiplied by w/8;
step 2-2-6: the outputs of the step 2-2-2 and the step 2-2-5 are jointly input into a second attention module, and the resolution of an output feature map is h/8 xw/8;
step 2-3: the fusion upsampling module specifically comprises: the output of the shallow detail branch and the deep semantic branch is jointly input into a feature fusion module, and the size of an output feature graph is h/8 multiplied by w/8; recovering the output of the feature fusion module to the original size h multiplied by w through an up-sampling module II, and predicting a segmentation result;
and step 3: and training a semantic segmentation model FastICENet by utilizing the training set and the verification set to obtain a final semantic segmentation model, and testing the performance of the final semantic segmentation model by utilizing the test set.
Preferably, the step 1 specifically comprises:
step 1-1: collecting multi-period and multi-region aerial yellow river ice images of an unmanned aerial vehicle;
step 1-2: the collected images were cropped to 1600 x 640 size images, each image was manually labeled with three classification labels pixel by pixel: ice, water and river banks;
step 1-3: the yellow river ice image and the classification label thereof are obtained through the step 1-2, and are divided into a training set, a verification set and a test set according to the proportion of 3: 1.
Preferably, the convolution kernel size of the convolution module one is 7 × 7, the step size is 2, and the padding size is 3; after the convolution kernel, connecting the combination of batch processing regularization and ReLU; the convolution kernel size of the convolution module two and the convolution module three is 3 x 3, the step length is 2, the filling size is 1, and the combination of batch process regularization and ReLU is connected later.
Preferably, the first down-sampling module, the second down-sampling module, the third down-sampling module and the fourth down-sampling module all adopt the following structures:
the number of input channels, the number of output channels and the number of convolution layer output channels of the feature map in the down-sampling module are respectively Win, Wout and Wconv;
in the down-sampling module, when Wout is larger than Win, an input feature map firstly passes through a convolution layer with a convolution kernel size of 3 multiplied by 3 and a maximum pooling layer of 2 multiplied by 2 in parallel, the step length of the convolution layer and the maximum pooling layer are both 2, the number Wconv of channels of the convolution layer output feature map is Wout-Win, and the number of channels of the output feature map of the maximum pooling layer is Win; then, the outputs of the convolutional layer and the maximum pooling layer are subjected to channel stacking and batch processing regularization, and activated by Relu to realize 2-time down-sampling of the feature map;
in a down-sampling module, when Wout is less than Win, the input feature map only passes through a convolution layer with the convolution kernel size of 3 multiplied by 3 and the step length of 2, then batch processing regularization and Relu activation are carried out, and 2 times down-sampling of the feature map is realized through a convolution mode;
the number of channels of the input feature map of the first downsampling module is 3, and the number of channels of the output feature map is 15; the number of channels of the input feature map of the second downsampling module is 15, and the number of channels of the output feature map is 30; the number of channels of the input feature map of the third downsampling module is 30, and the number of channels of the output feature map is 60; the number of channels of the input feature map of the down-sampling module four is 160, and the number of channels of the output feature map is 160.
Preferably, the structure of the first dense connection module based on the phantom feature diagram is the same as that of the second dense connection module based on the phantom feature diagram, and is defined as follows:
defining a phantom module: using the following formulaGenerating m original characteristic graphs Y' epsilon R by one convolution h×w×m :
Y′=X*f′
Where Y 'is the profile of the convolutional layer output, X is the convolutional layer input, f' is the R c×k×k×m The convolution kernel is used, m is less than or equal to n, and n is the number of layers of the actually required characteristic diagram in the network model;
applying a series of linear operations to each raw feature map in Y' to generate s phantom feature maps:
wherein y' i Is the ith original feature map in Y', phi i,j Is the jth linear operation for generating the jth phantom feature map y ij ;
By using linear operation, n-m-s feature maps Y-Y are obtained 11 ,y 12 ,...,y ij ,...,y ms ]As output data of the phantom module; finally, channel superposition is carried out on the original characteristic diagram and the phantom characteristic diagram, and the superposition result is used as the output of the phantom module;
using a dense connection mode for a plurality of phantom modules, namely, the input of each phantom module is the channel superposition of the input characteristic graph of the first initial dense connection module and the output characteristic graphs of all the phantom modules before;
the number of channels of an input feature map of a dense connection module based on the phantom feature map is 60, the number of channels of an output feature map is 160, and 5 phantom modules are used for dense connection;
the number of channels of the input feature map of the dense connection module II based on the phantom feature map is 160, the number of channels of the output feature map is 320, and 8 phantom modules are used for dense connection;
the 13 phantom modules are each added with 10 channels through the convolution layer, and 10 channels are added through linear operation, so that the number of output channels of each phantom module is increased by 20 channels relative to the input channels of the phantom module.
Preferably, the first attention module and the second attention module are implemented as follows: and sequentially carrying out global average pooling, 1 × 1 convolution and batch processing regularization on the input feature map, and finally obtaining a channel attention vector through sigmoid, then multiplying the channel attention vector by a corresponding bit of the input feature map, and adding a multiplication result and the input feature map to obtain a channel weighted feature map.
Preferably, the first upsampling module and the second upsampling module have the same structure, and the implementation manner is as follows: assume that the input feature map has a size ofWherein the content of the first and second substances,andfor feature height and width, C is the number of channels in the feature, and the input feature is passed through a convolutional layer of N convolution kernels of 1 × 1 size to produce a convolution kernel of sizeNew feature maps of (2); the new feature map is then reshaped to a size ofThe output feature map of (1); the first up-sampling module adopts 2 as the up-sampling multiple, and the second novel up-sampling module adopts 8 as the up-sampling multiple.
Preferably, the structure of the feature fusion module is as follows: firstly, stacking feature graph channels output by shallow detail branches and deep semantic branches by a feature fusion module, performing convolution kernel with step length of 1 and size of 1 multiplied by 1, and performing batch processing regularization and relu activation functions; secondly, performing global pooling on the output in the step one, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating a function through relu, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating the function through sigmoid, and multiplying the output by the corresponding bit of the output in the step one; and thirdly, adding and outputting the multiplication result in the step two and the output in the step one as the output of the characteristic fusion module.
The invention has the following beneficial effects:
1) the invention provides a double-branch lightweight semantic segmentation network which is used for real-time semantic segmentation of the ice of the yellow river;
2) the size and the shape of the ice in the image to be segmented are different, and the detection result is still accurate;
3) when the precision of the semantic segmentation network is similar to that of other networks, the segmentation speed of the semantic segmentation network is far better than that of other semantic segmentation networks.
Drawings
FIG. 1 is a diagram of a semantic segmentation model architecture of the present invention.
Fig. 2 is a block diagram of a downsampling module of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention aims to provide a real-time semantic segmentation method for yellow river icings, which improves the accuracy of a segmentation model through a double-branch structure and solves the problem of low segmentation speed by adopting a lightweight module.
A real-time semantic segmentation method for aerial images of an unmanned aerial vehicle for yellow river ice comprises the following steps:
step 1: constructing a yellow river ice semantic segmentation data set according to the collected unmanned aerial vehicle aerial ice image, wherein the data set comprises the yellow river unmanned aerial vehicle aerial ice image and label data;
step 1-1: collecting multi-period and multi-region unmanned aerial vehicle aerial photography yellow river ice images, and selecting clear images with sufficient illumination from the images;
step 1-2: the collected images were cropped to 1600 x 640 size images, each image being labeled with three classification labels by manual pixel by pixel: ice, water and river banks;
step 1-3: obtaining the yellow river ice image and the classification label thereof through the step 1-2, and performing the steps according to the following steps of 3: 1: the scale of 1 is divided into a training set, a validation set, and a test set.
Step 2: constructing a semantic segmentation model FastICENet;
the semantic segmentation model FastICENet comprises a shallow detail branch, a deep semantic branch and a fusion upsampling module; the shallow detail branch is used for extracting low-level detail information and texture information of the slush image, the deep semantic branch is used for extracting deep semantic information of the slush image, and finally the deep semantic branch and the shallow detail branch are fused and sampled by a fusion upsampling module to obtain a semantic segmentation result with the same size as the original image;
step 2-1: the shallow detail branch is specifically as follows: the input image with the size of h multiplied by w, wherein h and w are the height and the width of the image respectively, sequentially pass through a convolution module I, a convolution module II and a convolution module II, and after passing through the three convolution modules, the resolution of the feature map is h/8 multiplied by w/8;
step 2-2: the deep semantic branch is specifically as follows:
step 2-2-1: an input image with the size of h multiplied by w sequentially passes through a first down-sampling module, a second down-sampling module and a third down-sampling module, and a feature map is obtained after the input image passes through the three down-sampling modules, wherein the resolution ratio is h/8 multiplied by w/8;
step 2-2-2: inputting the feature map obtained in the step 2-2-1 into a dense connection module I based on the phantom feature map, wherein the resolution of the output feature map is still h/8 xw/8;
step 2-2-3: inputting the feature map obtained in the step 2-2-2 into a fourth down-sampling module, wherein the resolution of the output feature map is h/16 multiplied by w/16;
step 2-2-4: inputting the feature map obtained in the step 2-2-3 into a dense connection module II based on the phantom feature map, and respectively inputting the output feature map into an attention thinning module I and an average pooling module; stacking the output of the attention thinning module I and the output result of the average pooling module according to channels, and taking the obtained characteristic diagram as the output of the step 2-2-4;
step 2-2-5: enabling the characteristic diagram obtained in the step 2-2-4 to pass through a first up-sampling module, wherein the size of the output characteristic diagram is h/8 multiplied by w/8;
step 2-2-6: the outputs of the step 2-2-2 and the step 2-2-5 are jointly input into a second attention module, and the resolution of an output feature map is h/8 xw/8;
step 2-3: the fusion upsampling module specifically comprises: the output of the shallow detail branch and the deep semantic branch is jointly input into a feature fusion module, and the size of an output feature graph is h/8 multiplied by w/8; recovering the output of the feature fusion module to the original size h multiplied by w through an up-sampling module II, and predicting a segmentation result;
and 3, step 3: training a semantic segmentation model FastICENet by using the training set and the verification set to obtain a final semantic segmentation model, and testing the performance of the final semantic segmentation model by using the test set.
Preferably, the convolution kernel size of the convolution module one is 7 × 7, the step size is 2, and the padding size is 3; after the convolution kernel, connecting the combination of batch processing regularization and ReLU; convolution kernel size of convolution module two and convolution module three is 3 × 3, step size is 2, padding size is 1, followed by batch regularization and combination of ReLU.
Preferably, the first down-sampling module, the second down-sampling module, the third down-sampling module and the fourth down-sampling module all adopt the following structures:
the number of input channels, the number of output channels and the number of convolution layer output channels of the feature map in the down-sampling module are respectively Win, Wout and Wconv;
in the down-sampling module, when Wout is greater than Win, an input feature map firstly passes through a convolution layer with a convolution kernel size of 3 x 3 and a maximum pooling layer with a size of 2 x 2 in parallel, the step lengths of the two layers of the convolution layer and the maximum pooling layer are both 2, the number Wconv of channels of the output feature map of the convolution layer is Wout-Win, and the number of channels of the output feature map of the maximum pooling layer is Win; then, the outputs of the convolutional layer and the maximum pooling layer are subjected to channel stacking and batch processing regularization, and activated by Relu to realize 2-time down-sampling of the feature map;
in a down-sampling module, when Wout is less than Win, the input feature map only passes through a convolution layer with the convolution kernel size of 3 multiplied by 3 and the step length of 2, then batch processing regularization and Relu activation are carried out, and 2-time down-sampling of the feature map is realized in a convolution mode;
the number of channels of the input feature map of the first downsampling module is 3, and the number of channels of the output feature map is 15; the number of channels of the input feature map of the second downsampling module is 15, and the number of channels of the output feature map is 30; the number of channels of the input feature map of the third downsampling module is 30, and the number of channels of the output feature map is 60; the number of channels of the input feature map of the down-sampling module four is 160, and the number of channels of the output feature map is 160.
Preferably, the structure of the first dense connection module based on the phantom feature diagram is the same as that of the second dense connection module based on the phantom feature diagram, and is defined as follows:
defining a phantom module: generating m original characteristic graphs Y' epsilon R by using the following formula through one convolution h×w×m :
Y′=X*f′
Where Y 'is the profile of the convolutional layer output, X is the convolutional layer input, f' is the R c×k×k×m The convolution kernel is used, m is less than or equal to n, n is the number of layers of the characteristic diagram actually required in the network model, and the deviation term is omitted for simplicity; the superparameters, i.e., convolution kernel size, step size, and fill are the same as those in normal convolution to keep the spatial sizes (i.e., h 'and w') of the output feature map consistent.
Applying a series of linear operations to each raw feature map in Y' to generate s phantom feature maps:
wherein y' i Is the ith original feature map in Y', phi i,j Is the jth linear operation for generating the jth phantom feature map y ij ;
By using linear operation, n-m-s feature maps Y-Y are obtained 11 ,y 12 ,...,y ij ,...,y ms ]As output data of the phantom module; the convolution layer of the present invention uses convolution kernel of 1 × 1 size, and the linear operation Φ is deep convolution (depth wise convolution) of the original feature image YMotion) to generate a phantom feature map, and finally, performing channel superposition on the original feature map and the phantom feature map, wherein the superposition result is used as the output of the phantom module;
using a dense connection mode for a plurality of phantom modules, namely, the input of each phantom module is the channel superposition of the input characteristic graph of the first initial dense connection module and the output characteristic graphs of all the phantom modules before;
the number of channels of an input feature map of a dense connection module based on the phantom feature map is 60, the number of channels of an output feature map is 160, and 5 phantom modules are used for dense connection;
the number of channels of the input feature map of the dense connection module II based on the phantom feature map is 160, the number of channels of the output feature map is 320, and 8 phantom modules are used for dense connection;
the 13 phantom modules are added with 10 channels through the convolution layer and 10 channels through linear operation, so that the number of output channels of each phantom module is increased by 20 channels relative to the input channels of the phantom module.
Preferably, the first attention module and the second attention module are implemented as follows: and sequentially carrying out global average pooling, 1 × 1 convolution and batch processing regularization on the input feature map, and finally obtaining a channel attention vector through sigmoid, then multiplying the channel attention vector by a corresponding bit of the input feature map, and adding a multiplication result and the input feature map to obtain a channel weighted feature map.
Preferably, the first upsampling module and the second upsampling module have the same structure, and the implementation manner is as follows: assume that the input feature map has a size ofWherein the content of the first and second substances,andfor the height and width of the feature map, C is the number of channels of the feature map, and the input feature map is passed through a channel with NConvolution layers of 1 x 1 size convolution kernels, producing a convolution layer of sizeNew feature maps of (2); the new feature map is then reshaped to a size ofOutput feature maps of (a); the first up-sampling module adopts 2 as the up-sampling multiple, and the second novel up-sampling module adopts 8 as the up-sampling multiple.
Preferably, the structure of the feature fusion module is as follows: firstly, stacking feature graph channels output by shallow detail branches and deep semantic branches by a feature fusion module, performing convolution kernel with step length of 1 and size of 1 multiplied by 1, and performing batch processing regularization and relu activation functions; secondly, performing global pooling on the output in the step one, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating a function through relu, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating the function through sigmoid, and multiplying the output by the corresponding bit of the output in the step one; and thirdly, adding the multiplication result in the second step and the output in the first step and outputting the result as the output of the characteristic fusion module.
The specific embodiment is as follows:
in order to verify and illustrate the effectiveness of the method, the method is compared with four existing deep learning methods, and table 1 shows the performance (precision and speed) of the method of the invention and other deep learning-based methods.
TABLE 1 comparison of the method of the present invention with four other deep learning methods
As can be seen from Table 1, when the accuracy mIoU of the method is similar to that of the other four methods, the speed FPS greatly leads the other methods to reach 94.840 FPS.
Claims (8)
1. A real-time semantic segmentation method for aerial images of an unmanned aerial vehicle based on yellow river ice is characterized by comprising the following steps:
step 1: constructing a yellow river ice semantic segmentation data set according to the collected unmanned aerial vehicle aerial ice image, wherein the data set comprises the yellow river unmanned aerial vehicle aerial ice image and label data; dividing a data set into a training set, a verification set and a test set;
step 2: constructing a semantic segmentation model FastICENet;
the semantic segmentation model FastICENet comprises a shallow detail branch, a deep semantic branch and a fusion upsampling module; the shallow detail branch is used for extracting low-level detail information of the slush image, the deep semantic branch is used for extracting deep semantic information of the slush image, and finally the deep semantic branch and the shallow detail branch are fused and sampled by the fusion upsampling module to obtain a semantic segmentation result with the same size as the original image;
step 2-1: the shallow detail branch is specifically as follows: the input image with the size of h multiplied by w, wherein h and w are the height and the width of the image respectively, sequentially pass through a convolution module I, a convolution module II and a convolution module II, and the resolution of a feature map is h/8 multiplied by w/8 after passing through the three convolution modules;
step 2-2: the deep semantic branch is specifically as follows:
step 2-2-1: an input image with the size of h multiplied by w sequentially passes through a first down-sampling module, a second down-sampling module and a third down-sampling module, and a feature map is obtained after the input image passes through the three down-sampling modules, wherein the resolution ratio is h/8 multiplied by w/8;
step 2-2-2: inputting the feature map obtained in the step 2-2-1 into a dense connection module I based on the phantom feature map, wherein the resolution of the output feature map is still h/8 xw/8;
step 2-2-3: inputting the feature map obtained in the step 2-2-2 into a fourth down-sampling module, wherein the resolution of the output feature map is h/16 multiplied by w/16;
step 2-2-4: inputting the feature map obtained in the step 2-2-3 into a dense connection module II based on the phantom feature map, and respectively inputting the output feature map into an attention thinning module I and an average pooling module; stacking the output of the attention thinning module I and the output result of the average pooling module according to channels, and taking the obtained characteristic diagram as the output of the step 2-2-4;
step 2-2-5: enabling the feature map obtained in the step 2-2-4 to pass through a first up-sampling module, wherein the size of the output feature map is h/8 multiplied by w/8;
step 2-2-6: the outputs of the step 2-2-2 and the step 2-2-5 are jointly input into a second attention module, and the resolution of an output feature map is h/8 xw/8;
step 2-3: the fusion upsampling module specifically comprises: the output of the shallow detail branch and the deep semantic branch is jointly input into a feature fusion module, and the size of an output feature graph is h/8 multiplied by w/8; restoring the output of the feature fusion module to the original size h multiplied by w through an up-sampling module II, and predicting a segmentation result;
and step 3: training a semantic segmentation model FastICENet by using the training set and the verification set to obtain a final semantic segmentation model, and testing the performance of the final semantic segmentation model by using the test set.
2. The real-time semantic segmentation method for the aerial images of the yellow river ice unmanned aerial vehicle according to claim 1, wherein the step 1 specifically comprises:
step 1-1: collecting multi-period and multi-region aerial yellow river ice images of an unmanned aerial vehicle;
step 1-2: the collected images were cropped to 1600 x 640 size images, each image was manually labeled with three classification labels pixel by pixel: ice, water and river banks;
step 1-3: obtaining the yellow river ice image and the classification label thereof through the step 1-2, and performing the steps according to the following steps of 3: 1: the scale of 1 is divided into a training set, a validation set, and a test set.
3. The real-time semantic segmentation method for the aerial image of the yellow river ice unmanned aerial vehicle according to claim 1, characterized in that the convolution kernel of the convolution module I is 7 x 7 in size, the step length is 2, and the filling size is 3; after the convolution kernel, connecting the combination of batch processing regularization and ReLU; the convolution kernel size of the convolution module two and the convolution module three is 3 x 3, the step length is 2, the filling size is 1, and the combination of batch process regularization and ReLU is connected later.
4. The real-time semantic segmentation method for the aerial image of the yellow river ice unmanned aerial vehicle as claimed in claim 1, wherein the down-sampling module I, the down-sampling module II, the down-sampling module III and the down-sampling module IV all adopt the following structures:
the number of input channels, the number of output channels and the number of convolution layer output channels of the feature map in the down-sampling module are respectively Win, Wout and Wconv;
in the down-sampling module, when Wout is greater than Win, an input feature map firstly passes through a convolution layer with a convolution kernel size of 3 x 3 and a maximum pooling layer with a size of 2 x 2 in parallel, the step lengths of the two layers of the convolution layer and the maximum pooling layer are both 2, the number Wconv of channels of the output feature map of the convolution layer is Wout-Win, and the number of channels of the output feature map of the maximum pooling layer is Win; then, the outputs of the convolutional layer and the maximum pooling layer are subjected to channel stacking and batch processing regularization, and activated by Relu to realize 2-time down-sampling of the feature map;
in a down-sampling module, when Wout is less than Win, the input feature map only passes through a convolution layer with the convolution kernel size of 3 multiplied by 3 and the step length of 2, then batch processing regularization and Relu activation are carried out, and 2-time down-sampling of the feature map is realized in a convolution mode;
the number of channels of the input feature map of the first downsampling module is 3, and the number of channels of the output feature map is 15; the number of channels of the input feature map of the second downsampling module is 15, and the number of channels of the output feature map is 30; the number of channels of the input feature map of the third downsampling module is 30, and the number of channels of the output feature map is 60; the number of channels of the input feature map of the down-sampling module four is 160, and the number of channels of the output feature map is 160.
5. The real-time semantic segmentation method for aerial images of the yellow river ice slush unmanned aerial vehicle according to claim 1, wherein the first dense connection module based on the phantom feature diagram and the second dense connection module based on the phantom feature diagram have the same structure and are defined as follows:
defining a phantom module: generating m original characteristic graphs Y' epsilon R by using the following formula through one convolution h′×w′×m :
Y′=X*f′
Where Y 'is the characteristic graph of the convolution layer output, X is the convolution input, X is the convolution operation, f' is the R c×k×k×m The convolution kernel is used, m is less than or equal to n, and n is the number of layers of the actually required characteristic diagram in the network model;
applying a series of linear operations to each raw feature map in Y' to generate s phantom feature maps:
wherein y' i Is the ith original feature map in Y', phi i,j Is the jth linear operation for generating the jth phantom feature map y ij ;
By using linear operation, n-m-s feature maps Y-Y are obtained 11, y 12 ,...,y ij ,...,y ms ]As output data of the phantom module; finally, channel superposition is carried out on the original characteristic diagram and the phantom characteristic diagram, and the superposition result is used as the output of the phantom module;
using a dense connection mode for a plurality of phantom modules, namely, the input of each phantom module is the channel superposition of the input characteristic graph of the first initial dense connection module and the output characteristic graphs of all the phantom modules before;
the number of channels of an input feature map of a dense connection module based on the phantom feature map is 60, the number of channels of an output feature map is 160, and 5 phantom modules are used for dense connection;
the number of channels of the input feature map of the dense connection module II based on the phantom feature map is 160, the number of channels of the output feature map is 320, and 8 phantom modules are used for dense connection;
the 13 phantom modules are added with 10 channels through the convolution layer and 10 channels through linear operation, so that the number of output channels of each phantom module is increased by 20 channels relative to the input channels of the phantom module.
6. The real-time semantic segmentation method for the aerial images of the yellow river ice unmanned aerial vehicle according to claim 1, wherein the first attention module and the second attention module are implemented as follows: and sequentially carrying out global average pooling, 1 × 1 convolution and batch processing regularization on the input feature map, and finally obtaining a channel attention vector through sigmoid, then multiplying the channel attention vector by a corresponding bit of the input feature map, and adding a multiplication result and the input feature map to obtain a channel weighted feature map.
7. The real-time semantic segmentation method for the aerial images of the Huanghe Ice slush unmanned aerial vehicle as claimed in claim 1, wherein the first up-sampling module and the second up-sampling module have the same structure and are implemented in the following manner: assume that the input feature map has a size ofWherein the content of the first and second substances,andfor feature height and width, C is the number of channels in the feature, and the input feature is passed through a convolutional layer of N convolution kernels of 1 × 1 size to produce a convolution kernel of sizeNew feature maps of (2); the new feature map is then reshaped to a size ofOutput feature maps of (a); the first up-sampling module adopts 2 as the up-sampling multiple, and the novel up-samplingAnd the second module adopts 8 as an upsampling multiple.
8. The real-time semantic segmentation method for the aerial images of the yellow river ice unmanned aerial vehicle according to claim 1, wherein the feature fusion module has the following structure: firstly, stacking feature graph channels output by shallow detail branches and deep semantic branches by a feature fusion module, performing convolution kernel with step length of 1 and size of 1 multiplied by 1, and performing batch processing regularization and relu activation functions; secondly, performing global pooling on the output in the step one, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating a function through relu, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating the function through sigmoid, and multiplying the output by the corresponding bit of the output in the step one; and thirdly, adding and outputting the multiplication result in the step two and the output in the step one as the output of the characteristic fusion module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210415977.4A CN114943835B (en) | 2022-04-20 | 2022-04-20 | Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210415977.4A CN114943835B (en) | 2022-04-20 | 2022-04-20 | Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114943835A true CN114943835A (en) | 2022-08-26 |
CN114943835B CN114943835B (en) | 2024-03-12 |
Family
ID=82908048
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210415977.4A Active CN114943835B (en) | 2022-04-20 | 2022-04-20 | Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114943835B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710863A (en) * | 2018-05-24 | 2018-10-26 | 东北大学 | Unmanned plane Scene Semantics dividing method based on deep learning and system |
CN111160311A (en) * | 2020-01-02 | 2020-05-15 | 西北工业大学 | Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network |
WO2020101448A1 (en) * | 2018-08-28 | 2020-05-22 | Samsung Electronics Co., Ltd. | Method and apparatus for image segmentation |
CN111259898A (en) * | 2020-01-08 | 2020-06-09 | 西安电子科技大学 | Crop segmentation method based on unmanned aerial vehicle aerial image |
WO2020215236A1 (en) * | 2019-04-24 | 2020-10-29 | 哈尔滨工业大学(深圳) | Image semantic segmentation method and system |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
CN113361373A (en) * | 2021-06-02 | 2021-09-07 | 武汉理工大学 | Real-time semantic segmentation method for aerial image in agricultural scene |
CN113658189A (en) * | 2021-09-01 | 2021-11-16 | 北京航空航天大学 | Cross-scale feature fusion real-time semantic segmentation method and system |
-
2022
- 2022-04-20 CN CN202210415977.4A patent/CN114943835B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710863A (en) * | 2018-05-24 | 2018-10-26 | 东北大学 | Unmanned plane Scene Semantics dividing method based on deep learning and system |
WO2020101448A1 (en) * | 2018-08-28 | 2020-05-22 | Samsung Electronics Co., Ltd. | Method and apparatus for image segmentation |
WO2020215236A1 (en) * | 2019-04-24 | 2020-10-29 | 哈尔滨工业大学(深圳) | Image semantic segmentation method and system |
CN111160311A (en) * | 2020-01-02 | 2020-05-15 | 西北工业大学 | Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network |
CN111259898A (en) * | 2020-01-08 | 2020-06-09 | 西安电子科技大学 | Crop segmentation method based on unmanned aerial vehicle aerial image |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
CN113361373A (en) * | 2021-06-02 | 2021-09-07 | 武汉理工大学 | Real-time semantic segmentation method for aerial image in agricultural scene |
CN113658189A (en) * | 2021-09-01 | 2021-11-16 | 北京航空航天大学 | Cross-scale feature fusion real-time semantic segmentation method and system |
Non-Patent Citations (3)
Title |
---|
李帅;郭艳艳;卫霞;: "基于下采样的特征融合遥感图像语义分割", 测试技术学报, no. 04, 31 December 2020 (2020-12-31), pages 61 - 67 * |
熊伟;蔡咪;吕亚飞;裴家正;: "基于神经网络的遥感图像海陆语义分割方法", 计算机工程与应用, no. 15, 31 December 2020 (2020-12-31), pages 227 - 233 * |
青晨;禹晶;肖创柏;段娟;: "深度卷积神经网络图像语义分割研究进展", 中国图象图形学报, no. 06, 16 June 2020 (2020-06-16), pages 32 - 33 * |
Also Published As
Publication number | Publication date |
---|---|
CN114943835B (en) | 2024-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111160311B (en) | Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network | |
CN112651973B (en) | Semantic segmentation method based on cascade of feature pyramid attention and mixed attention | |
CN112085735B (en) | Aluminum material image defect detection method based on self-adaptive anchor frame | |
CN111369563B (en) | Semantic segmentation method based on pyramid void convolutional network | |
CN111612807B (en) | Small target image segmentation method based on scale and edge information | |
Al-Haija et al. | Multi-class weather classification using ResNet-18 CNN for autonomous IoT and CPS applications | |
CN107292875A (en) | A kind of conspicuousness detection method based on global Local Feature Fusion | |
CN111079640B (en) | Vehicle type identification method and system based on automatic amplification sample | |
CN111652273B (en) | Deep learning-based RGB-D image classification method | |
CN115063573A (en) | Multi-scale target detection method based on attention mechanism | |
CN112766283B (en) | Two-phase flow pattern identification method based on multi-scale convolution network | |
CN113221852B (en) | Target identification method and device | |
CN110930378A (en) | Emphysema image processing method and system based on low data demand | |
CN111639697B (en) | Hyperspectral image classification method based on non-repeated sampling and prototype network | |
CN116704431A (en) | On-line monitoring system and method for water pollution | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
CN115527072A (en) | Chip surface defect detection method based on sparse space perception and meta-learning | |
CN115049945A (en) | Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image | |
CN117197763A (en) | Road crack detection method and system based on cross attention guide feature alignment network | |
CN111310820A (en) | Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration | |
CN114626476A (en) | Bird fine-grained image recognition method and device based on Transformer and component feature fusion | |
CN114170657A (en) | Facial emotion recognition method integrating attention mechanism and high-order feature representation | |
CN114187506A (en) | Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network | |
CN112084897A (en) | Rapid traffic large-scene vehicle target detection method of GS-SSD | |
CN114943835B (en) | Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |