CN114943835A - Real-time semantic segmentation method for aerial images of ice slush unmanned aerial vehicle in yellow river - Google Patents

Real-time semantic segmentation method for aerial images of ice slush unmanned aerial vehicle in yellow river Download PDF

Info

Publication number
CN114943835A
CN114943835A CN202210415977.4A CN202210415977A CN114943835A CN 114943835 A CN114943835 A CN 114943835A CN 202210415977 A CN202210415977 A CN 202210415977A CN 114943835 A CN114943835 A CN 114943835A
Authority
CN
China
Prior art keywords
module
feature map
output
channels
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210415977.4A
Other languages
Chinese (zh)
Other versions
CN114943835B (en
Inventor
张秀伟
张艳宁
赵梓旭
尹翰林
邢颖慧
王康威
刘启兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210415977.4A priority Critical patent/CN114943835B/en
Publication of CN114943835A publication Critical patent/CN114943835A/en
Application granted granted Critical
Publication of CN114943835B publication Critical patent/CN114943835B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones

Abstract

The invention discloses a real-time semantic segmentation method for aerial images of a Huanghe ice unmanned aerial vehicle, which comprises the steps of constructing a Huanghe ice semantic segmentation data set according to collected aerial images of the ice of the unmanned aerial vehicle, wherein the data set comprises the aerial images of the ice of the Huanghe unmanned aerial vehicle and tag data; and training the segmentation network FastICENet by using the constructed yellow river ice semantic segmentation data set to obtain a final semantic segmentation model. Even if the size and the shape of the ice in the image are different, the detection result of the invention is still accurate; when the precision of the semantic segmentation network is similar to that of other networks, the segmentation speed of the semantic segmentation network is far better than that of other semantic segmentation networks.

Description

Real-time semantic segmentation method for aerial image of ice slush unmanned aerial vehicle in yellow river
Technical Field
The invention belongs to the technical field of pattern recognition, and particularly relates to a real-time semantic segmentation method for an aerial image.
Background
Semantic segmentation is a very important field in computer vision, and refers to identifying images at a pixel level, i.e. marking the object class to which each pixel in an image belongs, and the goal is to predict the class label of each pixel in the image. River ice condition monitoring has important significance for river management of shipping industry. Accurate ice segmentation is one of the most important techniques in ice condition monitoring studies. Among them, lightweight semantic segmentation is especially important in ice condition monitoring. The input image needs to be quickly analyzed, and the auxiliary system and the external environment can interact in time. Specifically, the method is used for rapidly and accurately segmenting the input yellow river ice image, monitoring the ice condition in the river in real time and giving early warning in time. Therefore, it is necessary to design a real-time accurate lightweight semantic segmentation network.
An early ice semantic segmentation method is mainly used for solving the technical problem that the existing ice detection method is poor in accuracy. For example, a segmentation network structure is constructed, the network comprises a shallow branch and a deep branch, and a channel attention module is added into the deep branch; adding a position attention module in the shallow layer branch; the fusion module is used for fusing the shallow branch and the deep branch. And putting the data in the training set into the network in batches, and training the constructed neural network by adopting cross entropy loss and an RMSprop optimizer. And finally, inputting an image to be tested, and testing by using the trained model. The method can selectively perform multi-level and multi-scale feature fusion, captures context information based on an attention mechanism, obtains a feature map with higher resolution and obtains better segmentation effect. However, the problem of slow segmentation speed exists, the segmentation network cannot be operated on low-power-consumption equipment in real time, and the actual landing requirement of the yellow river ice slush segmentation is difficult to meet.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a real-time semantic segmentation method for aerial images of a yellow river ice unmanned aerial vehicle, wherein a yellow river ice semantic segmentation data set is constructed according to collected aerial ice images of the unmanned aerial vehicle, and the data set comprises the aerial ice images of the yellow river unmanned aerial vehicle and tag data; and training the segmentation network FastICENet by using the constructed yellow river ice semantic segmentation data set to obtain a final semantic segmentation model. Even if the size and the shape of the ice in the image are different, the detection result of the invention is still accurate; when the precision of the semantic segmentation network is similar to that of other networks, the segmentation speed of the semantic segmentation network is far better than that of other semantic segmentation networks.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a yellow river ice semantic segmentation data set according to the collected unmanned aerial vehicle aerial ice image, wherein the data set comprises the yellow river unmanned aerial vehicle aerial ice image and label data; dividing a data set into a training set, a verification set and a test set;
step 2: constructing a semantic segmentation model FastICENet;
the semantic segmentation model FastICENet comprises a shallow detail branch, a deep semantic branch and a fusion upsampling module; the shallow detail branch is used for extracting low-level detail information of the slush image, the deep semantic branch is used for extracting deep semantic information of the slush image, and finally the deep semantic branch and the shallow detail branch are fused and sampled by the fusion upsampling module to obtain a semantic segmentation result with the same size as the original image;
step 2-1: the shallow detail branch is specifically as follows: the input image with the size of h multiplied by w, wherein h and w are the height and the width of the image respectively, sequentially pass through a convolution module I, a convolution module II and a convolution module II, and the resolution of a feature map is h/8 multiplied by w/8 after passing through the three convolution modules;
step 2-2: the deep semantic branch is specifically as follows:
step 2-2-1: an input image with the size of h multiplied by w sequentially passes through a first down-sampling module, a second down-sampling module and a third down-sampling module, and a feature map is obtained after the input image passes through the three down-sampling modules, wherein the resolution ratio is h/8 multiplied by w/8;
step 2-2-2: inputting the feature map obtained in the step 2-2-1 into a dense connection module I based on the phantom feature map, wherein the resolution of the output feature map is still h/8 xw/8;
step 2-2-3: inputting the feature map obtained in the step 2-2-2 into a fourth down-sampling module, wherein the resolution of the output feature map is h/16 multiplied by w/16;
step 2-2-4: inputting the feature map obtained in the step 2-2-3 into a second dense connection module based on a phantom feature map, and respectively inputting the output feature map into a first attention thinning module and a mean pooling module; stacking the output of the attention thinning module I and the output result of the average pooling module according to channels, and taking the obtained characteristic diagram as the output of the step 2-2-4;
step 2-2-5: enabling the feature map obtained in the step 2-2-4 to pass through a first up-sampling module, wherein the size of the output feature map is h/8 multiplied by w/8;
step 2-2-6: the outputs of the step 2-2-2 and the step 2-2-5 are jointly input into a second attention module, and the resolution of an output feature map is h/8 xw/8;
step 2-3: the fusion upsampling module specifically comprises: the output of the shallow detail branch and the deep semantic branch is jointly input into a feature fusion module, and the size of an output feature graph is h/8 multiplied by w/8; recovering the output of the feature fusion module to the original size h multiplied by w through an up-sampling module II, and predicting a segmentation result;
and step 3: and training a semantic segmentation model FastICENet by utilizing the training set and the verification set to obtain a final semantic segmentation model, and testing the performance of the final semantic segmentation model by utilizing the test set.
Preferably, the step 1 specifically comprises:
step 1-1: collecting multi-period and multi-region aerial yellow river ice images of an unmanned aerial vehicle;
step 1-2: the collected images were cropped to 1600 x 640 size images, each image was manually labeled with three classification labels pixel by pixel: ice, water and river banks;
step 1-3: the yellow river ice image and the classification label thereof are obtained through the step 1-2, and are divided into a training set, a verification set and a test set according to the proportion of 3: 1.
Preferably, the convolution kernel size of the convolution module one is 7 × 7, the step size is 2, and the padding size is 3; after the convolution kernel, connecting the combination of batch processing regularization and ReLU; the convolution kernel size of the convolution module two and the convolution module three is 3 x 3, the step length is 2, the filling size is 1, and the combination of batch process regularization and ReLU is connected later.
Preferably, the first down-sampling module, the second down-sampling module, the third down-sampling module and the fourth down-sampling module all adopt the following structures:
the number of input channels, the number of output channels and the number of convolution layer output channels of the feature map in the down-sampling module are respectively Win, Wout and Wconv;
in the down-sampling module, when Wout is larger than Win, an input feature map firstly passes through a convolution layer with a convolution kernel size of 3 multiplied by 3 and a maximum pooling layer of 2 multiplied by 2 in parallel, the step length of the convolution layer and the maximum pooling layer are both 2, the number Wconv of channels of the convolution layer output feature map is Wout-Win, and the number of channels of the output feature map of the maximum pooling layer is Win; then, the outputs of the convolutional layer and the maximum pooling layer are subjected to channel stacking and batch processing regularization, and activated by Relu to realize 2-time down-sampling of the feature map;
in a down-sampling module, when Wout is less than Win, the input feature map only passes through a convolution layer with the convolution kernel size of 3 multiplied by 3 and the step length of 2, then batch processing regularization and Relu activation are carried out, and 2 times down-sampling of the feature map is realized through a convolution mode;
the number of channels of the input feature map of the first downsampling module is 3, and the number of channels of the output feature map is 15; the number of channels of the input feature map of the second downsampling module is 15, and the number of channels of the output feature map is 30; the number of channels of the input feature map of the third downsampling module is 30, and the number of channels of the output feature map is 60; the number of channels of the input feature map of the down-sampling module four is 160, and the number of channels of the output feature map is 160.
Preferably, the structure of the first dense connection module based on the phantom feature diagram is the same as that of the second dense connection module based on the phantom feature diagram, and is defined as follows:
defining a phantom module: using the following formulaGenerating m original characteristic graphs Y' epsilon R by one convolution h×w×m
Y′=X*f′
Where Y 'is the profile of the convolutional layer output, X is the convolutional layer input, f' is the R c×k×k×m The convolution kernel is used, m is less than or equal to n, and n is the number of layers of the actually required characteristic diagram in the network model;
applying a series of linear operations to each raw feature map in Y' to generate s phantom feature maps:
Figure BDA0003604646200000041
wherein y' i Is the ith original feature map in Y', phi i,j Is the jth linear operation for generating the jth phantom feature map y ij
By using linear operation, n-m-s feature maps Y-Y are obtained 11 ,y 12 ,...,y ij ,...,y ms ]As output data of the phantom module; finally, channel superposition is carried out on the original characteristic diagram and the phantom characteristic diagram, and the superposition result is used as the output of the phantom module;
using a dense connection mode for a plurality of phantom modules, namely, the input of each phantom module is the channel superposition of the input characteristic graph of the first initial dense connection module and the output characteristic graphs of all the phantom modules before;
the number of channels of an input feature map of a dense connection module based on the phantom feature map is 60, the number of channels of an output feature map is 160, and 5 phantom modules are used for dense connection;
the number of channels of the input feature map of the dense connection module II based on the phantom feature map is 160, the number of channels of the output feature map is 320, and 8 phantom modules are used for dense connection;
the 13 phantom modules are each added with 10 channels through the convolution layer, and 10 channels are added through linear operation, so that the number of output channels of each phantom module is increased by 20 channels relative to the input channels of the phantom module.
Preferably, the first attention module and the second attention module are implemented as follows: and sequentially carrying out global average pooling, 1 × 1 convolution and batch processing regularization on the input feature map, and finally obtaining a channel attention vector through sigmoid, then multiplying the channel attention vector by a corresponding bit of the input feature map, and adding a multiplication result and the input feature map to obtain a channel weighted feature map.
Preferably, the first upsampling module and the second upsampling module have the same structure, and the implementation manner is as follows: assume that the input feature map has a size of
Figure BDA0003604646200000042
Wherein the content of the first and second substances,
Figure BDA0003604646200000043
and
Figure BDA0003604646200000044
for feature height and width, C is the number of channels in the feature, and the input feature is passed through a convolutional layer of N convolution kernels of 1 × 1 size to produce a convolution kernel of size
Figure BDA0003604646200000045
New feature maps of (2); the new feature map is then reshaped to a size of
Figure BDA0003604646200000046
The output feature map of (1); the first up-sampling module adopts 2 as the up-sampling multiple, and the second novel up-sampling module adopts 8 as the up-sampling multiple.
Preferably, the structure of the feature fusion module is as follows: firstly, stacking feature graph channels output by shallow detail branches and deep semantic branches by a feature fusion module, performing convolution kernel with step length of 1 and size of 1 multiplied by 1, and performing batch processing regularization and relu activation functions; secondly, performing global pooling on the output in the step one, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating a function through relu, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating the function through sigmoid, and multiplying the output by the corresponding bit of the output in the step one; and thirdly, adding and outputting the multiplication result in the step two and the output in the step one as the output of the characteristic fusion module.
The invention has the following beneficial effects:
1) the invention provides a double-branch lightweight semantic segmentation network which is used for real-time semantic segmentation of the ice of the yellow river;
2) the size and the shape of the ice in the image to be segmented are different, and the detection result is still accurate;
3) when the precision of the semantic segmentation network is similar to that of other networks, the segmentation speed of the semantic segmentation network is far better than that of other semantic segmentation networks.
Drawings
FIG. 1 is a diagram of a semantic segmentation model architecture of the present invention.
Fig. 2 is a block diagram of a downsampling module of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention aims to provide a real-time semantic segmentation method for yellow river icings, which improves the accuracy of a segmentation model through a double-branch structure and solves the problem of low segmentation speed by adopting a lightweight module.
A real-time semantic segmentation method for aerial images of an unmanned aerial vehicle for yellow river ice comprises the following steps:
step 1: constructing a yellow river ice semantic segmentation data set according to the collected unmanned aerial vehicle aerial ice image, wherein the data set comprises the yellow river unmanned aerial vehicle aerial ice image and label data;
step 1-1: collecting multi-period and multi-region unmanned aerial vehicle aerial photography yellow river ice images, and selecting clear images with sufficient illumination from the images;
step 1-2: the collected images were cropped to 1600 x 640 size images, each image being labeled with three classification labels by manual pixel by pixel: ice, water and river banks;
step 1-3: obtaining the yellow river ice image and the classification label thereof through the step 1-2, and performing the steps according to the following steps of 3: 1: the scale of 1 is divided into a training set, a validation set, and a test set.
Step 2: constructing a semantic segmentation model FastICENet;
the semantic segmentation model FastICENet comprises a shallow detail branch, a deep semantic branch and a fusion upsampling module; the shallow detail branch is used for extracting low-level detail information and texture information of the slush image, the deep semantic branch is used for extracting deep semantic information of the slush image, and finally the deep semantic branch and the shallow detail branch are fused and sampled by a fusion upsampling module to obtain a semantic segmentation result with the same size as the original image;
step 2-1: the shallow detail branch is specifically as follows: the input image with the size of h multiplied by w, wherein h and w are the height and the width of the image respectively, sequentially pass through a convolution module I, a convolution module II and a convolution module II, and after passing through the three convolution modules, the resolution of the feature map is h/8 multiplied by w/8;
step 2-2: the deep semantic branch is specifically as follows:
step 2-2-1: an input image with the size of h multiplied by w sequentially passes through a first down-sampling module, a second down-sampling module and a third down-sampling module, and a feature map is obtained after the input image passes through the three down-sampling modules, wherein the resolution ratio is h/8 multiplied by w/8;
step 2-2-2: inputting the feature map obtained in the step 2-2-1 into a dense connection module I based on the phantom feature map, wherein the resolution of the output feature map is still h/8 xw/8;
step 2-2-3: inputting the feature map obtained in the step 2-2-2 into a fourth down-sampling module, wherein the resolution of the output feature map is h/16 multiplied by w/16;
step 2-2-4: inputting the feature map obtained in the step 2-2-3 into a dense connection module II based on the phantom feature map, and respectively inputting the output feature map into an attention thinning module I and an average pooling module; stacking the output of the attention thinning module I and the output result of the average pooling module according to channels, and taking the obtained characteristic diagram as the output of the step 2-2-4;
step 2-2-5: enabling the characteristic diagram obtained in the step 2-2-4 to pass through a first up-sampling module, wherein the size of the output characteristic diagram is h/8 multiplied by w/8;
step 2-2-6: the outputs of the step 2-2-2 and the step 2-2-5 are jointly input into a second attention module, and the resolution of an output feature map is h/8 xw/8;
step 2-3: the fusion upsampling module specifically comprises: the output of the shallow detail branch and the deep semantic branch is jointly input into a feature fusion module, and the size of an output feature graph is h/8 multiplied by w/8; recovering the output of the feature fusion module to the original size h multiplied by w through an up-sampling module II, and predicting a segmentation result;
and 3, step 3: training a semantic segmentation model FastICENet by using the training set and the verification set to obtain a final semantic segmentation model, and testing the performance of the final semantic segmentation model by using the test set.
Preferably, the convolution kernel size of the convolution module one is 7 × 7, the step size is 2, and the padding size is 3; after the convolution kernel, connecting the combination of batch processing regularization and ReLU; convolution kernel size of convolution module two and convolution module three is 3 × 3, step size is 2, padding size is 1, followed by batch regularization and combination of ReLU.
Preferably, the first down-sampling module, the second down-sampling module, the third down-sampling module and the fourth down-sampling module all adopt the following structures:
the number of input channels, the number of output channels and the number of convolution layer output channels of the feature map in the down-sampling module are respectively Win, Wout and Wconv;
in the down-sampling module, when Wout is greater than Win, an input feature map firstly passes through a convolution layer with a convolution kernel size of 3 x 3 and a maximum pooling layer with a size of 2 x 2 in parallel, the step lengths of the two layers of the convolution layer and the maximum pooling layer are both 2, the number Wconv of channels of the output feature map of the convolution layer is Wout-Win, and the number of channels of the output feature map of the maximum pooling layer is Win; then, the outputs of the convolutional layer and the maximum pooling layer are subjected to channel stacking and batch processing regularization, and activated by Relu to realize 2-time down-sampling of the feature map;
in a down-sampling module, when Wout is less than Win, the input feature map only passes through a convolution layer with the convolution kernel size of 3 multiplied by 3 and the step length of 2, then batch processing regularization and Relu activation are carried out, and 2-time down-sampling of the feature map is realized in a convolution mode;
the number of channels of the input feature map of the first downsampling module is 3, and the number of channels of the output feature map is 15; the number of channels of the input feature map of the second downsampling module is 15, and the number of channels of the output feature map is 30; the number of channels of the input feature map of the third downsampling module is 30, and the number of channels of the output feature map is 60; the number of channels of the input feature map of the down-sampling module four is 160, and the number of channels of the output feature map is 160.
Preferably, the structure of the first dense connection module based on the phantom feature diagram is the same as that of the second dense connection module based on the phantom feature diagram, and is defined as follows:
defining a phantom module: generating m original characteristic graphs Y' epsilon R by using the following formula through one convolution h×w×m
Y′=X*f′
Where Y 'is the profile of the convolutional layer output, X is the convolutional layer input, f' is the R c×k×k×m The convolution kernel is used, m is less than or equal to n, n is the number of layers of the characteristic diagram actually required in the network model, and the deviation term is omitted for simplicity; the superparameters, i.e., convolution kernel size, step size, and fill are the same as those in normal convolution to keep the spatial sizes (i.e., h 'and w') of the output feature map consistent.
Applying a series of linear operations to each raw feature map in Y' to generate s phantom feature maps:
Figure BDA0003604646200000071
wherein y' i Is the ith original feature map in Y', phi i,j Is the jth linear operation for generating the jth phantom feature map y ij
By using linear operation, n-m-s feature maps Y-Y are obtained 11 ,y 12 ,...,y ij ,...,y ms ]As output data of the phantom module; the convolution layer of the present invention uses convolution kernel of 1 × 1 size, and the linear operation Φ is deep convolution (depth wise convolution) of the original feature image YMotion) to generate a phantom feature map, and finally, performing channel superposition on the original feature map and the phantom feature map, wherein the superposition result is used as the output of the phantom module;
using a dense connection mode for a plurality of phantom modules, namely, the input of each phantom module is the channel superposition of the input characteristic graph of the first initial dense connection module and the output characteristic graphs of all the phantom modules before;
the number of channels of an input feature map of a dense connection module based on the phantom feature map is 60, the number of channels of an output feature map is 160, and 5 phantom modules are used for dense connection;
the number of channels of the input feature map of the dense connection module II based on the phantom feature map is 160, the number of channels of the output feature map is 320, and 8 phantom modules are used for dense connection;
the 13 phantom modules are added with 10 channels through the convolution layer and 10 channels through linear operation, so that the number of output channels of each phantom module is increased by 20 channels relative to the input channels of the phantom module.
Preferably, the first attention module and the second attention module are implemented as follows: and sequentially carrying out global average pooling, 1 × 1 convolution and batch processing regularization on the input feature map, and finally obtaining a channel attention vector through sigmoid, then multiplying the channel attention vector by a corresponding bit of the input feature map, and adding a multiplication result and the input feature map to obtain a channel weighted feature map.
Preferably, the first upsampling module and the second upsampling module have the same structure, and the implementation manner is as follows: assume that the input feature map has a size of
Figure BDA0003604646200000081
Wherein the content of the first and second substances,
Figure BDA0003604646200000082
and
Figure BDA0003604646200000083
for the height and width of the feature map, C is the number of channels of the feature map, and the input feature map is passed through a channel with NConvolution layers of 1 x 1 size convolution kernels, producing a convolution layer of size
Figure BDA0003604646200000084
New feature maps of (2); the new feature map is then reshaped to a size of
Figure BDA0003604646200000085
Output feature maps of (a); the first up-sampling module adopts 2 as the up-sampling multiple, and the second novel up-sampling module adopts 8 as the up-sampling multiple.
Preferably, the structure of the feature fusion module is as follows: firstly, stacking feature graph channels output by shallow detail branches and deep semantic branches by a feature fusion module, performing convolution kernel with step length of 1 and size of 1 multiplied by 1, and performing batch processing regularization and relu activation functions; secondly, performing global pooling on the output in the step one, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating a function through relu, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating the function through sigmoid, and multiplying the output by the corresponding bit of the output in the step one; and thirdly, adding the multiplication result in the second step and the output in the first step and outputting the result as the output of the characteristic fusion module.
The specific embodiment is as follows:
in order to verify and illustrate the effectiveness of the method, the method is compared with four existing deep learning methods, and table 1 shows the performance (precision and speed) of the method of the invention and other deep learning-based methods.
TABLE 1 comparison of the method of the present invention with four other deep learning methods
Figure BDA0003604646200000086
Figure BDA0003604646200000091
As can be seen from Table 1, when the accuracy mIoU of the method is similar to that of the other four methods, the speed FPS greatly leads the other methods to reach 94.840 FPS.

Claims (8)

1. A real-time semantic segmentation method for aerial images of an unmanned aerial vehicle based on yellow river ice is characterized by comprising the following steps:
step 1: constructing a yellow river ice semantic segmentation data set according to the collected unmanned aerial vehicle aerial ice image, wherein the data set comprises the yellow river unmanned aerial vehicle aerial ice image and label data; dividing a data set into a training set, a verification set and a test set;
step 2: constructing a semantic segmentation model FastICENet;
the semantic segmentation model FastICENet comprises a shallow detail branch, a deep semantic branch and a fusion upsampling module; the shallow detail branch is used for extracting low-level detail information of the slush image, the deep semantic branch is used for extracting deep semantic information of the slush image, and finally the deep semantic branch and the shallow detail branch are fused and sampled by the fusion upsampling module to obtain a semantic segmentation result with the same size as the original image;
step 2-1: the shallow detail branch is specifically as follows: the input image with the size of h multiplied by w, wherein h and w are the height and the width of the image respectively, sequentially pass through a convolution module I, a convolution module II and a convolution module II, and the resolution of a feature map is h/8 multiplied by w/8 after passing through the three convolution modules;
step 2-2: the deep semantic branch is specifically as follows:
step 2-2-1: an input image with the size of h multiplied by w sequentially passes through a first down-sampling module, a second down-sampling module and a third down-sampling module, and a feature map is obtained after the input image passes through the three down-sampling modules, wherein the resolution ratio is h/8 multiplied by w/8;
step 2-2-2: inputting the feature map obtained in the step 2-2-1 into a dense connection module I based on the phantom feature map, wherein the resolution of the output feature map is still h/8 xw/8;
step 2-2-3: inputting the feature map obtained in the step 2-2-2 into a fourth down-sampling module, wherein the resolution of the output feature map is h/16 multiplied by w/16;
step 2-2-4: inputting the feature map obtained in the step 2-2-3 into a dense connection module II based on the phantom feature map, and respectively inputting the output feature map into an attention thinning module I and an average pooling module; stacking the output of the attention thinning module I and the output result of the average pooling module according to channels, and taking the obtained characteristic diagram as the output of the step 2-2-4;
step 2-2-5: enabling the feature map obtained in the step 2-2-4 to pass through a first up-sampling module, wherein the size of the output feature map is h/8 multiplied by w/8;
step 2-2-6: the outputs of the step 2-2-2 and the step 2-2-5 are jointly input into a second attention module, and the resolution of an output feature map is h/8 xw/8;
step 2-3: the fusion upsampling module specifically comprises: the output of the shallow detail branch and the deep semantic branch is jointly input into a feature fusion module, and the size of an output feature graph is h/8 multiplied by w/8; restoring the output of the feature fusion module to the original size h multiplied by w through an up-sampling module II, and predicting a segmentation result;
and step 3: training a semantic segmentation model FastICENet by using the training set and the verification set to obtain a final semantic segmentation model, and testing the performance of the final semantic segmentation model by using the test set.
2. The real-time semantic segmentation method for the aerial images of the yellow river ice unmanned aerial vehicle according to claim 1, wherein the step 1 specifically comprises:
step 1-1: collecting multi-period and multi-region aerial yellow river ice images of an unmanned aerial vehicle;
step 1-2: the collected images were cropped to 1600 x 640 size images, each image was manually labeled with three classification labels pixel by pixel: ice, water and river banks;
step 1-3: obtaining the yellow river ice image and the classification label thereof through the step 1-2, and performing the steps according to the following steps of 3: 1: the scale of 1 is divided into a training set, a validation set, and a test set.
3. The real-time semantic segmentation method for the aerial image of the yellow river ice unmanned aerial vehicle according to claim 1, characterized in that the convolution kernel of the convolution module I is 7 x 7 in size, the step length is 2, and the filling size is 3; after the convolution kernel, connecting the combination of batch processing regularization and ReLU; the convolution kernel size of the convolution module two and the convolution module three is 3 x 3, the step length is 2, the filling size is 1, and the combination of batch process regularization and ReLU is connected later.
4. The real-time semantic segmentation method for the aerial image of the yellow river ice unmanned aerial vehicle as claimed in claim 1, wherein the down-sampling module I, the down-sampling module II, the down-sampling module III and the down-sampling module IV all adopt the following structures:
the number of input channels, the number of output channels and the number of convolution layer output channels of the feature map in the down-sampling module are respectively Win, Wout and Wconv;
in the down-sampling module, when Wout is greater than Win, an input feature map firstly passes through a convolution layer with a convolution kernel size of 3 x 3 and a maximum pooling layer with a size of 2 x 2 in parallel, the step lengths of the two layers of the convolution layer and the maximum pooling layer are both 2, the number Wconv of channels of the output feature map of the convolution layer is Wout-Win, and the number of channels of the output feature map of the maximum pooling layer is Win; then, the outputs of the convolutional layer and the maximum pooling layer are subjected to channel stacking and batch processing regularization, and activated by Relu to realize 2-time down-sampling of the feature map;
in a down-sampling module, when Wout is less than Win, the input feature map only passes through a convolution layer with the convolution kernel size of 3 multiplied by 3 and the step length of 2, then batch processing regularization and Relu activation are carried out, and 2-time down-sampling of the feature map is realized in a convolution mode;
the number of channels of the input feature map of the first downsampling module is 3, and the number of channels of the output feature map is 15; the number of channels of the input feature map of the second downsampling module is 15, and the number of channels of the output feature map is 30; the number of channels of the input feature map of the third downsampling module is 30, and the number of channels of the output feature map is 60; the number of channels of the input feature map of the down-sampling module four is 160, and the number of channels of the output feature map is 160.
5. The real-time semantic segmentation method for aerial images of the yellow river ice slush unmanned aerial vehicle according to claim 1, wherein the first dense connection module based on the phantom feature diagram and the second dense connection module based on the phantom feature diagram have the same structure and are defined as follows:
defining a phantom module: generating m original characteristic graphs Y' epsilon R by using the following formula through one convolution h′×w′×m :
Y′=X*f′
Where Y 'is the characteristic graph of the convolution layer output, X is the convolution input, X is the convolution operation, f' is the R c×k×k×m The convolution kernel is used, m is less than or equal to n, and n is the number of layers of the actually required characteristic diagram in the network model;
applying a series of linear operations to each raw feature map in Y' to generate s phantom feature maps:
Figure FDA0003604646190000031
wherein y' i Is the ith original feature map in Y', phi i,j Is the jth linear operation for generating the jth phantom feature map y ij
By using linear operation, n-m-s feature maps Y-Y are obtained 11, y 12 ,...,y ij ,...,y ms ]As output data of the phantom module; finally, channel superposition is carried out on the original characteristic diagram and the phantom characteristic diagram, and the superposition result is used as the output of the phantom module;
using a dense connection mode for a plurality of phantom modules, namely, the input of each phantom module is the channel superposition of the input characteristic graph of the first initial dense connection module and the output characteristic graphs of all the phantom modules before;
the number of channels of an input feature map of a dense connection module based on the phantom feature map is 60, the number of channels of an output feature map is 160, and 5 phantom modules are used for dense connection;
the number of channels of the input feature map of the dense connection module II based on the phantom feature map is 160, the number of channels of the output feature map is 320, and 8 phantom modules are used for dense connection;
the 13 phantom modules are added with 10 channels through the convolution layer and 10 channels through linear operation, so that the number of output channels of each phantom module is increased by 20 channels relative to the input channels of the phantom module.
6. The real-time semantic segmentation method for the aerial images of the yellow river ice unmanned aerial vehicle according to claim 1, wherein the first attention module and the second attention module are implemented as follows: and sequentially carrying out global average pooling, 1 × 1 convolution and batch processing regularization on the input feature map, and finally obtaining a channel attention vector through sigmoid, then multiplying the channel attention vector by a corresponding bit of the input feature map, and adding a multiplication result and the input feature map to obtain a channel weighted feature map.
7. The real-time semantic segmentation method for the aerial images of the Huanghe Ice slush unmanned aerial vehicle as claimed in claim 1, wherein the first up-sampling module and the second up-sampling module have the same structure and are implemented in the following manner: assume that the input feature map has a size of
Figure FDA0003604646190000041
Wherein the content of the first and second substances,
Figure FDA0003604646190000042
and
Figure FDA0003604646190000043
for feature height and width, C is the number of channels in the feature, and the input feature is passed through a convolutional layer of N convolution kernels of 1 × 1 size to produce a convolution kernel of size
Figure FDA0003604646190000044
New feature maps of (2); the new feature map is then reshaped to a size of
Figure FDA0003604646190000045
Output feature maps of (a); the first up-sampling module adopts 2 as the up-sampling multiple, and the novel up-samplingAnd the second module adopts 8 as an upsampling multiple.
8. The real-time semantic segmentation method for the aerial images of the yellow river ice unmanned aerial vehicle according to claim 1, wherein the feature fusion module has the following structure: firstly, stacking feature graph channels output by shallow detail branches and deep semantic branches by a feature fusion module, performing convolution kernel with step length of 1 and size of 1 multiplied by 1, and performing batch processing regularization and relu activation functions; secondly, performing global pooling on the output in the step one, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating a function through relu, performing convolution with the step length of 1 and the size of 1 multiplied by 1, activating the function through sigmoid, and multiplying the output by the corresponding bit of the output in the step one; and thirdly, adding and outputting the multiplication result in the step two and the output in the step one as the output of the characteristic fusion module.
CN202210415977.4A 2022-04-20 2022-04-20 Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image Active CN114943835B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210415977.4A CN114943835B (en) 2022-04-20 2022-04-20 Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210415977.4A CN114943835B (en) 2022-04-20 2022-04-20 Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image

Publications (2)

Publication Number Publication Date
CN114943835A true CN114943835A (en) 2022-08-26
CN114943835B CN114943835B (en) 2024-03-12

Family

ID=82908048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210415977.4A Active CN114943835B (en) 2022-04-20 2022-04-20 Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image

Country Status (1)

Country Link
CN (1) CN114943835B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710863A (en) * 2018-05-24 2018-10-26 东北大学 Unmanned plane Scene Semantics dividing method based on deep learning and system
CN111160311A (en) * 2020-01-02 2020-05-15 西北工业大学 Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network
WO2020101448A1 (en) * 2018-08-28 2020-05-22 Samsung Electronics Co., Ltd. Method and apparatus for image segmentation
CN111259898A (en) * 2020-01-08 2020-06-09 西安电子科技大学 Crop segmentation method based on unmanned aerial vehicle aerial image
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN113361373A (en) * 2021-06-02 2021-09-07 武汉理工大学 Real-time semantic segmentation method for aerial image in agricultural scene
CN113658189A (en) * 2021-09-01 2021-11-16 北京航空航天大学 Cross-scale feature fusion real-time semantic segmentation method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710863A (en) * 2018-05-24 2018-10-26 东北大学 Unmanned plane Scene Semantics dividing method based on deep learning and system
WO2020101448A1 (en) * 2018-08-28 2020-05-22 Samsung Electronics Co., Ltd. Method and apparatus for image segmentation
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
CN111160311A (en) * 2020-01-02 2020-05-15 西北工业大学 Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network
CN111259898A (en) * 2020-01-08 2020-06-09 西安电子科技大学 Crop segmentation method based on unmanned aerial vehicle aerial image
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN113361373A (en) * 2021-06-02 2021-09-07 武汉理工大学 Real-time semantic segmentation method for aerial image in agricultural scene
CN113658189A (en) * 2021-09-01 2021-11-16 北京航空航天大学 Cross-scale feature fusion real-time semantic segmentation method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李帅;郭艳艳;卫霞;: "基于下采样的特征融合遥感图像语义分割", 测试技术学报, no. 04, 31 December 2020 (2020-12-31), pages 61 - 67 *
熊伟;蔡咪;吕亚飞;裴家正;: "基于神经网络的遥感图像海陆语义分割方法", 计算机工程与应用, no. 15, 31 December 2020 (2020-12-31), pages 227 - 233 *
青晨;禹晶;肖创柏;段娟;: "深度卷积神经网络图像语义分割研究进展", 中国图象图形学报, no. 06, 16 June 2020 (2020-06-16), pages 32 - 33 *

Also Published As

Publication number Publication date
CN114943835B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN111160311B (en) Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network
CN112651973B (en) Semantic segmentation method based on cascade of feature pyramid attention and mixed attention
CN112085735B (en) Aluminum material image defect detection method based on self-adaptive anchor frame
CN111369563B (en) Semantic segmentation method based on pyramid void convolutional network
CN111612807B (en) Small target image segmentation method based on scale and edge information
Al-Haija et al. Multi-class weather classification using ResNet-18 CNN for autonomous IoT and CPS applications
CN107292875A (en) A kind of conspicuousness detection method based on global Local Feature Fusion
CN111079640B (en) Vehicle type identification method and system based on automatic amplification sample
CN111652273B (en) Deep learning-based RGB-D image classification method
CN115063573A (en) Multi-scale target detection method based on attention mechanism
CN112766283B (en) Two-phase flow pattern identification method based on multi-scale convolution network
CN113221852B (en) Target identification method and device
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN111639697B (en) Hyperspectral image classification method based on non-repeated sampling and prototype network
CN116704431A (en) On-line monitoring system and method for water pollution
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN115527072A (en) Chip surface defect detection method based on sparse space perception and meta-learning
CN115049945A (en) Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image
CN117197763A (en) Road crack detection method and system based on cross attention guide feature alignment network
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN114626476A (en) Bird fine-grained image recognition method and device based on Transformer and component feature fusion
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN114187506A (en) Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
CN114943835B (en) Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant