CN110853057B - Aerial image segmentation method based on global and multi-scale full-convolution network - Google Patents

Aerial image segmentation method based on global and multi-scale full-convolution network Download PDF

Info

Publication number
CN110853057B
CN110853057B CN201911087534.1A CN201911087534A CN110853057B CN 110853057 B CN110853057 B CN 110853057B CN 201911087534 A CN201911087534 A CN 201911087534A CN 110853057 B CN110853057 B CN 110853057B
Authority
CN
China
Prior art keywords
layer
convolution
global
network
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911087534.1A
Other languages
Chinese (zh)
Other versions
CN110853057A (en
Inventor
马晶晶
吴琳琳
唐旭
焦李成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201911087534.1A priority Critical patent/CN110853057B/en
Publication of CN110853057A publication Critical patent/CN110853057A/en
Application granted granted Critical
Publication of CN110853057B publication Critical patent/CN110853057B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses an aerial image segmentation method based on global and multi-scale full convolution networks, which comprises the following steps: constructing a global and multi-scale full convolution network; generating a training set; training a global and multi-scale full convolution network; and inputting the aerial image to be segmented into a trained global and multi-scale full convolution network for binary segmentation to generate a segmentation mask image. The method utilizes the global and multi-scale full convolution network to segment the aerial image, and embeds the global module and the multi-scale module in the global and multi-scale full convolution network, thereby extracting more refined segmentation mask, having strong robustness and high segmentation precision.

Description

Aerial image segmentation method based on global and multi-scale full-convolution network
Technical Field
The invention belongs to the technical field of image processing, and further relates to an aerial image segmentation method based on global and multi-scale full convolution networks in the technical field of image segmentation. The invention can be used for detecting the building target from the high-resolution aerial image and segmenting the area where the building is located from the image.
Background
With the continuous development of the current society, the planning of town construction becomes a hot topic of people's attention. With increasing building demand, more buildings add difficulty to the construction of urban infrastructure, such as traffic route planning, drainage system planning, convenience facility planning, and the like. Building detection and segmentation in the aerial images can help a construction planning department to detect and segment town buildings and build township infrastructures. However, the aerial images contain rich information and complex spatial details, the building targets occupy different areas in one aerial image, the shooting angles are different, the objects are more and complex, the appearance styles are different, and in addition, the buildings and the surrounding environment are shielded to different degrees, so that great challenges are brought to building detection and segmentation of the aerial images.
The remote sensing image segmentation method based on the fusion of the complete residual error and the multi-scale features is proposed in the patent document 'the remote sensing image segmentation method combining the complete residual error and the features' (patent application number: 201811306585.4, application publication number: CN109447994A) applied by Shanxi university of teachers and universities. The method comprises the following implementation steps: the method comprises the steps of improving a convolutional encoding-decoding network used as a segmented backbone network, adding a characteristic pyramid module for aggregating multi-scale context information by adopting the network as the segmented backbone network, adding a residual error unit in a convolutional layer corresponding to an encoder and a decoder of the backbone network, simultaneously fusing the characteristics in the encoder into the corresponding layer of the decoder in a pixel-by-pixel addition mode, and finally segmenting the remote sensing image by using the improved image segmentation network combining complete residual errors and multi-scale characteristic fusion. The method has the defects that the structure of the convolutional encoding-decoding network in the convolutional encoding-decoding network is realized by depending on a multilayer convolutional layer, only local information can be extracted due to the size limitation of a convolutional kernel, and global information is lacked, so that the segmentation precision is low.
A Remote Sensing image Segmentation method based on a full convolution recursive Network is proposed in a published paper "RiFCN: Current Network in full volumetric computational Network for continuous Segmentation of High Resolution Remote Sensing Images" (IEEE arxiv, 5 months in 2018). The method comprises the following implementation steps: processing data to construct a training sample set and a test set; constructing a bidirectional network containing a forward flow and a reverse flow as a main network for semantic segmentation; the forward flow is a convolution neural network used for feature extraction, images pass through the forward flow to obtain a multilevel convolution feature map from light to deep, and the backward flow utilizes all available characteristics of the forward flow to realize high-resolution prediction by utilizing cyclic connection. The method has the following defects: only the relation between the coding and decoding parts of the full convolution network is considered, the different effects of each convolution layer of the decoding part of the full convolution network on final prediction are not considered, the multi-scale characteristics are not considered, the similar objects with different sizes in the image are difficult to identify, the simplicity and the high efficiency of the network are not considered, and the network segmentation performance is not high.
Disclosure of Invention
The invention aims to provide an aerial image segmentation method based on global and multi-scale full convolution networks, aiming at the defects of the prior art.
The idea for realizing the purpose of the invention is to construct a global and multi-scale full convolution network for segmenting the aerial image, and embed a global module and a multi-scale module in the global and multi-scale full convolution network so as to improve the segmentation efficiency and the segmentation precision.
The method comprises the following specific steps:
(1) constructing a global and multi-scale full convolution network:
(1a) a global and multi-scale full convolution network is built, and the structure of the network is as follows in sequence: input layer → feature extraction layer → first combination module → fully-connected layer → deconvolution layer → second combination module → output layer;
the feature extraction layer consists of five convolution modules connected in series in a VGG16 model;
the first combination module has 7 layers, and the structure thereof is as follows in sequence: first convolution layer → transposed layer → first multiplication layer → softmax layer → second multiplication layer → second convolution layer → additive layer;
the structure of the full connecting layer is as follows in sequence: maximum pooling layer → third convolution layer → first dropout layer → fourth convolution layer → second dropout layer;
the deconvolution layer is composed of four deconvolution modules connected in series, and the structure of each deconvolution module is as follows: first upsampling layer → fifth convolution layer → third dropout layer;
the second combination module is formed by connecting three up-sampling modules in series, wherein each up-sampling module consists of a second up-sampling layer and a sixth convolution layer;
the output layer is formed by connecting a seventh convolution layer and Argmax in series and is used for generating a segmentation mask map;
wherein, the outputs of the second, third, fourth and fifth convolution modules of the feature extraction layer in the global and multi-scale full convolution network are respectively connected with the inputs of the first, second, third and fourth convolution modules of the network deconvolution layer in a way of adding pixel by pixel;
(1b) the parameters of the global and multiscale full convolution networks are set as follows:
the convolution kernel sizes of the first convolution layer and the second convolution layer are set to be 1 x 1 pixels, and the step length is set to be 1 pixel; the parameters of the feature extraction layer are the same as the network parameters of VGG 16;
the feature maps of the input and the output in the first combination module are set to be 512, the feature maps in the middle process are set to be 256,
the convolution kernel sizes of the full-connection layer and the deconvolution layer are set to be 3 x 3 pixels, the step length is set to be 1 pixel, and the dropout parameters in the full-connection layer and the deconvolution layer are set to be 0.5;
setting the feature mapping maps of each up-sampling layer in the second combination module to be 2, setting the sizes of convolution kernels of the sixth convolution layer to be 1 × 1 pixel, and setting the step length to be 1 pixel;
(2) generating a training set:
(2a) acquiring 31 aerial images with the size of 5000 multiplied by 5000 and corresponding actual class labels, wherein each image comprises a background class and a target class;
(2b) cutting each image into 256 multiplied by 256 sizes, dividing each pixel point by 255.0 for normalization processing to form a training set, and cutting the corresponding actual class label to form an actual class label of the training set;
(3) training global and multi-scale full convolution networks:
(3a) inputting the training set into a global and multi-scale full convolution network, and taking a feature map output by the global and multi-scale full convolution network as a segmentation mask map for network prediction;
(3b) iteratively updating the network weight value by using an Adam optimization algorithm until a loss function is converged to obtain a trained global and multi-scale full convolution network;
(4) generating a segmentation mask map:
and cutting each aerial image to be segmented into 256 multiplied by 256 sizes, dividing each pixel point by 255.0 for normalization processing, and inputting the pixel points into a trained global and multi-scale full convolution network for binary segmentation to obtain a final segmentation mask image.
Compared with the prior art, the invention has the following advantages:
firstly, the global information in the feature extraction layer is constructed and obtained by using the global module, so that the local information can be obtained through the convolution layer, the global information can be obtained through the global module, the image is segmented by using the local information and the global information, the problem that the segmentation precision is low because only the local information can be extracted due to the size limitation of a convolution kernel in the prior art and the global information is lacked is solved, and the method has the advantage of high segmentation precision.
Secondly, because the invention constructs and utilizes the multi-scale module to obtain the multi-scale information in the deconvolution layer, and uses the mask map obtained by the multi-level feature mapping map in series to replace the mask map obtained by only using the last level feature mapping map, the multi-scale information can be obtained, and the information of each level can be fully utilized, thereby overcoming the problem that the prior art does not consider the different functions of each convolution layer of the full convolution network decoding part on the final prediction and does not consider the multi-scale features, which causes the difficulty of identifying the same kind of objects with different sizes in the image, and leading the invention to identify the same kind of objects with different shapes more accurately.
Thirdly, because the connection between the feature extraction layer and the deconvolution layer of the invention makes full use of the information extracted by the feature extraction layer, and the addition of the feature extraction layer and the corresponding layer of the deconvolution layer can reduce the loss of high-frequency information generated by pooling operation without increasing the calculation amount, and overcome the problem of low network segmentation performance caused by the fact that the network is not simple and efficient in the prior art, the invention has the advantages of excellent segmentation performance and high robustness.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the global and multi-scale full convolution network of the present invention;
FIG. 3 is a block diagram of the invention.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
The implementation steps of the present invention are described in further detail with reference to fig. 1.
Step 1, constructing a global and multi-scale full convolution network.
Firstly, a global and multi-scale full convolution network is built, and the structure of the network is as follows in sequence: input layer → feature extraction layer → first combination module → fully connected layer → deconvolution layer → second combination module → output layer.
The feature extraction layer consists of five convolution modules connected in series in a VGG16 model;
the first combination module has 7 layers, and the structure thereof is as follows in sequence: first convolution layer → transpose layer → first multiplication layer → softmax layer → second multiplication layer → second convolution layer → addition layer.
The structure of the full connecting layer is as follows in sequence: max pooling layer → third convolution layer → first dropout layer → fourth convolution layer → second dropout layer.
The deconvolution layer is composed of four deconvolution modules connected in series, and the structure of each deconvolution module is as follows: first upsampling layer → fifth convolution layer → third dropout layer.
The second combination module is formed by connecting three up-sampling modules in series, wherein each up-sampling module consists of a second up-sampling layer and a sixth convolution layer.
The output layer is composed of a seventh convolutional layer and Argmax in series and is used for generating a segmentation mask map.
The outputs of the second, third, fourth and fifth convolution modules of the feature extraction layer in the global and multi-scale full convolution network are respectively connected with the inputs of the first, second, third and fourth convolution modules of the network deconvolution layer in a pixel-by-pixel addition mode.
The convolution kernels of a first convolution module and a second convolution module of five series convolution modules in the VGG16 model are all 3 x 3 pixels, and the step sizes of the first convolution layer and the second convolution layer are sequentially set to be 2 pixels and 1 pixel; the sizes of convolution kernels of the third convolution module, the fourth convolution module and the fifth convolution module are all 3 multiplied by 3 pixels, the step lengths of the third convolution layer, the fourth convolution module and the fifth convolution module are sequentially set to be 2 pixels, 1 pixel and 1 pixel, and weights trained on an Imagenet data set in advance are used as initial values of the models.
The structure of the first combination of modules in the constructed global and multi-scale full convolution network will be further described with reference to fig. 2.
The first combination module in the global and multi-scale full convolution network is mainly composed of 1 × 1 convolution, transposition, multiplication and addition, wherein the input X is the output characteristic mapping chart of the fifth convolution module of VGG16, and the same theta, and the first, theta, phi,
Figure GDA0003124758590000051
and g, adding the input X after transposition, multiplication, softmax operation and 1 multiplied by 1 convolution operation to obtain a final feature mapping graph Z containing global information.
Second, the parameters of the global and multiscale fully convolutional networks are set as follows.
The convolution kernel sizes of the first convolution layer and the second convolution layer are set to be 1 x 1 pixels, and the step length is set to be 1 pixel; the parameters of the feature extraction layer are the same as the network parameters of the VGG 16.
The feature maps of input and output in the first combination module are set to be 512, and the feature maps in the middle process are set to be 256.
The convolution kernel sizes of the full-connection layer and the deconvolution layer are set to be 3 x 3 pixels, the step length is set to be 1 pixel, and the dropout parameters in the full-connection layer and the deconvolution layer are set to be 0.5.
And setting the feature maps of each upsampling layer in the second combined module to be 2, setting the sizes of the convolution kernels of the sixth convolution layer to be 1 multiplied by 1 pixel, and setting the step length to be 1 pixel.
The input feature maps of the first, second, third, fourth and fifth convolution modules in the network parameters of the VGG16 are sequentially set to be 3, 64, 128, 256 and 512, and the output feature maps are sequentially set to be 64, 128, 256, 512 and 512.
And 2, generating a training set.
31 aerial images with the size of 5000 multiplied by 5000 and corresponding actual class labels are collected, and each image comprises a background class and a target class.
Cutting each image into 256 multiplied by 256 size, dividing each pixel point by 255.0 for normalization processing to form a training set, and cutting the corresponding actual class label to form the actual class label of the training set.
And 3, training the global and multi-scale full convolution network.
And inputting the training set into the global and multi-scale full convolution networks, and taking the feature maps output by the global and multi-scale full convolution networks as segmentation mask maps for network prediction.
And (3) using an Adam optimization algorithm to update the network weight values iteratively until the loss function is converged to obtain the trained global and multi-scale full convolution network.
The loss function is a spark-softmax cross entropy loss function, the loss function firstly converts an actual label from an original category index into one-hot coding, then performs softmax calculation on a prediction category label, and finally calculates cross entropy as a loss value, wherein the cross entropy calculation formula is as follows:
Hy'(y)=-∑y'logy
wherein y' is the actual class label of the training set, y is the segmentation mask graph predicted by the training set, and log is the logarithm operation with a base 10.
The step of iteratively updating the network weight values using the Adam optimization algorithm is as follows:
firstly, dividing a training set into a plurality of parts according to the following formula:
Figure GDA0003124758590000061
wherein G is the total number of images in the training set, M is the total number of images in the training set, Q is the number of each image in the training set, the number of each image is set according to the scale of the global and multi-scale full convolution network and the size of an input image, and when the network is deeper or each input image is larger, the value of Q is smaller;
secondly, any unselected image is taken from the divided training set and input into the global and multi-scale full convolution network, and the weighted value of the network is updated by using the following weighted value updating formula:
Figure GDA0003124758590000062
wherein, WnewFor the updated weight value, W is the initial weight value of the global and multi-scale full convolution network, L is the learning rate of the global and multi-scale full convolution network training, and the value range of the learning rate is [0.001-0.00001 ]]Denotes the operation of multiplication,
Figure GDA0003124758590000063
representing a partial derivation operation;
and thirdly, any unselected image is taken from the divided training set, the selected image is input into the global and multi-scale full convolution network, and the loss function loss value after the weight value is updated.
And 4, generating a segmentation mask map.
And cutting each aerial image to be segmented into 256 multiplied by 256 sizes, dividing each pixel point by 255.0 for normalization processing, and inputting the pixel points into a trained global and multi-scale full convolution network for binary segmentation to obtain a final segmentation mask image.
The working steps of the overall invention are further described with reference to fig. 3.
The method comprises the steps of sequentially inputting pictures of a training set into a constructed global and multi-scale full convolution network, extracting a feature mapping graph through a feature extraction layer consisting of a first convolution module, a second convolution module, a third convolution module, a fourth convolution module and a fifth convolution module, inputting the extracted feature mapping graph into the global module, passing through a full connection layer consisting of a maximum pooling layer, the third convolution layer, the fourth convolution layer and a first drop layer and a second drop layer, expanding the feature mapping graph through a deconvolution layer consisting of the first convolution module, the second convolution module, the third convolution module and the fourth convolution module, obtaining a multi-level feature mapping graph through first sampling, second sampling and third sampling for using multi-level information, then connecting the multi-level feature mapping graphs in series, and reducing the dimension of the connected feature mapping graph through a seventh convolution layer to obtain an output segmentation mask graph, wherein the 'accord with' means that pixel points are added together one by one.
The effect of the present invention will be further described with reference to simulation experiments.
1. Simulation conditions are as follows:
the hardware platform of the simulation experiment of the invention is as follows: CPU is Intel (R) core (TM) i7-8700X, main frequency is 3.2GHz, memory 64GB, GPU is NVIDIA 1080 Ti.
The software platform of the simulation experiment of the invention is as follows: ubuntu operating system and python 3.6.
2. Simulation content and result analysis:
the simulation experiment of the invention is to train the constructed global and multi-scale full convolution networks respectively by using the training images by adopting the invention and three prior arts (full convolution network method, segmentation network method and bidirectional full convolution network method). And (3) segmenting the image to be segmented by using the trained global and multi-scale full convolution network to obtain 25 (5 in each region) segmentation mask images of the image to be segmented.
The training Image and the Image to be segmented used in the simulation experiment are Aerial Image data sets in an Aerial Image Labeling data set Inria initial Image Labeling data set of a French national computer and an automated research institute. The aerial image dataset is collected from ten regions, five of which have real tags, each region has 36 images with a size of 5000 × 5000 × 3 pixels, the image tags are architectural and non-architectural, and the image format is tiff. The simulation experiment of the invention uses five regions with real labels to verify the effectiveness of the invention, and selects 6 th to 36 th aerial images of each region as training images of each region, and 1 st to 5 th aerial images as images to be segmented of each region.
In the simulation experiment, three prior arts are adopted:
the prior art full convolution network method refers to an aerial image Segmentation method proposed in the paper "full convolution Networks for magnetic Segmentation", IEEE Conference on computer Vision and Pattern registration "(CVPR, 2014)" published by Long et al, and the method uses an end-to-end convolution neural network and uses deconvolution to perform upsampling, which is referred to as a full convolution network method for short.
The prior art network segmentation method refers to a paper 'SegNet' published by Vijay et al: the method for dividing aerial images proposed in A Deep relational Encoder-Decoder Architecture for Image Segmentation (IEEE Arxiv, 2016) converts the maximum pooling into a Decoder to improve the resolution, which is called a dividing network method for short.
The prior art bi-directional full convolutional network method refers to the Mou et al paper "RiFCN: the method for segmenting the aerial image provided by the secure network in full volumetric network for the continuous segmentation for high resolution (IEEE Arxiv, 2018) improves the segmentation precision by using the circulation action of the forward flow and the backward flow, and is called as a bidirectional full convolution network method for short.
The segmentation accuracy of the segmentation mask map of the obtained 25 images to be segmented (5 images in each region) is evaluated by the four methods respectively by using two evaluation indexes (accuracy rate ACC and cross-over ratio IOU). The accuracy ACC, the cross-over ratio IOU, is calculated using the following formula, and the calculation results are plotted in table 1:
Figure GDA0003124758590000081
Figure GDA0003124758590000082
wherein, A represents the area of the predicted target label, B represents the area of the real target label, n is intersection operation, and U is union operation.
In table 1, "invention" represents the aerial image segmentation method based on global and multi-scale full convolution networks proposed by the present invention, "FCN" represents the full convolution network method proposed by Long et al, "SegNet" represents the segmentation network method proposed by Vijay et al, "RiFCN" represents the bi-directional full convolution network method proposed by Mou et al, "austin", "chicago", "kitsap", "tyrol _ w" and "vienna" are five regions respectively containing 5 segmentation mask maps, and "overall" is a whole region containing 25 segmentation mask maps.
TABLE 1 Performance evaluation table for semantic segmentation model of the invention and the existing aerial remote sensing image
austin chicago kitsap Tyrol_w vienna overall
The Invention (IOU) 78.90 69.84 66.87 75.29 80.59 75.97
(ACC) 96.89 92.78 99.27 98.05 94.56 96.35
FCN(IOU) 47.66 53.62 33.70 46.86 60.60 53.82
(ACC) 92.22 88.59 98.58 95.83 88.72 92.79
SegNet(IOU) 74.81 52.83 68.06 65.68 72.90 70.14
(ACC) 92.52 98.65 97.28 91.36 96.04 95.17
RiFCN(IOU) 76.84 67.45 63.95 73.19 79.18 74.00
(ACC) 96.50 91.76 99.14 97.75 93.95 95.82
As can be seen by combining the table 1, the accuracy rate ACC of all the areas containing 25 segmentation mask maps is 95.82%, the intersection ratio of all the areas is 74.00%, the two indexes are higher than those of 3 prior art methods, and the accuracy rate and the intersection ratio of each area are also higher than those of the 3 prior art methods, so that the method provided by the invention can obtain higher aerial image segmentation accuracy.
The above simulation experiments show that: the method can extract the global information in the feature extraction layer of the aerial image and combine the local information by utilizing the built global module, can extract the multi-scale information in the deconvolution layer of the aerial image by utilizing the built multi-scale module, can fully utilize the information extracted by the feature extraction layer by utilizing the connection between the built feature extraction layer and the deconvolution layer, solves the problems that only the local information can be extracted due to the size limitation of a convolution kernel in the prior art, the global information is lacked, the different effects of each convolution layer of a full convolution network decoding part on final prediction are not considered, the multi-scale feature and the network simplicity and high efficiency are not considered, the segmentation precision is low, the same kind of objects with different sizes in the image are difficult to identify, and the network segmentation performance is not high, and is a very practical aerial image segmentation method.

Claims (5)

1. An aerial image segmentation method based on global and multi-scale full convolution networks is characterized in that a global module is constructed and utilized to obtain global information in a feature extraction layer, a multi-scale module is constructed and utilized to obtain multi-scale information in an deconvolution layer, and the connection between the feature extraction layer and the deconvolution layer enables the information extracted by the feature extraction layer to be fully utilized, and the method specifically comprises the following steps:
(1) constructing a global and multi-scale full convolution network:
(1a) a global and multi-scale full convolution network is built, and the structure of the network is as follows in sequence: input layer → feature extraction layer → first combination module → fully-connected layer → deconvolution layer → second combination module → output layer;
the feature extraction layer consists of five convolution modules connected in series in a VGG16 model;
the first combination module has 7 layers, and the structure thereof is as follows in sequence: first convolution layer → transposed layer → first multiplication layer → softmax layer → second multiplication layer → second convolution layer → additive layer;
the structure of the full connecting layer is as follows in sequence: maximum pooling layer → third convolution layer → first dropout layer → fourth convolution layer → second dropout layer;
the deconvolution layer is composed of four deconvolution modules connected in series, and the structure of each deconvolution module is as follows: first upsampling layer → fifth convolution layer → third dropout layer;
the second combination module is formed by connecting three up-sampling modules in series, wherein each up-sampling module consists of a second up-sampling layer and a sixth convolution layer;
the output layer is formed by connecting a seventh convolution layer and Argmax in series and is used for generating a segmentation mask map;
wherein, the outputs of the second, third, fourth and fifth convolution modules of the feature extraction layer in the global and multi-scale full convolution network are respectively connected with the inputs of the first, second, third and fourth convolution modules of the network deconvolution layer in a way of adding pixel by pixel;
(1b) the parameters of the global and multiscale full convolution networks are set as follows:
the convolution kernel sizes of the first convolution layer and the second convolution layer are set to be 1 x 1 pixels, and the step length is set to be 1 pixel; the parameters of the feature extraction layer are the same as the network parameters of VGG 16;
the feature maps of the input and the output in the first combination module are set to be 512, the feature maps in the middle process are set to be 256,
the convolution kernel sizes of the full-connection layer and the deconvolution layer are set to be 3 x 3 pixels, the step length is set to be 1 pixel, and the dropout parameters in the full-connection layer and the deconvolution layer are set to be 0.5;
setting the feature mapping maps of each up-sampling layer in the second combination module to be 2, setting the sizes of convolution kernels of the sixth convolution layer to be 1 × 1 pixel, and setting the step length to be 1 pixel;
(2) generating a training set:
(2a) acquiring 31 aerial images with the size of 5000 multiplied by 5000 and corresponding actual class labels, wherein each image comprises a background class and a target class;
(2b) cutting each image into 256 multiplied by 256 sizes, dividing each pixel point by 255.0 for normalization processing to form a training set, and cutting the corresponding actual class label to form an actual class label of the training set;
(3) training global and multi-scale full convolution networks:
(3a) inputting the training set into a global and multi-scale full convolution network, and taking a feature map output by the global and multi-scale full convolution network as a segmentation mask map for network prediction;
(3b) iteratively updating the network weight value by using an Adam optimization algorithm until a loss function is converged to obtain a trained global and multi-scale full convolution network;
(4) generating a segmentation mask map:
and cutting each aerial image to be segmented into 256 multiplied by 256 sizes, dividing each pixel point by 255.0 for normalization processing, and inputting the pixel points into a trained global and multi-scale full convolution network for binary segmentation to obtain a final segmentation mask image.
2. The aerial image segmentation method based on the global and multi-scale full convolution network of claim 1 is characterized in that the convolution kernel sizes of the first convolution module and the second convolution module of the five series convolution modules in the VGG16 model in step (1a) are 3 x 3 pixels, and the step sizes of the first convolution module and the second convolution module are sequentially set to be 2 pixels and 1 pixel; the sizes of convolution kernels of the third convolution module, the fourth convolution module and the fifth convolution module are all 3 multiplied by 3 pixels, the step lengths of the third convolution layer, the fourth convolution module and the fifth convolution module are sequentially set to be 2 pixels, 1 pixel and 1 pixel, and weights trained on an Imagenet data set in advance are used as initial values of the models.
3. The aerial image segmentation method based on the global and multi-scale full convolution network of claim 1 is characterized in that in the step (1b), the input feature maps of the first, second, third, fourth and fifth convolution modules in the network parameters of the VGG16 are sequentially set to be 3, 64, 128, 256 and 512, and the output feature maps are sequentially set to be 64, 128, 256, 512 and 512.
4. The aerial image segmentation method based on the global and multi-scale full convolutional network as claimed in claim 1, wherein the loss function in step (3b) is a sparse-softmax cross entropy loss function, the loss function first converts the actual label from the original category index into one-hot coding, then performs softmax calculation on the predicted category label, and finally calculates the cross entropy as a loss value, and the cross entropy calculation formula is as follows:
Hy'(y)=-∑y'logy
wherein y' is the actual class label of the training set, y is the segmentation mask graph predicted by the training set, and log is the logarithm operation with a base 10.
5. The method for segmenting aerial images based on global and multi-scale full convolution networks according to claim 1, wherein the step of iteratively updating network weight values by using an Adam optimization algorithm in step (3b) is as follows:
firstly, dividing a training set into a plurality of parts according to the following formula:
Figure FDA0003124758580000031
wherein G is the total number of images in the training set, M is the total number of images in the training set, Q is the number of each image in the training set, the number of each image is set according to the scale of the global and multi-scale full convolution network and the size of an input image, and when the network is deeper or each input image is larger, the value of Q is smaller;
secondly, any unselected image is taken from the divided training set and input into the global and multi-scale full convolution network, and the weighted value of the network is updated by using the following weighted value updating formula:
Figure FDA0003124758580000032
wherein, WnewFor the updated weight value, W is the initial weight value of the global and multi-scale full convolution network, L is the learning rate of the global and multi-scale full convolution network training, and the value range of the learning rate is [0.001-0.00001 ]]Denotes the operation of multiplication,
Figure FDA0003124758580000033
representing a partial derivation operation;
and thirdly, any unselected image is taken from the divided training set, the selected image is input into the global and multi-scale full convolution network, and the loss function loss value after the weight value is updated.
CN201911087534.1A 2019-11-08 2019-11-08 Aerial image segmentation method based on global and multi-scale full-convolution network Active CN110853057B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911087534.1A CN110853057B (en) 2019-11-08 2019-11-08 Aerial image segmentation method based on global and multi-scale full-convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911087534.1A CN110853057B (en) 2019-11-08 2019-11-08 Aerial image segmentation method based on global and multi-scale full-convolution network

Publications (2)

Publication Number Publication Date
CN110853057A CN110853057A (en) 2020-02-28
CN110853057B true CN110853057B (en) 2021-10-29

Family

ID=69600177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911087534.1A Active CN110853057B (en) 2019-11-08 2019-11-08 Aerial image segmentation method based on global and multi-scale full-convolution network

Country Status (1)

Country Link
CN (1) CN110853057B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496159B (en) * 2020-03-20 2022-12-23 昆明理工大学 Multi-scale convolution and dynamic weight cost function smoke target segmentation method
CN111640116B (en) * 2020-05-29 2023-04-18 广西大学 Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN111784653B (en) * 2020-06-28 2023-08-01 西安电子科技大学 Multi-scale network MRI pancreas contour positioning method based on shape constraint
CN112183448B (en) * 2020-10-15 2023-05-12 中国农业大学 Method for dividing pod-removed soybean image based on three-level classification and multi-scale FCN
CN114419381B (en) * 2022-04-01 2022-06-24 城云科技(中国)有限公司 Semantic segmentation method and road ponding detection method and device applying same
CN114821174B (en) * 2022-04-24 2024-02-27 西北工业大学 Content perception-based transmission line aerial image data cleaning method
CN116071607B (en) * 2023-03-08 2023-08-08 中国石油大学(华东) Reservoir aerial image classification and image segmentation method and system based on residual error network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107397658A (en) * 2017-07-26 2017-11-28 成都快眼科技有限公司 A kind of multiple dimensioned full convolutional network and vision blind-guiding method and device
CN107944347A (en) * 2017-11-03 2018-04-20 西安电子科技大学 Polarization SAR object detection method based on multiple dimensioned FCN CRF
CN110288613A (en) * 2019-06-12 2019-09-27 中国科学院重庆绿色智能技术研究院 A kind of histopathology image partition method of very-high solution

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169974A (en) * 2017-05-26 2017-09-15 中国科学技术大学 It is a kind of based on the image partition method for supervising full convolutional neural networks more
CN107397658A (en) * 2017-07-26 2017-11-28 成都快眼科技有限公司 A kind of multiple dimensioned full convolutional network and vision blind-guiding method and device
CN107944347A (en) * 2017-11-03 2018-04-20 西安电子科技大学 Polarization SAR object detection method based on multiple dimensioned FCN CRF
CN110288613A (en) * 2019-06-12 2019-09-27 中国科学院重庆绿色智能技术研究院 A kind of histopathology image partition method of very-high solution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION;Karen Simonyan,Andrew Zisserman;《https://arxiv.org/pdf/1409.1556.pdf》;20150410;第1-14页 *
全卷积神经网络的多尺度人脸检测的研究;罗明柱,肖业伟;《计算机工程与应用》;20181121;第55卷(第5期);第124-128页 *

Also Published As

Publication number Publication date
CN110853057A (en) 2020-02-28

Similar Documents

Publication Publication Date Title
CN110853057B (en) Aerial image segmentation method based on global and multi-scale full-convolution network
CN110136170B (en) Remote sensing image building change detection method based on convolutional neural network
CN112070779B (en) Remote sensing image road segmentation method based on convolutional neural network weak supervised learning
CN111898439B (en) Deep learning-based traffic scene joint target detection and semantic segmentation method
CN113780149B (en) Remote sensing image building target efficient extraction method based on attention mechanism
CN111340034B (en) Text detection and identification method and system for natural scene
CN105608454A (en) Text structure part detection neural network based text detection method and system
CN113505842B (en) Automatic urban building extraction method suitable for large-scale regional remote sensing image
CN114821342B (en) Remote sensing image road extraction method and system
CN113256649B (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN111709387B (en) Building segmentation method and system for high-resolution remote sensing image
CN112633140A (en) Multi-spectral remote sensing image urban village multi-category building semantic segmentation method and system
CN113239753A (en) Improved traffic sign detection and identification method based on YOLOv4
CN110992366A (en) Image semantic segmentation method and device and storage medium
CN112991364A (en) Road scene semantic segmentation method based on convolution neural network cross-modal fusion
CN116206185A (en) Lightweight small target detection method based on improved YOLOv7
CN113052106A (en) Airplane take-off and landing runway identification method based on PSPNet network
CN114820655A (en) Weak supervision building segmentation method taking reliable area as attention mechanism supervision
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN116206112A (en) Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM
CN114973136A (en) Scene image recognition method under extreme conditions
Thati et al. A systematic extraction of glacial lakes for satellite imagery using deep learning based technique
CN114119621A (en) SAR remote sensing image water area segmentation method based on depth coding and decoding fusion network
CN116778318A (en) Convolutional neural network remote sensing image road extraction model and method
CN114743023B (en) Wheat spider image detection method based on RetinaNet model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant