CN109447994A

CN109447994A - In conjunction with the remote sensing image segmentation method of complete residual error and Fusion Features

Info

Publication number: CN109447994A
Application number: CN201811306585.4A
Authority: CN
Inventors: 汪西莉; 张小娟; 洪灵; 刘明; 刘侍刚
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2018-11-05
Filing date: 2018-11-05
Publication date: 2019-03-08
Anticipated expiration: 2038-11-05
Also published as: CN109447994B

Abstract

结合完全残差与多尺度特征融合的遥感图像分割方法，包括：S100：对作为分割的主干网络卷积编码‑解码网络进行改进，具体为：S101：采用卷积编码‑解码网络作为分割的主干网络；S102：在所述主干网络中加入聚合多尺度上下文信息的特征金字塔模块；S103：在所述主干网络的编码器和解码器对应的卷积层内部加入残差单元，同时将编码器中的特征以逐像素相加的方式融合到解码器相应层中；S200：采用改进后的结合完全残差与多尺度特征融合的图像分割网络进行遥感图像的分割；S300：输出遥感图像的分割结果。该方法既简化了深层网络的训练，增强了特征融合，又能使网络提取丰富的上下文信息，应对目标尺度变化，提升分割性能。A remote sensing image segmentation method combining full residuals and multi-scale features, including: S100: improving a convolutional coding-decoding network as a segmentation backbone network, specifically: S101: using a convolutional coding-decoding network as a segmentation backbone a network; S102: adding a feature pyramid module that aggregates multi-scale context information to the backbone network; S103: adding a residual unit to the convolution layer corresponding to the encoder and the decoder of the backbone network, and simultaneously The feature is fused to the corresponding layer of the decoder in a pixel-by-pixel addition manner; S200: the image segmentation network combining the complete residual and the multi-scale feature fusion is used to segment the remote sensing image; S300: outputting the segmentation result of the remote sensing image . The method not only simplifies the training of the deep network, enhances the feature fusion, but also enables the network to extract rich context information, cope with target scale changes, and improve segmentation performance.

Description

In conjunction with the remote sensing image segmentation method of complete residual error and Fusion Features

Technical field

The disclosure belongs to technical field of remote sensing image processing, in particular to a kind of complete residual error of combination is melted with Analysis On Multi-scale Features The remote sensing image segmentation method of conjunction.

Background technique

With the appearance of unmanned plane and the improvement of acquisition sensor, the remote sensing images of extreme resolution ratio (10 centimetres of <) become It can use, especially in urban area.Compared with normal image, with the raising of spatial resolution, spectrum that remote sensing images include Information and terrestrial object information are more and more abundant, target scale is different and image in there are it is more block, shade phenomena such as, these are all The understanding of high-resolution remote sensing image brings challenge.Therefore, the research for carrying out Remote Sensing Image Segmentation, to people in remotely-sensed data Growing demand handles aspect, such as the detection of environmental modeling, land use change survey and urban planning, there is important meaning Justice.

Image segmentation, which refers to the process of, is divided into multiple images subregion for the pixel set in image with similar features, It can be regarded as distributing a unique label (or classification) for pixel each in image, and then make the pixel with same label With certain common visual characteristic, image is made to be easier to understand and analyze.Currently, deep learning method, especially convolution are refreshing Remarkable result is achieved in field of image processing through network (Convolutional Neural Network, CNN), and to remote sensing The influence of image procossing is increasing.

Deep learning can be applied to image segmentation, but come with some shortcomings.It is first for depth convolutional neural networks First, more ratio cavity convolution sum spatial pyramids pond structure can extract the characteristic information under different scale, but cavity volume Grid phenomenon caused by the operation of long-pending and pondization and local message loss have considerable restraint to the promotion of final segmentation precision.Secondly, Although service performance is higher and the deeper convolutional neural networks of level can mention to a certain extent as the core network of segmentation High segmentation precision and gradient is overcome to disappear, but their network structure is excessively complicated, training saves as cost to consume in a large amount of. It is considered that feature at all levels is helpful to semantic segmentation, advanced features facilitate classification identification, and low-level features help In the promotion of segmentation result details.

Summary of the invention

To solve the above-mentioned problems, present disclose provides the remote sensing figures of a kind of complete residual error of combination and multi-scale feature fusion As dividing method, include the following steps:

S100: to the core network as segmentation: convolutional encoding-decoding network improves, specifically:

S101: the core network using convolutional encoding-decoding network as segmentation, the core network include two components: Encoder and decoder；

S102: the feature pyramid module for polymerizeing multiple dimensioned contextual information is added in the core network；

S103: residual unit is added inside the corresponding convolutional layer of encoder and decoder of the core network, simultaneously Feature in encoder is fused in decoder equivalent layer in a manner of being added pixel-by-pixel；

S200: remote sensing figure is carried out using the image segmentation network of the complete residual error of improved combination and multi-scale feature fusion The segmentation of picture；

S300: the segmentation result of remote sensing images is exported.

Through the above technical solutions, first on the basis of convolutional encoding-decoding network by the feature in encoder with by The mode that pixel is added is fused in decoder equivalent layer, and part connection is alternatively referred to as the residual error connection of long range；Secondly exist Short-range residual error connection is introduced inside the corresponding convolutional layer of encoder and decoder.It is connected over long distances with short-range complete residual error It is not only that this layer has incorporated more original input informations, enhances Fusion Features, but also gradient can allow for directly to propagate To any one convolutional layer, simplify training process.During by the Fusion Features in encoder to decoder, in addition to selecting Compared with the last layer feature of shallow-layer, all advanced features of deeper have especially been selected, and have used the more rulers of polymerization in the 5th stage The Fusion Features of the feature pyramid module of degree information, different content and different scale enable whole network to successfully manage target Dimensional variation promotes segmentation performance.

Detailed description of the invention

Fig. 1 is a kind of remote sensing of combination complete residual error and multi-scale feature fusion provided in an embodiment of the present disclosure The flow diagram of image partition method；

Fig. 2 is the structural schematic diagram of feature pyramid module in an embodiment of the present disclosure；

Fig. 3 is the structural schematic diagram of residual unit in an embodiment of the present disclosure；

Fig. 4 is the segmentation result of each depth network on ISPRS Vaihingen test set in an embodiment of the present disclosure Comparison diagram；

Fig. 5 is evaluation result of the distinct methods on every width figure corresponded in Fig. 4 in an embodiment of the present disclosure；

Fig. 6 (a), Fig. 6 (b) be in an embodiment of the present disclosure each depth network on ISPRS Vaihingen test set Evaluation result comparison diagram；

Fig. 7 is to carry out the method in this method and the preferable document of current segmentation performance pair in an embodiment of the present disclosure The comparison diagram of ratio；

Fig. 8 is that network is respectively compared in an embodiment of the present disclosure in the comparison diagram without the segmentation result on mark image；

Fig. 9 is the segmentation result comparison of each depth network on Road Detection test set in an embodiment of the present disclosure Figure；

Figure 10 is evaluation result of the distinct methods on every width figure in an embodiment of the present disclosure corresponding to Fig. 9；

Figure 11 is evaluation result of each depth network on Road Detection test set in an embodiment of the present disclosure Comparison diagram；

Figure 12 is by this method and the research method of existing road segmentation in an embodiment of the present disclosure in Road The comparison diagram of comparative analysis on Detection data set；

Figure 13 is segmentation knot of the different comparison networks in road image on unmarked image in an embodiment of the present disclosure The comparison diagram of fruit.

Specific embodiment

In one embodiment, as shown in Figure 1, disclosing the remote sensing of a kind of combination complete residual error and multi-scale feature fusion Image partition method includes the following steps:

S300: the segmentation result of remote sensing images is exported.

Wherein, in conjunction with the definition of complete residual error and the image segmentation network of multi-scale feature fusion: the network is in convolution On the basis of coding-is decoded, by between encoder and decoder and its interior roll lamination is added to complete residual error and connects, The feature pyramid of a polymerization Analysis On Multi-scale Features has been used the convolution feature in the last one convolution stage of encoder simultaneously Module FPM is completed.

Specifically: firstly, basic network is a convolutional encoding-decoding network, it is by full symmetric encoder and solution Code device composition.Secondly, first adding short-range residual error connection inside each convolutional layer of encoder and decoder.Residual error attended operation Realization be one data of input first, convolutional layer, batch normalization unit and amendment linear unit etc. are passed through to the data Sequence of operations unit learns the residual error of input data, then this residual error is added with input data and is exported.It will compile simultaneously The feature in each convolution stage is fused in the equivalent layer of decoder in a manner of being added pixel-by-pixel in code device, analogy residual unit The residual error that the connection of this step is known as long range can be connected, short distance is connected with the residual error of long range and is known as by handling principle Complete residual error connection.Finally, when by Fusion Features equivalent layer into decoder in convolution stage each in encoder, to encoder In the feature in the 5th convolution stage used a feature pyramid module FPM, the context thus having polymerize under different scale Information, then by obtained multi-scale feature fusion into decoder equivalent layer.The above connection of residual error completely and feature pyramid mould The operation of block is to carry out simultaneously, belongs to the different operation of same grade.

Above-described embodiment uses the image segmentation network of the complete residual error of improved combination and multi-scale feature fusion, both The training for simplifying deep layer network, enhances Fusion Features, moreover, different scale and the Fusion Features of mode enable the network to mention Contextual information abundant is taken, reply target scale variation promotes segmentation performance.

In another embodiment, the encoder in step S101 includes 13 convolutional layers and 5 pond layers, is being compiled Code device one decoder of stacked on top, the decoder and encoder are in complete mirror, include 13 convolutional layers and 5 Xie Chi Change layer.

For the embodiment, being achieved in that for encoder carries out spy to input data by the convolution kernel of sizes Sign is extracted, this kind of implementation can obtain good feature extraction effect.

In another embodiment, 13 convolutional layers of the encoder are divided into five convolution stages, first volume Product stage and second convolution stage respectively include two convolutional layers, third convolution stage, Volume Four product stage and the 5th convolution order Section respectively includes three convolutional layers.

In another embodiment, linear comprising a batch normalization unit and an amendment after each convolutional layer Unit, wherein the characteristic extracted is normalized batch normalization unit, and amendment linear unit is non-thread for being added Sexual factor；It include a pond layer after each convolution stage.

For the embodiment, it is able to solve during training network using batch normalization unit, intermediate layer data The problem of distribution changes accelerates training speed to prevent gradient from disappearing；Using amendment linear unit be added it is non-linear because Element promotes network to the ability to express of data.

In another embodiment, the pondization operation in the encoder is using maximum pond, and saves maximum pond The index position of change.

For the embodiment, Xie Chiization layer can be conducive to for the lesser characteristic pattern expansion of size by saving maximum pondization index Greatly to obtain sparse features figure.

In another embodiment, commonly extract different scale contextual information pyramid structure such as PSPNet and Spatial pyramid pond in DeepLab network or the ASPP module with empty convolution, this generic module are spliced with parallel channel Mode polymerize multi-scale information, on the one hand network parameter can be made excessive in this way, the operation of another aspect pondization and empty convolution are divided Local message is not easily caused to lose and grid phenomenon, the final locally coherence for influencing characteristic pattern.Therefore the feature in this method Pyramid module (FPM), structure as shown in Fig. 2, first use 3x3, the convolution kernel of 5x5 is to former input feature vector figure respectively (conv5) contextual information under different scale is extracted, then is gradually integrated to reach the mesh in conjunction with adjacent scale contextual feature 's.Then 1x1 convolution is carried out to former input feature vector figure (conv5) and is multiplied with Analysis On Multi-scale Features with pixel-wise.Finally merge Global pool information improves the performance of feature pyramid module.Wherein, the Upsample in Fig. 2 refers to the size of characteristic pattern Given resolution is extended to by deconvolution operation.

For the embodiment, using feature pyramid module mitigate computation burden, not will cause local message lose and Grid phenomenon.

In another embodiment, in the encoder the Fusion Features in the 5th convolution stage into decoder before equivalent layer Use the feature pyramid module.

It, will not using biggish convolution kernel since the advanced features figure resolution ratio of deeper is smaller for the embodiment Excessive computation burden is brought, so the selection of feature pyramid module is in conv5 stages operating.

In another embodiment, described gradually integrate is to polymerize multiple dimensioned letter in a manner of being gradually added pixel-by-pixel Breath.

For the embodiment, it polymerize multi-scale information by the way of being gradually added pixel-by-pixel, does so and consider The hierarchical dependencies of feature, maintain the locally coherence of characteristic information under different scale.

In another embodiment, the feature in encoder is merged in a manner of being added pixel-by-pixel described in step S103 It is specifically into decoder equivalent layer:

The last layer convolution characteristic pattern is only selected to the second convolution stage in the first convolution stage in encoder and encoder, All convolution are selected to the 5th convolution stage in Volume Four product stage in third convolution stage in encoder, encoder and encoder Characteristic pattern is added fusion to do pixel-by-pixel.

For the embodiment, reduce the loss of characteristic pattern resolution ratio.

In another embodiment, this method joined in the corresponding convolution order intersegmental part of encoder and decoder such as Fig. 3 Residual unit, referred to as short distance residual error connect.In Fig. 3, X_l, y indicates the, and a residual unit is output and input, F (X_l) The residual error that the residual unit learns is indicated, by a series of convolutional layer, batch normalization unit (batch Normalization, BN), the operations such as amendment linear unit (rectified linear unit, RELU) learn to obtain.It is described For convolutional layer for extracting feature, the batch normalization unit is described to repair for the characteristic extracted to be normalized Linear positive unit is for being added non-linear factor.Y=F (X_l)+X_l, in special circumstances, as residual error F (X_lWhen)=0, output is equal to Input.Fusion Features in step S102 can be connected the residual error connection for being known as long range by the principle of analogy Fig. 3 residual unit, It connect with short distance residual error together constitutes complete residual error connection, on the one hand solves depth network because level deepens appearance Gradient disappearance problem, on the other hand for depth network because caused by convolution operation profile information lose, complete residual error connection Because not only having merged Analysis On Multi-scale Features, the original input information of this layer is also merged, thus to a certain extent to loss Information is supplemented, and Fusion Features are further enhanced.

For the embodiment, using residual unit, gradient is effectively prevent to disappear.

In another embodiment, the work station for being equipped with 64 Ubuntu systems, hardware configuration Intel are used (R) Xeon (R) CPU E5-2690 v32.6GHz processor, 256GB memory and 4TB hard disk.The training of whole network uses Caffe deep learning platform is accelerated using one piece of NVIDIA Tesla K 40c 126B video memory CPU in training process.Net Network parameter using on ImageNet data set pre-training resulting VGG16 initialize, remaining layer parameter passes through He et al. (2015) the MSRA initial method proposed is initialized, and when only considering input number n, it can make weight obey mean value It is 0, variance is the Gaussian Profile of 2/n.In the training process, fixed learning rate is 0.0001, batch_size 5, and gamma is 1, weight decays to 0.0002, and momentum 0.99, maximum number of iterations is 100000 times.

In trained back-propagation phase, error is calculated by cross entropy loss function, more using stochastic gradient descent method The weight of new whole network, the definition of cross entropy loss function are as follows:

l_iIndicate the true tag at pixel i, p_{K, i}Indicate that pixel i belongs to the output probability of kth class, K indicates classification Sum, N indicate that the sum of all pixels point in batch images, σ () indicate a sign function, work as l_iIt is 1 when=k, otherwise It is 0.L indicates true tag set, and p indicates the output of the last one convolutional layer in decoder, and θ indicates the parameter in loss function Collection, log default are bottom with 10.

Convolutional neural networks are by the error-duration model of network end-point using back-propagation algorithm to every in deep learning field One layer, these layer of modification is allowed to update the weight for working as layer, each layer of convolutional neural networks is finally made to extract the ability of feature more It is good.Back-propagation algorithm (Back Propagation, BP) standard step be include that a propagated forward stage and one are reversed Propagation stage.The propagated forward stage finally obtains in the end of network according to the feature of initially given weights learning input picture One predicted value is not related to weight there are an error between the predicted value and really given label value in this stage It updates.In order to enable each layer in network of weight preferably in analog image feature distribution, needed in back-propagation phase The above error is passed back into preceding layer layer by layer to update each layer of weight.Multiple propagated forward has updated with back-propagation process The predicted value that network can be made finally to learn after weight more approaches true tag value.Used when updating weight with Machine gradient descent algorithm.Above-mentioned mentioned error needs to define a loss function to calculate, and is damaged in the method using cross entropy Function is lost to calculate error of the propagated forward later between true tag value.

In another embodiment, the performance of proposed network segmentation remote sensing images is verified using following two datasets simultaneously Data extending is done to following two datasets, is specifically described as follows:

(1) ISPRS Vaihingen Challenge Dataset: it is ISPRS 2D semantic label in Vaihingen The benchmark dataset of challenge, by 3 wave band IRRG (near-infrared, infrared, green) image data and corresponding digital surface network (DSM) and normalization digital surface network (NDSM) data form.The data set include 33 sizes it is not equal, surface sample away from From the image for 9cm, wherein there is 16 tape label figures, each image is all marked as six classes, i.e. impermeable surface (Impervious surfaces), building (Building), short vegetation (Low vegetation), trees (Tree), vapour Vehicle (Car), clutter or background (Clutter/Background).12 are randomly selected from 16 images of tape label as instruction Practice collection, 2 as verifying collection, 2 be used as test set.The data set is relatively small for training depth network, in experiment The image block of 256x256 is selected to train network.The training set of above-mentioned division and verifying collection quantity for training depth network and Say it is relatively small, therefore for training set and verifying collect, we carry out expanding data using dual stage process.First stage, for Given image is first 256x256 using size, the sliding window that step-length is 128 is right with it in IRRG image due to size etc. It is intercepted on the label figure answered, then extracts the image block (that is, the upper right corner, the lower left corner and lower right corner) of 3 fixed positions.The Two-stage first carries out 90 degree, 180 degree and 270 degree of rotations respectively to all image blocks, then does water to image blocks obtained by all rotations Flat vertical mirror overturning.Finally respectively obtain 15000 training set samples and 2045 verifying collection samples.

(2) Road Detection Dataset: the data set is adopted from Google Earth by Cheng et al. (2017) Collect and hand labeled lane segmentation with reference to figure with its corresponding center line with reference to figure, be maximum road data collection at present. It includes the high-definition picture that 224 spatial resolutions are 1.2m, each image at least 600x600 pixel, road width About 12~15 pixels.224 width images are randomly divided into 180 training sets, 14 verifying collection and 30 test sets by us.It is real The middle image block for selecting 300x300 is tested to train network.Similarly, number is expanded using dual stage process to training set and verifying collection According to.First stage first extracts the image block of 4 fixed positions (that is, the upper left corner, the upper right corner, the lower left corner for given image And the lower right corner), it reuses the sliding window that size is 300x300 and intercepts 25 image blocks at random in original image and label figure.The Two-stage is first rotated all image blocks with every 90 degree of step-length, is then overturn in the horizontal and vertical directions.Most 31320 training set samples and 2436 verifying collection samples are respectively obtained eventually.

In another embodiment, in order to verify the validity of image segmentation network of the invention, respectively with following networks It compares, is specifically described as follows:

FCN8s (Long etc., 2015), DeconvNet (Noh etc., 2015), SegNet (Badrinarayanan etc., And four kinds of semantic segmentation networks such as U-Net (Ronneberger etc., 2015) 2017).

These four semantic segmentation networks, for configuration aspects, FCN8s structure is most simple, the FCN8s net based on VGG16 The coded portion of network includes 15 convolutional layers and 5 pond layers, and decoded portion is by the characteristic pattern of third and fourth and five convolutional layers It operates to expand and be successively added by deconvolution and carries out Fusion Features, finally carry out pixel class prediction again.DeconvNet, SegNet and U-Net network can incorporate into as full symmetric this major class of coding-decoding network, and constructional depth is suitable, it Encoder be all to operate to complete by convolution sum pondization, the decoder of DeconvNet and SegNet by Xie Chiization with instead Convolution (or convolution) operation is completed, and the decoder of U-Net network only operates completion by deconvolution.It is this kind of compared to FCN8s Coding-decoding network decoding process is deeper.For in terms of the Fusion Features, FCN8s and U-Net network has all carried out feature and has melted It closes, FCN8s carries out third and fourth in encoder, the characteristic pattern in five stages to be successively added fusion.U-Net network is by encoder In each convolution stage the last layer characteristic pattern all duplication has been fused in the equivalent layer of decoder, the characteristic information of fusion is more It is more, amalgamation mode is increasingly complex.And DeconvNet and SegNet network does not utilize Fusion Features in decoding process, they are only Be the advanced features in encoder are successively extended to the equirotal characteristic pattern of input picture, it is pre- finally to do pixel class It surveys.

Image segmentation network of the invention can also incorporate into as this major class of coding-decoding network, in structure with U-Net Network is very similar, but there is also 4 points of differences.First point: amalgamation mode is different, and image segmentation network of the invention will encode Characteristic pattern in device is fused in the equivalent layer of decoder in a manner of being added pixel-by-pixel, and U-Net network is spliced with channel Mode carry out Fusion Features.Splice compared to channel, the amalgamation mode being added pixel-by-pixel will not increase additional parameter to network. Second point: fusion content is different, since gradually the operation of convolution sum pondization can lose characteristic pattern resolution ratio, this hair in encoder Bright image segmentation network selected in fusion the last one convolution feature of the first and second stage and third and fourth, the institute in five stages There is convolution feature, and U-Net network has only selected each convolution stage the last layer feature in encoder in fusion.Thirdly: Analysis On Multi-scale Features are merged, before the characteristic pattern in the 5th stage is fused to equivalent layer, image segmentation network of the invention is utilized Feature pyramid module extracts Analysis On Multi-scale Features information, can cope with the multiple dimensioned variation of target, and U-Net network does not melt Close different scale feature.4th point: complete residual error connection, image segmentation network of the invention are corresponding with decoder in encoder Increase residual error connection inside convolutional layer, it connects with the Fusion Features in image segmentation network of the invention and constitutes completely Residual error connection, the complete residual error connection allow gradient that can be propagate directly to any one convolutional layer, simplify training process.And U- Residual error connection is not used in Net network.

In another embodiment, for the quality of quantitative evaluation segmentation network performance, following evaluation index has been used, it Explanation and definition it is as follows:

F1- value (F1-score), whole accuracy rate (0A) and friendship and than (IOU).

F1 value is the harmomic mean of accurate rate (P) and recall rate (R), is a comprehensive evaluation index；Whole accuracy rate (0A) is the percentage measured all pixels being correctly marked and account for total number of image pixels, and definition difference is as follows:

Wherein:The positive class of TP:true positive is determined the class that is positive；FP: The negative class of false positive is determined the class that is positive；The positive class of FN:false negative is determined the class that is negative；TN:true The negative class of negative is determined the class that is negative.

IOU is the gauge of semantic segmentation, indicates the pixel number and in advance of the intersection set of predicted value and true tag value The ratio of the pixel number of the union set of measured value and true tag value, definition are as follows:

Wherein: P_gtIt is the pixel set of authentic signature figure, P_mIt is the pixel set of forecast image, " ∩ " and " ∪ " difference table Show intersection and union operation.| | it indicates to calculate the pixel number in the group.

In another embodiment, it is tested on ISPRS Vaihingen test set as follows:

On ISPRS Vaihingen, the segmentation result of this method and advanced deep layer network is as shown in figure 4, all nets The input image size of network is 256x256, and is all only IRRG Three Channel Color image, and output is and input picture size phase Same prediction label figure.Fig. 4 is followed successively by IRRG image, label figure, FCN8s segmentation result, DeconvNet segmentation knot from top to bottom Fruit, SegNet segmentation result, U-Net segmentation result, FRes-MFDNN segmentation result.

Target size is not equal in each figure, comes in every shape, and all there is certain shadow occlusion.For example, the first width and the 5th Short vegetation and trees distribution in width image compare concentration, since the influence of trees and depth of building causes to exist in original image The shade of large area, and partial phantom forms automobile and road surface and blocks.As seen from Figure 4, FCN8s and DeconvNet The segmentation result of network is poor, and wherein the result of DeconvNet segmentation differs larger with physical tags figure, and at object edge Details is fuzzy, and the inside of single target is discontinuous etc. in the presence of segmentation.Compared with FCN8s, SegNet network is due to code of promoting mutual understanding Process, and location index value, segmentation result obtained in the process of pond is utilized and is closer to physical tags figure, preferably The detailed information of target is remained, wrong branch point is also than the reduction of FCN8s and DeconvNet network.U-Net is by encoder The feature duplication in middle corresponding stage is fused in decoder in respective stage, segmentation result and physical tags figure more closely, Target detail information is relatively clear.Network in this method in encoder and decoder equivalent layer because having used complete residual error to connect It connects, and has merged the multi-scale information of advanced features, segmentation result and physical tags figure are very close, and target detail is more clear Clear, mistake point is less, this embodies this method can cope with target size diversity and shadow band in original image to a certain extent The influence come, improves segmentation accuracy.

Fig. 5 gives the quantitative assessment corresponding to Fig. 4 as a result, runic represents best result, and underscore represents time good result. Wherein accurate rate (P) and recall rate (R) have measured the integrality and correctness of segmentation respectively, and ideal situation of dividing is accurate rate It is all high with recall rate.Measure Indexes of this method on each width figure reach highest, in addition, in average accuracy and averagely recalling It is higher by about 3% and 2% respectively than secondary good result in rate, from this method in terms of qualitative and quantitative result in urban remote sensing image segmentation side Face and real marking figure are closer, and effect is more preferable.

The evaluation result such as Fig. 6 (a) and Fig. 6 of each deep layer network and this method in ISPRS Vaihingen test image (b), wherein although from Fig. 6 (a) and Fig. 6 (b) as it can be seen that some comparison algorithms are in IOU and F1 value metric has preferable knot Fruit, but this method has all reached optimal in the average behavior of IOU, F1 value and the test set entirety of each classification.Specifically For, the average IOU of this method is higher by about 6% than secondary good result (U-Net), and average F1 value is higher by about 4% than secondary good result, this is sufficiently Demonstrate validity of this method in terms of urban remote sensing image segmentation.

The comparing result of the method in this method and the preferable document of current segmentation performance is given in Fig. 7, Paisitkriangkrai et al. (2015) proposes that the CNN+RF for combining CNN with random forest (RF) divides network, CNN master It is used to extract feature, RF is for classifying.Deconvolution network is used for Remote Sensing Image Segmentation by Volpi and Tuia (2017) proposition, Its network is made of symmetrical encoder and decoder, and encoder is completed by eight convolutional layers and three pond layers, decoder with Encoder is in mirror, wherein with 1x1 convolutional layer concatenated coding and decoding process.Sherrah (2016) uses empty convolution Remote sensing images are split, and smoothing processing is done to segmentation result with CRF；Maggiori et al. (2016) is in coding-decoding It is dissolved into CRF as post-processing in the training process of depth network the end of network.Audebert et al. (2017) use pair Coding-decoding network of title, encoder are made of convolutional layer and pond layer, and decoder is by warp lamination and anti-pond layer group At.The above experimental result is derived from document original text, and the training samples number that each method uses is about the same.From Fig. 7 comparing result In as can be seen that in the F1 value of every one kind and the whole accuracy rate of segmentation, the segmentation effect of this method is better than institute's comparative approach.

In order to preferably verify the segmentation performance of this method, herein using area4 in ISPRS Vaihingen data set, Area31 and area35 etc. three, without mark figure, are tested in each comparison network respectively, and partial results show such as Fig. 8, It is followed successively by original image, FRes-MFDNN segmentation result, U-Net segmentation result, SegNet segmentation result, FCN8s segmentation from left to right As a result with DeconvNet segmentation result.

In the case where no label schemes to refer to, referring to original image (first row), it can be seen that, this method is in the correct of segmentation Property, it is better than the effects of other comparison networks in integrality and object boundary flatness.

In another embodiment, it is tested on Road Detection data set as follows:

On Road Detection Dataset, this method and the segmentation result of each deep layer network are as shown in Figure 9.It is all The input image size of network is the RGB triple channel image of 300x300, and output is the prediction knot with input picture same size Fruit figure, black represent background, and white represents road.Fig. 7 is followed successively by RGB image, label figure, FRes-MFDNN segmentation from top to bottom As a result, U-Net segmentation result, SegNet segmentation result, DeconvNet segmentation result, FCN8s segmentation result.

Fig. 9 the first row gives five width spectral informations and the different image of background complexity, part road by trees and Automobile is blocked, wherein the 4th width image section residential area residential house and road are extremely close on spectral information, the 4th width image It also include to be trampled apparent loess road surface with the 5th width image, these factors are all that segmentation increases certain challenge.By The segmentation result that Fig. 9 can be seen that FCN8s and DeconvNet network differs larger with physical tags figure, mistake point and leakage facet Product is more, and the road continuity being partitioned into is poor.The segmentation result of SegNet network and physical tags figure are more similar, wrong facet Product is compared to have for DeconvNet and relatively significantly be reduced, but still divides phenomenon in the presence of leakage.Point of U-Net network and this method It cuts result and physical tags figure is the most similar, mistake point and leakage point are also substantially reduced compared with other networks.Compared with U-Net network, The segmentation result detailed information of this method is more perfect, and when having automobile and trees are blocked, the road edge split is more smooth, Space Consistency is higher.

Figure 10 gives the evaluation result corresponding to Fig. 9, and runic represents best values, and underscore represents time figure of merit.Similarly, Although some comparative approach yield good result on accurate rate or recall rate, this method is on every piece image Measure Indexes can almost reach highest, and average accuracy and average recall rate are higher by 2% He than the method for suboptimum respectively 3%, more close with real marking figure in terms of dividing remote sensing road image from context of methods in terms of qualitative and quantitative result, effect is more It is good.

Figure 11 gives average IOU of each method on Road Detection test set and average F1 value.It can be seen that we The average IOU of method and average F1 value are all apparently higher than other methods, and average IOU is even more to improve than second-best U-Net method 4%, average F1 value has reached 93%, has fully demonstrated this method good segmentation performance on the data set.

Figure 12 is the result pair of the research method on Road Detection data set of this method and existing road segmentation Than the deduction time including average IOU of each method on the data set, average F1 value, training time (h) and a width figure (the s/p expression second /).

In Figure 12, Zhang et al. (2018) proposes the Res-unet network comprising three layer coders and three layer decoders, Cataloged procedure is completed by convolution operation, and decoding process is completed by bilinear interpolation, wherein by each stage the last layer of encoder Characteristic pattern duplication has been fused in the respective stage of decoder, and is introduced residual error in encoder and encoder and connected. Ronneberger et al. (2015) proposes the U-Net for being used for medical image segmentation, and many researchs are used for remote sensing figure at present In the task of picture segmentation.Panboonyuen et al. (2017) proposes ELU-SegNet structure, he is on the basis of SegNet network RELU activation primitive is substituted for ELU activation primitive.Cheng et al. (2014) is proposed by four layer coders and four layer decoder groups At Cascaded-net structure, wherein encoder by convolution sum pondization operate complete, decoder by deconvolution reconciliation pondization it is complete At.The networks such as Res-unet, ELU-SegNet and Cascaded-net are proposed both for lane segmentation application.The above knot Fruit is all on the caffe deep learning platform configured together with the network training of this method, for Road Detection data Collection experiment gained.It can be recognized from fig. 12 that although this method is slightly inferior to Res- on training time and deduction time The time of unet and U-Net network, but gap and little, this method is than other methods in average IOU and average two side of F1 value Face all takes advantage.

In order to preferably verify performance of each comparison network in terms of dividing remote sensing road image, we are from Google Maps The image in the U.S., block overhead, St. Louis is acquired, all images are triple channel RGB color image, spatial resolution 20 Rice, is respectively fed to trained each network and is tested, and partial results are as shown in figure 13, is followed successively by original image, sheet from left to right Network segmentation result, U-Net segmentation result, SegNet segmentation result, DeconvNet segmentation result and the FCN8s segmentation of method As a result.

Although acquisition image and for train the Road Detection data set of network background complexity, spectrum believe It is all different in terms of breath and spatial resolution, but as can be seen from Figure 13, compared to other control methods, this method energy It is enough to be preferably partitioned into road, most of background interference is effectively rejected, spatial resolution difference bring is overcome to influence.This is also filled Divide and demonstrates robustness of this method in terms of dividing remote sensing road image.

Although embodiment of the present invention is described in conjunction with attached drawing above, the invention is not limited to above-mentioned Specific embodiments and applications field, above-mentioned specific embodiment are only schematical, directiveness, rather than restricted 's.Those skilled in the art are under the enlightenment of this specification and in the range for not departing from the claims in the present invention and being protected In the case where, a variety of forms can also be made, these belong to the column of protection of the invention.

Claims

1. a kind of remote sensing image segmentation method of combination complete residual error and multi-scale feature fusion, includes the following steps:

S101: the core network using convolutional encoding-decoding network as segmentation, the core network include two components: coding Device and decoder；

S103: residual unit is added inside the corresponding convolutional layer of encoder and decoder of the core network, while will compile Feature in code device is fused in decoder equivalent layer in a manner of being added pixel-by-pixel；

S200: remote sensing images are carried out using the image segmentation network of the complete residual error of improved combination and multi-scale feature fusion Segmentation；

S300: the segmentation result of remote sensing images is exported.

2. the method according to claim 1, wherein preferred, the encoder in step S101 includes 13 convolutional layers and 5 A pond layer, in one decoder of encoder stacked on top, the decoder and encoder are in complete mirror, include 13 volumes Lamination and 5 Xie Chiization layers.

3. method according to claim 2, wherein 13 convolutional layers of the encoder are divided into five convolution stages, the One convolution stage and second convolution stage respectively include two convolutional layers, third convolution stage, Volume Four product stage and the 5th The convolution stage respectively includes three convolutional layers.

4. according to the method in claim 3, wherein repaired after each convolutional layer comprising a batch normalization unit and one Linear positive unit, wherein the characteristic extracted is normalized batch normalization unit, and amendment linear unit is for adding Enter non-linear factor；It include a pond layer after each convolution stage.

5. method according to claim 2, wherein the pondization operation in the encoder is saved using maximum pond The index position in maximum pond.

6. according to the method in claim 3, the feature pyramid module specifically:

3x3 is used respectively, and the convolution kernel of 5x5 extracts the characteristic pattern in the 5th convolution stage in encoder upper and lower under different scale Literary information, is gradually integrated, and obtains Analysis On Multi-scale Features；

1x1 convolution is carried out to the characteristic pattern in the 5th convolution stage and is multiplied with Analysis On Multi-scale Features with pixel-wise；

Amalgamation of global pond information.

7. method according to claim 6, wherein the Fusion Features in the 5th convolution stage to decoder is corresponding in the encoder The feature pyramid module is used before in layer.

8. method according to claim 6, wherein described gradually integrate is to polymerize more rulers in a manner of being gradually added pixel-by-pixel Spend information.

9. according to the method in claim 3, the feature in encoder is melted in a manner of being added pixel-by-pixel described in step S103 It closes in decoder equivalent layer and is specifically:

The last layer convolution characteristic pattern is only selected to the second convolution stage in the first convolution stage in encoder and encoder, to volume Third convolution stage in code device, the 5th convolution stage selected all convolution features in Volume Four product stage and encoder in encoder Figure is added fusion to do pixel-by-pixel.

10. the method according to claim 1, the residual error that the residual unit learns is obtained by sequence of operations modular learning It arrives, the operating unit includes convolutional layer, batch normalization unit, amendment linear unit；The convolutional layer is used to extract feature, For the batch normalization unit for the characteristic extracted to be normalized, the amendment linear unit is non-for being added Linear factor.