CN115471754A - Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network - Google Patents

Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network Download PDF

Info

Publication number
CN115471754A
CN115471754A CN202210941960.2A CN202210941960A CN115471754A CN 115471754 A CN115471754 A CN 115471754A CN 202210941960 A CN202210941960 A CN 202210941960A CN 115471754 A CN115471754 A CN 115471754A
Authority
CN
China
Prior art keywords
network
feature
scale
road
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210941960.2A
Other languages
Chinese (zh)
Inventor
陶于祥
何哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210941960.2A priority Critical patent/CN115471754A/en
Publication of CN115471754A publication Critical patent/CN115471754A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/182Network patterns, e.g. roads or rivers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image road extraction method based on a multi-dimensional and multi-scale U-net network. And secondly, adding a cavity space pyramid pooling (ASPP) module to the bridge network part to perform multi-scale feature extraction on the road information. Finally, a feature alignment module is added to the decoding network part to adjust the inaccurate corresponding relation between the features of the high layer and the low layer caused by the non-learnability of the up-sampling operation and the repeatability of the up-sampling and the down-sampling. According to the method, the model loss is calculated by adopting a composite loss function combining the cross entropy and the Dice coefficient, the phenomenon of unbalance of positive and negative samples existing in the remote sensing road data set is relieved, and the road extraction result of the model is improved.

Description

Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to a remote sensing image road extraction method based on an improved U-net network.
Background
Road information is an essential important content in life and travel, is a main trunk and a basic mode of transportation, and provides a lot of support for the development of human civilization. Road extraction is of great value in many applications, such as autonomous driving, city planning, intelligent transportation systems, emergency risk management, updating of geographical information, etc.
At present, the method for extracting the remote sensing image road is mainly divided into a traditional road extraction method and a road extraction method based on deep learning. The traditional method usually utilizes manually designed features to extract roads, and can be further divided into a pixel-based method and an object-oriented method. The pixel-based method mainly extracts roads by analyzing differences in spectral features, and can be generally classified into a spectral analysis method, a threshold segmentation method and an edge detection method. The pixel-based method fully utilizes the gray value of the image, and can better extract the road from the remote sensing image with clear image, simple background and sparse road network. However, such methods are prone to salt-and-pepper effects and do not distinguish background interference information well. The object-oriented method takes the road as an object, takes the road object as a whole, utilizes information modeling to identify the road object, has better noise resistance and applicability, but is easy to cause mixing and misdividing to ground objects which are adjacent in space and similar in shape.
With the rapid growth in available data and computing power, the use of deep learning techniques has achieved tremendous success in the field of computer vision. Deep learning has been increasingly applied to extracting information from high-resolution remote sensing images due to its good performance and generalization capability. Different from the traditional method that the road characteristics of the remote sensing image need to be extracted manually, the method actively learns the characteristic experience by means of training a neural network, can automatically learn the shallow information characteristics in the repeated network iteration process, and then learn deeper abstract characteristics step by step. The deep learning method can excavate high-level features of the road so as to improve the effectiveness of computer vision tasks, and has strong self-adaptive learning capability and feature fitting capability and great advantages in the accuracy and automation degree of road extraction.
Mnih and Hinton et al (2013) applied the deep learning technique in the field of road extraction for the first time, and constructed Massachusetts road data sets. Jonathan Long et al subsequently proposed a Full Convolutional Network (FCN) that implemented pixel-level classification from simple image classification by using standard convolutional layers instead of full link layers, while preserving spatial information of the original input image and greatly improving the segmentation effect. Therefore, the full convolution neural network based on the FCN extension is increasingly applied, especially in the aspect of road extraction. However, with the higher resolution of the remote sensing images, the detailed features of the road areas expressed in the images are more and more complex, the road surface interference factors (such as buildings and trees) are more and more, and many non-road areas (urban buildings, vegetable greenhouses and the like) also have features highly similar to roads. CN110807376A, an extraurban road extraction method and apparatus based on remote sensing image, the method includes utilizing and obtaining GIS image information from the digital map, and produce mark data and training/test data; constructing an initial road extraction network model based on a U-Net network; training the initial network model by using the marking data and the training/testing data to obtain a road extraction model with road recognition capability; and detecting the remote sensing image by using the road extraction model, and automatically extracting the road target. According to the method and the device, the accuracy of extracting the urban and outdoor roads by using the remote sensing image is improved by constructing the improved U-Net network.
Firstly, the two branches of the residual error structure are connected by using a feature summation mode in the patent, and the fusion mode is only to assign fixed weight to the features, does not consider the change of the feature content, and is low in efficiency. The method disclosed by the invention fuses two branches of a residual error structure in an attention feature fusion mode, can dynamically and adaptively fuse the received features in a scale perception mode, makes up semantic differences among different branches, and enhances the learning capability of a network on global and local information of the remote sensing image. Secondly, due to repeated application of up-down sampling, inaccurate correspondence exists between the connected high-level features and low-level features in the decoding network, and connecting misaligned features only in a channel-superimposed connection manner may adversely affect subsequent learning. This problem is not taken into account in this patent. Aiming at the problem, the invention adds a characteristic alignment module in a decoding network, and dynamically establishes the position corresponding relation between different layers of characteristics through the characteristic alignment module, thereby improving the capability of the decoder for reconstructing precise details.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A remote sensing image road extraction method based on a multi-dimensional and multi-scale U-net network is provided. The technical scheme of the invention is as follows:
a remote sensing image road extraction method based on a multi-dimensional and multi-scale U-net network comprises the following steps:
step 1, selecting a public Massachusetts road data set as original data, and performing preprocessing steps including cutting and data enhancement;
step 2, inputting the preprocessed data into a coding network, wherein the coding network combines a residual error structure and an attention feature fusion mechanism to perform multi-scale extraction on road feature information;
step 3, the output of the coding network is used as the input of a bridge network, a cavity space pyramid pooling ASPP module is added to the part of the bridge network, the cavity space pyramid pooling ASPP module comprises parallel cavity convolution layers, and the parallel cavity convolution layers in the ASPP module are equivalent to a plurality of different receiving domains and are used for sampling in parallel on a plurality of scales to realize multi-scale feature fusion on deep features;
step 4, decoding network stage: the feature image is gradually restored to the input image size by upsampling. Adding a feature alignment module FAM into a decoding network, wherein the module takes high-level features and low-level features of a corresponding layer of a coding network as input to generate semantic streams, and adjusts feature maps of two adjacent layers by using the semantic streams to generate feature output with high resolution and strong semantics;
step 5, finally, changing the number of channels into 2 through the 1 multiplied by 1 convolutional layer, and testing the model through a test set in a Massachusetts data set;
and during model training, performing loss calculation on the model by adopting a composite loss function combining a cross entropy loss function and a Dice loss function.
Further, the step 1 specifically includes:
the image size of the Massachusetts data set is 1500 multiplied by 1500, and a 256 multiplied by 256 area is arranged to cut all the images of the original data set; high-grade data of 256 multiplied by 3 wave bands are input into a built coding network as input data to extract road information.
Further, the step 2 inputs the preprocessed data into a coding network, and the coding network combines a residual structure and an attention feature fusion mechanism to perform multi-scale extraction on the road feature information, and specifically includes:
the encoding network comprises a convolution sequence block CSB and an attention residual error learning unit ARLU, the RGB image after preprocessing operation is converted into high-dimensional characteristics through the convolution sequence block, and then the multi-scale multi-level characteristics are generated through the attention residual error learning unit; in the attention residual learning unit, a residual unit is used to replace a common neural network unit, and then an identity mapping branch and a residual branch in the residual unit are fused through an attention feature fusion module. The two branches of the residual error structure are fused in an attention feature fusion mode, so that the network can extract information on a plurality of scales from the feature graph along the channel dimension, and meanwhile, the lightweight of the network is kept.
Further, step 3 is that the ASPP module in the bridged network includes 5 parallel branches, which are: the method comprises a 1 x 1 convolution branch, three 3 x 3 expansion convolution branches and a global average pooling branch, wherein the 1 x 1 convolution branch and the global average pooling branch are equivalent to using minimum and maximum receptive fields respectively to maintain the inherent characteristics of input, and the other three branches are respectively provided with different expansion rates for describing image features on different scales.
Further, the step 4 specifically includes: the feature image is restored to the input image size by up-sampling step by step at the decoding network stage. The high-level features of the decoding network and the low-level features of the corresponding layer of the coding network are connected by using a feature alignment module. In the feature alignment module, firstly, the high-level features change the image size and the channel number through inverse convolution, then the changed high-level features and the low-level features are connected, and the semantic meaning is generated through convolution operation. According to the semantic flow, the feature alignment module adjusts inaccurate corresponding relation between the high-level features and the low-level features caused by repeated up-down sampling, so that semantic information in the high-level features better flows to the low-level features, semantic and resolution difference between the high-level features and the low-level features is closed, and the model is guided to better recover to the initial resolution and simultaneously contain rich semantic information.
Further, the model prediction stage in step 5 specifically includes: changing the number of the characteristic diagram channels into 2 through a 1 multiplied by 1 convolutional layer to generate a final prediction diagram; and inputting the test images in the Massachusetts data set into the trained model after preprocessing.
Further, a composite loss function composed of a cross entropy loss function and a Dice loss function is used for calculating the model loss, and the cross entropy loss function and the Dice loss function are defined as follows respectively:
Figure BDA0003786054580000051
Figure BDA0003786054580000052
wherein N represents the total number of pixels, g i The true label value, p, representing pixel i i Representing a pixel i prediction value;
the composite loss function is defined as follows:
L=L BCE +L D
the invention has the following advantages and beneficial effects:
the innovation of the invention mainly comprises the matching of the steps 2, 3 and 4 of the claims. Step 2, residual learning is introduced, so that network training is easier, and the degradation problem of the deep network is solved to a great extent; meanwhile, the attention characteristic fusion module is used for fusing the two branches of the residual error structure, so that the semantic difference between different branches is made up, and the learning capability of the network on the global and local information of the remote sensing image is enhanced. Step 3, the bridge network uses ASPP, a convolution kernel receiving domain is enlarged through parallel expansion convolution layers, information extraction and fusion of multiple scales are further carried out on high-level features, and connectivity of roads is improved. And 4, adding a feature alignment module in the decoding network, dynamically establishing the position corresponding relation between different layers of features, solving the problem of dislocation between different layers of features and improving the capability of reconstructing precise details of the decoding network.
Drawings
FIG. 1 is a diagram of a remote sensing image road extraction model framework based on MMS-UNet according to the preferred embodiment of the invention.
FIG. 2 is a block of convolutional sequences in a coding network
FIG. 3 is a residual Attention Residual Learning Unit (ARLU) in a coding network
FIG. 4 is an attention fusion module (AFF)
FIG. 5 is an ASPP module
FIG. 6 is a Feature Alignment Module (FAM)
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
the invention aims to solve the problems that in practical application, the model is limited in receptive field, the fusion efficiency of the identity mapping branch and the residual error branch in the residual error structure is low, the fusion efficiency of high-level and low-level features is low and the like. Aiming at the problems, a remote sensing image road extraction method based on multi-dimensional multi-scale U-net is provided. The method fully utilizes information from different layers through an attention feature fusion method, expands an acceptance domain of a convolution kernel by using ASPP, then adds a feature module to align and fuse high-layer and low-layer information, and upsamples the information to the size of an input image to generate a prediction graph. Tests on Massachusetts public road data sets show that the method has good effect of predicting images.
Fig. 1 shows the network structure of the present invention, wherein:
step (1) is a preprocessing method of the input data of the model, the image size of the Massachusetts data set is 1500 multiplied by 1500, and a 256 multiplied by 256 area is set to cut all the images of the original data set. High-score data of a 256 multiplied by 3 wave band is input into an MMS-UNet network model built by us as input data to extract road information.
And (2) combining a residual error structure and an attention characteristic fusion module in the coding network to extract road characteristics. The coding network comprises a convolutional sequence block CSB and an attention residual learning unit ARLU. The convolution sequence block is composed of two convolution sequences stacked, each convolution sequence including a 3 × 3 convolution layer, a batch normalization layer, and a ReLU layer. Converting an input RGB image into high-dimensional features through a convolution sequence block, and then generating multi-scale and multi-level features through an attention residual error learning unit; in the attention residual error learning unit, a residual error unit is used for replacing a common neural network unit, and then an identity mapping branch and a residual error branch in the residual error unit are fused through an attention characteristic fusion module; the attention feature fusion module outputs the identity mapping branch firstly
Figure BDA0003786054580000071
Sum residual branch output
Figure BDA0003786054580000072
Fusing by means of feature summation, and then fusingThe combined result A is input into a multi-scale channel attention module MS-CAM, wherein one branch G (A) obtains global channel context information by using global average pooling, and the other branch L (A) directly obtains local channel context information by using a point convolution mode. Fusing the two branches G (A) and L (A) through a summation operation, and enabling the output of the MS-CAM to be M, then:
G(A)=B(PW 2 (δ(B(PW 1 (g(A)))))) (1)
L(A)=B(PW 2 (δ(B(PW 1 (A))))) (2)
M(A)=σ(G(A)+L(A)) (3)
where g (·) represents the global average pooling operation (GAP). PW (pseudo wire) 1 And PW 2 Respectively represent a convolution kernel of
Figure BDA0003786054580000073
And
Figure BDA0003786054580000074
r represents the channel reduction rate. B denotes batch normalization operation (BN), δ denotes the ReLU activation function, and σ denotes the Sigmoid function. Therefore, the output F of the attention feature fusion module AFF can be represented by equation (4):
Figure BDA0003786054580000075
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003786054580000076
representing element multiplication. The module adds local information to global information through point convolution, so that the network can extract information on a plurality of scales from the feature map along the channel dimension, and meanwhile, the lightweight of the network is kept. The attention residual error learning unit is connected with the residual error branch and the identity mapping branch through the attention feature fusion module, dynamically and adaptively fuses the received features in a scale perception mode, makes up semantic differences among different branch features, and enhances the learning of global and local information of the remote sensing image by a networkThe learning ability improves the accuracy of road identification.
And (3) introducing an ASPP module into the bridge network to realize multi-scale feature extraction on deep features. The ASPP module contains 5 branches, a 1 × 1 convolution branch, three 3 × 3 dilation convolution branches, and a global average pooling branch. The 1 × 1 convolution branch and the global average pooling branch are equivalent to keeping the inherent characteristics of input by respectively using the minimum and maximum receptive fields, and the other three branches are respectively provided with expansion rates of 6, 12 and 18 and perform feature sampling on the feature map; finally, the outputs of the five branches are fused by feature splicing, and the number of channels is adjusted by using a 1 × 1 convolutional layer.
And (4) restoring the characteristic image to the size of the input image by up-sampling step by step at the stage of decoding the network. Output of bridged network
Figure BDA0003786054580000081
Firstly, the number of channels and the image size are changed into the hierarchical characteristics corresponding to the coding network through inverse convolution
Figure BDA0003786054580000082
In agreement, then F H And F L Fused by splicing and input into a 3 x 3 convolutional layer to generate a semantic stream S L As shown in formula (1);
S L =Conv 1 (Cat(T(F H ),F L )) (1)
in the formula, conv 1 (. Cndot.) represents a 3 × 3 convolution operation, cat (. Cndot.) represents a concatenation operation, and T (. Cndot.) represents an inverse convolution operation. Derived semantic streams
Figure BDA0003786054580000083
Corresponding to an offset between the high-level feature and the low-level feature in two directions. Adding each pixel point p on the low-level feature map L Corresponding to pixel point p on high-level characteristic diagram H Then p is paired H Can realize semantic alignment of high-level features and low-level features by linear interpolation of four adjacent points, such as formulas (2) and (3)Showing:
Figure BDA0003786054580000084
Figure BDA0003786054580000085
in the formula, N (p) H ) Representing a pixel p in a high level feature map H Adjacent points of (a), w p A bilinear kernel weight representing a distance estimate through a warped mesh. F is to be L After changing the number of channels by 1 × 1 convolution with F H (p H ) Performing summation operation to obtain output F of the feature alignment module out Namely:
F out =Conv 2 (F L )+F H (p H ) (4)
in the formula, conv 2 (. Smallcap.) represents a 1 × 1 volume operation; the semantic and resolution difference between the high-level and low-level features is closed through the feature alignment module, and the model is guided to be better restored to the initial resolution and simultaneously contains rich semantic information.
And (5) in the model final prediction stage, changing the number of the characteristic diagram channels into 2 through a 1 × 1 convolutional layer, and generating a final prediction diagram. The Massachusetts data concentrated test image is input into the trained model after being preprocessed, and the experimental result shows that a better road extraction result can be obtained by improving the network model of the U-net.
And (6) the road extraction task uses a composite loss function consisting of a cross entropy loss function and a Dice loss function. In the remote sensing image, a road is a narrow area, compared with the whole image, the occupied proportion of the road is very small, and the problem of high imbalance of sample categories exists between the road and the background. The cross entropy loss function evaluates each pixel in the segmentation result, which may result in overfitting of classes with more samples if there is a class imbalance problem in the image. When the road is extracted from the remote sensing image, the network is biased to the background learning, and the capability of extracting the foreground target by the network is reduced. The Dice coefficient takes all pixels of one category as a whole, and calculates the proportion of the intersection of the two categories in the whole, so that the Dice coefficient is not influenced by a large number of background pixels, and can achieve a better effect under the condition of unbalanced samples. Thus, the cross entropy loss function and the Dice loss function are defined as follows:
Figure BDA0003786054580000091
Figure BDA0003786054580000092
wherein N represents the total number of pixels, g i The true label value, p, representing pixel i i Representing a pixel i prediction value;
the composite loss function is defined as follows:
L=L BCE +L D (3)
the systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (7)

1. A remote sensing image road extraction method based on a multi-dimensional and multi-scale U-net network is characterized by comprising the following steps:
step 1, selecting a public Massachusetts road data set as original data, and performing preprocessing steps including cutting and data enhancement;
step 2, inputting the preprocessed data into a coding network, wherein the coding network combines a residual error structure and an attention feature fusion mechanism to perform multi-scale extraction on road feature information;
step 3, the output of the coding network is used as the input of a bridge network, a cavity space pyramid pooling ASPP module is added to the part of the bridge network, the cavity space pyramid pooling ASPP module comprises parallel cavity convolution layers, and the parallel cavity convolution layers in the ASPP module are equivalent to a plurality of different receiving domains and are used for sampling in parallel on a plurality of scales to realize multi-scale feature fusion on deep features;
step 4, decoding network stage: the feature image is gradually restored to the input image size by upsampling. Adding a feature alignment module FAM in a decoding network, wherein the module takes high-level features and low-level features of a corresponding layer of a coding network as input to generate semantic stream, and adjusting feature maps of two adjacent layers by using the semantic stream to generate feature output with high resolution and strong semantics;
step 5, finally, changing the number of channels into 2 through the 1 multiplied by 1 convolutional layer, and testing the model through a test set in a Massachusetts data set;
and during model training, performing loss calculation on the model by adopting a composite loss function combining a cross entropy loss function and a Dice loss function.
2. The method for extracting the remote sensing image road based on the multi-dimensional multi-scale U-net network according to claim 1, wherein the step 1 specifically comprises:
the image size of the Massachusetts data set is 1500 multiplied by 1500, and a 256 multiplied by 256 area is arranged to cut all the images of the original data set; high-grade data of 256 multiplied by 3 wave bands are input into a built coding network as input data to extract road information.
3. The method for extracting a remote sensing image road based on a multi-dimensional and multi-scale U-net network as claimed in claim 1, wherein the step 2 inputs the preprocessed data into a coding network, and the coding network combines a residual structure and an attention feature fusion mechanism to perform multi-scale extraction on road feature information, specifically comprising:
the encoding network comprises a convolution sequence block CSB and an attention residual error learning unit ARLU, the RGB image after preprocessing operation is converted into high-dimensional characteristics through the convolution sequence block, and then the multi-scale multi-level characteristics are generated through the attention residual error learning unit; in the attention residual learning unit, a residual unit is used to replace a common neural network unit, and then an identity mapping branch and a residual branch in the residual unit are fused through an attention feature fusion module. The two branches of the residual error structure are fused in an attention feature fusion mode, so that the network can extract information on a plurality of scales from the feature graph along the channel dimension, and meanwhile, the lightweight of the network is kept.
4. The method for extracting a remote sensing image road based on a multi-dimensional and multi-scale U-net network according to claim 1, wherein step 3 is that an ASPP module of a bridge network comprises 5 parallel branches, which are: the system comprises a 1 x 1 convolution branch, three 3 x 3 expansion convolution branches and a global average pooling branch, wherein the 1 x 1 convolution branch and the global average pooling branch are equivalent to using minimum and maximum receptive fields respectively to keep the inherent characteristics of input, and the other three branches are respectively set with different expansion rates to describe the image characteristics on different scales.
5. The method for extracting a remote sensing image road based on a multi-dimensional and multi-scale U-net network according to claim 1, wherein the step 4 specifically comprises: and gradually sampling the recovered feature image to the size of an input image in the decoding network stage, and connecting the high-level features of the decoding network and the low-level features of the corresponding layer of the coding network by using a feature alignment module. In the feature alignment module, firstly, the high-level features change the image size and the channel number through inverse convolution, then the changed high-level features and the low-level features are connected, and the semantic stream is generated through convolution operation. According to the semantic flow, the feature alignment module adjusts inaccurate corresponding relation between the high-layer features and the low-layer features caused by repeated up-down sampling, so that semantic information in the high-layer features flows into the low-layer features better, semantic and resolution difference between the high-layer features and the low-layer features is closed, and the model is guided to restore to the initial resolution better and contain rich semantic information.
6. The method for extracting a remote sensing image road based on a multi-dimensional multi-scale U-net network as claimed in claim 5, wherein the model prediction stage in the step 5 specifically comprises: changing the number of the characteristic diagram channels into 2 through a 1 multiplied by 1 convolution layer to generate a final prediction diagram; and inputting the test images in the Massachusetts data set into the trained model after preprocessing.
7. The method for extracting the remote sensing image road based on the multi-dimensional multi-scale U-net network according to claim 6, wherein model loss is calculated by using a composite loss function consisting of a cross entropy loss function and a Dice loss function, and the cross entropy loss function and the Dice loss function are respectively defined as follows:
Figure FDA0003786054570000031
Figure FDA0003786054570000032
wherein N represents the total number of pixels, g i The true label value, p, representing pixel i i Representing a pixel i prediction value; the composite loss function is defined as follows:
L=L BCE +L D
CN202210941960.2A 2022-08-08 2022-08-08 Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network Pending CN115471754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210941960.2A CN115471754A (en) 2022-08-08 2022-08-08 Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210941960.2A CN115471754A (en) 2022-08-08 2022-08-08 Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network

Publications (1)

Publication Number Publication Date
CN115471754A true CN115471754A (en) 2022-12-13

Family

ID=84367584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210941960.2A Pending CN115471754A (en) 2022-08-08 2022-08-08 Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network

Country Status (1)

Country Link
CN (1) CN115471754A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343063A (en) * 2023-05-26 2023-06-27 南京航空航天大学 Road network extraction method, system, equipment and computer readable storage medium
CN116721351A (en) * 2023-07-06 2023-09-08 内蒙古电力(集团)有限责任公司内蒙古超高压供电分公司 Remote sensing intelligent extraction method for road environment characteristics in overhead line channel
CN116994231A (en) * 2023-08-01 2023-11-03 无锡车联天下信息技术有限公司 Method and device for determining left-behind object in vehicle and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343063A (en) * 2023-05-26 2023-06-27 南京航空航天大学 Road network extraction method, system, equipment and computer readable storage medium
CN116343063B (en) * 2023-05-26 2023-08-11 南京航空航天大学 Road network extraction method, system, equipment and computer readable storage medium
CN116721351A (en) * 2023-07-06 2023-09-08 内蒙古电力(集团)有限责任公司内蒙古超高压供电分公司 Remote sensing intelligent extraction method for road environment characteristics in overhead line channel
CN116994231A (en) * 2023-08-01 2023-11-03 无锡车联天下信息技术有限公司 Method and device for determining left-behind object in vehicle and electronic equipment

Similar Documents

Publication Publication Date Title
CN113780296B (en) Remote sensing image semantic segmentation method and system based on multi-scale information fusion
Guo et al. CDnetV2: CNN-based cloud detection for remote sensing imagery with cloud-snow coexistence
CN115471754A (en) Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network
CN112668494A (en) Small sample change detection method based on multi-scale feature extraction
CN110555841B (en) SAR image change detection method based on self-attention image fusion and DEC
CN111860233B (en) SAR image complex building extraction method and system based on attention network selection
CN111753677B (en) Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
CN113505792B (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
CN113256649B (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN112561876A (en) Image-based pond and reservoir water quality detection method and system
CN112508079B (en) Fine identification method, system, equipment, terminal and application of ocean frontal surface
CN112733693B (en) Multi-scale residual error road extraction method for global perception high-resolution remote sensing image
CN115471467A (en) High-resolution optical remote sensing image building change detection method
CN116797787B (en) Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network
CN115272278A (en) Method for constructing change detection model for remote sensing image change detection
CN115830471A (en) Multi-scale feature fusion and alignment domain self-adaptive cloud detection method
Thati et al. A systematic extraction of glacial lakes for satellite imagery using deep learning based technique
CN116977747B (en) Small sample hyperspectral classification method based on multipath multi-scale feature twin network
CN113378642A (en) Method for detecting illegal occupation buildings in rural areas
CN112686184A (en) Remote sensing house change detection method based on neural network
CN117152435A (en) Remote sensing semantic segmentation method based on U-Net3+
CN115327544B (en) Little-sample space target ISAR defocus compensation method based on self-supervision learning
CN111505738A (en) Method and equipment for predicting meteorological factors in numerical weather forecast
CN114549958B (en) Night and camouflage target detection method based on context information perception mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination