CN115457379A

CN115457379A - Remote sensing image road extraction method and system combining semantic segmentation and angle prediction

Info

Publication number: CN115457379A
Application number: CN202210775721.4A
Authority: CN
Inventors: 付莹; 梁帅哲
Original assignee: Yangtze River Delta Research Institute Of Beijing University Of Technology Jiaxing; Beijing Institute of Technology BIT
Current assignee: Yangtze River Delta Research Institute Of Beijing University Of Technology Jiaxing; Beijing Institute of Technology BIT
Priority date: 2022-07-01
Filing date: 2022-07-01
Publication date: 2022-12-09

Abstract

The invention relates to a method and a system for extracting a remote sensing image road by combining semantic segmentation and angle prediction, and belongs to the technical field of computer vision. The invention calculates the road angle information from the real road information, explicitly reflects the geometric characteristic that the road inclination angle has certain stability, and provides a road angle truth value for comparison for the neural network to constrain the geometric characteristic. According to the method, an angle prediction module and a feature fusion part are designed in a core model, road angle features are predicted and fused, and explicit constraint conditions related to angle increase of a corresponding loss function are designed, so that the model can better utilize various information contained in a remote sensing image to improve the accuracy of a final road prediction result. The method can improve the accuracy of predicting the road information under the complex real remote sensing image on the premise of no need of special equipment and higher generation speed.

Description

Remote sensing image road extraction method and system combining semantic segmentation and angle prediction

Technical Field

The invention relates to a method and a system for extracting a remote sensing image road, in particular to a method and a system for extracting a remote sensing image road by combining semantic segmentation and angle prediction, and belongs to the technical field of computer vision.

Background

The remote sensing image is a ground surface image formed by collecting ground surface electromagnetic radiation information from a satellite, an airplane or other aircrafts. The remote sensing image contains abundant ground information, and can provide important basis for analysis of earth surface conditions and resources. In recent years, with the continuous development of artificial satellite technology, the number and quality of artificial satellites for acquiring remote sensing images are obviously improved, and the acquisition cost of high-resolution remote sensing images is further reduced.

Road extraction is an important task in the field of remote sensing image analysis and processing, and the task target is to acquire the position information of a road and mark the position information in the form of a road marking map. Road information is of great importance to travel planning of people, plays an important role in commercial fields such as logistics, take-out, online car booking, shared vehicles and the like, and plays an important role in emergency response, traffic management, city planning and the like.

According to the traditional road information acquisition method, a large amount of manpower and material resources are consumed for on-site surveying and mapping, road vector data are acquired and recorded, and then the road vector data are imported into a computer and are drawn into a road network graph convenient for people to watch. The process is tedious and long, high cost needs to be consumed, the field mapping process is greatly influenced by the surface conditions, and quick response and updating are difficult to achieve under extreme weather and emergency conditions.

Considering that the remote sensing image is easier to obtain, faster, lower in cost and free from the influence of ground conditions, the method has better adaptability, and the road information of the ground surface can be usually reflected in the remote sensing image, and the method for extracting the road according to the remote sensing image becomes an object worthy of study.

Since the 70 s of the 20 th century, the academia has put forward many methods for road extraction tasks from different perspectives. According to the basic principle of the method, the traditional road extraction method can be divided into three categories, namely a template matching-based method, a knowledge-based method and an object analysis-based method. The template matching method is a relatively mature method in the early traditional road extraction method, can effectively combine the radiation characteristic and the geometric characteristic of a road for judgment, and can realize man-machine interaction by methods of manually setting seed points or template initial contours and the like, so that the method has certain error correction capability and better application value. The knowledge-based method generally achieves the purpose of extracting roads by using the knowledge through establishing a road-related knowledge model, particularly a hypothesis testing model between the road-related knowledge and a remote sensing image processing result. The object analysis method mainly identifies the radiation and texture characteristics of the remote sensing image, utilizes an early segmentation method to segment the remote sensing image, and extracts roads through classification or post-processing methods. The traditional road extraction methods achieve certain results in road extraction tasks in simpler scenes, are also used in part of commercial software, and show practical application value. However, in the face of more complex real remote sensing images, under the interference of various factors such as shadows, weather, seasons, vegetation and the like, the detection effect of the road extraction methods is still easily affected.

In recent years, the road extraction effect based on remote sensing images is remarkably improved due to rapid development of deep learning technology. Deep learning is a new research direction in the field of machine learning, and usually a deep nonlinear neural network is used to realize approximation of a complex function, and suitable data and a loss function are used to train the complex function, so that parameters stored in the neural network are changed, statistical rules are learned from a large amount of known data, and therefore, unknown events are predicted. With the continuous development of new ideas of new algorithms, human intelligence endows machines with higher and higher understanding capacity on image information, and the model can automatically extract road information from more complex and real remote sensing images gradually.

In 2010, mnih et al proposed a remote sensing image road extraction method based on a restricted Boltzmann machine, which is the first attempt of a deep learning technology in the field and obtains a result obviously superior to that of the traditional method. Then, the more road extraction methods regard the task as a semantic segmentation task, and the semantic segmentation model taking the deep convolutional neural network as a core is used for road region extraction. Compared with the traditional method for extracting the road by depending on one or more artificially summarized road characteristic models, the semantic segmentation model using the deep convolutional neural network can learn required composite characteristics from a large number of remote sensing images, and the characteristics are comprehensively utilized to improve the accuracy of road extraction.

However, when the current semantic segmentation method is used for extracting a remote sensing image road, phenomena such as road fracture, road boundary blurring, road width mutation and the like in an extraction result are still common. Therefore, from the viewpoints of improving the accuracy and effect of the remote sensing image road extraction method, reducing the acquisition cost of road information, improving the updating speed and timeliness of road information, and better serving the needs of people such as travel, business, management, planning and the like, an urgent need exists for a real remote sensing image which can automatically extract road information from the remote sensing image and has complex conditions and more interference. The method and the system for extracting the remote sensing image road can still keep better performance.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, and creatively provides a method and a system for extracting a remote sensing image road by combining semantic segmentation and angle prediction in order to effectively solve the technical problem of extracting the remote sensing image road. The method can improve the accuracy of extracting the road information under the complex real remote sensing image on the premise of no need of special equipment and higher extraction speed.

The innovation points of the invention are as follows: the remote sensing image road extraction combining semantic segmentation and angle prediction is divided into a training stage and a using stage.

In the training stage, firstly, the road angle is calculated according to the real road information in the data set to obtain the angle value corresponding to each road category pixel, and real road angle information is formed. Then, extracting road information according to the input remote sensing image by using a road extraction algorithm model to generate a prediction result of the road information, and synchronously generating a prediction result of a road angle in the prediction process. After prediction is carried out, the road extraction algorithm model compares the predicted road information and road angles with corresponding real values respectively, calculates a loss function and updates parameters in the model according to the loss function. And repeating the prediction and comparison processes until a set termination condition is met, and storing the structure of the network and the model parameters.

In the using stage, firstly, a network model is established according to the model structure and parameters stored in the training stage, the collected remote sensing image images are respectively input into the model for calculation, and the road information extracted and predicted by the model is stored. And if the used remote sensing image has corresponding real road information, comparing and evaluating the predicted road information and the real road information so as to judge the quality of the effect of the model predicted road information.

In order to achieve the purpose, the invention adopts the following technical scheme.

A remote sensing image road extraction method combining semantic segmentation and angle prediction comprises the following steps:

step 1: and (5) a training stage.

And iteratively modifying the parameter dictionary of the algorithm model by using the remote sensing image-road information pairing data set.

Specifically, step 1 comprises the steps of:

step 1.1: and calculating the road angle according to the real road information in the data set (calculation can be performed in modes of Hough transform and the like) to obtain the angle value corresponding to each road category pixel and obtain the real road angle information.

Preferably, the following method may be employed:

firstly, the real road information is refined by using a morphological method to obtain road skeleton information. The purpose is to reduce subsequent calculation amount, thereby improving calculation efficiency.

Then, for each pixel in the road skeleton information, taking a neighborhood road skeleton information image taking the pixel as a center, calculating a straight line fitting the road skeleton of the area by using Hough transform on the neighborhood image, and taking the inclination angle of the straight line as the road angle corresponding to the pixel of the central point.

And then, according to the angle information of the road skeleton points, deducing the angle information of other points on the road according to a proximity principle to obtain complete road angle information.

Further, all road angle information is stored for training. For example, the storage may be in a matrix format.

Step 1.2: the road extraction algorithm model extracts road information according to the input remote sensing image, generates a prediction result of the road information, and synchronously generates a prediction result of a road angle in the prediction process.

Preferably, the following method may be employed:

and randomly selecting a plurality of remote sensing images from the training data set, and inputting the remote sensing images into a road extraction algorithm model.

The road extraction algorithm model outputs the prediction information of the road information corresponding to one remote sensing image for each remote sensing image, and a prediction result of the road angle is synchronously generated in the process.

The width and the height of the road information prediction information are consistent with those of the input remote sensing image, the solving space is 0 or 1,0 represents that the pixel is not predicted as a road, and 1 represents that the pixel is predicted as a road. The road angle information predicted by the model is in a matrix format, and the solution space of the road angle information is any legal angle value. For example, the range of angle values may be defined as any real number on [ - π/2, π/2). The length and width dimensions of the matrix are consistent with those of the input remote sensing image.

Specifically, the road extraction algorithm model includes three parts: the device comprises a feature extraction part, a preliminary prediction part and a feature fusion part.

After the remote sensing image is input into the road extraction algorithm model, firstly, the remote sensing image passes through a feature extraction part, and the feature extraction part extracts the image features of the remote sensing image by using a self-attention mechanism-based Transformer structure. The method comprises the following steps:

the characteristic extraction part firstly uses the image blocks to cut the layers, and splits the input RGB remote sensing image of H multiplied by W multiplied by 3 into N multiplied by p ² The x 3 form is to split the original remote sensing image with the height of H, the width of W and 3 channels into N image blocks, the width and the height of each image block are both p, and the number of channels is still 3.

Passing the tiles through a linear embedding layer, passing p of each tile ² The x 3-dimensional tensor projects to a vector mapping of arbitrary dimensions, where the linear embedded layer is essentially a fully connected layer. The purpose is to combine each p ² The 3 RGB tiles are mapped into linear vectors for use by subsequent structures.

These linear vectors are input from the Swin block of the attention mechanism. In the Swin block, these restricted vectors are first regularized and then input into a Window-based Multi-head Self-Attention Module W-MSA (Window Multi-head Self-Attention Attention) or a Shift Window-based Multi-head Self-Attention Module SW-MSA (Shifted Window Multi-head Self-Attention Attention).

The window-based multi-head self-attention module takes the image blocks segmented by the image block segmentation layers as a unit, and calculates the self-attention in each image block, so that the calculation efficiency of the model is obviously improved, but the information communication among the cross-windows is limited. Thus, the model alternates the window-based multi-headed self-attention module with the shifted-window based multi-headed self-attention module in every two consecutive adjacent Swin blocks.

In the multi-head self-attention module based on the shift window, the model respectively translates each window by 1/2 of the window size in the transverse direction and the longitudinal direction, and performs reverse translation after self-attention is calculated so as to complete information exchange between the windows.

After the self-attention module, the model performs residual join and continues to process features using the regularization layer and the multi-layer perceptron layer. The multi-layer perceptron is an infrastructure in deep learning, which takes full connectivity as a basic principle. In the Swin structure segment following the feature extraction section, the model uses a tile fusion layer instead of a linear embedding layer, which downsamples the current feature map at the front of each Swin structure segment.

After the characteristics are extracted from the remote sensing image through the characteristic extraction part, the obtained characteristics are input into the preliminary prediction part by the model, and the area and the angle of the road are preliminarily predicted.

In the preliminary prediction part, the model respectively uses two modules: and the semantic segmentation module and the angle prediction module are used as decoders for decoding the feature maps.

The semantic segmentation module aims at performing preliminary road region prediction, and the angle prediction module aims at performing road angle prediction.

In both decoders, the prediction results are not formed by directly upsampling the feature map using a convolutional network, but two sets of values, a set of masks M and a set of mask-to-class mapping matrices C, are predicted using a pixel decoder and a standard transform decoder in parallel. The mask group M comprises N masks, and the size of each mask is H multiplied by W and is consistent with that of an input image; the mapping matrix C has a size N x (K + 1) and relates the N masks to the distribution of K classes to be predicted (with an additional empty class label) in the form of a matrix multiplication with the set of masks M. The pixel decoder uses conventional convolutional upsampling, while the transform decoder uses multiple attention layers for upsampling. In the angle prediction module, an additional tanh layer is used to constrain the output to-1, and is multiplied by pi/2 to map to the angle space when the angle is calculated subsequently.

The semantic segmentation module and the angle prediction module respectively obtain preliminary road region prediction and road angle prediction, and the predicted values are compared with corresponding truth values to calculate a loss function.

After the preliminary road region prediction and the road angle prediction are obtained by the preliminary prediction part, the road region prediction characteristic and the road angle characteristic are fused by the characteristic fusion part, so that the accuracy degree of the road prediction is further improved.

Specifically, the model connects the original remote sensing image, the preliminary road prediction result, the road angle prediction result and the mask set M generated in the preliminary road prediction process into a multi-channel feature map, and inputs the multi-channel feature map into the final feature fusion module to obtain the final road prediction result. The feature fusion module upsamples the feature map using a number of convolutional layers.

Step 1.3: the road extraction algorithm model compares the predicted road information and road angle with the corresponding real values respectively, calculates a loss function and updates the parameters in the model according to the loss function.

Preferably, the following method may be employed:

for effective training of the model, 3 sub-loss functions are used at different modules respectively, and the overall loss function is expressed as follows:

wherein,

the overall loss function is expressed as a function of,

a penalty function representing a result of constraining the preliminary prediction component to produce a preliminary road prediction,

a penalty function representing the result of constraining the preliminary prediction part to produce an angle prediction,

representing a penalty function that constrains the final road prediction.

In particular, for the semantic segmentation module in the preliminary prediction part, use is made of

The mask set prediction result M of the module and the mask-to-class mapping matrix prediction result C of the module are constrained.

For mask set M, a combination of focus loss and dice loss is used for constraint.

For the mapping matrix C, a cross-entropy classification penalty is used for the constraint.

Thus, the penalty function for constraining the preliminary road prediction results

Expressed as:

wherein,

respectively representing a cross entropy classification loss function, a focus loss function and a dice loss function; c and c ^gt Respectively representing the prediction result and the truth value of the mapping matrix C; m and m ^gt Respectively showing the prediction result and the true value of the mask set M.

For the angle prediction module in the preliminary prediction section, the module predicts an angle value corresponding to each road pixel, expressed as a value between-pi/2 and pi/2. In consideration of this point, a loss function for restricting the angle prediction result is designed

It is expressed as:

wherein x and y respectively represent the values of horizontal and vertical coordinates in the mask/matrix; r is a radical of hydrogen ^gt Truth values, t and t representing road regions ^gt Respectively representing the predicted result and the true value of the road angle matrix.

Finally, a part of the loss function is also needed

To constrain the final road prediction. Feature fusion of the last step androad prediction is performed using a plurality of convolution layers, the result of which can be considered as a pixel-level semantic segmentation result, and therefore, a loss function of a pixel classification type (such as a loss function of cross entropy loss, focus loss, lov-sz loss, or a combination thereof) is used to constrain the final prediction result.

Step 1.4: and (4) repeating the step 1.2 and the step 1.3 until the set termination condition is met, and storing the structure and the model parameters of the network.

In the training process, the remote sensing image is continuously and repeatedly input into the model, parameters in the model are dynamically adjusted according to the loss function until preset conditions are met (for example, the training round reaches a certain amount or certain evaluation index is better than a certain preset value), the training process is stopped, and the model parameters are stored.

And 2, step: and (4) a use stage.

And (3) extracting road information according to the remote sensing image by using the model parameters obtained in the training stage in the step (1).

Further, if the real road information corresponding to the remote sensing image exists, the extracted road information and the real road information are compared and evaluated, and the effect of extracting the road information by the model is judged to be good or bad.

Specifically, step 2 comprises the steps of:

step 2.1: and establishing a network model according to the model structure and parameters stored in the training stage, respectively inputting the acquired remote sensing image images into the model for operation, and storing the extracted and predicted road information of the model.

The remote sensing image used by the model is usually in an RGB format and is formed by splicing a plurality of tiles with the same size.

If the collected remote sensing image is not in the format, the complete remote sensing image is cut and stored firstly. And then, creating a network model according to the model structure and parameters stored in the training stage, inputting the remote sensing image tiles into the model, predicting the model through a feature extraction part, a preliminary prediction part and a feature fusion part respectively, and automatically storing the road information finally obtained by prediction.

The road information is in a binary mask or binary matrix format, and the length and width of the road information are consistent with those of the input remote sensing image tiles.

Step 2.2: and if the used remote sensing image has corresponding real road information, comparing and evaluating the predicted road information and the real road information to judge the effect of the model predicted road information.

In order to objectively judge the accuracy of the model predicted road information, the predicted road information and the real road information may be directly compared and evaluated based on accuracy indexes such as an IoU (interaction over unit) score and an F1 score. The IoU score is a common evaluation index in the semantic segmentation problem, is a ratio of intersection area to union area between a real area and a prediction area corresponding to a certain semantic category, and is expressed by a calculation formula:

wherein TP, FP, FN represent the number of pixels of true positive, false positive and false negative, respectively.

The F1 score is an index used for measuring the accuracy of the binary classification in statistics, and is calculated according to the accuracy and the recall ratio, and the calculation formula is expressed as follows:

wherein, P and R represent accuracy and recall respectively, and TP, FP and FN represent the pixel numbers of true positive, false positive and false negative respectively.

Further, based on the method, the invention provides a remote sensing image road extraction system combining semantic segmentation and angle prediction. The system comprises: the system comprises a real road information preprocessing subsystem, a remote sensing image preprocessing subsystem, a feature extraction subsystem, a primary prediction subsystem, a feature fusion subsystem, a loss function calculation subsystem and a generated result evaluation subsystem.

The real road information preprocessing subsystem is used for calculating real road angle information based on the real road information, and is convenient for the loss function calculating subsystem to compare the real road angle information with predicted road angle information to calculate a loss function so as to train a model better. The road framework angle estimation method comprises a road framework extraction module, a road framework pixel-by-pixel angle calculation module and a road non-framework point angle inference module.

The remote sensing image preprocessing subsystem comprises a remote sensing image amplifying module and a remote sensing image formatting module. The remote sensing image augmentation module is used for carrying out operations such as random cutting, rotation and turning on the remote sensing image so as to increase a remote sensing image data set and diversify the remote sensing image as much as possible, so that the training model has stronger generalization capability; the remote sensing image formatting module is used for converting the remote sensing image stored in the bitmap format into a tensor format and normalizing the tensor format to meet the format requirement of the characteristic extraction subsystem on input data;

the feature extraction subsystem is used for extracting road features of the remote sensing image and providing road feature information convenient to use for the follow-up preliminary prediction subsystem. Comprises a segment cutting layer, a linear embedding layer, a segment fusion layer and a Swin block structure.

The preliminary prediction subsystem comprises a semantic segmentation module and an angle prediction module which are respectively used for carrying out preliminary prediction on the region and the angle of the road based on the road characteristics extracted by the characteristic extraction subsystem. These preliminary prediction results will be temporarily saved for the subsequent loss function calculation subsystem to calculate the loss function and adjust the model parameters.

The feature fusion subsystem comprises a plurality of convolution layers and is used for fusing road region prediction features and road angle features, and generating a final road prediction result by compositely utilizing the features, so that the accuracy of road prediction is further improved;

the loss function calculation subsystem comprises a preliminary road prediction loss function calculation module, a preliminary angle prediction loss function calculation module and a final road prediction loss function calculation module. The system comprises a primary road prediction loss function calculation module, a primary angle prediction loss function calculation module and a final road prediction loss function calculation module, wherein the primary road prediction loss function calculation module is used for calculating a loss function according to real road information and a primary road prediction result provided by a semantic segmentation module in a primary prediction subsystem, the primary angle prediction loss function calculation module is used for calculating a loss function according to the real road angle information and a road angle prediction matrix provided by an angle prediction module in the primary prediction subsystem, and the final road prediction loss function calculation module is used for calculating the loss function according to the real road information and a final road prediction result provided by a feature fusion subsystem. The loss function is propagated reversely to update parameters of each neural network layer in the feature extraction subsystem, the preliminary prediction subsystem and the feature fusion subsystem, and correct the road result finally predicted by the system.

And the generated result evaluation subsystem is used for comparing the final road prediction result with the real road information, automatically calculating various road precision evaluation indexes such as IoU (IoU) scores and F1 scores, serving as supplement of subjective feelings of a user and helping the user quantitatively know the accuracy of the system prediction road information.

The connection relationship among the above-mentioned component systems is:

the real road information preprocessing subsystem is connected with the loss function calculation subsystem and the generated result evaluation subsystem and provides real road information and a real road angle matrix for the loss function calculation subsystem and the generated result evaluation subsystem;

the remote sensing image preprocessing subsystem is connected with the feature extraction subsystem and the feature fusion subsystem and provides processed remote sensing image pictures meeting the requirements for the feature extraction subsystem and the feature fusion subsystem;

the characteristic extraction subsystem is connected with the remote sensing image preprocessing subsystem and the preliminary prediction subsystem, provides extracted road characteristics for the preliminary prediction subsystem, and simultaneously receives a loss value returned by the preliminary prediction subsystem for training the self;

the primary prediction subsystem is connected with the feature extraction subsystem, the feature fusion subsystem and the loss function calculation subsystem respectively, provides a primary road prediction result, a primary angle prediction result and features generated in the primary road prediction process for the feature fusion subsystem, and simultaneously receives a loss value returned by the feature fusion subsystem for training the primary prediction subsystem; in addition, the generated preliminary road prediction result and the preliminary angle prediction result are provided for the loss function calculation subsystem, and the returned loss value is received from the loss function calculation subsystem to train the loss function calculation subsystem;

the characteristic fusion subsystem is connected with the remote sensing image preprocessing subsystem, the preliminary prediction subsystem, the loss function calculation subsystem and the generated result evaluation subsystem, fuses various characteristics provided by the remote sensing image preprocessing subsystem and the preliminary prediction subsystem to generate final road prediction information, provides the final road prediction information to the loss function calculation subsystem and the generated result evaluation subsystem, and simultaneously receives a loss value calculated by the loss function calculation subsystem according to the final road prediction information to train the system;

the loss function calculation subsystem is connected with the real road information preprocessing subsystem, the preliminary prediction subsystem and the feature fusion subsystem, compares the real road information provided by the real road information preprocessing subsystem with the preliminary road prediction information provided by the preliminary prediction subsystem and the final road prediction information provided by the feature fusion subsystem to calculate a loss value, compares the real angle matrix provided by the real road information preprocessing subsystem with the preliminary angle prediction result provided by the preliminary prediction subsystem to calculate a loss value, and correspondingly transmits the loss values back to the preliminary prediction subsystem and the feature fusion subsystem respectively for training a model;

the generated result evaluation subsystem is connected with the real road information preprocessing subsystem and the characteristic fusion subsystem, and compares and calculates an evaluation index according to the real road information provided by the real road information preprocessing subsystem and the final road prediction information provided by the characteristic fusion subsystem, so as to help a user to quantitatively know the accuracy of the system prediction road information.

The working process of the system is as follows:

step 1: all real road information is sequentially input into a real road information preprocessing subsystem, a corresponding real road angle matrix is calculated and sequentially stored, so that the real road angle matrix is convenient for subsequent repeated use;

step 2: inputting the remote sensing image into a remote sensing image preprocessing subsystem, performing data augmentation, normalization and formatting, and storing the processed remote sensing image in a memory;

and step 3: the feature extraction subsystem reads the remote sensing image from the remote sensing image preprocessing subsystem, and extracts road features after processing through the neural network layer;

and 4, step 4: the preliminary prediction subsystem reads the road characteristics, processes the road characteristics by using a semantic segmentation module and an angle prediction module respectively to obtain a preliminary road prediction result and a preliminary angle prediction result, and simultaneously stores an intermediate result and a mask set M generated by the semantic segmentation module;

and 5: the feature fusion subsystem reads the remote sensing image, the preliminary road prediction result, the preliminary angle prediction result and the mask set M, fuses the features and generates a final road prediction result according to the composite features;

step 6: the loss function calculation subsystem compares the real road information with the preliminary road prediction information and calculates a loss value, and the loss value is transmitted back to the preliminary prediction subsystem for training; comparing the real angle matrix with the preliminary angle prediction result, calculating a loss value, and transmitting the loss value back to the preliminary prediction subsystem for training; comparing the real road information with the final road prediction information, calculating a loss value, and transmitting the loss value back to the feature fusion subsystem for training;

and 7: each neural network layer in the characteristic fusion subsystem carries out parameter modification according to the loss value returned by the loss function calculation subsystem to complete one training iteration and return the loss value of the first layer to the preliminary prediction subsystem;

and 8: each neural network layer in the preliminary prediction subsystem calculates the loss value returned by the subsystem and the loss value returned by the feature fusion subsystem according to the loss function to carry out parameter modification, completes one training iteration and returns the loss value of the first layer to the feature extraction subsystem;

and step 9: each neural network layer in the feature extraction subsystem modifies parameters according to the loss value returned by the preliminary prediction subsystem, and one training iteration is completed;

step 10: repeatedly executing the step 2 to the step 9, and sending the real road information and the final road prediction information to a generated result evaluation subsystem to calculate indexes after all the training data are traversed and used once; if the index meets a preset condition or the total iteration number reaches a preset upper limit, stopping training and executing the step 11;

step 11: and (5) executing the step (2) to the step (5), generating final road prediction information, and sending the real road information and the final road prediction information to a generation result evaluation subsystem to calculate a final index.

Advantageous effects

Compared with the prior art, the invention has the following advantages:

1. the method comprises the steps of calculating road angles pixel by original or collected road information of a data set, explicitly reflecting the geometric characteristic that a road inclination angle has certain stability, and providing a road angle truth value for comparison by constraining the geometric characteristic for a neural network;

2. according to the method, an angle prediction module and a feature fusion part are designed in a model, the angle prediction module is used for predicting road angle features, and a feature fusion part is used for fusing a remote sensing image, preliminarily predicted road information and features and predicted road angle features, so that the road is finally predicted on the basis, and the accuracy of a final road prediction result is improved by better utilizing various types of information contained in the remote sensing image;

3. the invention predicts the loss function of the result by designing the constraint angle

And adding explicit constraint conditions for an angle prediction module in the model to enable the structure of the model to be effectively converged.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of a core algorithm model according to the method of the present invention.

Fig. 3 is a detailed structural diagram of a feature extraction part in a core algorithm model of the method of the present invention, wherein the upper half is a structural diagram of a trunk feature extractor, and the lower half is an internal structural diagram of two consecutive adjacent Swin blocks in the trunk feature extractor.

Fig. 4 is a detailed structural diagram of the preliminary prediction part in the core algorithm model of the method of the present invention, wherein the upper half is a structural diagram of the semantic segmentation module in the preliminary prediction part, and the lower half is a structural diagram of the angle prediction module in the preliminary prediction part.

FIG. 5 is a schematic diagram of the system of the present invention.

Detailed Description

For a better understanding of the objects and advantages of the present invention, reference is made to the following description taken in conjunction with the accompanying drawings and examples.

Examples

The embodiment discloses a remote sensing image road extraction method combining semantic segmentation and angle prediction, which comprises a training stage and a using stage.

In the training stage, firstly, according to real road information in a data set, calculating road angles in modes of Hough transformation and the like to obtain angle values corresponding to each road category pixel and form real road angle information; then extracting road information according to the input remote sensing image by using a road extraction algorithm model to generate a prediction result of the road information, and synchronously generating a prediction result of a road angle in the prediction process; after prediction is carried out, the road extraction algorithm model respectively compares the predicted road information and road angles with corresponding real values, calculates a loss function and updates parameters in the model according to the loss function; and then, repeating the second step and the third step until the set termination condition is met, and storing the structure and the model parameters of the network. In the using stage, a network model is established according to the model structure and parameters stored in the training stage, the acquired remote sensing image images are respectively input into the model for operation, and the extracted and predicted road information of the model is stored; and if the used remote sensing image has corresponding real road information, comparing and evaluating the predicted road information and the real road information to judge the quality of the effect of the model predicted road information. The flow chart of this embodiment is shown in fig. 1.

The traditional road information acquisition method needs to consume a large amount of manpower and material resources to carry out field mapping, acquire and record road vector data, then import the road vector data into a computer and draw a road network map convenient for people to watch, and the process is complicated and long, and high cost needs to be consumed. In recent years, the road extraction effect based on remote sensing images is remarkably improved due to rapid development of deep learning technology. Most of the existing road extraction methods based on the deep learning technology regard the task as a semantic segmentation task, and a semantic segmentation model taking a deep convolutional neural network as a core is used for extracting road regions. Compared with the traditional method for extracting the road by depending on one or more artificially summarized road characteristic models, the semantic segmentation model using the deep convolutional neural network can learn required composite characteristics from a large number of remote sensing images, and the characteristics are comprehensively utilized to improve the accuracy of road extraction. However, when the current semantic segmentation method is used for extracting a remote sensing image road, phenomena such as road fracture, road boundary blurring, road width mutation and the like in an extraction result are still common. The method and the system for extracting the remote sensing image road by combining semantic segmentation and angle prediction provide a core algorithm with an angle prediction module and a feature fusion module, and the model can better utilize various information contained in the remote sensing image to improve the accuracy of a final road prediction result by adding the modes of prediction, fusion and explicit constraint on road angle features in the model. The schematic diagram of the main network model structure of the method and the system is shown in fig. 2. The detailed structure of the feature extraction part in the model is shown in fig. 3, and the detailed structure of the preliminary prediction part in the model is shown in fig. 4.

The remote sensing image road extraction system combining semantic segmentation and angle prediction comprises a real road information preprocessing subsystem, a remote sensing image preprocessing subsystem, a feature extraction subsystem, a preliminary prediction subsystem, a feature fusion subsystem, a loss function calculation subsystem and a generated result evaluation subsystem. The structure and connection relationship of the above subsystems are shown in fig. 5.

The details of the present embodiment are as follows.

A remote sensing image road extraction method combining semantic segmentation and angle prediction comprises two stages of training and using. The method comprises the following steps:

step 1: in the training stage, a remote sensing image-road information pairing data set is used, a parameter dictionary of the algorithm model is modified in an iterative mode and stored:

step 1.1: according to the real road information in the data set, the road angle is calculated through Hough transform and other modes, the angle value corresponding to each road category pixel is obtained, and the real road angle information is formed.

Firstly, refining real road information by using a morphological method to obtain road skeleton information; the purpose of this step is to reduce the amount of calculation in the following steps and to improve the calculation efficiency. Secondly, for each pixel in the road skeleton information, taking a neighborhood road skeleton information image taking the pixel as a center, using Hough transform on the neighborhood image, calculating a straight line fitting the road skeleton of the area, and taking the inclination angle of the straight line as the road angle corresponding to the pixel of the central point. And thirdly, according to the angle information of the road skeleton points, deducing the angle information of other points on the road according to a proximity principle to obtain complete road angle information. Finally, all the road angle information is stored in a matrix format for training.

And randomly selecting a plurality of remote sensing images from the training data set to input into a road extraction algorithm model. The model outputs prediction information of road information corresponding to one remote sensing image to each remote sensing image, and a prediction result of a road angle is synchronously generated in the process. The width and the height of the road information prediction information are consistent with those of the input remote sensing image, and the solving space is 0 or 1, wherein 0 represents that the pixel is not predicted as a road, and 1 represents that the pixel is predicted as a road. The road angle information predicted by the model is in a matrix format, the solution space of the road angle information is any legal angle value, and the range of the angle value is usually limited to any real number on [ - π/2, π/2); the length and width dimensions of the matrix are consistent with those of the input remote sensing image.

The road extraction algorithm model comprises three main parts: the device comprises a feature extraction part, a preliminary prediction part and a feature fusion part.

After the remote sensing image is input into the algorithm model, firstly, the remote sensing image passes through a feature extraction part, and the feature extraction part uses a self-attention mechanism-based Transformer structure to extract image features of the remote sensing image. The characteristic extraction part firstly splits an input RGB remote sensing image of H multiplied by W multiplied by 3 into N multiplied by p by using a graph block slicing layer ² The x 3 form is to split the original remote sensing image with three channels, i.e. the height is H and the width is W, into N image blocks, wherein the width and the height of each image block are both p, and the number of channels is still 3. Next, the blocks pass through a linear embedding layer, and p of each block is obtained ² The x 3-dimensional tensor projects to a vector mapping of arbitrary dimensions, where the linear embedded layer is essentially a fully connected layer. The significance of this step is to combine each p ² The 3 × RGB tiles are mapped into linear vectors for use in the next structure. These vectors are then input into the Swin block from the attention mechanism. In Swin block, these vectors are first regularized and input to a Window-based Multi-head Self-Attention Module W-MSA (Window Multi-head Self-Attention Attention) or a Shift Window-based Multi-head Self-Attention Module SW-MSA (Shifted Window Multi-head Self-Attention Attention). The window-based multi-head self-attention module takes the image blocks segmented by the image block segmentation layers as a unit, and calculates the self-attention in each image block, so that the calculation efficiency of the model is obviously improved, but the information communication among the cross-windows is limited. Thus, the model uses a window-based multi-headed self-attention module in which the mode alternates with a shifted window-based multi-headed self-attention module in which the mode alternates between every two consecutive adjacent Swin blocksAnd respectively translating each window by 1/2 of the size of the window in the transverse direction and the longitudinal direction, calculating attention, and then performing reverse translation to finish information exchange between the windows. After the self-attention module, the model is connected with residual errors, and the characteristics are processed by using the regularization layer and the multi-layer perceptron layer continuously. The multi-layer perceptron is an infrastructure in deep learning, which takes full connectivity as a basic principle. In the latter Swin structure segments of the feature extraction part, the model uses a block fusion layer instead of a linear embedding layer, which downsamples the current feature map at the front end of each Swin structure segment.

After the characteristics are extracted from the remote sensing image through the characteristic extraction part, the obtained characteristics are input into the preliminary prediction part by the model, and the area and the angle of the road are preliminarily predicted. In the preliminary prediction part, the model respectively uses two modules: and the semantic segmentation module and the angle prediction module are used as decoders for decoding the feature map, wherein the semantic segmentation module aims at performing preliminary road region prediction, and the angle prediction module aims at performing road angle prediction. Instead of directly upsampling the feature map using a convolutional network to form a prediction result in the two decoders, a pixel decoder and a standard transform decoder are used in parallel to predict two sets of values, a set of masks M and a set of mask-to-class mapping matrices C. The mask group M comprises N masks, and the size of each mask is H multiplied by W and is consistent with that of an input image; the mapping matrix C has a size N x (K + 1) and relates the N masks to the distribution of K classes to be predicted (with an additional empty class label) in the form of a matrix multiplication with the set of masks M. The pixel decoder uses conventional convolutional upsampling, while the transform decoder uses multiple attention layers for upsampling. An additional tanh layer is used in the angle prediction module to constrain the output to-1 and multiply by pi/2 to map to the angle space when the angle is calculated subsequently. The semantic segmentation module and the angle prediction module respectively obtain preliminary road region prediction and road angle prediction, and the predicted values are compared with corresponding truth values to calculate a loss function.

After the preliminary road region prediction and the road angle prediction are obtained through the preliminary prediction part, the model fuses the road region prediction features and the road angle features through the feature fusion part, and the accuracy degree of the road prediction is further improved. Specifically, the model connects the original remote sensing image, the preliminary road prediction result, the road angle prediction result and a mask set M generated in the preliminary road prediction process into a multi-channel feature map, and inputs the multi-channel feature map into a final feature fusion module to obtain a final road prediction result. The feature fusion module upsamples the feature map using a plurality of convolutional layers.

Step 1.3: the road extraction algorithm model compares the predicted road information and road angle with the corresponding real values respectively, calculates a loss function and updates parameters in the model according to the loss function.

To train the model effectively, 3 sub-loss functions are used at different modules, respectively. The overall loss function can be expressed as follows

Wherein

The overall loss function is expressed as a function of,

a loss function representing a result of constraining the preliminary prediction component to produce a preliminary road prediction,

representing a penalty function that constrains the final road prediction.

The mask set prediction result M of the module and the mask-to-class mapping matrix prediction result C of the module are constrained. For the mask group M, the combination of focal point loss and dice loss is adopted for constraint; for the mapping matrix C, the conventional cross-entropy classification penalty is used for the constraint. Thus, the penalty function for constraining the preliminary road prediction results

Can be expressed as

Wherein

Respectively a cross entropy classification loss function, a focus loss function and a dice loss function; c and c ^gt Respectively representing the prediction result and the true value of the mapping matrix C; m and m ^gt Respectively showing the prediction result and the true value of the mask set M.

For the angle prediction module in the preliminary prediction part, the module predicts the angle value corresponding to each road pixel, which is expressed by the value between-pi/2 and pi/2. In consideration of this point, a loss function for restricting the angle prediction result is designed

Which can be represented as

Where x, y represent the values of the abscissa and ordinate, respectively, in the mask or matrix, r ^gt Truth values, t and t representing road regions ^gt Respectively representing the predicted result and the true value of the road angle matrix.

Finally, a part of the loss function is also needed

To constrain the final road prediction. The feature fusion and road prediction of the last step are performed by using a plurality of convolution layers, and the result can be regarded as a semantic segmentation result of a pixel level, so that a loss function of a pixel classification type is used for constraining a final prediction result, such as a cross entropy loss, a focus loss, a lov a sz loss and other loss functions, or a combination of the loss functions.

In the training process, the remote sensing images are continuously and repeatedly input into the model, parameters in the model are dynamically adjusted according to the loss function until a certain preset condition is met, and the training process is stopped and the model parameters are stored if the training turn reaches a certain amount or an evaluation index is superior to a certain preset value.

Step 2: and (3) a using stage, namely extracting and storing road information according to the remote sensing image by using the model parameters obtained in the training stage in the step 1. Further, if the real road information corresponding to the remote sensing image exists, the extracted road information and the real road information can be compared and evaluated, and the effect of extracting the road information by the model is judged to be good or bad.

The remote sensing image used by the model is usually in an RGB format and is formed by splicing a plurality of tiles with the same size. If the collected remote sensing image is not in the format, the complete remote sensing image is cut and stored. And then, creating a network model according to the model structure and parameters stored in the training stage, inputting the remote sensing image tiles into the model, predicting the model by a feature extraction part, a preliminary prediction part and a feature fusion part respectively, and automatically storing road information obtained by final prediction, wherein the road information is in a binary mask or binary matrix format, and the length and width of the road information are consistent with those of the input remote sensing image tiles.

Step 2.2: and if the used remote sensing image has corresponding real road information, comparing and evaluating the predicted road information and the real road information to judge the quality of the effect of the model predicted road information.

In order to objectively judge the accuracy of the model predicted road information, the predicted road information and the actual road information may be directly compared and evaluated based on accuracy indexes such as an IoU (interaction over unit) score and an F1 score. The IoU score is a common evaluation index in the semantic segmentation problem, and is a ratio of intersection area and union area between a real area and a prediction area corresponding to a certain semantic category. The calculation formula can be expressed as

Where TP, FP, FN represent the number of pixels of true positive, false positive and false negative, respectively.

The F1 score is an index used for measuring the accuracy rate of two classifications in statistics and is calculated according to the accuracy rate and the recall rate, and a calculation formula can be expressed as

Where P and R represent accuracy and recall, respectively, and TP, FP, FN represent the number of pixels of true positive, false positive, and false negative, respectively.

In addition, for better illustrating the objects and advantages of the present invention, the inventive system is further described below with reference to the accompanying drawings.

Examples

The embodiment discloses a remote sensing image road extraction system combining semantic segmentation and angle prediction, which specifically comprises the following components: the system comprises a real road information preprocessing subsystem, a remote sensing image preprocessing subsystem, a feature extraction subsystem, a preliminary prediction subsystem, a feature fusion subsystem, a loss function calculation subsystem and a generated result evaluation subsystem.

The real road information preprocessing subsystem comprises a road skeleton extraction module, a road skeleton pixel-by-pixel angle calculation module and a road non-skeleton point angle inference module, and is used for calculating real road angle information based on the real road information, so that the loss function calculation subsystem can compare the real road angle information with predicted road angle information to calculate a loss function, and a model can be trained better.

The remote sensing image preprocessing subsystem comprises a remote sensing image amplifying module and a remote sensing image formatting module. The remote sensing image augmentation module is used for carrying out operations such as random cutting, rotation and turning on the remote sensing image so as to increase a remote sensing image data set and diversify the remote sensing image as much as possible, so that the training model has stronger generalization capability; the remote sensing image formatting module is used for converting the remote sensing image stored in the bitmap format into a tensor format and normalizing the tensor format to meet the format requirement of the feature extraction subsystem on input data;

the feature extraction subsystem comprises a block segmentation layer, a linear embedding layer, a block fusion layer and a Swin block structure and is used for extracting road features of the remote sensing image and providing road feature information convenient to use for the subsequent preliminary prediction subsystem;

the preliminary prediction subsystem comprises a semantic segmentation module and an angle prediction module, and preliminary prediction is carried out on the regions and the angles of the roads respectively based on the road characteristics extracted by the characteristic extraction subsystem. These preliminary prediction results will be temporarily saved for the subsequent loss function calculation subsystem to calculate the loss function and adjust the model parameters;

the feature fusion subsystem comprises a plurality of convolution layers, road region prediction features and road angle features are fused, and the features are compositely utilized to generate a final road prediction result, so that the accuracy of road prediction is further improved;

the loss function calculation subsystem comprises a preliminary road prediction loss function calculation module, a preliminary angle prediction loss function calculation module and a final road prediction loss function calculation module. The system comprises a primary road prediction loss function calculation module, a primary angle prediction loss function calculation module and a final road prediction loss function calculation module, wherein the primary road prediction loss function calculation module is used for calculating a loss function according to real road information and a primary road prediction result provided by a semantic segmentation module in a primary prediction subsystem, the primary angle prediction loss function calculation module is used for calculating a loss function according to the real road angle information and a road angle prediction matrix provided by an angle prediction module in the primary prediction subsystem, and the final road prediction loss function calculation module is used for calculating a loss function according to the real road information and a final road prediction result provided by a feature fusion subsystem. The loss function is propagated reversely to update parameters of each neural network layer in the feature extraction subsystem, the preliminary prediction subsystem and the feature fusion subsystem, and correct the road result finally predicted by the system.

And the generated result evaluation subsystem is used for comparing the final road prediction result with the real road information, automatically calculating various road precision evaluation indexes such as an IoU score and an F1 score, serving as supplement of subjective feeling of a user, and helping the user to quantitatively know the accuracy of the system prediction road information.

The connection relationship among the above-mentioned component systems is:

the preliminary prediction subsystem is connected with the feature extraction subsystem, the feature fusion subsystem and the loss function calculation subsystem, provides a preliminary road prediction result, a preliminary angle prediction result and features generated in the preliminary road prediction process for the feature fusion subsystem, and simultaneously receives a loss value returned by the feature fusion subsystem for training the system; in addition, the generated preliminary road prediction result and the preliminary angle prediction result are provided for the loss function calculation subsystem, and the returned loss value is received from the loss function calculation subsystem to train the loss function calculation subsystem;

the feature fusion subsystem is connected with the remote sensing image preprocessing subsystem, the preliminary prediction subsystem, the loss function calculation subsystem and the generated result evaluation subsystem, fuses various features provided by the remote sensing image preprocessing subsystem and the preliminary prediction subsystem to generate final road prediction information, provides the final road prediction information to the loss function calculation subsystem and the generated result evaluation subsystem, and simultaneously receives a loss value calculated by the loss function calculation subsystem according to the final road prediction information to train the self-body;

The use method of the system comprises the following steps:

and 3, step 3: the feature extraction subsystem reads the remote sensing image from the remote sensing image preprocessing subsystem, and extracts road features after processing through the neural network layer;

and step 8: calculating loss values returned by the subsystem and the characteristic fusion subsystem according to the loss function by each neural network layer in the preliminary prediction subsystem to modify parameters, finishing one training iteration, and returning the loss value of the first layer to the characteristic extraction subsystem;

step 10: repeatedly executing the steps 2 to 9, and sending the real road information and the final road prediction information to a generated result evaluation subsystem to calculate indexes after all the training data are traversed and used once; if the index meets the preset condition or the total iteration number reaches the preset upper limit, stopping training and executing the step 11;

step 11: and (5) executing the steps 2 to 5, generating final road prediction information, and sending the real road information and the final road prediction information to a generation result evaluation subsystem to calculate a final index.

The above detailed description is further intended to illustrate the objects, technical solutions and advantages of the present invention, and it should be understood that the above detailed description is only an example of the present invention and should not be used to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A remote sensing image road extraction method combining semantic segmentation and angle prediction is characterized by comprising a training stage and a using stage;

in the training stage, a remote sensing image-road information pairing data set is used for iteratively modifying a parameter dictionary of the algorithm model;

firstly, calculating road angles according to real road information in a data set to obtain an angle value corresponding to each road category pixel and form real road angle information;

then, extracting road information according to the input remote sensing image by using a road extraction algorithm model, generating a prediction result of the road information, and synchronously generating a prediction result of a road angle in the prediction process;

after prediction is carried out, the road extraction algorithm model respectively compares the predicted road information and road angles with corresponding real values, calculates a loss function and updates parameters in the model according to the loss function;

repeating the prediction and comparison processes until a set termination condition is met, and storing the structure of the network and the model parameters;

in the use stage, firstly, a network model is established according to the model structure and parameters stored in the training stage, the collected remote sensing image images are respectively input into the model for calculation, and the road information extracted and predicted by the model is stored; and if the used remote sensing image has corresponding real road information, comparing and evaluating the predicted road information and the real road information so as to judge the quality of the effect of the model predicted road information.

2. The method for extracting the remote sensing image road by combining semantic segmentation and angle prediction as claimed in claim 1, wherein in a training phase, the road angle is calculated according to real road information in a data set to obtain an angle value corresponding to each road class pixel and obtain real road angle information, and the implementation method is as follows:

firstly, refining real road information by using a morphological method to obtain road skeleton information;

then, for each pixel in the road skeleton information, taking a neighborhood road skeleton information image taking the pixel as a center, calculating a straight line fitting the road skeleton of the area by using Hough transform on the neighborhood image, and taking the inclination angle of the straight line as a road angle corresponding to the pixel of the central point;

3. The method for extracting the road of the remote sensing image by combining the semantic segmentation and the angle prediction as claimed in claim 1, wherein in the training phase, a road extraction algorithm model extracts road information according to the input remote sensing image, generates a prediction result of the road information, and synchronously generates the prediction result of the road angle in the prediction process, and the specific method is as follows:

firstly, randomly selecting a plurality of remote sensing images from a training data set, and inputting the remote sensing images into a road extraction algorithm model;

the road extraction algorithm model outputs prediction information of road information corresponding to one remote sensing image for each remote sensing image, and a prediction result of a road angle is synchronously generated in the process;

the width and the height of the road information prediction information are consistent with those of the input remote sensing image, the solving space is 0 or 1,0 represents that the pixel is not predicted as a road, and 1 represents that the pixel is predicted as a road; the road angle information predicted by the model is in a matrix format, and the solution space of the road angle information is any legal angle value; the length and width of the matrix are consistent with those of the input remote sensing image;

the road extraction algorithm model comprises three parts: a feature extraction part, a preliminary prediction part and a feature fusion part;

after the remote sensing image is input into a road extraction algorithm model, firstly passing through a feature extraction part, wherein the feature extraction part uses a self-attention mechanism-based Transformer structure to extract image features of the remote sensing image;

after extracting features from the remote sensing image through the feature extraction part, inputting the obtained features into the preliminary prediction part by the model, and preliminarily predicting the area and the angle of the road;

after the preliminary road area prediction and the road angle prediction are obtained by the preliminary prediction part, the road area prediction feature and the road angle feature are fused by the feature fusion part.

4. The method as claimed in claim 3, wherein the feature extraction part divides the input H x W x 3 RGB remote sensing image into N x p remote sensing images by using the image segmentation layer ² A form of x 3, namely splitting an original remote sensing image with the height of H, the width of W and 3 channels into N image blocks, wherein the width and the height of each image block are both p, and the number of channels is still 3;

passing the tiles through a linear embedding layer, passing p of each tile ² The x 3-dimensional tensor is projected onto a vector map of arbitrary dimensions, where the linear embedding layer is essentially a fully connected layer, with the goal of mapping each p ² The RGB small blocks multiplied by 3 are mapped into linear vectors for subsequent use;

these linear vectors are input into the Swin block from the attention mechanism; in the Swin block, after regularizing the restricted row vectors, inputting the regularized restricted row vectors into a multi-head self-attention module W-MSA based on a window or a multi-head self-attention module SW-MSA based on a shift window;

the multi-head self-attention module based on the window takes the image blocks segmented by the image block segmentation layer as a unit, and the multi-head self-attention module based on the window and the multi-head self-attention module based on the shift window are alternately used in every two continuous adjacent Swin blocks by the model;

in a multi-head self-attention module based on a shifting window, a model respectively translates each window by 1/2 of the size of the window in the transverse direction and the longitudinal direction, and performs reverse translation after self-attention is calculated so as to complete information exchange between the windows;

after the self-attention module, performing residual error connection on the model, and continuously processing the characteristics by using a regularization layer and a multilayer perceptron layer;

in the Swin structure section behind the feature extraction part, the model uses a block fusion layer to replace a linear embedding layer, and the model downsamples a current feature map at the front end of each Swin structure section;

in the preliminary prediction part, the model respectively uses two modules: the semantic segmentation module and the angle prediction module are used as a decoder to decode the feature map;

the semantic segmentation module aims at performing preliminary road region prediction, and the angle prediction module aims at performing road angle prediction;

in the two decoders, the feature map is not directly up-sampled by using a convolutional network to form a prediction result, but a pixel decoder and a standard transform decoder are used in parallel to predict two groups of values, namely a group of masks M and a group of mask-to-class mapping matrix C; the mask group M comprises N masks, and the size of each mask is H multiplied by W and is consistent with that of an input image; the size of the mapping matrix C is nx (K + 1), which relates the N masks to the distribution of K classes to be predicted, in the form of a matrix multiplication with the mask set M; the pixel decoder uses conventional convolutional upsampling, while the transform decoder uses multiple attention layers for upsampling; in an angle prediction module, an extra tanh layer is used for constraining the output to be-1, and multiplying the output by pi/2 to map to an angle space when the angle is calculated subsequently;

5. The method for extracting the remote sensing image road by combining semantic segmentation and angle prediction as claimed in claim 3, wherein the model connects an original remote sensing image, a preliminary road prediction result, a road angle prediction result and a mask set M generated in the preliminary road prediction process into a multi-channel feature map, and inputs the multi-channel feature map into a final feature fusion module to obtain a final road prediction result; the feature fusion module upsamples the feature map using a number of convolutional layers.

6. The method for extracting the remote sensing image road by combining semantic segmentation and angle prediction as claimed in claim 4, wherein the road extraction algorithm model respectively compares the predicted road information and road angle with the corresponding real values, calculates the loss function and updates the parameters in the model according to the loss function, and the specific implementation method is as follows:

wherein,

the overall loss function is expressed as a function of,

a loss function representing a constraint final road prediction result;

for semantic segmentation modules in preliminary prediction parts, use is made of

Constraining a mask group prediction result M of the module and a mapping matrix prediction result C from the mask of the module to the category;

for the mask group M, the combination of focal point loss and dice loss is adopted for constraint;

for the mapping matrix C, using cross entropy classification loss to carry out constraint;

Expressed as:

wherein,

respectively representing a cross entropy classification loss function, a focus loss function and a dice loss function; c and c ^gt Respectively representing the prediction result and the true value of the mapping matrix C; m and m ^gt Respectively representing the prediction result and the true value of the mask set M;

for an angle prediction module in the preliminary prediction part, predicting an angle value corresponding to each road pixel by the module, wherein the value is represented by a value between-pi/2 and pi/2; in consideration of this point, a loss function for constraining the angle prediction result is designed

It is expressed as:

wherein x and y respectively represent the values of horizontal and vertical coordinates in the mask/matrix; r is ^gt Truth values, t and t representing road regions ^gt Respectively representing the prediction result and the true value of the road angle matrix;

finally, a part of the loss function is also needed

To constrain the final road prediction result; the feature fusion and road prediction of the last step are carried out by using a plurality of convolution layers, and the result can be regarded as a semantic segmentation result of a pixel level, so that a loss function of a pixel classification type is used for constraining a final prediction result;

and in the training process, continuously and repeatedly inputting the remote sensing image into the model, and dynamically adjusting parameters in the model according to the loss function until a preset condition is met, stopping the training process, and storing the model parameters.

7. The method for extracting the remote sensing image road by combining semantic segmentation and angle prediction as claimed in claim 1, wherein in the using stage, if the collected remote sensing image is not in the format, the complete remote sensing image is firstly cut and stored; then, a network model is established according to a model structure and parameters stored in a training stage, the remote sensing image tiles are input into the model, the model is respectively predicted through a feature extraction part, a preliminary prediction part and a feature fusion part, and road information obtained through final prediction is automatically stored;

the road information is in a binary mask or binary matrix format, and the length and width of the road information are consistent with those of the input remote sensing image tile;

and if the used remote sensing image has corresponding real road information, comparing and evaluating the predicted road information and the real road information to judge the effect of the model predicted road information.

8. The method for extracting the remote sensing image road by combining semantic segmentation and angle prediction as claimed in claim 7, wherein the predicted road information is directly compared and evaluated with the real road information based on precision indexes including an IoU score and an F1 score for objectively judging the accuracy degree of the model predicted road information;

the IoU score is the ratio of the intersection area and the union area between a real area and a prediction area corresponding to a certain semantic category, and the calculation formula is as follows:

wherein TP, FP and FN respectively represent the number of pixels with true positive, false positive and false negative;

the F1 score is calculated according to the accuracy and the recall ratio, and the calculation formula is as follows:

wherein, P and R represent accuracy and recall respectively, and TP, FP and FN represent the pixel number of true positive, false positive and false negative respectively.

9. A remote sensing image road extraction system combining semantic segmentation and angle prediction is characterized by comprising a real road information preprocessing subsystem, a remote sensing image preprocessing subsystem, a feature extraction subsystem, a preliminary prediction subsystem, a feature fusion subsystem, a loss function calculation subsystem and a generated result evaluation subsystem;

the real road information preprocessing subsystem is used for calculating real road angle information based on the real road information, so that the loss function calculation subsystem can compare the real road angle information with predicted road angle information to calculate a loss function, and a model can be trained better; the device comprises a road skeleton extraction module, a road skeleton pixel-by-pixel angle calculation module and a road non-skeleton point angle inference module;

the remote sensing image preprocessing subsystem comprises a remote sensing image amplification module and a remote sensing image formatting module; the remote sensing image augmentation module is used for carrying out operations such as random cutting, rotation and turning on the remote sensing image so as to increase a remote sensing image data set and diversify the remote sensing image as much as possible, so that the training model has stronger generalization capability; the remote sensing image formatting module is used for converting the remote sensing image stored in the bitmap format into a tensor format and normalizing the tensor format to meet the format requirement of the feature extraction subsystem on input data;

the feature extraction subsystem is used for extracting road features of the remote sensing image and providing road feature information convenient to use for the subsequent preliminary prediction subsystem; comprises a segment cutting layer, a linear embedding layer, a segment fusion layer and a Swin block structure;

the preliminary prediction subsystem comprises a semantic segmentation module and an angle prediction module, and is respectively used for preliminary prediction of the region and the angle of the road based on the road characteristics extracted by the characteristic extraction subsystem; these preliminary prediction results will be temporarily saved for the subsequent loss function calculation subsystem to calculate the loss function and adjust the model parameters;

the feature fusion subsystem comprises a plurality of convolution layers and is used for fusing the road region prediction features and the road angle features and compositely utilizing the features to generate a final road prediction result so as to further improve the accuracy of road prediction;

the loss function calculation subsystem comprises a preliminary road prediction loss function calculation module, a preliminary angle prediction loss function calculation module and a final road prediction loss function calculation module; the system comprises a primary road prediction loss function calculation module, a primary angle prediction loss function calculation module, a final road prediction loss function calculation module and a characteristic fusion subsystem, wherein the primary road prediction loss function calculation module is used for calculating a loss function according to real road information and a primary road prediction result provided by a semantic segmentation module in the primary prediction subsystem; the loss function is propagated reversely to update parameters of each neural network layer in the feature extraction subsystem, the preliminary prediction subsystem and the feature fusion subsystem, and correct the road result finally predicted by the system;

the generated result evaluation subsystem is used for comparing the final road prediction result with the real road information, automatically calculating various road precision evaluation indexes such as an IoU score and an F1 score, serving as supplement of subjective feeling of a user and helping the user to quantitatively know the accuracy degree of the road information predicted by the system;

the connection relationship among the above-mentioned component systems is:

10. The remote sensing image road extraction system combining semantic segmentation and angle prediction as claimed in claim 9, characterized in that the system works as follows:

and 6: the loss function calculation subsystem compares the real road information with the preliminary road prediction information and calculates a loss value, and transmits the loss value back to the preliminary prediction subsystem for training; comparing the real angle matrix with the preliminary angle prediction result, calculating a loss value, and transmitting the loss value back to the preliminary prediction subsystem for training; comparing the real road information with the final road prediction information, calculating a loss value, and transmitting the loss value back to the feature fusion subsystem for training;

and step 9: each neural network layer in the characteristic extraction subsystem modifies parameters according to the loss value returned by the preliminary prediction subsystem to complete one training iteration;

step 10: repeatedly executing the step 2 to the step 9, and sending the real road information and the final road prediction information to a generated result evaluation subsystem to calculate an index after all the training data are subjected to one-time traversal use; if the index meets a preset condition or the total iteration number reaches a preset upper limit, stopping training and executing the step 11;