CN112749578A

CN112749578A - Remote sensing image automatic road extraction method based on deep convolutional neural network

Info

Publication number: CN112749578A
Application number: CN201911039232.7A
Authority: CN
Inventors: 钱启; 周健; 张一明; 闫灿
Original assignee: Zhongke Star Map Co ltd
Current assignee: Zhongke Star Map Co ltd
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2021-05-04

Abstract

The embodiment of the invention provides a remote sensing image automatic road extraction method based on a deep convolutional neural network, which comprises the steps of collecting remote sensing image data and constructing a training data set; constructing a deep semantic segmentation network model of an encoder-decoder mode as a deep convolutional neural network model; training the deep semantic segmentation network model by using the training data set; and inputting the test data into the trained deep semantic segmentation network model to obtain a road extraction result of the remote sensing image. In this way, the remote sensing image can be segmented in a complex scene, and the road information in the remote sensing image can be accurately extracted.

Description

Remote sensing image automatic road extraction method based on deep convolutional neural network

Technical Field

Embodiments of the present invention relate generally to the field of deep learning, computer vision, and remote sensing satellite image processing, and more particularly, to a remote sensing image automatic road extraction method based on a deep convolutional neural network.

Background

Road information plays a very important role in modern society, and the research on a road extraction method based on remote sensing images has very important significance. The remote sensing image road extraction has a plurality of application scenes, such as automatic driving and vehicle navigation, map generation, city planning, intelligent transportation, land utilization detection and the like. The accuracy of road extraction not only significantly influences the recognition effect of other ground objects such as vehicles and buildings, but also is one of the key technologies in the research fields such as natural disaster early warning, military striking, unmanned vehicle path planning and the like.

The remote sensing image road extraction can be divided into semi-automatic extraction and automatic extraction according to the automation degree of the algorithm. The semi-automatic extraction algorithm needs to manually set the starting point of the road or the designated seed point, and the automatic extraction algorithm automatically realizes the extraction of the road by a computer without adding artificial subjective judgment factors.

With the development of artificial intelligence, when the remote sensing image is interpreted, the functions of learning and processing images of a computer are fully utilized, a large amount of manpower and time can be reduced, and the working efficiency is greatly improved. Therefore, such methods of automatically extracting road information from remote sensing images are becoming mainstream research directions. The existing road extraction methods are divided into three types, namely pixel-based, object-oriented and deep learning-based methods. The road extraction method based on deep learning has the best effect.

The road segmentation from the satellite image is a very challenging task, which is unique and difficult compared to the general segmentation task, and is expressed as follows: in the satellite image, the proportion of the target road occupied by the target road is generally small, rivers, railways and the like are too similar to the road, and even the target road is difficult to distinguish by human eyes. The road bifurcation connection condition is complex, and the identification precision of road extraction is quite high. In satellite images, roads are often narrow and have a priori connectivity, several roads may be in cross-connection with each other, and the whole span covers the whole picture. In addition, the problem of road identification connectivity is more prominent, and more roads are broken, and these challenges bring certain difficulty to road extraction in the satellite image, and the traditional image segmentation method is difficult to adapt to these challenges.

Disclosure of Invention

According to the embodiment of the invention, a remote sensing image automatic road extraction scheme based on a deep convolutional neural network is provided.

In a first aspect of the invention, a remote sensing image automatic road extraction method based on a deep convolutional neural network is provided. The method comprises the following steps:

a remote sensing image automatic road extraction method based on a deep convolutional neural network comprises the following steps:

collecting remote sensing image data and constructing a training data set;

constructing a deep semantic segmentation network model of an encoder-decoder mode as a deep convolutional neural network model;

training the deep semantic segmentation network model by using the training data set;

and inputting the test data into the trained deep semantic segmentation network model to obtain a road extraction result of the remote sensing image.

And further, before a training data set is constructed, data enhancement is carried out on the acquired remote sensing image data, and the remote sensing image data after data enhancement is obtained and used as the training data set.

Further, the construction process of the deep semantic segmentation network model of the encoder-decoder mode comprises the following steps:

sequentially constructing an encoder and a decoder to complete the construction of a deep semantic segmentation network model of an encoder-decoder mode; wherein

The process of constructing the encoder includes:

inputting an input image into a basic classification network module pre-trained on ImageNet, inputting an output result of the input image into a cavity rolling module group, outputting a feature extraction result by the cavity rolling module group, cascading the feature extraction results of the cavity rolling module group, performing feature fusion by convolution of 1 x 1, outputting a feature map, and completing encoder construction; the output characteristic diagram of the encoder is used as the first input of the decoder, and the first bottom layer characteristic diagram output by the basic classification network module is used as the second input of the decoder; taking the second bottom layer feature map output by the basic classification network module as a third input of the decoder;

the process of constructing a decoder includes: the first decoder and the second decoder are constructed in sequence.

Further, the first decoder construction process comprises:

performing first up-sampling on a first input of a decoder, then performing cascade connection on the first input of the decoder and a second input of the decoder, then performing continuous convolution twice to obtain a first characteristic diagram, and performing second up-sampling on the first characteristic diagram to obtain a first decoding characteristic diagram;

the second decoder construction process comprises:

and cascading a third input of the decoder with the first decoding feature map, then performing continuous convolution twice to obtain a second decoding feature map, performing second up-sampling on the second decoding feature map to obtain a result map of the deep semantic segmentation network model, and completing construction of the deep semantic segmentation network model.

Further, in the first decoder construction process, before the cascade connection, performing channel dimensionality reduction on the first up-sampled decoder first input and the second input of the decoder by using 1 × 1 convolution;

in the second decoder construction process, prior to concatenation, a channel dimensionality reduction is performed using a 1 x 1 convolution of a third input of the decoder with the first decoded feature.

Further, the hole convolution module group comprises a plurality of branches, and each branch comprises a hole convolution filter with the same convolution kernel size and different hole rates.

Further, before test data are input into the trained deep semantic segmentation network model, the input picture of the test data is subjected to band overlapping cutting, the cut small images are subjected to data enhancement, the test data subjected to data enhancement are input into the trained deep semantic segmentation network model for road feature extraction and segmentation, a plurality of small image segmentation results are obtained, and then the small images are subjected to average fusion, so that the road extraction result of the remote sensing image is obtained.

And further, post-processing the road extraction result of the remote sensing image, namely performing optimization processing by a closed operation mathematical morphology method of expansion and corrosion to obtain the road extraction result of the optimized remote sensing image.

In a second aspect of the invention, an electronic device is provided. The electronic device includes: a memory having a computer program stored thereon and a processor implementing the method as described above when executing the program.

In a third aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the method as according to the first aspect of the invention.

It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of any embodiment of the invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.

The method has a good segmentation effect on the remote sensing image in a complex scene, and greatly improves the working efficiency under the condition of accurately extracting the road.

Drawings

The above and other features, advantages and aspects of various embodiments of the present invention will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:

FIG. 1 shows a flow chart of a method for automatic road extraction of remote sensing images based on deep convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a model structure based on a deep convolutional neural network according to an embodiment of the present invention;

FIG. 3 illustrates a block diagram of a void volume module in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a morphological processing method in the field of digital image processing according to an embodiment of the present invention;

FIG. 5 is a graph showing a comparison of experimental segmentation results before and after using a morphological processing method according to an embodiment of the present invention;

FIG. 6 illustrates a block diagram of an exemplary electronic device capable of implementing embodiments of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

In the invention, based on the deep convolutional neural network model, the remote sensing image is segmented under a complex scene, and the road information in the remote sensing image is accurately extracted.

Fig. 1 shows a flowchart of a method for automatically extracting a road from a remote sensing image based on a deep convolutional neural network according to an embodiment of the present invention.

As shown, the method includes:

s101, collecting remote sensing image data and constructing a training data set.

The acquired remote sensing image data comprise roads, and the road scene comprises an urban road and a rural road.

In an embodiment of the invention, after the remote sensing image data is collected, the collected remote sensing image data is subjected to data enhancement to obtain the remote sensing image data subjected to data enhancement as a final training data set.

In the embodiment, when data enhancement is performed, the data sample set is randomly subjected to operations such as 90-degree rotation, 180-degree rotation, 270-degree rotation, horizontal inversion, vertical inversion, gamma transformation, blurring, and gaussian noise enhancement. To amplify more data sets, a specified number of random crop cuts were made. In addition, to eliminate color shift between different images, the images are color-transformed. Data enhancement is mainly in order to strengthen the data type in the training data set, promotes the training effect.

S102, constructing a deep semantic segmentation network model of an encoder-decoder mode as a deep convolutional neural network model.

Fig. 2 is a schematic diagram illustrating a model structure based on a deep convolutional neural network according to an embodiment of the present invention.

The construction process of the deep semantic segmentation network model of the coder-decoder mode comprises the following steps:

The process of constructing the encoder includes:

inputting an input image into a basic classification network module pre-trained on ImageNet, inputting an output result of the input image into a cavity rolling module group, outputting a feature extraction result by the cavity rolling module group, cascading the feature extraction results of the cavity rolling module group, performing feature fusion by convolution of 1 x 1, outputting a feature map, and completing encoder construction; wherein the output feature map of the encoder is used as a first input of the decoder, and a first bottom-layer feature map, for example, 1/4 size, output by the basic classification network module is used as a second input of the decoder; the second underlying feature map, e.g., 1/2 size, output by the underlying classification network module is used as a third input to the decoder.

In one embodiment of the present invention, when constructing the encoder, a base classification network module pre-trained on ImageNet is first selected, and then a hole rolling module group is added on the basis of the base classification network module.

As an embodiment, the basic classification network module takes a ResNet101 network as a network structure.

FIG. 3 shows a block diagram of a hole volume module set according to an embodiment of the invention.

The hole convolution module group comprises a plurality of branches, and each branch comprises a hole convolution filter with the same convolution kernel size and different hole rates.

In this embodiment, the hole convolution module group is mainly composed of 4 branches, and each branch includes a hole convolution filter with a convolution kernel size of 3 × 3 with a certain hole rate.

As an example, the void ratios of these 4 branches are 6, 12, 18 and 24, respectively. Then, the feature extraction results of the 4 branches are cascaded, and then feature fusion is performed through 1 × 1 convolution. Considering that the underlying network used by the present invention is ResNet101, in FIG. 3, the input to the set of hole rolling modules is a feature of block4 in ResNet 101.

In the set of hole convolution modules, there is a very important concept of hole convolution. The hole convolution can change the value of the hole rate to adjust the receptive field of the filter, thereby capturing the multi-scale features. For each location i on the output profile y and a convolution filter w, the hole convolution acts on the input profile x mainly according to equation (1). In equation (1), the void rate r is the step size we sample the input signal, and k is the length of the convolution filter w. It is worth mentioning that when the hole rate r is 1, the hole convolution is degraded to the normal convolution.

y[i]＝∑_kx[i+r·k]w[k] (1)

The hole rolling module group is added mainly for combining with the basic classification network module, so that the receptive field of a network filter can be increased, the image context information is increased, a multi-scale dense feature extraction image is generated, the target can be conveniently segmented in a multi-scale mode, and the task of semantic segmentation can be better completed.

The process of constructing a decoder includes: the first decoder and the second decoder are constructed in sequence. By combining the first decoding module and the second decoding module, the detail of segmentation can be well improved, and the positioning accuracy of the target boundary is improved.

The first decoder construction process includes:

the first input of the decoder is cascaded with the second input of the decoder after first up-sampling, then two times of continuous convolution are carried out to obtain a first characteristic diagram, and second up-sampling is carried out on the first characteristic diagram to obtain a first decoding characteristic diagram. By the first decoder, the segmentation characteristic map generated by the encoder can be effectively improved, and the refinement of the boundary information is facilitated.

In an embodiment of the present invention, the output signature of the encoder is 4 times bi-linear upsampled and then concatenated with the 1/4 size underlying signature generated by the underlying network in the encoder. Both profiles have the same spatial resolution.

In the first decoder construction process, before cascade connection, the first up-sampled first input of the decoder and the second input of the decoder are subjected to channel dimensionality reduction by using convolution of 1 x 1, so that the bottom-layer features are subjected to dimensionality reduction.

As an embodiment of the present invention, before cascading, first, a 1 × 1 convolution is used to perform channel dimensionality reduction on the bottom layer features, so that the bottom layer features are subjected to dimensionality reduction processing, and are uniform in dimensionality, and then cascading is performed. Two 3 x 3 convolutions were used consecutively after the cascade. By using the convolution of two 3 × 3 continuously, the feature map after the cascade can be effectively improved, so that the segmentation effect is better. And carrying out 2-time bilinear upsampling on the obtained feature map to obtain a final output feature map of the first decoder.

The second decoder construction process includes:

In the second decoder construction process, before cascade connection, the third input of the decoder and the first decoding feature are subjected to channel dimensionality reduction by using 1-by-1 convolution, so that the bottom-layer feature is subjected to dimensionality reduction.

In the embodiment of the invention, 1 × 1 convolution is firstly used for channel dimensionality reduction on bottom layer features to enable the bottom layer features to obtain dimensionality reduction processing, dimensionality is unified, then a bottom layer feature map with the size of 1/2 generated by a basic network in an encoder and a feature map generated by a first decoding module are cascaded, two continuous 3 × 3 convolutions are used for generating the feature map, and then 2 times of bilinear upsampling is carried out on the feature map to obtain a final result map of a semantic segmentation network model. The second decoding module is constructed, so that the boundary details of the target can be further recovered, the segmentation details are further improved, and the positioning accuracy of the target boundary is improved.

S103, training the deep semantic segmentation network model by using the training data set.

In an embodiment of the invention, the basic network is adjusted, and the 1000 types of the last layer are changed into 2 types, namely, a road type and a background type. Therefore, the basic network can be more suitable for the binary semantic segmentation task of automatic road extraction.

In one embodiment of the invention, in the aspect of definition of the loss function, the loss function uses cross entropy loss, and the loss function is used as an optimization target of the deep convolutional neural network, so that the semantic segmentation problem in the invention can be effectively defined.

In one embodiment of the invention, a standard SGD optimizer is used to optimally solve the loss function in terms of optimizer selection. The optimizer traverses the data samples in batch when the optimization is carried out, which is helpful for the rapid convergence of the deep convolutional neural network.

In one embodiment of the invention, a poly learning rate strategy is used in the learning rate setting. Compared with a strategy of reducing the learning rate by using a fixed iteration step number, the learning rate strategy is more effective, and the final loss value of the deep convolutional neural network is reduced.

The poly learning rate strategy is mainly to set an initial learning rate first, and the learning rate is dynamically updated according to a formula (2) in the training process.

Wherein power is 0.9, iter represents the current iteration number, maxim represents the set total iteration number, and lr is the learning rate.

In an embodiment of the invention, in the training process, the evaluation index of the algorithm performance adopts mIoU. The evaluation index is the most common measurement standard in the semantic segmentation of deep learning, and is also an important index for measuring the image segmentation precision. The index can integrally measure the segmentation effect of the algorithm, and the larger the numerical value is, the better the segmentation algorithm is.

For each picture, the cross-over score definition for the pixel is shown in equation (3).

Wherein, for picture i, TP_iNumber of pixels, FP, representing correct prediction as a road pixel_iIndicating number of pixels incorrectly predicted as road pixels, FN_iIndicating the number of pixels that were mispredicted as non-road pixels.

Assuming that there are n pictures, the average cross-over ratio score mlou for all pictures is the final index we want to use for evaluation. A specific formula of the index mlou is shown in formula (4).

And S104, inputting the test data into the trained deep semantic segmentation network model to obtain a road extraction result of the remote sensing image.

Considering that the size of a remote sensing image is generally large, before test data is input into a trained deep semantic segmentation network model, the input picture of the test data is cut in a band overlapping mode, data enhancement is carried out on a small image after cutting, the test data after data enhancement is input into the trained deep semantic segmentation network model for road feature extraction and segmentation, a plurality of small image segmentation results are obtained, and then the small images are subjected to average fusion, so that a road extraction result of the remote sensing image is obtained.

During testing, the method for directly cutting the image and directly inputting the cut small image data into the trained model for road extraction has a better prediction effect, and the accuracy of road extraction is higher.

As an embodiment of the invention, data enhancement in the testing stage specifically comprises horizontal and vertical folding of the image, image rotation at a certain angle and the like, so that the predicted image types are richer, and the prediction effect is improved.

In a preferred embodiment of the invention, the road extraction result of the remote sensing image is post-processed.

Fig. 4 is a schematic diagram illustrating a morphological processing method in the field of digital image processing according to an embodiment of the present invention.

And in the post-processing process, optimization processing is carried out by a closed operation mathematical morphology method of firstly expanding and then corroding, so that a road extraction result of the optimized remote sensing image is obtained.

The basic operations of mathematical morphology include: erosion and dilation, open and close operations.

As an embodiment of the present invention, the present invention uses a closed operation when performing prediction result post-processing. The closed operation is an operation of expanding the image first and then corroding the image, and a schematic diagram of the closed operation is shown in fig. 4. The closed operation can be used to fill small holes in objects, connect neighboring objects, smooth their boundaries, and do not significantly change their area. Considering that the connectivity problem of road identification is more prominent in road extraction and is more broken roads, in order to reduce the broken road condition as much as possible and combine the characteristic of closed operation, the invention adopts the morphological optimization mode of closed operation to solve the problem.

Fig. 5 is a graph showing a comparison of experimental segmentation results before and after using a morphological processing method according to an embodiment of the present invention. Fig. 5(a) is a graph showing the result of the experimental segmentation before the morphological processing method is used, and fig. 5 (b) is a graph showing the result of the experimental segmentation after the morphological processing method is used. Therefore, the method can effectively improve the connectivity problem of road identification and ensure that the connectivity of the road identification is better.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules illustrated are not necessarily required to practice the invention.

The above is a description of an embodiment of the method, and the following is a further description of the solution of the present invention through an embodiment of the electronic device implementing the above method of the present invention.

As shown, an electronic device comprises a memory having stored thereon a computer program and a processor implementing the method of fig. 1 when executing the program.

The device 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM)602 or loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the device 600 can also be stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processing unit 601 executes the respective methods and processes described above, for example, the methods S101 to S104. For example, in some embodiments, methods S101-S104 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM602 and/or the communication unit 609. When the computer program is loaded into the RAM603 and executed by the CPU601, one or more steps of the methods S101-S104 described above may be performed. Alternatively, in other embodiments, the CPU601 may be configured to perform the methods S101-S104 by any other suitable means (e.g., by means of firmware).

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.

Program code for implementing the methods of the present invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the invention. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A remote sensing image automatic road extraction method based on a deep convolutional neural network is characterized by comprising the following steps:

collecting remote sensing image data and constructing a training data set;

2. The method according to claim 1, wherein before constructing the training data set, data enhancement is performed on the acquired remote sensing image data to obtain data-enhanced remote sensing image data as the training data set.

3. The method according to claim 1, wherein the construction process of the deep semantic segmentation network model of the encoder-decoder mode comprises:

The process of constructing the encoder includes:

4. The method of claim 3,

the first decoder construction process comprises:

the second decoder construction process comprises:

5. The method of claim 4, wherein in the first decoder construction process, prior to the concatenation, a 1 x 1 convolution is used for channel dimensionality reduction on the first up-sampled decoder first input and the decoder second input;

6. The method of claim 3, wherein the set of hole convolution modules comprises a plurality of branches, each branch comprising a hole convolution filter having a convolution kernel of the same size and a different hole rate.

7. The method according to claim 1, wherein before the test data is input into the trained deep semantic segmentation network model, the input picture of the test data is cut with overlap, the cut small images are subjected to data enhancement, the test data after the data enhancement is input into the trained deep semantic segmentation network model for road feature extraction and segmentation, a plurality of small image segmentation results are obtained, and then the small images are subjected to average fusion, so that the road extraction result of the remote sensing image is obtained.

8. The method according to claim 1, characterized in that the road extraction result of the remote sensing image is post-processed, i.e. optimized by a closed-operation mathematical morphology method of expansion followed by corrosion, to obtain the optimized road extraction result of the remote sensing image.

9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the program, implements the method of any of claims 1-8.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 8.