CN113792742A

CN113792742A - Semantic segmentation method of remote sensing image and training method of semantic segmentation model

Info

Publication number: CN113792742A
Application number: CN202111094429.8A
Authority: CN
Inventors: 金博夫
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2021-12-14

Abstract

The disclosure at least provides a semantic segmentation method of a remote sensing image and a training method of a semantic segmentation model, and relates to the fields of remote sensing image processing, computer vision, big data, artificial intelligence, deep learning and the like. The specific implementation scheme comprises the following steps: acquiring a target remote sensing image; performing semantic segmentation on the target remote sensing image by using different semantic segmentation models respectively to obtain predicted images corresponding to the semantic segmentation models respectively; and determining the semantic segmentation result of the target remote sensing image according to the voting result of each pixel point on each predicted image. The technical scheme disclosed by the invention can provide a semantic segmentation algorithm for various remote sensing images by utilizing the difference of the models, and effectively improves the recognition accuracy and precision of the semantic segmentation of the remote sensing images.

Description

Semantic segmentation method of remote sensing image and training method of semantic segmentation model

Technical Field

The present disclosure relates to the field of image technologies, and in particular, to the fields of remote sensing image processing, computer vision, big data, artificial intelligence, deep learning, and the like, and in particular, to a method, an apparatus, a device, a system, a storage medium, and a computer program product for semantic segmentation of a remote sensing image, and a method for training a semantic segmentation model.

Background

China is currently in the accelerated transformation period from traditional agriculture to modern agriculture, and the difference of regional plots is large and the planting structure is complex. The remote sensing image data of crops with large scales are obtained by remote sensing measurement of an observation satellite and an unmanned aerial vehicle, the crops are classified through an artificial intelligence algorithm for remote sensing image segmentation, types of related crops, buildings and the like are identified, the accuracy of crop identification is improved, dependence on artificial field investigation can be reduced, and the crop identification efficiency and the agricultural management capacity are improved. The semantic segmentation of the images has the characteristics of imbalance among categories, more noise, similarity among different categories, weaker scene generalization capability and the like, has higher segmentation difficulty, and causes great difficulty for technicians.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, system, storage medium and computer program product for semantic segmentation of remote sensing images, training method, apparatus, device, system, storage medium and computer program product for semantic segmentation model.

According to a first aspect of the present disclosure, a semantic segmentation method for a remote sensing image is provided, which includes:

acquiring a target remote sensing image;

performing semantic segmentation on the target remote sensing image by using different semantic segmentation models respectively to obtain predicted images corresponding to the semantic segmentation models respectively;

and determining the semantic segmentation result of the target remote sensing image according to the voting result of each pixel point on each predicted image.

According to a second aspect of the present disclosure, there is provided a training method of a semantic segmentation model, including:

obtaining a sample remote sensing image;

respectively carrying out segmentation processing on the sample remote sensing image according to different size standards to obtain sample sub-image sets respectively corresponding to the size standards;

and respectively training different initial networks by utilizing the sample sub-images in each sample sub-image set to obtain a plurality of semantic segmentation models, wherein the plurality of semantic segmentation models are used for performing semantic segmentation on the same target remote sensing image.

According to a third aspect of the present disclosure, there is provided a semantic segmentation apparatus for a remote sensing image, including:

the target remote sensing image acquisition module is used for acquiring a target remote sensing image;

the prediction module is used for performing semantic segmentation on the target remote sensing image by utilizing different semantic segmentation models respectively to obtain predicted images corresponding to the semantic segmentation models respectively;

and the voting module is used for determining the semantic segmentation result of the target remote sensing image according to the voting result of each pixel point on each predicted image.

According to a fourth aspect of the present disclosure, there is provided a training apparatus for a semantic segmentation model, including:

the sample remote sensing image module is used for acquiring a sample remote sensing image;

the segmentation module is used for respectively carrying out segmentation processing on the sample remote sensing image according to different size standards so as to obtain sample sub-image sets respectively corresponding to the size standards;

and the training module is used for respectively training different initial networks by utilizing the sample sub-images in each sample sub-image set so as to obtain a plurality of semantic segmentation models, wherein the plurality of semantic segmentation models are used for performing semantic segmentation on the same target remote sensing image.

According to a fifth aspect of the present disclosure, a semantic segmentation system for a remote sensing image is provided, which includes a semantic segmentation device for a remote sensing image provided in any embodiment of the present disclosure and a training device for a semantic segmentation model provided in any embodiment of the present disclosure.

According to a sixth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method provided by any of the embodiments of the present disclosure.

According to a seventh aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided by any of the embodiments of the present disclosure.

According to an eighth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method provided by any of the embodiments of the present disclosure.

According to the technical scheme of the embodiment of the invention, the semantic segmentation algorithm of various remote sensing images can be provided by utilizing the difference of the models, and the recognition accuracy and precision of the semantic segmentation of the remote sensing images are effectively improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method of semantic segmentation of remote sensing images according to an embodiment of the present disclosure;

FIG. 2 is a diagram of one example of an application according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram of a method of training a semantic segmentation model according to an embodiment of the present disclosure;

FIG. 4 is a block diagram of a semantic segmentation apparatus for remote sensing images according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a training apparatus for a semantic segmentation model according to an embodiment of the present disclosure;

FIG. 6 is a diagram of an example of an application of a semantic segmentation system for remote sensing maps according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

At present, the main focus of semantic segmentation of remote sensing images lies in optimization of an algorithm model, mainly an optimization single model, but the optimization is complicated and complicated due to neglect of reality, and the problems of high noise, imbalance among different categories and poor generalization capability are solved. The disclosure aims to provide a training method of a semantic segmentation model, a semantic segmentation method and a semantic segmentation system of a remote sensing image so as to improve the stability and generalization capability of the semantic segmentation of the remote sensing image.

FIG. 1 shows a flow diagram of a method of training a semantic segmentation model according to an embodiment of the present disclosure. As shown in fig. 1, the training method includes:

step S101: obtaining a sample remote sensing image;

step S102: respectively carrying out segmentation processing on the sample remote sensing image according to different size standards to obtain sample sub-image sets respectively corresponding to the size standards;

step S103: and respectively training different initial networks by utilizing the sample sub-images in each sample sub-image set to obtain a plurality of semantic segmentation models, wherein the plurality of semantic segmentation models are used for performing semantic segmentation on the same target remote sensing image.

In the embodiment of the disclosure, the remote sensing image (sample remote sensing image or target remote sensing image) can be acquired by remote sensing satellite or unmanned aerial vehicle remote sensing measurement and other ways.

The resolution of the sample remote sensing image is very large, usually over ten thousands, so that the system needs to segment the input sample remote sensing image, thereby improving the prediction speed of the model and reducing the memory. In addition, since the sizes of the sample remote sensing images are different, the segmentation process can make the sizes of the images input to the same initial network uniform.

Exemplarily, for the same sample remote sensing image, a first size standard is adopted for segmentation processing to obtain a plurality of sample sub-images with the first size standard, so as to obtain a first sample sub-image set, wherein the first sample sub-images are used for training a first initial network, so as to obtain a first semantic segmentation model; performing segmentation processing by adopting a second size standard to obtain a plurality of sample sub-images with the second size standard so as to obtain a second sample sub-image set, wherein the second sample sub-images are used for training a second initial network so as to obtain a second semantic segmentation model; performing segmentation processing by using a third size standard to obtain a plurality of sample sub-images with the third size standard, so as to obtain a third sample sub-image set, wherein the third sample sub-images are used for training a third initial network, so as to obtain a third semantic segmentation model … …

In step S102, a sliding window may be used for the segmentation process of the sample remote sensing image. Illustratively, the sliding window sizes can be set to be 1024 × 1024, 512 × 512 and the like according to the configuration of the machine, so that the segmentation processing of different size standards on the sample remote sensing image is realized. For example: setting the size of a sliding window to be 1024 × 1024, and performing segmentation processing on the sample remote sensing image to obtain a plurality of 1024 × 1024 sample sub-images, wherein the plurality of 1024 × 1024 sample sub-images form a sample sub-image set with the size standard of 1024 × 1024. Further, the size of the sliding window is set to be 512 × 512, the remote sensing sample image is subjected to segmentation processing, so that a plurality of 512 × 512 sample sub-images are obtained, and the plurality of 512 × 512 sample sub-images form a sample sub-image set with the size standard of 512 × 512.

To equalize the positive and negative categories, regions with an image null ratio greater than a first threshold (e.g., 6/7) may be directly filtered during the segmentation process; when the background class occupancy of the image is less than a second threshold (e.g., 1/3), the sliding window step size is shortened and the sampling rate is increased.

Further, the system may perform enhancement processing on the sample sub-images, such as sequential segmentation, random segmentation, blurring operation, illumination adjustment operation, noise increase (gaussian noise, salt and pepper noise), and the like. For the sample sub-images and the corresponding label maps thereof, rotation operations or mirror operations of different angles can be performed. Illustratively, the enhancement process may employ an open source grid space Data transformation Library (GDAL), and the enhancement function may also be written as needed.

Illustratively, the initial network may include a SegNet, U-Net, DeeplabV3+, etc. network. For example: as shown in fig. 2, the first initial network may be SegNet, and is trained by using sample sub-images with 512 × 512 size standards, so as to obtain a first semantic segmentation model; the second initial network can be U-Net, and sample sub-images with standard sizes of 1024 by 1024 are adopted for training, so that a second semantic segmentation model is obtained; the third initial network may be a deplab v3+, and sample sub-images with 512 × 1024 size criteria are used for training, so as to obtain a third semantic segmentation model.

In the embodiment, when SegNet is used for semantic segmentation, a Conditional Random Fields (CRF) module is added at the end of SegNet for post-processing, so that the segmentation result of the picture edge can be improved. The U-Net adopts a full convolution neural network, combines high-level features and low-level features together by down-sampling and up-sampling, has the advantages of good performance on a small data set, high training speed and capability of obtaining a better result in a short time. The deepbabv 3+ also performs multi-scale information fusion through encoding-decoding, and in this embodiment, the backbone network of the deepbabv 3+ may adopt Xception-65, ResNet-101, DenseNet-121, and other models to improve the robustness and the running speed of semantic segmentation.

Illustratively, the hyper-parameters of each initial network may be set, for example, the number of samples selected for one training (Batch Size) is set to 18, the duration (epoch) is set to 20, and so on. During training, the model parameters with good effect groups in the training process can be reserved, and a loss rate/accuracy rate curve is drawn.

In one embodiment, step S103 may include: for any initial network, in the training process of the initial network, the network parameters of the initial network are adjusted by using the cosine annealing learning rate.

Compared with the conventional method of fixing the learning rate (learning rate) to a value, for example, setting the learning rate to values of 0.001, 0.0001, 0.00001, etc. within a certain range, the present embodiment adopts the cosine annealing learning rate, adjusts the learning rate to rapidly decrease and then increase, and continuously repeats the process as the epoch increases, so as to avoid the model falling into the local optimal solution.

For the multiple semantic segmentation models obtained after training, different semantic segmentation models are used for performing semantic segmentation on the same target remote sensing image, so that multiple predicted images are obtained, and voting decision is performed on the multiple predicted images to obtain a final semantic segmentation result. According to the technical scheme of the embodiment of the disclosure, semantic segmentation algorithms of various remote sensing images can be provided based on differences among different models, so that the recognition accuracy and precision of the semantic segmentation of the remote sensing images can be effectively improved, and the stability and generalization capability can be improved.

FIG. 3 illustrates a method of semantic segmentation of a remote sensing image according to an embodiment of the disclosure. As shown in fig. 3, the method includes:

step S301: acquiring a target remote sensing image;

step S302: performing semantic segmentation on the target remote sensing image by using different semantic segmentation models respectively to obtain predicted images corresponding to the semantic segmentation models respectively;

step S303: and determining the semantic segmentation result of the target remote sensing image according to the voting result of each pixel point on each predicted image.

Different semantic segmentation models are used for performing semantic segmentation on the same target remote sensing image so as to obtain a plurality of predicted images, and voting decision is performed on the plurality of predicted images so as to obtain a final semantic segmentation result.

The target remote sensing image can be a remote sensing image in a test set, which corresponds to a semantic segmentation label, or a remote sensing image provided by a user, and a semantic segmentation result needs to be predicted according to the method of the embodiment of the disclosure.

Illustratively, the difference of the semantic segmentation model is represented by: the input models differ in image size criteria and in network structure. For example: the different semantic segmentation models comprise the first semantic segmentation model, the second semantic segmentation model and the third semantic segmentation model in the training process.

In one embodiment, step S302 may include: for any semantic segmentation model, according to the size standard of the semantic segmentation model, segmenting a target remote sensing image into a plurality of target sub-images, respectively inputting the plurality of target sub-images into the semantic segmentation model according to a preset sequence to obtain prediction sub-images respectively corresponding to the plurality of target sub-images, and splicing the prediction sub-images according to the preset sequence to obtain a prediction image corresponding to the semantic segmentation model.

Any semantic segmentation model can be a first semantic segmentation model, a second semantic segmentation model or a third semantic segmentation model. The following is exemplified with a first semantic segmentation model.

Illustratively, a padding (padding)0 operation can be firstly performed on the target remote sensing image, and a full 0 image backup file (bak) with the same size as the target remote sensing image is produced at the same time. The size standard of the first semantic segmentation model is 512 x 512, so that the target remote sensing image is filled up to be a multiple of 512, and then the target remote sensing image is segmented into a plurality of target sub-images with the sizes of 512 x 512 by step length 512; then sequentially sending the target sub-images obtained by segmentation into a first semantic segmentation model according to a preset sequence for prediction to obtain a plurality of predicted sub-images; then, placing a plurality of prediction sub-images on corresponding positions of the image bak according to a preset sequence, thereby splicing into a large image; and finally, cutting the large image to the size same as that of the target remote sensing image to obtain a predicted image of the first semantic segmentation model to the target remote sensing image.

The system can improve the prediction speed of the model and reduce the memory by segmenting the input target remote sensing image.

In one embodiment, segmenting the target remote sensing image into a plurality of target sub-images according to the size standard of the semantic segmentation model may include: setting a sliding window according to the size standard of the semantic segmentation model; and adopting a sliding window step length larger than the size of the sliding window to segment the target remote sensing image.

The target remote sensing image is segmented by adopting the overlapping sliding window mode, the edge of the image which is not predicted accurately can be abandoned in the prediction process, and only the central area of the prediction result is reserved, so that the accuracy and precision of the prediction are improved.

In one embodiment, the inputting the plurality of target sub-images into the semantic segmentation model respectively according to a preset order to obtain prediction sub-images corresponding to the plurality of target sub-images respectively includes: and for any target sub-image, rotating the target sub-image by different angles to obtain a plurality of sub-images to be detected, respectively inputting the plurality of sub-images to be detected into the semantic segmentation model to obtain a plurality of sub-images to be selected, and determining a prediction sub-image corresponding to the target sub-image from the plurality of sub-images to be selected.

The test enhancement mode can carry out multiple predictions, and then the sub-images to be selected are averaged to obtain the predicted sub-images, so that the accuracy can be improved. Since the prediction time will become long, this approach can be configured according to the time requirements and the actual needs.

In one embodiment, stitching the prediction sub-images may include: and carrying out image post-processing for filling holes and/or removing small connected domains.

In one example, in a sample sub-image set, such as a rubble sample sub-image or a tree sample sub-image in a field, its corresponding label map is still the field. In the prediction process, for target sub-images such as rubbles or trees, the prediction sub-images output by the semantic segmentation model can normally predict the situations, so that holes can appear in the spliced prediction images (large images), and the unreasonable results can be filled by field images in the hole filling image post-processing mode, so that the prediction accuracy is improved.

In another example, in the stitching process, some small connected domains with areas smaller than a threshold may appear in a preset range around the stitched image, and the image post-processing for removing the small connected domains can delete the small connected domains, so that some unreasonable results are removed, and the prediction accuracy is improved.

In one embodiment, step S302 may include: and under the condition that the semantic segmentation model is a two-classification model, aiming at the classification task of each class, performing one-round prediction on the target remote sensing image by using the two-classification model to obtain classification images respectively corresponding to the classes, and merging and overlapping the classification images to obtain a predicted image corresponding to the semantic segmentation model.

The method of the disclosed embodiments can be used for multi-classification tasks. Illustratively, the multi-classification task may be a five-classification task, the semantic segmentation model may be a two-classification model, and for each classification task, the two-classification model performs individual prediction, and uses corresponding two-classification labels to further obtain five classification images, and the five classification images are merged and superimposed to obtain one finished five-classification predicted image, that is, the five-classification predicted image of the semantic segmentation model (the two-classification model).

For the multi-classification task, the accuracy of the multi-round prediction based on the two-classification model is higher than that of the prediction directly using the multi-classification model, so that the prediction accuracy can be improved.

Of course, under the condition of time saving, a multi-classification model may be used to perform a round of prediction to directly obtain a multi-classified predicted image, which is not limited in the embodiments of the present disclosure.

Further, in step S303, voting is performed on each pixel point on the predicted image output by each semantic segmentation model, and the obtained multiple categories are the categories of the pixel point. The mode of integrating learning and model fusion can improve some obvious error prediction points.

Exemplarily, as shown in fig. 2, after the target remote sensing image is predicted by the first semantic division model, the second semantic division model and the third semantic division model, the first predicted image, the second predicted image and the third predicted image are respectively obtained, the category variables are marked by the encoding method of the thermal encoding (onehot), and the semantic division result (image) of the target remote sensing image is obtained by the independent variable function (argmax).

Fig. 4 shows a block diagram of a semantic segmentation apparatus for remote sensing images according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus includes:

a target remote sensing image obtaining module 401, configured to obtain a target remote sensing image;

the prediction module 402 is configured to perform semantic segmentation on the target remote sensing image by using different semantic segmentation models respectively to obtain predicted images corresponding to the semantic segmentation models respectively;

and the voting module 403 is configured to determine a semantic segmentation result of the target remote sensing image according to a voting result of each pixel point on each predicted image.

In one embodiment, the prediction module 402 is specifically configured to:

for any semantic segmentation model, according to the size standard of the semantic segmentation model, segmenting a target remote sensing image into a plurality of target sub-images, respectively inputting the plurality of target sub-images into the semantic segmentation model according to a preset sequence to obtain prediction sub-images respectively corresponding to the plurality of target sub-images, and splicing the prediction sub-images according to the preset sequence to obtain a prediction image corresponding to the semantic segmentation model.

In one embodiment, the prediction module 402 is specifically configured to:

setting a sliding window according to the size standard of the semantic segmentation model;

and adopting a sliding window step length larger than the size of the sliding window to segment the target remote sensing image.

In one embodiment, the prediction module 402 is specifically configured to:

and for any target sub-image, rotating the target sub-image by different angles to obtain a plurality of sub-images to be detected, respectively inputting the plurality of sub-images to be detected into the semantic segmentation model to obtain a plurality of sub-images to be selected, and determining a prediction sub-image corresponding to the target sub-image from the plurality of sub-images to be selected.

In one embodiment, the prediction module 402 is specifically configured to: and carrying out image post-processing for filling holes and/or removing small connected domains.

The image post-processing for filling holes and/or removing small connected domains is specifically used for:

and under the condition that the semantic segmentation model is a two-classification model, aiming at the classification task of each class, performing one-round prediction on the target remote sensing image by using the two-classification model to obtain classification images respectively corresponding to the classes, and merging and overlapping the classification images to obtain a predicted image corresponding to the semantic segmentation model.

Fig. 5 shows a block diagram of a training apparatus of a semantic segmentation model according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus includes:

a sample remote sensing image module 501, configured to obtain a sample remote sensing image;

a segmentation module 502, configured to perform segmentation processing on the sample remote sensing image according to different size standards, so as to obtain sample sub-image sets corresponding to the size standards respectively;

the training module 503 is configured to train different initial networks respectively by using the sample sub-images in each sample sub-image set to obtain a plurality of semantic segmentation models, where the plurality of semantic segmentation models are used to perform semantic segmentation on the same target remote sensing image.

In one embodiment, the training module 503 is specifically configured to:

for any initial network, in the training process of the initial network, the network parameters of the initial network are adjusted by using the cosine annealing learning rate.

The embodiment of the disclosure also provides a semantic segmentation system of the remote sensing image, which comprises the semantic segmentation device of the remote sensing image of any embodiment and the training device of any embodiment.

In one application example, as shown in fig. 6, the system may include a data processing unit, a core algorithm unit, and an ensemble learning unit. Wherein, the data processing unit can comprise a data preprocessing subunit and a data enhancement subunit. The data preprocessing subunit is configured to perform the image segmentation processing in the above-described embodiment, and the data enhancement subunit is configured to perform the enhancement processing on the sample sub-image in the above-described embodiment. The core algorithm unit includes a training subunit and a prediction subunit, which are respectively used for executing the training process and the prediction process in the above embodiments. The ensemble learning unit is configured to perform the voting process in the above-described embodiment.

The functions of each module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic apparatus can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A plurality of components in the electronic device are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 performs the respective methods and processes described above. For example, in some embodiments, the various methods described above may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM 702 and/or the communication unit 709. When loaded into RAM 703 and executed by the computing unit 701, may perform one or more steps of the respective methods described above. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g., by means of firmware) to perform the various methods described above.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a lane ball) through which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A semantic segmentation method for remote sensing images comprises the following steps:

acquiring a target remote sensing image;

2. The method according to claim 1, wherein the semantic segmentation of the target remote sensing image by using different semantic segmentation models respectively to obtain a predicted image corresponding to each semantic segmentation model respectively comprises:

for any semantic segmentation model, segmenting the target remote sensing image into a plurality of target sub-images according to the size standard of the semantic segmentation model, respectively inputting the plurality of target sub-images into the semantic segmentation model according to a preset sequence to obtain prediction sub-images respectively corresponding to the plurality of target sub-images, and splicing the prediction sub-images according to the preset sequence to obtain a prediction image corresponding to the semantic segmentation model.

3. The method of claim 2, wherein segmenting the target remote sensing image into a plurality of target sub-images according to a size criterion of the semantic segmentation model comprises:

4. The method of claim 2, wherein the step of inputting the plurality of target sub-images into the semantic segmentation model respectively in a preset order to obtain prediction sub-images corresponding to the plurality of target sub-images respectively comprises:

5. The method of claim 2, wherein stitching prediction sub-images comprises:

and carrying out image post-processing for filling holes and/or removing small connected domains.

6. The method according to claim 1, wherein the semantic segmentation of the target remote sensing image by using different semantic segmentation models respectively to obtain a predicted image corresponding to each semantic segmentation model respectively comprises:

7. A training method of a semantic segmentation model comprises the following steps:

obtaining a sample remote sensing image;

8. The training method of claim 7, wherein training different initial networks using the sample subimages in each sample subimage set comprises:

9. A semantic segmentation device for remote sensing images comprises:

10. The apparatus of claim 9, wherein the prediction module is specifically configured to:

11. The apparatus of claim 10, wherein the prediction module is specifically configured to:

12. The apparatus of claim 10, wherein the prediction module is specifically configured to:

13. The apparatus of claim 10, wherein the prediction module is specifically configured to:

14. The apparatus of claim 9, wherein the prediction module is specifically configured to:

15. A training apparatus for a semantic segmentation model, comprising:

the segmentation module is used for respectively segmenting the sample remote sensing image according to different size standards to obtain sample sub-image sets respectively corresponding to the size standards;

16. The training apparatus of claim 15, wherein the training module is specifically configured to:

17. A semantic segmentation system for remote sensing images, comprising a semantic segmentation device for remote sensing images according to any one of claims 9 to 14 and a training device for semantic segmentation models according to claim 15 or 16.

18. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.

19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 8.

20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.