CN112347976A

CN112347976A - Region extraction method and device for remote sensing satellite image, electronic equipment and medium

Info

Publication number: CN112347976A
Application number: CN202011318422.5A
Authority: CN
Inventors: 刘晓; 梅树起
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-02-09
Anticipated expiration: 2040-11-23
Also published as: CN112347976B

Abstract

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence, and provides a method, a device, electronic equipment and a medium for extracting a region of a remote sensing satellite image, which are used for improving the accuracy and precision of extracting a target region in the remote sensing satellite image. The method comprises the following steps: extracting an image to be detected in a remote sensing satellite image of a region to be detected; coding an image to be detected to obtain a coding feature map; analyzing the coding characteristic diagram by adopting different receptive fields to obtain different target characteristic diagrams; performing feature fusion based on each target feature map, and predicting the category information of each pixel in the image to be measured according to the fusion result; and extracting a target area in the image to be detected based on the category information of each pixel. According to the method, the coding characteristic diagram is analyzed by adopting different receptive fields, the detail information is difficult to weaken, the characteristic part of excessive detail is ignored, the accuracy and precision of extracting the target area in the remote sensing satellite image can be effectively improved, and the map manufactured based on the method is more accurate.

Description

Region extraction method and device for remote sensing satellite image, electronic equipment and medium

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence, and provides a method and a device for extracting a region of a remote sensing satellite image, electronic equipment and a medium.

Background

Currently, feature extraction for relevant scenes in remote sensing satellite images, such as buildings, greenbelts and water systems, is mainly based on a semantic segmentation technology in a deep learning technology. Taking a green space region extraction scene as an example, in the scene, since base maps of green spaces, vacant spaces, roads, buildings and the like are independent and associated with each other, if there are a plurality of categories of learning, for example, in green space extraction, two categories are classified: in greenbelts and open grounds, the situation that the confusion of the boundary is too much can occur; in addition, the extracted boundary is not smooth enough, the whole is relatively fine, and the expected effects of smooth edge and complete individual cannot be achieved. That is, the semantic segmentation technique in the related art is low in both accuracy and precision of feature extraction.

In addition, the feature extraction of target areas such as urban green land areas of remote sensing satellite images is a problem which needs to be solved automatically or semi-automatically in the current map background production. Therefore, how to extract a correct and concise target region on a remote sensing satellite is a problem to be solved by current feature extraction.

Disclosure of Invention

The embodiment of the application provides a method and a device for extracting a region of a remote sensing satellite image, electronic equipment and a medium, which are used for improving the accuracy and precision of extracting a target region in the remote sensing satellite image.

The method for extracting the region of the remote sensing satellite image comprises the following steps:

acquiring a remote sensing satellite image of a region to be detected, and extracting an image to be detected in the remote sensing satellite image;

coding the image to be detected to obtain a coding feature map;

analyzing the coding characteristic diagram by adopting different receptive fields to obtain different target characteristic diagrams;

performing feature fusion based on each target feature map, and predicting to obtain category information of each pixel in the image to be detected according to a fusion result;

and extracting a target area in the image to be detected based on the category information of each pixel.

The regional extraction element of remote sensing satellite image that this application embodiment provided includes:

the image acquisition unit is used for acquiring a remote sensing satellite image of a region to be detected and extracting an image to be detected in the remote sensing satellite image;

the coding unit is used for coding the image to be detected to obtain a coding feature map;

the decoding unit is used for analyzing the coding characteristic diagram by adopting different receptive fields to acquire different target characteristic diagrams; performing feature fusion based on each target feature map, and predicting to obtain category information of each pixel in the image to be detected according to a fusion result;

and the region extraction unit is used for extracting a target region in the image to be detected based on the category information of each pixel.

Optionally, the apparatus further comprises:

the training unit is used for acquiring a training sample data set from the sample remote sensing satellite image;

performing multiple rounds of iterative training on the untrained area extraction network model according to the training samples in the training sample data set to obtain a trained area extraction network model; wherein, each round of iterative training executes the following processes:

selecting at least one training sample from the training sample data set, inputting a sample image contained in the training sample into an untrained area extraction network model aiming at any training sample, and obtaining the category information of each pixel in the sample image output by the untrained area extraction network model;

and adjusting the network parameters of the untrained area extraction network model according to the difference between the category information of each pixel in the sample image and the label marked on the sample image.

An electronic device provided by an embodiment of the present application includes a processor and a memory, where the memory stores a program code, and when the program code is executed by the processor, the processor is enabled to execute any one of the above steps of the method for extracting a region of a remote sensing satellite image.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to enable the computer device to execute the steps of any one of the above-mentioned methods for extracting the region of the remote sensing satellite image.

An embodiment of the present application provides a computer-readable storage medium, which includes program code, when the program product runs on an electronic device, the program code is configured to enable the electronic device to execute any one of the steps of the method for extracting a region from a remote sensing satellite image.

The beneficial effect of this application is as follows:

the embodiment of the application provides a method and a device for extracting a region of a remote sensing satellite image, electronic equipment and a medium. After the encoding is completed in the encoding stage, the decoding stage needs to perform decoding operation on the result of the encoding stage, and because different receptive fields are adopted to analyze the encoding characteristic diagram in the decoding process, different receptive fields are called to analyze the encoding characteristic diagram in the process of performing pixel-by-pixel classification on the encoding characteristic diagram, and information of the receptive fields is encoded and decoded, the information which is difficult to be detailed is weakened, the characteristic part which is too detailed is ignored, the final pixel category information is obtained based on the prediction, and the extraction accuracy and precision of the target area in the remote sensing satellite image are improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is an alternative diagram of a semantic segmentation model in the related art;

fig. 2 is an alternative schematic diagram of an application scenario in an embodiment of the present application;

fig. 3 is a schematic flowchart of a method for extracting a region of a remote sensing satellite image according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an area extraction network model in an embodiment of the present application;

FIG. 5 is an alternative schematic diagram of a scanning subnetwork in the embodiment of the present application;

fig. 6 is a schematic flowchart of a training method for a local extraction network model in an embodiment of the present application;

fig. 7 is a timing diagram illustrating an implementation of a region extraction complete method for a remote sensing satellite image according to an embodiment of the present application;

fig. 8 is a schematic structural diagram illustrating a composition of a region extraction device for remote sensing satellite images according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram of a hardware component structure of an electronic device to which an embodiment of the present application is applied.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.

Some concepts related to the embodiments of the present application are described below.

Receptive field: refers to the input area "seen" by the neurons in the neural network, and in the convolutional neural network, the computation of a certain element on the feature map is affected by a certain area on the input image, i.e., the receptive field of the element. In the embodiment of the present application, when the sizes of convolution kernels related to scanning subnetworks are different, a method for extracting features based on different receptive fields can be implemented, so as to obtain different receptive field information, that is, features of different dimensions.

Remote sensing satellite images: it is meant to record films or photos or videos of electromagnetic wave size of various ground features. The ground object is a general term referring to various objects (such as mountains, forests, buildings and the like) and non-objects (such as provinces, counties and the like) on the ground. Generally refers to a relatively stationary object on the surface of the earth.

Category information: in the embodiment of the application, the process of extracting the region of the remote sensing satellite image actually belongs to the process of classifying each pixel in the image to be measured. For example, in green extraction, two categories are classified: the greenfield and the open space, that is, each pixel in the image to be measured are classified into two categories, and the category information indicates whether a certain pixel belongs to the greenfield, that is, the category to which the target region belongs. Similarly, the same is true for multi-classification, and the type information indicates to which class a certain pixel belongs: class A, class B or class C, etc.

And (3) extracting a network model from the region: the neural network model for carrying out region extraction on remote sensing satellite images provided in the embodiment of the application mainly comprises two parts: an encoding portion and a decoding portion. The encoding part is used for performing feature extraction on an input image to obtain a feature map of the image; and the decoding part carries out pixel-by-pixel class marking on the provided feature map so as to complete the segmentation task. Compared with the related art, a scanning module is introduced into the decoding part in the embodiment of the application, and in an optional implementation mode, the decoding part comprises a decoding sub-network and a scanning sub-network, wherein the decoding sub-network is used for decoding the coding feature map and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map; the scanning sub-networks perform feature extraction on the pixel level feature map to obtain a target feature map, convolution kernels adopted when feature extraction is performed on the basis of different scanning sub-networks are different, and different convolution kernels correspond to different receptive fields.

The embodiments of the present application relate to Artificial Intelligence (AI) and machine learning techniques, and are designed based on a computer vision technique and Machine Learning (ML) in the AI.

Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence.

Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology mainly comprises a computer vision technology, a natural language processing technology, machine learning/deep learning and other directions. With the research and progress of artificial intelligence technology, artificial intelligence is researched and applied in a plurality of fields, such as common smart homes, smart customer service, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, robots, smart medical treatment and the like.

Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer.

Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like. When the image is judged whether to have the phenomenon of excessive image repairing, the image with the image repairing degree marked is learned by adopting the region extraction network model based on machine learning or deep learning, so that the image repairing identification result of the multimedia content can be obtained according to the image sequence of the input multimedia content such as the image, the dynamic image, the video and the like.

The region extraction method of the remote sensing satellite image provided by the embodiment of the application can be divided into two parts, including a training part and an application part; in the training part, a regional extraction network model is trained by the machine learning technology, so that after a sample image in a training sample passes through the regional extraction network model, the class information of each pixel in the sample image is obtained, and model parameters are continuously adjusted by an optimization algorithm to obtain a trained regional extraction network model; the application part is used for obtaining the target area in the remote sensing satellite image by using the area extraction network model obtained by training in the training part so as to be used for making a map and the like.

The following briefly introduces the design concept of the embodiments of the present application:

the urban green land is an important infrastructure in cities, is an important component in urban ecological systems, and plays a positive role in improving urban ecological environment and maintaining harmony between people and nature. The ecological effect of the urban green land is closely related to the spatial distribution pattern of the urban green land in the city, and in order to fully play the role of an urban cleaner of the urban green land, the urban green land needs to be reasonably planned, constructed and managed, and the change situation of the urban green land needs to be objectively and accurately mastered.

Currently, feature extraction for relevant scenes (buildings, greenbelts and water systems) in remote sensing satellite images is mainly based on semantic segmentation technology in deep learning technology, wherein most of the semantic segmentation technology adopts a structural diagram as shown in fig. 1 to perform segmentation tasks. As shown in fig. 1, the main process is divided into two processes of encoding (encoding) and decoding (decoding), wherein the encoding part is responsible for extracting the features of the input image to obtain the feature map of the image; and the decoding part carries out pixel-by-pixel class marking on the provided feature map so as to complete the segmentation task.

However, the semantic segmentation technique in the related art is low in both accuracy and precision of feature extraction. Therefore, how to extract a correct and concise target region from the remote sensing satellite is a problem to be solved by the current region extraction.

In view of this, the embodiment of the present application provides a method and an apparatus for extracting a region of a remote sensing satellite image, an electronic device, and a medium. After the encoding is completed in the encoding stage, the decoding stage needs to perform decoding operation on the result of the encoding stage, and because different receptive fields are adopted to analyze the encoding characteristic diagram in the decoding process, different receptive fields are called to analyze the encoding characteristic diagram in the process of performing pixel-by-pixel classification on the encoding characteristic diagram, and information of the receptive fields is encoded and decoded, the information which is difficult to be detailed is weakened, the characteristic part which is too detailed is ignored, the final pixel category information is obtained based on the prediction, and the extraction accuracy and precision of the target area in the remote sensing satellite image are improved.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it should be understood that the preferred embodiments described herein are merely for illustrating and explaining the present application, and are not intended to limit the present application, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Fig. 2 is a schematic view of an application scenario according to an embodiment of the present application. The application scenario diagram includes two terminal devices 210 and a server 220, and the application operation interface 120 can be logged in through the terminal devices 210. The terminal device 210 and the server 220 can communicate with each other through a communication network.

In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 210 and the server 220 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In this embodiment, the terminal device 210 is an electronic device used by a user, and the electronic device may be a computer device having a certain computing capability and running instant messaging software and a website or social contact software and a website, such as a personal computer, a mobile phone, a tablet computer, a notebook, an e-book reader, a smart home, and the like. Each terminal device 210 and the server 220 are connected via a wireless Network, and the server 220 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and an artificial intelligence platform.

The application related to the embodiment of the application can be a software client, a webpage client, an applet client and the like, and the server is an application server corresponding to the software client, the webpage client, the applet client and the like, and the specific type of the client is not limited.

In the embodiment of the present application, the area extraction network model may be deployed on the server 220 for training, a large number of training samples may be stored in the server 220, the training samples include a plurality of sample images extracted based on the sample remote sensing satellite images, and each pixel on the sample images is labeled with a category label for training the area extraction network model. Optionally, after the area extraction network model is obtained based on the training method in the embodiment of the present application through training, the trained area extraction network model may be directly deployed on the server 220 or the terminal device 210. Generally, the area extraction network model is directly deployed on the server 220, and in the embodiment of the present application, the area extraction network model is often used for extracting a target area from a remote sensing satellite image of an area to be detected.

It should be noted that the region extraction method for the remote sensing satellite image provided in the embodiment of the present application can be applied to various application scenarios including image classification tasks related to the remote sensing satellite image. For example, in a green extraction scenario, two categories would be classified: greenery patches and open spaces. Accordingly, the training samples used in different scenarios are different.

In a possible application scenario, the training samples in the present application may be stored by using a cloud storage technology. A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside.

In a possible application scenario, in order to reduce the communication delay, the servers 220 may be deployed in different regions, or in order to balance the load, different servers 220 may respectively serve the regions corresponding to the terminal devices 210. The plurality of servers 220 can share data through a blockchain, and the plurality of servers 220 correspond to a data sharing system formed by the plurality of servers 220. For example, terminal device 210 is located at site a and communicatively coupled to server 220, and terminal device 210 is located at site b and communicatively coupled to other servers 220.

Each server 220 in the data sharing system has a node identifier corresponding to the server 220, and each server 220 in the data sharing system may store node identifiers of other servers 220 in the data sharing system, so that the generated block is broadcast to other servers 220 in the data sharing system according to the node identifiers of other servers 220. Each server 220 may maintain a node identifier list as shown in the following table, and store the name of the server 220 and the node identifier in the node identifier list. The node identifier may be an IP (Internet Protocol) address and any other information that can be used to identify the node, and table 1 only illustrates the IP address as an example.

TABLE 1

Server name	Node identification
		Node 1	119.115.151.174
Node 2	118.116.189.145
		…	…
Node N	119.124.789.258

The method for extracting a target region provided by the exemplary embodiment of the present application is described below with reference to the accompanying drawings in conjunction with the application scenarios described above, and it should be noted that the application scenarios described above are only shown for the convenience of understanding the spirit and principle of the present application, and the embodiments of the present application are not limited in this respect.

In the application, target region extraction is mainly used for image classification tasks related to remote sensing satellite images, namely, an input image to be detected related to the remote sensing satellite images is processed through a trained region extraction network model, category information of each pixel in the image to be detected is determined based on the trained region extraction network model, finally, region division is carried out based on the category information of each pixel, and a target region in the image to be detected is extracted. Therefore, in the application, the region extraction network model is trained firstly, and the trained region extraction network model is applied to extract the target region.

Referring to fig. 3, an implementation flow chart of a method for extracting a region of a remote sensing satellite image according to an embodiment of the present application is shown, and the specific implementation flow of the method is as follows:

s31: acquiring a remote sensing satellite image of a region to be detected, and extracting an image to be detected in the remote sensing satellite image;

the area to be measured may refer to a certain place, a certain city, or the like. In the case where the remote sensing satellite image is a picture, the image to be measured may be one selected from a plurality of images (only one image is not required to be selected). Under the condition that the remote sensing satellite image is a video, frame splitting processing is carried out on the video to obtain a frame-by-frame image, and a target image is selected from the frame-by-frame image to serve as an image to be detected.

The target image may be determined according to actual requirements, for example, when the target area is a green space, the target image is an image including the green space; for example, when the target area is a building, the target image is an image including the building.

S32: coding an image to be detected to obtain a coding feature map;

optionally, when the image to be detected is encoded, the image to be detected may be implemented based on a trained region extraction network model, where the region extraction network model includes two main parts, namely an encoding part and a decoding part. Specifically, the image to be detected is input into the trained area extraction network model, and the image to be detected is encoded based on the encoding part in the area extraction network model to obtain an encoding characteristic diagram.

S33: analyzing the coding characteristic diagram by adopting different receptive fields to obtain different target characteristic diagrams;

s34: performing feature fusion based on each target feature map, and predicting to obtain category information of each pixel in the image to be measured according to a fusion result;

here, steps S33 and S34 are actually processes for decoding the encoded feature map. In this embodiment of the present application, the decoding process may be implemented based on the class of the decoding part of the trained region extraction network model, specifically:

inputting the coding feature map into an area, extracting a decoding part in a network model, analyzing the coding feature map based on different receptive fields in the decoding part, and acquiring different target feature maps; and performing feature fusion based on each target feature map, and predicting the category information of each pixel in the image to be measured according to a fusion result.

S35: and extracting a target area in the image to be detected based on the category information of each pixel.

In the embodiment of the application, after the category information of each pixel is obtained, the pixel of which the category information belongs to the target area can be determined, a final target area is formed, and the target area in the image to be detected is extracted.

For example, taking a green space extraction scene as an example, in the case of binary classification, the target region refers to a green space, and the pixel value of each pixel may be 1 or 0, where a pixel value of 1 indicates that the pixel belongs to the green space, and a pixel value of 0 indicates that the pixel does not belong to the green space. Based on the finally obtained category information of each pixel, a target area in the image to be detected can be determined, and high-precision and high-accuracy target area extraction is realized. The boundary of the target area extracted based on the method is smoother and is not easy to be confused.

The area extraction process of the remote sensing satellite image in the embodiment of the present application will be described in detail below with reference to fig. 4 and 5.

Referring to fig. 4, a regional extraction network model in the embodiment of the present application mainly includes an encoding portion and a decoding portion:

the coding part adopts ResNet (Residual Neural Network) as a main structure, and mainly comprises a convolutional layer, a normalization layer, an activation layer and other structures. The convolutional layer is responsible for extracting features and is the main functional layer. The embodiment of the application adopts a cross-layer connection structure, and mainly uses C3, C4 and C5 in a ResNet structure as a cross-layer connection input part so as to strengthen detailed information in a decoding process. The Input (Input) of the encoding section is the image to be measured.

As shown in fig. 4, the device includes 5 feature layers, which are C1, C2, C3, C4, and C5. Wherein the feature maps of the several feature layers are gradually reduced in size, and high-level information is more and more abstract.

Specifically, the feature map obtained by simply filtering the image to be detected through the C1 feature layer is substantially the same as the original image, the process is simply filtering, the previous result is filtered again through the C2 feature layer, and after the size is reduced, the obtained result is closer to the previous result, but the detail information is less; by analogy, the abstraction degree is higher and higher from the primary coding feature diagram of the C3 feature layer filtering result to the middle-level coding feature diagram of the C4 feature layer filtering result to the final-level coding feature diagram of the C5 feature layer filtering result, wherein the final-level coding feature diagram is most abstract and has the smallest size, and the final-level coding feature diagram is basically invisible from the original drawing. And the primary coding feature map and the final coding feature map also retain some more detailed information, such as inflection points and some small region range.

After the encoding is completed in the encoding stage, the decoding stage needs to perform decoding operation on the result of the encoding stage, and mainly relates to the processing of a primary encoding characteristic diagram, a middle-level encoding characteristic diagram and a final-level encoding characteristic diagram.

Optionally, the decoding part in the area extraction network model in the embodiment of the present application includes a decoding subnetwork and at least two scanning subnetworks. The decoding sub-network is used for decoding the coding feature map and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map; the scanning sub-networks perform feature extraction on the pixel level feature map to obtain a target feature map, convolution kernels adopted when feature extraction is performed on the basis of different scanning sub-networks are different, and different convolution kernels correspond to different receptive fields.

In the embodiment of the present application, the decoding portion mainly includes three scanning subnets for illustration. As shown in fig. 4, three scanning subnetworks, SCAN1, SCAN2, and SCAN3, are included.

The decoding part is mainly used for restoring the characteristics obtained by the coding part to a pixel level and classifying the characteristics. Each upsampling layer consists of deconvolution and convolution. The upsampling layers shown in fig. 4 and 5 include U45, USCD, and the like.

In the embodiment of the application, the feature maps of the S5, S4, S3, S45, S345 and the SSCAN feature layer are obtained by performing convolution and normalization processing on the feature maps of the C5, C4, C3, M45, M345 and the MSCAN feature layer, respectively.

Specifically, the encoding characteristic map comprises a primary encoding characteristic map obtained through a C3 characteristic layer, a middle-level encoding characteristic map obtained through a C4 characteristic layer and a final-level encoding characteristic map obtained through a C5 characteristic layer. After the primary coding feature map of the C3 feature layer is subjected to convolution and normalization processing by a decoding sub-network, an S3 feature map, that is, the primary decoding feature map in the embodiment of the present application, is obtained; similarly, the middle-level coding feature map of the C4 feature layer is convolved and normalized by a decoding sub-network to obtain an S4 feature map, that is, the middle-level decoding feature map in the embodiment of the present application; and performing convolution and normalization processing on the final-level coded feature map of the C5 feature layer through a decoding sub-network to obtain an S5 feature layer feature map, namely the final-level decoded feature map in the embodiment of the application.

Wherein, M45, M345 and MSCAN are respectively obtained by splicing S5 and S4, U45 and S3 and the output of three SCAN modules through a convolution layer.

Specifically, the signature of the M45 signature layer is obtained by stitching the output of S5 and S4 through convolutional layers, i.e., the first stitched signature in the embodiment of the present application, so as to retain more information and obtain more detailed signature information. After convolution and normalization processing are carried out on the M45, a feature map of the feature layer S45, namely a first decoding feature map, can be obtained; and then, after further feature extraction is carried out on the first decoded feature map in S45, upsampling is carried out to obtain a feature map of a U45 feature layer, namely a second decoded feature map. Wherein the size of the second decoding feature map is the same as the size of the primary encoding feature map. On the basis, the second decoding feature map of the U45 feature layer and the primary decoding feature map of the S3 feature layer are spliced through the convolutional layer M345 in the decoding sub-network to obtain a feature map of the M345 feature layer, namely a second splicing feature map; then, the second stitched feature map of the M345 feature layer is convolved and normalized to obtain a feature map of the S345 feature layer, that is, a pixel-level feature map, which is hereinafter referred to as a pixel-level feature map S345, referred to as S345 for short, and feature maps of other feature layers may also be referred to by similar terms, for example, an abstract feature map USCD, and the like, which is not specifically limited herein. Wherein the size of the pixel level feature map is the same as the size of the primary encoding feature map, as shown in fig. 4, and S345 is the same as C3.

The above process is as follows: after a pixel-level feature map S345 is finally obtained in the process of decoding the coding feature map input to the decoding portion in the region extraction network model in the embodiment of the present application, feature extraction may be performed on S345 by scanning a sub-network.

In the embodiment of the present application, feature extraction is mainly performed on the pixel-level feature map S345 obtained by passing through the decoding subnetwork in front based on ScanModule (scan module) shown in fig. 4. The ScanModule includes three modules with similar basic structures, called SCAN modules, and the SCAN modules 1, SCAN2, and SCAN3 shown in fig. 4 represent three basic modules, i.e., SCAN sub-networks in the embodiment of the present application. The three scanning sub-networks listed in the embodiment of the present application are obtained by empirical analysis, and the more scanning sub-networks can be, the better.

Referring to fig. 5, a schematic structural diagram of a SCAN module in an embodiment of the present application is shown, where a portion inside a dashed line frame is the SCAN module, and the schematic structural diagram specifically includes: SCD, USCD, SSC, MSCD, SSCD. The input of the module is a pixel level feature map S345, and the output results of all SCAN modules are spliced to an MSCAN feature layer.

The feature graphs of the SSCD and SSC feature layers are obtained by convolving and normalizing the feature graphs of the MSCD and S345 feature layers respectively; the characteristic diagram of the USCD characteristic layer is obtained by up-sampling the characteristic diagram of the SCD characteristic layer; the characteristic diagram of the MSCD characteristic layer is obtained by splicing the characteristic diagram of the USCD characteristic layer and the characteristic diagram of the SSC characteristic layer through a convolution layer.

Optionally, the scanning subnetwork may be divided into an abstract feature extraction layer, a detail feature extraction layer, a feature fusion layer, and a target feature extraction layer. The system comprises an abstract feature extraction layer, a detail feature extraction layer and a target feature extraction layer, wherein the abstract feature extraction layer comprises an SCD feature layer and an USCD feature layer, the detail feature extraction layer comprises an SSC feature layer, the feature fusion layer comprises an MSCD feature layer, and the target feature extraction layer comprises an SSCD feature layer.

Specifically, for any scanning subnetwork, when the target feature map is extracted and obtained through the scanning subnetwork, the specific process is as follows:

the method comprises the steps that a pixel level feature map S345 input into a scanning sub-network is subjected to down-sampling to obtain SCD based on an abstract feature extraction layer in the scanning sub-network, and then up-sampling is carried out to obtain an abstract feature map USCD, wherein the size of the abstract feature map USCD is the same as that of the pixel level feature map S345, and the abstract feature map USCD is shown in FIG. 5;

and extracting features of the pixel level feature map S345 input into the scanning subnetwork based on a detail feature extraction layer in the scanning subnetwork to obtain a detail feature map SSC, wherein the size of the detail feature map SSC is the same as that of the pixel level feature map S345, as shown in fig. 5;

further, splicing the abstract feature map USCD and the detail feature map SSC based on the feature fusion layer to obtain a feature fusion map MSCD;

and (4) extracting the features of the feature fusion graph MSCD based on the target feature extraction layer to obtain a target feature graph (SSCD output).

In the embodiment of the present application, when performing abstract feature extraction, for different SCAN sub-networks SCAN, when obtaining SCD through downsampling S345, different SCAN sub-networks SCAN are processed by using convolution kernels with different sizes. For example, the SCAN1, the SCAN2 and the SCAN3 respectively adopt convolution kernels with the sizes of 3x3, 6x6 and 12x12 to perform convolution operation, and extract features with different dimensions. To ensure that, the upsampling operation is performed subsequently, so that the sizes of the feature layer sizes USCD after upsampling are equal.

In detail feature extraction, S345 obtains SSC through feature extraction, and similarly, different scanning subnetworks SCAN are processed using convolution kernels of different sizes. For example, SCAN1, SCAN2 and SCAN3 respectively use convolution kernels with sizes of 3x3, 6x6 and 12x12 to perform convolution operation, but in order to ensure that the sizes of the SSC feature layers after feature extraction are the same as the sizes of the S345 feature layers, padding (padding) processes are added, wherein the padding sizes are 1, 3 and 6 respectively. The final minutiae pattern SSC obtained based on the minutiae extraction layer is the same size as S345.

In the embodiment of the application, when convolutional kernels with different scales are used, the ScanModule can extract features with different dimensions on a feature layer based on different receptive fields, further extracts the coded features through a process of extracting the detailed features, simulates a coding and decoding process through a process of extracting the abstract features, weakens information which is difficult to be detailed, and ignores a feature part which is too detailed.

Finally, target feature maps output by the three scanning sub-networks are spliced through convolution layers to obtain a feature fusion map MSCAN, further, the MSCAN is subjected to convolution and normalization processing to obtain a first class feature map SSCAN, the size of the first class feature map SSCAN is the same as that of the primary coding feature map C3, in order to obtain the class information of each pixel in the image to be detected, the first class feature map SSCAN needs to be further subjected to up-sampling to obtain a second class feature image with the same size as the image to be detected, and the pixel value of each pixel in the second class feature map obtained in the mode represents the class information. For example, in the two-classification scene of green space extraction, the pixel value of each pixel in the second class feature map may be 1 or 0, where a pixel value of 1 indicates that the pixel belongs to the green space, and a pixel value of 0 indicates that the pixel does not belong to the green space.

Taking a green space extraction scene as an example, the background image production mode of the green space is purely manual production at present. The green space feature extraction based on deep learning provides possibility for an automatic or semi-automatic labor-saving process, and on the basis, the extraction accuracy and precision of the green space in the remote sensing satellite image are further improved based on the region extraction method of the remote sensing satellite image in the embodiment of the application.

The following describes in detail the training process of the area extraction network model listed in the embodiments of the present application.

Referring to fig. 6, a schematic flowchart of a training method for a local extraction network model in the embodiment of the present application is shown, which specifically includes the following steps:

s61: acquiring a training sample data set from a sample remote sensing satellite image;

in the embodiment of the present application, the training sample data set includes a plurality of training samples, each of the training samples includes a sample image, and the sample image is labeled with a category label of each pixel. The sample image in the embodiment of the application is extracted based on the sample remote sensing satellite image.

S62: performing multiple rounds of iterative training on the untrained area extraction network model according to training samples in the training sample data set to obtain a trained area extraction network model;

in the embodiment of the application, when multiple rounds of iterative training are performed on an untrained area extraction network model, the number of iterations may be a preset value, and the training is stopped when the number of iterations reaches an upper limit, so as to obtain a trained area extraction network model. The training may be stopped when the model converges according to the actual situation, and a trained region extraction network model may be obtained, which is not specifically limited herein.

Wherein, each round of iterative training executes the following processes:

s621: selecting at least one training sample from a training sample data set, inputting a sample image contained in the training sample into an untrained area extraction network model aiming at any training sample, and obtaining the category information of each pixel in the sample image output by the untrained area extraction network model;

s622: and adjusting the network parameters of the untrained area extraction network model according to the difference between the class information of each pixel in the sample image and the label marked on the sample image.

After the trained area extraction network model is obtained based on the method, the area extraction process of the enumerated remote sensing satellite image can be realized based on the trained area extraction model. The image to be detected is subjected to pixel-by-pixel labeling classification based on the region extraction network model provided in the embodiment of the application, so that the accuracy and precision of target region extraction can be effectively improved.

Fig. 7 shows a timing chart for implementing the complete method for extracting the region of the remote sensing satellite image. The specific implementation flow of the method is as follows:

step S701: acquiring a remote sensing satellite image of a region to be detected, and extracting an image to be detected in the remote sensing satellite image;

step S702: inputting an image to be detected into a trained area to extract a network model;

step S703: coding the image to be detected based on a coding part in the region extraction network model to obtain a coding feature map;

step S704: inputting the coding feature map into a decoding sub-network in a decoding part of the region extraction network model, decoding the coding feature map by the decoding sub-network, and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map;

step S705: respectively extracting the features of the pixel level feature maps through different scanning sub-networks to obtain each target feature map;

step S706: splicing target characteristic graphs extracted based on different scanning sub-networks through a convolutional layer to obtain a target splicing characteristic graph;

step S707: performing convolution and normalization processing on the target splicing feature map to obtain a first class feature map;

step S708: and performing up-sampling on the first class characteristic diagram to obtain a second class characteristic diagram, wherein the size of the second class characteristic diagram is the same as that of the image to be detected, and the pixel value of each pixel in the second class characteristic diagram represents the class information of the pixel.

Based on the same inventive concept, the embodiment of the application also provides a region extraction device of the remote sensing satellite image. As shown in fig. 8, a schematic structural diagram of an area extraction device 800 for remote sensing satellite images may include:

the image acquisition unit 801 is used for acquiring a remote sensing satellite image of a region to be detected and extracting an image to be detected in the remote sensing satellite image;

the encoding unit 802 is configured to encode an image to be detected to obtain an encoding feature map;

a decoding unit 803, configured to analyze the encoded feature map by using different receptive fields to obtain different target feature maps; performing feature fusion based on each target feature map, and predicting to obtain category information of each pixel in the image to be measured according to a fusion result;

and an area extracting unit 804, configured to extract a target area in the image to be measured based on the category information of each pixel.

Optionally, the encoding unit 802 is specifically configured to:

inputting an image to be detected into the trained area extraction network model, and coding the image to be detected based on a coding part in the area extraction network model to obtain a coding feature map; and

the decoding unit 803 is specifically configured to:

inputting the coding feature map into an area, extracting a decoding part in a network model, analyzing the coding feature map based on different receptive fields in the decoding part, and acquiring different target feature maps; performing feature fusion based on each target feature map, and predicting to obtain category information of each pixel in the image to be measured according to a fusion result;

the trained regional extraction network model is obtained by training according to a training sample data set marked with class labels, and training samples in the training sample data set comprise sample images extracted based on sample remote sensing satellite images.

Optionally, the decoding part comprises a decoding subnetwork and at least two scanning subnetworks; the decoding sub-network is used for decoding the coding feature map and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map; the scanning sub-networks perform feature extraction on the pixel level feature map to obtain a target feature map, convolution kernels adopted when feature extraction is performed on the basis of different scanning sub-networks are different, and different convolution kernels correspond to different receptive fields.

Optionally, the decoding unit 803 is specifically configured to:

inputting the coding feature map into a decoding sub-network in a decoding part, decoding the coding feature map through the decoding sub-network, and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map;

respectively extracting the features of the pixel level feature maps through different scanning sub-networks to obtain each target feature map;

splicing target characteristic graphs extracted based on different scanning sub-networks through a convolutional layer to obtain a target splicing characteristic graph;

performing convolution and normalization processing on the target splicing feature map to obtain a first class feature map;

and performing up-sampling on the first class characteristic diagram to obtain a second class characteristic diagram, wherein the size of the second class characteristic diagram is the same as that of the image to be detected, and the pixel value of each pixel in the second class characteristic diagram represents the class information of the pixel.

Optionally, the encoding feature map includes a primary encoding feature map, a middle-level encoding feature map and a final-level encoding feature map, where the primary encoding feature map is obtained by encoding the image to be detected at least once, the size of the primary encoding feature map is smaller than that of the image to be detected, the middle-level encoding feature map is obtained by further encoding the primary encoding feature map, the size of the middle-level encoding feature map is smaller than that of the primary encoding feature map, the final-level encoding feature map is obtained by filtering the middle-level encoding feature map, and the size of the final-level encoding feature map is the same as that of the middle-level encoding feature map; and

the decoding unit 803 is specifically configured to:

inputting the final-level coding feature graph into a decoding sub-network in the decoding part, and performing convolution and normalization processing on the final-level coding feature graph through the decoding sub-network to obtain a final-level decoding feature graph; and

inputting the middle-level coding feature graph into a decoding sub-network in the decoding part, and performing convolution and normalization processing on the middle-level coding feature graph through the decoding sub-network to obtain a middle-level decoding feature graph, wherein the size of the middle-level decoding feature graph is the same as that of the final-level decoding feature graph;

splicing the middle-level coding feature map and the final-level coding feature map through convolutional layers in a decoding sub-network to obtain a first splicing feature map;

performing convolution and normalization processing on the first splicing feature map to obtain a first decoding feature map;

the first decoding characteristic diagram is up-sampled to obtain a second decoding characteristic diagram, and the size of the second decoding characteristic diagram is the same as that of the primary coding characteristic diagram; and

inputting the primary coding feature map into a decoding sub-network in a decoding part, and performing convolution and normalization processing on the primary coding feature map through the decoding sub-network to obtain a primary decoding feature map, wherein the size of the primary decoding feature map is the same as that of the primary coding feature map;

splicing the primary decoding feature map and the second decoding feature map through the convolution layers in the decoding sub-network to obtain a second spliced feature map;

and performing convolution and normalization processing on the second spliced characteristic diagram to obtain a pixel-level characteristic diagram, wherein the size of the pixel-level characteristic diagram is the same as that of the primary coding characteristic diagram.

Optionally, the scanning the sub-network comprises: the system comprises an abstract feature extraction layer, a detail feature extraction layer, a feature fusion layer and a target feature extraction layer; the decoding unit 803 is specifically configured to:

for each scanning sub-network, performing down-sampling on a pixel level feature map input into the scanning sub-network based on an abstract feature extraction layer in the scanning sub-network, and then performing up-sampling to obtain an abstract feature map, wherein the size of the abstract feature map is the same as that of the pixel level feature map;

performing feature extraction on a pixel-level feature map input into a scanning sub-network based on a detail feature extraction layer in the scanning sub-network to obtain a detail feature map, wherein the size of the detail feature map is the same as that of the pixel-level feature map;

splicing the abstract feature map and the detail feature map based on the feature fusion layer to obtain a feature fusion map;

and performing feature extraction on the feature fusion graph based on the target feature extraction layer to obtain a target feature graph.

Optionally, the apparatus further comprises:

a training unit 805, configured to obtain a training sample data set from a sample remote sensing satellite image;

performing multiple rounds of iterative training on the untrained area extraction network model according to training samples in the training sample data set to obtain a trained area extraction network model; wherein, each round of iterative training executes the following processes:

selecting at least one training sample from a training sample data set, inputting a sample image contained in the training sample into an untrained area extraction network model aiming at any training sample, and obtaining the category information of each pixel in the sample image output by the untrained area extraction network model;

and adjusting the network parameters of the untrained area extraction network model according to the difference between the class information of each pixel in the sample image and the label marked on the sample image.

For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.

After the method and the device for extracting the region of the remote sensing satellite image according to the exemplary embodiment of the present application are introduced, an electronic device according to another exemplary embodiment of the present application is introduced next.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

The electronic equipment is based on the same inventive concept as the method embodiment, and the embodiment of the application also provides the electronic equipment. The electronic device may be used for transfer control of resource objects. In one embodiment, the electronic device may be a server, such as server 220 shown in FIG. 2. In this embodiment, the electronic device may be configured as shown in fig. 9, and include a memory 901, a communication module 903, and one or more processors 902.

A memory 901 for storing computer programs executed by the processor 902. The memory 901 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, a program required for running an instant messaging function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.

Memory 901 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 901 may also be a non-volatile memory (non-volatile memory), such as a read-only memory (rom), a flash memory (flash memory), a hard disk (HDD) or a solid-state drive (SSD); or the memory 901 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 901 may be a combination of the above memories.

The processor 902 may include one or more Central Processing Units (CPUs), a digital processing unit, and the like. The processor 902 is configured to implement the above-described method for extracting a satellite image region when calling a computer program stored in the memory 901.

The communication module 903 is used for communicating with terminal equipment and other servers.

The embodiment of the present application does not limit the specific connection medium among the memory 901, the communication module 903, and the processor 902. In the embodiment of the present application, the memory 901 and the processor 902 are connected through the bus 904 in fig. 9, the bus 904 is represented by a thick line in fig. 9, and the connection manner between other components is merely illustrative and is not limited. The bus 904 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The memory 901 stores a computer storage medium, and the computer storage medium stores computer executable instructions for implementing the method for extracting the region of the remote sensing satellite image according to the embodiment of the present application. The processor 902 is configured to execute the above-mentioned region extraction method for remote sensing satellite images, as shown in fig. 3.

In some possible embodiments, the various aspects of the method for extracting a region of a remote sensing satellite image provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the method for extracting a region of a remote sensing satellite image according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device, for example, the computer device may perform the steps as shown in fig. 3.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit in the embodiment of the present application may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof that contribute to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such changes and modifications of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such changes and modifications.

Claims

1. A region extraction method of a remote sensing satellite image is characterized by comprising the following steps:

coding the image to be detected to obtain a coding feature map;

2. The method of claim 1, wherein the encoding the image to be tested to obtain an encoding feature map specifically comprises:

inputting the image to be detected into a trained area extraction network model, and coding the image to be detected based on a coding part in the area extraction network model to obtain a coding feature map; and

analyzing the coding characteristic diagram by adopting different receptive fields to obtain different target characteristic diagrams; performing feature fusion based on each target feature map, and predicting to obtain category information of each pixel in the image to be detected according to a fusion result, wherein the method specifically comprises the following steps:

inputting the coding feature map into a decoding part in the region extraction network model, analyzing the coding feature map based on different receptive fields in the decoding part, and acquiring different target feature maps; performing feature fusion based on each target feature map, and predicting to obtain category information of each pixel in the image to be detected according to a fusion result;

3. The method of claim 2, wherein the decoding portion comprises a decoding subnetwork, and at least two scanning subnetworks; the decoding sub-network is used for decoding the coding feature map and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map; and the scanning sub-networks perform feature extraction on the pixel level feature map to obtain a target feature map, and different convolution kernels are adopted when feature extraction is performed on the basis of different scanning sub-networks, and different receptive fields are met by different convolution kernels.

4. The method according to claim 3, wherein the inputting the coding feature map into a decoding portion of the area extraction network model, and analyzing the coding feature map based on different receptive fields in the decoding portion to obtain different target feature maps specifically comprises:

inputting the coding feature map into a decoding sub-network in the decoding part, decoding the coding feature map through the decoding sub-network, and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map;

the performing feature fusion based on each target feature map, and predicting to obtain category information of each pixel in the image to be measured according to a fusion result specifically includes:

and performing up-sampling on the first class feature map to obtain a second class feature map, wherein the size of the second class feature map is the same as that of the image to be detected, and the pixel value of each pixel in the second class feature map represents the class information of the pixel.

5. The method according to claim 4, wherein the coding feature map comprises a primary coding feature map, a middle coding feature map and a final coding feature map, wherein the primary coding feature map is obtained by coding the image to be tested at least once, the size of the primary coding feature map is smaller than that of the image to be tested, the middle coding feature map is obtained by further coding the primary coding feature map, the size of the middle coding feature map is smaller than that of the primary coding feature map, the size of the final coding feature map is obtained by filtering the middle coding feature map, and the size of the final coding feature map is the same as that of the middle coding feature map; and

inputting the coding feature map into a decoding sub-network in the decoding part, decoding the coding feature map through the decoding sub-network, and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map, which specifically comprises:

inputting the final-level coding feature map into a decoding sub-network in the decoding part, and performing convolution and normalization processing on the final-level coding feature map through the decoding sub-network to obtain a final-level decoding feature map; and

inputting the middle-level coding feature map into a decoding sub-network in the decoding part, and performing convolution and normalization processing on the middle-level coding feature map through the decoding sub-network to obtain a middle-level decoding feature map, wherein the size of the middle-level decoding feature map is the same as that of the final-level decoding feature map;

splicing the middle-level coding feature map and the final-level coding feature map through convolutional layers in the decoding sub-network to obtain a first splicing feature map;

inputting the primary coding feature map into a decoding sub-network in the decoding part, and performing convolution and normalization processing on the primary coding feature map through the decoding sub-network to obtain a primary decoding feature map, wherein the size of the primary decoding feature map is the same as that of the primary coding feature map;

splicing the primary decoding feature map and the second decoding feature map through convolutional layers in the decoding sub-network to obtain a second spliced feature map;

and performing convolution and normalization processing on the second spliced feature map to obtain the pixel-level feature map, wherein the size of the pixel-level feature map is the same as that of the primary coding feature map.

6. The method of claim 4, wherein the scanning for subnets comprises: the system comprises an abstract feature extraction layer, a detail feature extraction layer, a feature fusion layer and a target feature extraction layer;

when the feature extraction is performed on the pixel level feature map through different scanning subnetworks to obtain each target feature map, the method specifically includes, for each scanning subnetwork:

performing down-sampling on a pixel level feature map input into the scanning sub-network based on an abstract feature extraction layer in the scanning sub-network, and then performing up-sampling to obtain an abstract feature map, wherein the size of the abstract feature map is the same as that of the pixel level feature map;

performing feature extraction on a pixel-level feature map input into the scanning sub-network based on a detail feature extraction layer in the scanning sub-network to obtain a detail feature map, wherein the size of the detail feature map is the same as that of the pixel-level feature map;

splicing the abstract feature map and the detail feature map based on a feature fusion layer to obtain a feature fusion map;

and performing feature extraction on the feature fusion graph based on a target feature extraction layer to obtain a target feature graph.

7. The method of any one of claims 2 to 6, wherein the area extraction network model is trained by:

acquiring a training sample data set from a sample remote sensing satellite image;

8. A region extraction device for remote sensing satellite images is characterized by comprising:

9. The apparatus of claim 8, wherein the encoding unit is specifically configured to:

the decoding unit is specifically configured to:

10. The apparatus of claim 9, wherein the decoding portion comprises a decoding subnetwork, and at least two scanning subnetworks; the decoding sub-network is used for decoding the coding feature map and restoring the features in the coding feature map to a pixel level to obtain a pixel level feature map; and the scanning sub-networks perform feature extraction on the pixel level feature map to obtain a target feature map, and different convolution kernels are adopted when feature extraction is performed on the basis of different scanning sub-networks, and different receptive fields are met by different convolution kernels.

11. The apparatus as recited in claim 10, wherein said decoding unit is specifically configured to:

12. The apparatus of claim 11, wherein the coding feature map comprises a primary coding feature map, a middle coding feature map and a final coding feature map, wherein the primary coding feature map is obtained by coding the image to be tested at least once, the size of the primary coding feature map is smaller than that of the image to be tested, the middle coding feature map is obtained by further coding the primary coding feature map, the size of the middle coding feature map is smaller than that of the primary coding feature map, the size of the final coding feature map is obtained by filtering the middle coding feature map, and the size of the final coding feature map is the same as that of the middle coding feature map; and

the decoding unit is specifically configured to:

13. The apparatus of claim 11, wherein the scanning subnetwork comprises: the system comprises an abstract feature extraction layer, a detail feature extraction layer, a feature fusion layer and a target feature extraction layer;

the decoding unit is specifically configured to:

14. An electronic device, comprising a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 7.

15. A computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to carry out the steps of the method according to any one of claims 1 to 7, when said program code is run on said electronic device.