CN116894955A

CN116894955A - Target extraction method, target extraction device, electronic equipment and storage medium

Info

Publication number: CN116894955A
Application number: CN202310930932.5A
Authority: CN
Inventors: 王雷; 陈铖; 于博; 陈方
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-07-27
Filing date: 2023-07-27
Publication date: 2023-10-17

Abstract

The disclosure provides a target extraction method, a target extraction device, electronic equipment and a storage medium, wherein the target extraction method comprises the following steps: acquiring a satellite remote sensing image to be detected; preprocessing the satellite remote sensing image to be detected to obtain a plurality of plaques with preset pixel sizes; for each plaque, the following operations are performed: inputting the plaque to an encoder and outputting a first feature map; inputting the first feature map to a decoder, the decoder comprising a global local extraction operation of the intermediate feature map of the multi-upsampling and multi-merging encoder and the intermediate feature map of the decoder, comprising: normalizing and convolving the input feature map to extract local features from the input feature map to obtain a first feature set; extracting global features from the input feature map based on a multi-head self-attention mechanism to obtain a second feature set; and processing and fusing the first feature set and the second feature set to obtain a second feature map, wherein the second feature map comprises a plurality of identified targets. The encoder and decoder form a Global-Local-identity network structure.

Description

Target extraction method, target extraction device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to a target extraction method, a target extraction device, electronic equipment and a storage medium.

Background

The method is used as a large animal husbandry country and a large grassland country, and is an important technical means for the statistical analysis of the cattle and sheep husbandry in areas such as inner Mongolia. Because the number of cattle and sheep is large, the range is wide, the manual monitoring statistics is time-consuming and labor-consuming, the monitoring maintenance cost by using physical hardware is high, and the effect is influenced by factors such as climate, environment and the like. Therefore, it is necessary to develop a high-efficiency low-cost monitoring technology means, extract cattle and sheep in the target area, analyze the distribution situation of the cattle and sheep, and provide a solid data base for situation statistics, pasture maintenance and the like of grassland and pasture industries.

With the development of technology and also because of the demands of politics, national defense security and the like, satellite technology is developed more and more, and one of the following benefits is that the remote sensing images obtained by us are wider in range, larger in quantity and more time-efficient in comparison. The high spatial resolution remote sensing satellite images in China, including the resource satellite III and the high resolution satellite II, are already widely applied to remote sensing target extraction. At present, related technology research for large-scale cattle and sheep extraction is less, and because the cattle and sheep targets are smaller, the characteristics of more confused features and the like exist, and a great challenge exists in the extraction process.

Disclosure of Invention

In order to solve the problems in the prior art, an embodiment of the present disclosure provides a target extraction method, including: acquiring a satellite remote sensing image to be detected; preprocessing the satellite remote sensing image to be detected to obtain a plurality of plaques with preset pixel sizes; for each plaque, the following operations are performed: inputting the plaque to an encoder and outputting a first feature map; inputting the first feature map to a decoder, the decoder comprising a global local extraction operation of upsampling and merging intermediate feature maps of the encoder and intermediate feature maps of the decoder a plurality of times, wherein the global local extraction comprises: normalizing and convolving the input feature map to extract local features from the input feature map to obtain a first feature set; extracting global features from the input feature map based on a multi-head self-attention mechanism to obtain a second feature set; processing and fusing the first feature set and the second feature set to obtain a second feature map, wherein the second feature map comprises a plurality of identified targets; wherein, the encoder and the decoder form a Global-Local-content network structure.

According to an embodiment of the present disclosure, an encoder includes four residual blocks, which are based on ResNet18, each patch is input to the encoder, resulting in a first feature map, including: each plaque is input into an encoder, local features are extracted through ResNet18, normalization processing and maximum pooling processing are carried out through a normalization layer and a maximum pooling layer, four feature maps with different sizes are output, and one of the feature maps is taken as a first feature map.

According to an embodiment of the present disclosure, an input feature map is normalized and convolved to extract local features from the input feature map, resulting in a first feature set: normalizing the input feature map; respectively carrying out convolution processing on the normalized feature images by adopting two convolution kernels with different sizes; and processing the convolution result by adopting a concat function, and carrying out normalization processing to obtain a first feature set.

According to an embodiment of the present disclosure, the decoder further includes a Global Connect module, where GBlock included in the Global spatial correlation feature is extracted from the input feature map based on a multi-head self-attention mechanism, to obtain a second feature set, including: extracting two feature subgraphs from the feature map; carrying out space association on the corresponding positions of the two feature subgraphs; sequentially sliding and extracting the next two feature subgraphs from the feature graphs, performing spatial association with the feature graphs to obtain spatial association information of the input feature graphs, and outputting a second feature set.

According to an embodiment of the present disclosure, before inputting the first feature set and the second feature set to the multi-layered perceptron, further comprising: and inputting the first feature set and the second feature set into a residual shrinkage network module, wherein the first feature set and the second feature set both comprise a plurality of feature data, and the residual shrinkage network module is used for enhancing the optimization capability of the model.

According to an embodiment of the present disclosure, preprocessing a satellite remote sensing image to be detected to obtain a plurality of plaques with preset pixel sizes, including: and cutting each satellite remote sensing image into a plurality of patches with 1024×1024 pixels.

A second aspect of the present disclosure provides a target extraction device, characterized by comprising: the data acquisition module is used for acquiring a satellite remote sensing image to be detected; the data preprocessing module is used for preprocessing the satellite remote sensing image to be detected to obtain a plurality of plaques with preset pixel sizes; the detection module inputs the plaque to the encoder and outputs a first characteristic diagram; inputting the first feature map to a decoder, normalizing and convolving the input feature map to extract local features from the input feature map, and obtaining a first feature set; extracting global features from the input feature map based on a multi-head self-attention mechanism to obtain a second feature set; processing and fusing the first feature set and the second feature set to obtain a second feature map, wherein the second feature map comprises a plurality of identified targets; wherein, the encoder and the decoder form a Global-Local-content network structure.

According to an embodiment of the present disclosure, the encoder includes four residual blocks, which are based on ResNet 18.

A third aspect of the present disclosure provides an electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the object extraction method provided by the first aspect of the present disclosure when the computer program is executed by the processor.

A fourth aspect of the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object extraction method provided by the first aspect of the present disclosure.

According to the target extraction method, the device, the electronic equipment and the storage medium, the method is used for extracting the cattle and sheep target model based on the high-spatial-resolution remote sensing image by adopting the Global-Local Attention network structure, multi-scale Local features are extracted in the encoder, global spatial features and Local features are combined in the decoder, finally, cattle and sheep feature extraction is carried out on the input high-spatial-resolution remote sensing image, further, the final cattle and sheep target extraction result is obtained, and the effectiveness of the model is improved. Meanwhile, an attribute Block module is added to a Global-Local-attribute network structure model in the decoder to extract Global and Local features of the features, wherein Global spatial features of target cattle and sheep are extracted, global spatial extraction capacity of the model is enhanced, and extraction results of the target cattle and sheep are optimized.

Drawings

For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

FIG. 1 schematically illustrates a flow chart of a target extraction method according to an embodiment of the disclosure;

FIG. 2 schematically illustrates a Global-Local-attention network architecture diagram according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic structure of a transducer according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a structural schematic of a GBlock module and an LBlock module according to an embodiment of the present disclosure;

fig. 5 schematically illustrates a structural diagram of a Global Connect module according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a target extraction result graph and a true profile according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a target extraction apparatus according to an embodiment of the disclosure;

fig. 8 schematically shows a block diagram of an electronic device adapted to implement the method described above according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Some of the block diagrams and/or flowchart illustrations are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, when executed by the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon, the computer program product being for use by or in connection with an instruction execution system.

Fig. 1 schematically illustrates a flow chart of a target extraction method according to an embodiment of the disclosure. As shown in fig. 1, the method includes: steps S101 to S104.

In operation S101, a satellite remote sensing image to be measured is acquired. The satellite remote sensing images to be detected comprise multi-view satellite remote sensing images, and each view satellite remote sensing image comprises a plurality of target cattle and sheep.

In the embodiment of the disclosure, a satellite remote sensing image acquired by a satellite at a certain moment is acquired and is used as a satellite remote sensing image to be detected, namely an original data set. The satellite remote sensing image to be detected comprises a plurality of satellite remote sensing images, each satellite remote sensing image comprises a plurality of targets, and the targets can be one or more of livestock such as pigs, sheep and cattle. And realizing target identification based on each satellite remote sensing image.

For example, by collecting satellite remote sensing images collected by satellites on a grassland of inner mongolia with a resolution of 0.5m, specifically, by visually interpreting and collecting samples of flocks or herds in the grassland animal husbandry area as the original dataset.

In operation S102, the satellite remote sensing image to be detected is preprocessed to obtain a plurality of patches with preset pixel sizes.

In the embodiment of the disclosure, a satellite remote sensing image to be detected is preprocessed to obtain a plurality of plaques with preset pixel sizes, and the plaques with the preset pixel sizes are divided into a training data set and a test data set according to a preset proportion. The method comprises the following steps: and cutting each satellite remote sensing image into a plurality of plaques with the size of 1024 multiplied by 1024 pixels, and selecting three channels of green, blue and near infrared from the image channels to obtain a plurality of plaques with the size of 1024 multiplied by 1024 pixels.

Further, a plurality of images of plaque-containing targets of 1024×1024 pixel size are randomly divided into a training data set, a test data set, and a verification data set in equal proportion according to a preset ratio. The training data set accounts for 60% and is used for training the target extraction model to obtain a trained target extraction model. Correspondingly, the test data set accounts for 20% and is used for evaluating and checking the trained model to obtain the extraction result of the target livestock in the animal husbandry. The verification data set accounts for 20% and is used for verifying the trained target extraction model.

Note that the pixel size of the plurality of patches includes, but is not limited to, 1024×1024 pixels, which may also be 512×512 pixels, and the like, which is not limited in the embodiment of the present disclosure.

In operation S103, inputting the patch to the encoder, outputting a first feature map; inputting the first feature map to a decoder, the decoder comprising a global local extraction operation of upsampling and merging intermediate feature maps of the encoder and intermediate feature maps of the decoder a plurality of times, wherein the global local extraction comprises: normalizing and convolving the input feature map to extract local features from the input feature map to obtain a first feature set; and extracting global features from the input feature map based on a multi-head self-attention mechanism to obtain a second feature set.

Specifically, for each patch, identifying at least one target in the patch by adopting a Global-Local-area network structure, wherein the Global-Local-area network structure comprises an encoder and a decoder, inputting the patch into the encoder, and outputting a first characteristic diagram.

Fig. 2 schematically illustrates a Global-Local-identity network architecture diagram according to an embodiment of the present disclosure. As shown in fig. 2, the Global-Local-identity network structure includes an encoder and a decoder, and the overall network structure is a UNet structure.

In the embodiment of the present disclosure, the training data set obtained in step S102 is input into the target extraction model for training, so as to obtain a trained target extraction model. The target extraction model adopts a Global-Local-area network structure and mainly comprises an encoder and a decoder.

In the embodiment of the disclosure, the encoder mainly consists of four ResBlocks (residual blocks), and the four ResBlocks are based on ResNet18, so that the overall model is still lighter on the premise that the encoder can extract multi-scale local features. And then reducing the size of the feature map after passing through a normalization layer and a maximum pooling layer, and expanding the receptive field. Finally, through four ResBlocks, four feature maps with the sizes of 64×192×192, 128×96×96, 256×48×48 and 512×24×24 are output and used for representing multi-scale local features of a satellite remote sensing image to be detected, wherein the feature map with the size of 512×24×24 is output as a first feature map.

Next, the first feature map is subjected to convolution operation to reduce the number of channels, and is input to a decoder, and global local extraction operation is included in the decoder, wherein the global local extraction operation includes: normalizing and convolving the input feature map to extract local features from the input feature map to obtain a first feature set; and extracting global features from the input feature map based on a multi-head self-attention mechanism to obtain a second feature set.

In the disclosed embodiment, four feature maps of encoder output are represented in fig. 2 as Res1, res2, res3, and Res4, respectively. In the decoder, a 3×3 convolution operation is performed on Res4, and the size of Res4 is changed from 512×24×24 to 64×24×24 UpBlock1; up-sampling operation is carried out on UpBlock1 to obtain UpBlock2 with the size of 64 multiplied by 48; upBlock2 and Res3 were used as input features for GLBlock 1.

Fig. 3 schematically shows a schematic structural diagram of a transducer according to an embodiment of the present disclosure. As shown in fig. 3, in the embodiment of the present disclosure, the decoder uses GLBlock composed of a global extraction module based on transform modification and a local extraction module based on convolution as a decoder backbone. Compared with a general transducer, the embodiment of the disclosure adds an attribute Block module to extract targets. Meanwhile, batch Normalization is adopted for normalization of data, so that training speed is increased.

Specifically, fig. 4 schematically illustrates a structural schematic diagram of a GBlock module and an LBlock module according to an embodiment of the present disclosure. As shown in FIG. 4, the Attention Block module mainly comprises two sub-modules, namely an LBlock module and a GBlock module, wherein the LBlock module is used for carrying out local feature extraction on the input feature map, and the GBlock module is used for carrying out global feature extraction on the input feature map. By adopting the structure in the embodiment of the disclosure, the multiscale local information and the global context information of the characteristics of the cattle and the sheep can be fully utilized, and the extraction effect of the target cattle and the sheep can be improved.

In an embodiment of the present disclosure, a first feature map is input to a decoder, and the decoder includes a global local extraction operation for upsampling and merging an intermediate feature map of an encoder and an intermediate feature map of the decoder, where the local extraction operation includes normalization and convolution operations for the input feature map, and is used for performing local feature extraction on the input feature map to obtain a first feature set, and specifically includes: and carrying out normalization processing on the input feature map, carrying out convolution operation by adopting convolution with two sizes of 1 multiplied by 1 and 3 multiplied by 3, carrying out concat function fusion processing on the two convolution results, and normalizing by Batch Normalization to obtain a first feature set output by the LBlock module.

In the embodiment of the present disclosure, GBlock adopts a multi-head self-attention mechanism, which is used for global feature extraction on an input feature map to obtain a second feature set, and specifically includes: the GBlock further comprises a Global Connect module on the basis of a multi-head self-attention mechanism, and two feature subgraphs are extracted from the feature map; carrying out space association on the corresponding positions of the two feature subgraphs; and sequentially sliding and extracting the next two feature subgraphs, performing spatial correlation with the input feature map to obtain spatial correlation information of the feature map, and outputting a second feature set.

Fig. 5 schematically illustrates a structural diagram of a Global Connect module according to an embodiment of the present disclosure. As shown in FIG. 5, the Global Connect module generates two feature subgraphs W of size 16×16 ₁ And W is ₂ The features of the corresponding positions are associated, and a sliding module is needed in the specific implementation process. Specifically, two feature subgraphs W of 16×16 size are extracted from the input feature map ₁ And W is ₂ And W is passed through two 4X 4 size slide modules ₁ And W is ₂ Spatially correlating corresponding positions of (a); the sliding module can correlate the whole feature subgraph by sliding in the feature subgraph; the method is applied to the whole input feature map, the space correlation information of the feature map is obtained, and the second feature set is output.

In operation S104, the first feature set and the second feature set are processed and fused to obtain a second feature map.

Specifically, the first feature set and the second feature set are subjected to concat fusion processing, and finally a second feature map is obtained, wherein the second feature map is used for representing a plurality of targets. Meanwhile, after the concat fusion processing is performed on the first feature set and the second feature set, the method further comprises: and inputting the first feature set and the second feature set into a residual shrinkage network module, wherein the first feature set and the second feature set both comprise a plurality of feature data, and the residual shrinkage network module is used for enhancing the optimization capability of the model.

In the disclosed embodiment, the decoder is mainly composed of three global local extraction backbone modules. The global local extraction backbone module mainly comprises a global attention extraction module and a local feature extraction module.

In the global attention extraction module, the feature map is input and then passes through a layer of multi-head self-attention and global connection module for global extraction. The local feature extraction module performs normalization operation on the feature map through a normalization layer, then performs 1×1 and 3×3 convolution operations respectively, finally performs weighting and two convolution results, and finally performs normalization to complete local extraction.

In the decoder, the UpBlock2 feature map obtained by up-sampling UpBlock1 is input, then the channel number is changed by 3X 3 convolution operation, then the convolution result and the feature map output by Res3 are subjected to weighted summation, and GLBlock1 is obtained through a global local extraction module.

Then, up-sampling is carried out on the GLBlock1 to obtain UpBlock3, the UpBlock3 and Res2 are used as input features of the GLBlock, and the GLBlock2 is obtained through a global local extraction module; up-sampling is carried out on the GLBlock2 to obtain UpBlock4, the UpBlock4 and Res1 are used as input features of the GLBlock, and the GLBlock3 is obtained through a global local extraction module. The GLBlock3 is used for representing a plurality of targets in the satellite remote sensing image to be detected.

In the embodiment of the disclosure, in the training process of the target extraction model, an Adam optimization algorithm may be used to perform model learning, and the learning rate may be set to 0.01. And (3) adopting a cosine annealing learning rate optimization strategy, increasing the learning rate by 3 times every 20 batches, selecting the optimal model, and training the model for 100 batches. All experiments were performed using NVIDIA GeForce 3060Ti graphic cards. The loss function adopts a multi-head segmentation architecture, soft cross entropy is used as a main loss, and a specific calculation formula is shown as follows:

Where yk denotes the probability of class k,the calculation formula of Softmax is specifically as follows:

wherein, the auxiliary loss adopts dichios, and is specifically:

wherein X represents the feature matrix before training, Y represents the feature matrix after training, and the final loss function is shown in the following formula:

L＝L _sce +∝L _dice

where L represents the total loss, and oc represents the weight parameter, initialized to 0.37.

In the embodiment of the disclosure, the test data set can be input into a trained target extraction model for testing. And (3) for the target extraction model obtained by testing, evaluating and checking the trained target extraction model by adopting a test data set to obtain an extraction result of the target. And randomly selecting two test images from the test data set, and displaying the corresponding extraction structure and the real distribution diagram.

Fig. 6 schematically illustrates a target extraction result graph and a true distribution graph according to an embodiment of the present disclosure. As shown in fig. 6, fig. 6a1 and fig. 6b1 are original pseudo-color composite images (green, blue and near infrared bands), fig. 6a2 and fig. 6b2 are binary graphs of extraction results of livestock herd, and fig. 6a3 and fig. 6b3 are binary graphs of true distribution of livestock herd. The target extraction model provided by the embodiment of the disclosure can better extract the target livestock from the complex background ground object.

Meanwhile, in order to more objectively evaluate the extraction result of the model, calculating a Recall rate Recall and Precision according to the following formulas by combining a real distribution diagram of the livestock group and a binary diagram of the extraction result, and combining the two comprehensive evaluation parameters according to IoU (Intersection over Union) and F1_measure values:

wherein TP represents the number of real pixels extracted as the target livestock; TN represents the number of real pixels identified by the model as background ground objects; FP represents the number of pixels of the real background terrain misclassified as the target livestock; FN represents the number of pixels of the real target livestock misclassified as background terrain. Further, a target extraction accuracy statistic table based on Global-Local-attention network structure shown in the following table 1 can be obtained:

TABLE 1 target extraction precision statistics table based on Global-Local-Attenion network structure

From table 1 above, the method proposed by the embodiment of the present disclosure can obtain higher precision and recall rate, indicating that most of the targets are accurately extracted.

According to the target extraction method, a Global-Local Attention network structure is adopted to extract a cattle and sheep target model based on a high-spatial-resolution remote sensing image, multi-scale Local features are extracted in an encoder, global spatial features and Local features are combined in a decoder, finally, cattle and sheep feature extraction is carried out on the input high-spatial-resolution remote sensing image, and then a final cattle and sheep target extraction result is obtained, so that the effectiveness of the model is improved. Meanwhile, an attribute Block module (comprising a GBlock global feature extraction sub-module and an LBlock local feature extraction sub-module) is added in a transducer module in the decoder to extract global features and local features of the features respectively, wherein the global feature extraction sub-module GBlock extracts global space features of target cattle and sheep, global space extraction capability of a model is enhanced, and extraction results of the target cattle and sheep are optimized.

Fig. 7 schematically illustrates a block diagram of a target extraction apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the target extraction device 700 based on the high spatial resolution remote sensing image includes: a data acquisition module 710, a data preprocessing module 720, and a detection module 730. The apparatus 700 may be used to implement the target extraction method described with reference to fig. 1.

The data acquisition module 710 is configured to acquire a satellite remote sensing image to be detected; the satellite remote sensing images to be detected comprise multi-scene satellite remote sensing images, and each scene satellite remote sensing image comprises a plurality of target livestock. The data acquisition module 710 may be used, for example, to perform the step S101 described above with reference to fig. 1, which is not described herein.

The data preprocessing module 720 is configured to preprocess the satellite remote sensing image to be detected to obtain a plurality of patches with preset pixel sizes, and divide the patches with preset pixel sizes into a training data set and a test data set according to a preset proportion. The data preprocessing module 720 may be used, for example, to perform the step S102 described above with reference to fig. 1, which is not described herein.

The detection module 730 inputs the plaque into the Global-Local-attention network structure to detect a plurality of targets in the plaque; the Global-Local-attention network structure comprises an encoder and a decoder, wherein the plaque is input into the encoder and a first characteristic diagram is output; inputting the first feature map to a decoder, including a global local extraction operation of upsampling and merging the intermediate feature map of the encoder and the intermediate feature map of the decoder a plurality of times in the decoder, wherein the global local extraction comprises: normalizing and convolving the input feature map to extract local features from the input feature map to obtain a first feature set; extracting global features from the input feature map based on a multi-head self-attention mechanism to obtain a second feature set; and processing and fusing the first feature set and the second feature set to obtain a second feature map, wherein the second feature map is used for representing a plurality of targets. The data training module 730 may be used, for example, to perform steps S101 to S104 described above with reference to fig. 1, which are not described herein.

According to an embodiment of the disclosure, a detection module inputs each plaque to an encoder to obtain a first feature map, including: the encoder includes four residual blocks based on ResNet 18; and inputting the plaque into a residual block, carrying out normalization processing and maximum pooling processing by a normalization layer and a maximum pooling layer, obtaining four characteristic diagrams with different sizes, and outputting the characteristic diagram obtained by the last layer as a first characteristic diagram from an encoder.

Any number of modules, sub-modules, units, sub-units, or at least some of the functionality of any number of the sub-units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a device on a chip, a device on a substrate, a device on a package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware that integrates or packages the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which when executed, may perform the corresponding functions.

For example, any of the data acquisition module 710, the data preprocessing module 720, and the detection module 730 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the data acquisition module 710, the data preprocessing module 720, the detection module 730 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), an on-chip device, a device on a substrate, a device on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware, such as any other reasonable way of integrating or packaging the circuits, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the data acquisition module 710, the data preprocessing module 720, the detection module 730 may be at least partially implemented as a computer program module which, when executed, may perform the corresponding functions.

Fig. 8 schematically shows a block diagram of an electronic device adapted to implement the method described above, according to an embodiment of the disclosure. The electronic device shown in fig. 8 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 8, the electronic device 800 described in the present embodiment includes: a processor 801 which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.

In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, the electronic device 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The electronic device 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 808 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 808 including a hard disk or the like; and a communication section 808 including a network interface card such as a LAN card, a modem, or the like. The communication section 808 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 808, and/or installed from the removable medium 811. The above-described functions defined in the apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The above-described apparatuses, devices, means, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/means described in the above embodiments; or may exist alone without being assembled into the apparatus/device/means. The above-described computer-readable storage medium carries one or more programs which, when executed, implement the target extraction method according to the embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code means for causing a computer device to carry out the method for extracting a target based on a high spatial resolution remote sensing image provided by the embodiments of the present disclosure when the computer program product is run in the computer device.

The above-described functions defined in the apparatus/means of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The above-described apparatuses, modules, units, etc. may be implemented by computer program modules according to an embodiment of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication portion 808, and/or installed from the removable medium 811. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 808, and/or installed from the removable medium 811. The above-described functions defined in the apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The above-described apparatuses, devices, means, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

It should be noted that, each functional module in each embodiment of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, either in essence or as a part of the prior art or all or part of the technical solution.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

While the present disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the appended claims and their equivalents. The scope of the disclosure should, therefore, not be limited to the above-described embodiments, but should be determined not only by the following claims, but also by the equivalents of the following claims.

Claims

1. A method of extracting a target, comprising:

acquiring a satellite remote sensing image to be detected;

preprocessing the satellite remote sensing image to be detected to obtain a plurality of plaques with preset pixel sizes;

For each of the plaques, the following is performed:

inputting the plaque to an encoder and outputting a first characteristic diagram;

inputting the first feature map to a decoder comprising a global local extraction operation of upsampling and merging intermediate feature maps of the encoder and intermediate feature maps of the decoder multiple times, wherein global local extraction comprises: normalizing and convolving an input feature map to extract local features from the input feature map to obtain a first feature set; extracting global features from the input feature map based on a multi-head self-attention mechanism to obtain a second feature set; processing and fusing the first feature set and the second feature set to obtain a second feature map, wherein the second feature map comprises a plurality of identified targets;

wherein the encoder and the decoder form a Global-Local-attention network structure.

2. The method of claim 1, wherein the encoder includes four residual blocks, four of the residual blocks being based on res net18, and wherein the inputting the plaque to the encoder results in a first feature map, comprising:

Inputting each patch into a ResNet18 basic backbone network in the encoder, and carrying out normalization processing on the patch by a normalization layer;

inputting the plaque after normalization processing to a maximum pooling layer to perform maximum pooling processing on the plaque, outputting four feature images with different sizes, and taking one of the feature images as the first feature image.

3. The method of claim 1, wherein normalizing and convolving the input feature map to extract local features from the input feature map, the obtaining a first feature set comprises:

normalizing the input feature map;

respectively carrying out convolution processing on the input feature images by adopting two convolution kernels with different sizes;

and fusing and normalizing the convolution result by adopting a concat function to obtain the first feature set.

4. The method of claim 1, wherein the decoder further comprises a Global Connect module, wherein the extracting Global features from the input feature map based on the multi-headed self-attention mechanism to obtain the second feature set comprises:

extracting two feature subgraphs from the input feature map;

Carrying out space association on the corresponding positions of the two feature subgraphs;

and sequentially sliding and extracting the next two feature subgraphs from the input feature map, performing spatial association with the input feature map to obtain spatial association information of the input feature map, and outputting the second feature set.

5. The method for extracting a target according to claim 1, wherein the processing and fusing the first feature set and the second feature set to obtain a second feature map further comprises:

and inputting the first feature set and the second feature set into a residual shrinkage network module to enhance the optimization capability of the model.

6. The method for extracting a target according to claim 1, wherein the preprocessing the satellite remote sensing image to be detected to obtain a plurality of patches with preset pixel sizes includes:

and cutting the satellite remote sensing image to be detected into a plurality of patches with 1024 multiplied by 1024 pixels.

7. A target extraction device, characterized by comprising:

the data acquisition module is used for acquiring a satellite remote sensing image to be detected;

the data preprocessing module is used for preprocessing the satellite remote sensing image to be detected to obtain a plurality of plaques with preset pixel sizes;

The detection module is used for determining a first characteristic diagram according to the plaque; normalizing and convolving an input feature map to extract local features from the input feature map to obtain a first feature set; extracting global features from the input feature map based on a multi-head self-attention mechanism to obtain a second feature set; and processing and fusing the first feature set and the second feature set to obtain a second feature map, wherein the second feature map comprises a plurality of identified targets.

8. The object extraction device of claim 7, wherein the detection module comprises a Global-Local-area network structure formed by an encoder and a decoder, the encoder comprising four residual blocks, the four residual blocks being based on a res net 18.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the target extraction method of any of claims 1-6.

10. A computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to perform the object extraction method according to any one of claims 1 to 6.