CN116597316A

CN116597316A - Remote sensing image recognition method and device, equipment and readable storage medium

Info

Publication number: CN116597316A
Application number: CN202310672124.3A
Authority: CN
Inventors: 何恒超; 杨勇
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2023-06-07
Filing date: 2023-06-07
Publication date: 2023-08-15

Abstract

The application discloses a remote sensing image identification method, a remote sensing image identification device, remote sensing image identification equipment and a readable storage medium, relates to the technical field of computers, and solves the problem that the accuracy of target detection in a high-resolution remote sensing image is low at present. The method comprises the following steps: and acquiring a remote sensing image. Inputting the remote sensing image into the trained feature extraction network, and determining a feature map of the remote sensing image. And inputting the feature map into a region candidate network, and determining candidate regions and correction position information of the remote sensing image. And correcting the candidate region by using the correction position information to determine a target prediction region.

Description

Remote sensing image recognition method and device, equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for identifying a remote sensing image, a device, and a readable storage medium.

Background

Along with the continuous rising of high-resolution satellites in China and the rapid development of unmanned aerial vehicle technology, the types of remote sensing images which can be obtained are more and more, the resolution is also continuously improved, and the remote sensing images with rich types and high resolution provide powerful data support for remote sensing image target detection and state identification.

The currently commonly used target detection methods are single-stage and double-stage target detection methods based on deep learning, such as YOLO series and Fast RCNN series. When the target detection models of the YOLO series and the Fast RCNN series are applied to detection in a natural scene, a good detection effect can be obtained. However, under the condition of complex scenes, as the target scale in the remote sensing image of the cooling tower is smaller, the image arrangement is dense and the resolution is higher, a large number of different objects possibly have the same or similar characteristics, a large number of false detections can be generated in actual application, and the detection accuracy is lower.

Disclosure of Invention

The application provides a remote sensing image identification method, a remote sensing image identification device, a remote sensing image identification equipment and a readable storage medium, which are used for solving the problem that the accuracy of target detection in a high-resolution remote sensing image is low at present.

In order to achieve the above purpose, the application adopts the following technical scheme:

in a first aspect, the present application provides a method for identifying a remote sensing image, including: and acquiring a remote sensing image. And inputting the acquired remote sensing image into a trained feature extraction network, and determining a feature map of the remote sensing image. And inputting the feature map of the remote sensing image into a region candidate network, and determining candidate regions and correction position information of the remote sensing image. And correcting the candidate region by using the correction position information to determine a target prediction region.

According to the identification method of the remote sensing image, the remote sensing image is acquired and is input into the trained feature extraction network, the feature map of the remote sensing image is determined, the feature map is further input into the region candidate network, the candidate region and the correction position information in the feature map are extracted, and finally the candidate region is corrected by utilizing the correction position information, so that the target prediction region with higher accuracy can be obtained.

In one possible implementation, inputting the remote sensing image into the trained feature extraction network, determining a feature map of the remote sensing image, including: and inputting the remote sensing image into a residual error network, and determining at least one shallow feature map of the remote sensing image. And carrying out feature fusion on the at least one shallow feature map to obtain at least one deep feature map of the at least one shallow feature map.

In a possible implementation, the residual network comprises: a first residual block, a second residual block, a third residual block, a fourth residual block, a fifth residual block, and an attention module. Inputting the remote sensing image into a residual network, and determining at least one shallow feature map of the remote sensing image, wherein the method comprises the following steps: and inputting the remote sensing image into a first residual block, and determining a first shallow feature map. And inputting the first shallow feature map into a second residual block and an attention module, and determining a second shallow feature map. And inputting the second shallow feature map into a third residual block and an attention module, and determining the third shallow feature map. And inputting the third shallow feature map into a fourth residual block and an attention module, and determining the fourth shallow feature map. And inputting the fourth shallow feature map into a fifth residual block and an attention module, and determining the fifth shallow feature map.

In the possible implementation manner, the shallow feature map is input into the residual block and the attention module, so that a key target in the obtained feature map is provided with more weight information, a model is focused on the key target, the attention to other information in the feature map is reduced, the information overload problem is solved, and meanwhile, the task processing efficiency and accuracy are improved.

In one possible implementation, the at least one shallow feature map includes: the second shallow characteristic diagram, the third shallow characteristic diagram, the fourth characteristic diagram and the fifth characteristic diagram. Feature fusion is carried out on the at least one shallow feature map to obtain at least one deep feature map of the at least one shallow feature map, and the feature fusion comprises the following steps: and convolving the fifth shallow feature map to determine a first deep feature map. And carrying out up-sampling and pixel addition on the fourth shallow layer characteristic map and the first deep layer characteristic map, and determining a second deep layer characteristic map. And carrying out up-sampling and pixel addition on the third shallow layer characteristic map and the second deep layer characteristic map, and determining the third deep layer characteristic map. And carrying out up-sampling and pixel addition on the second shallow layer characteristic map and the third deep layer characteristic map, and determining a fourth deep layer characteristic map.

In the possible implementation manner, the feature map with the feature information of the two layers of feature maps can be obtained by up-sampling and pixel addition of the feature map output through the residual block and the attention module and the feature map of the next layer of residual block and the attention module, so that the feature information of the feature map is increased, and meanwhile, the accuracy of subsequent feature fusion is improved.

In a possible implementation manner, inputting the feature map into a region candidate network, determining candidate regions and correction position information of the remote sensing image, including: and inputting the feature map into a region candidate network, and determining candidate suggested regions of the feature map through an anchor frame. And inputting the feature map and the candidate suggested region into a pooling layer and a full-connection layer, and determining the candidate region and the correction position information of the remote sensing image.

In a second aspect, the present application further provides a device for identifying a remote sensing image, including: the device comprises an acquisition module and a determination module.

The acquisition module is used for acquiring the remote sensing image.

And the determining module is used for inputting the remote sensing image into the trained feature extraction network and determining a feature map of the remote sensing image. And inputting the feature map into a region candidate network, and determining candidate regions and correction position information of the remote sensing image. And correcting the candidate region by using the correction position information to determine a target prediction region.

A possible implementation manner, the determining module is specifically configured to input the remote sensing image into the residual network, and determine at least one shallow feature map of the remote sensing image.

The device for identifying the remote sensing image provided by the application further comprises:

and the fusion module is used for carrying out feature fusion on the at least one shallow feature map to obtain at least one deep feature map of the at least one shallow feature map.

In one possible implementation, the residual network includes: a first residual block, a second residual block, a third residual block, a fourth residual block, a fifth residual block, and an attention module.

The determining module is specifically configured to input the remote sensing image into the first residual block, and determine a first shallow feature map. And inputting the first shallow feature map into a second residual block and an attention module, and determining a second shallow feature map. And inputting the second shallow feature map into a third residual block and an attention module, and determining the third shallow feature map. And inputting the third shallow feature map into a fourth residual block and an attention module, and determining the fourth shallow feature map. And inputting the fourth shallow feature map into a fifth residual block and an attention module, and determining the fifth shallow feature map.

In one possible implementation, the at least one shallow feature map includes: the second shallow characteristic diagram, the third shallow characteristic diagram, the fourth characteristic diagram and the fifth characteristic diagram.

The fusion module is specifically configured to convolve the fifth shallow feature map and determine a first deep feature map. And carrying out up-sampling and pixel addition on the fourth shallow layer characteristic map and the first deep layer characteristic map, and determining a second deep layer characteristic map. And carrying out up-sampling and pixel addition on the third shallow layer characteristic map and the second deep layer characteristic map, and determining the third deep layer characteristic map. And carrying out up-sampling and pixel addition on the second shallow layer characteristic map and the third deep layer characteristic map, and determining a fourth deep layer characteristic map.

In one possible implementation manner, the determining module is specifically configured to input the feature map into the area candidate network, and determine a candidate suggestion area of the feature map through an anchor frame. And inputting the feature map and the candidate suggested region into a pooling layer and a full-connection layer, and determining the candidate region and the correction position information of the remote sensing image.

In a third aspect, the present application provides a remote sensing image recognition device having a function of implementing the remote sensing image recognition method of the first aspect or any one of the possible implementations. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In a fourth aspect, a computer readable storage medium is provided, in which instructions are stored which, when run on a computer, cause the computer to perform the method of identifying a remote sensing image of the first aspect or any of the possible implementations of the first aspect.

The technical effects of any one of the design manners of the second aspect to the fourth aspect may be referred to the technical effects of the different design manners of the first aspect, and will not be described herein.

For a detailed description of the second to fourth aspects of the present application and various implementations thereof, reference may be made to the detailed description of the first aspect and various implementations thereof; moreover, the advantages of the second aspect to the fourth aspect and the various implementations thereof may be referred to for analysis of the advantages of the first aspect and the various implementations thereof, and will not be described here again.

These and other aspects of the application will be more readily apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a remote sensing image recognition system according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a remote sensing image recognition method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a feature extraction network according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a regional candidate network according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of a training method of a detection model according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for identifying a remote sensing image according to an embodiment of the present application;

FIG. 7 is another specific flowchart of a remote sensing image recognition method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a remote sensing image recognition device according to an embodiment of the present application;

fig. 9 is another schematic structural diagram of a remote sensing image recognition device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a remote sensing image recognition device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Wherein, in the description of the present application, "/" means that the related objects are in a "or" relationship, unless otherwise specified, for example, a/B may mean a or B; the "and/or" in the present application is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural. In addition, in order to facilitate the clear description of the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. Meanwhile, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.

In addition, the network architecture and the service scenario described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided by the embodiments of the present application, and as a person of ordinary skill in the art can know, with evolution of the network architecture and appearance of a new service scenario, the technical solution provided by the embodiments of the present application is also applicable to similar technical problems.

For ease of understanding, related art terms related to the present application will be explained first.

A feature map, an image with features. The feature may include: color features, texture features, shape features, spatial relationship features, etc.

Attention mechanisms refer to allocation of computing resources to more important tasks in situations where computing power is limited. For example, in neural network learning, in general, the more parameters of a model are, the more expressive the model is, and the larger the amount of information stored in the model is, but this causes a problem of information overload. By introducing an attention mechanism, more weight information is added for key targets in the feature map so as to focus the model on the key targets, reduce the attention degree of other information in the feature map, even filter out irrelevant information, solve the problem of information overload and improve the processing efficiency and accuracy of tasks.

Currently, the commonly used target detection method is a single-stage and double-stage target detection method based on deep learning, for example, the single-stage target detection method is YOLO series, and the double-stage target detection method is Fast RCNN series. However, these two methods are only suitable for simple natural scenes, and if the recognition accuracy of the cooling tower in the remote sensing image is low in the remote sensing image with complex background, small target scale, dense arrangement and high resolution.

Based on the above, the application provides a remote sensing image identification method, which has the following basic principles: acquiring a remote sensing image, inputting the remote sensing image into a trained feature extraction network, determining a feature map of the remote sensing image, inputting the feature map into a region candidate network, determining a candidate region and correction position information of the remote sensing image, and finally correcting the candidate region by utilizing the correction position information to determine a target cooling tower. According to the method, the characteristic image of the remote sensing image is determined by inputting the remote sensing image into the trained characteristic extraction network, the characteristic image is further input into the region candidate network, the candidate region and the correction position information of the remote sensing image are determined, and finally the candidate region is corrected by utilizing the correction position information, so that the target cooling tower in the remote sensing image can be obtained, and the recognition accuracy of the target cooling tower is improved.

The following describes in detail the implementation of the embodiment of the present application with reference to the drawings.

The scheme provided by the application can be applied to the remote sensing image identification system 100 shown in fig. 1, and the system comprises: an image acquisition module 101, a feature extraction network 102, a region candidate network 103, a correction module 104 and a display module 105.

The image acquisition module 101 is configured to acquire a high-resolution remote sensing image, and input the acquired remote sensing image into the feature extraction network 102.

It should be noted that, the image capturing module 101 may be a camera, a video camera, a scanner, a three-dimensional onboard imager, or other devices with an image capturing function, which is not limited by the present application. Meanwhile, the image acquisition module 101 can be deployed on an unmanned aerial vehicle and is used for acquiring remote sensing images.

The feature extraction network 102 is configured to receive the remote sensing image input by the image acquisition module 101, and determine a feature map of the remote sensing image according to the method provided by the present application.

The region candidate network 103 is configured to process the feature map determined by the feature extraction network 102, and determine a candidate region and corrected location information of the remote sensing image.

The correction module 104 is configured to correct the candidate region of the remote sensing image by using the correction position information, so as to obtain a recognition result of the target cooling tower of the remote sensing image.

The display module 105 is used for displaying the identification result of the target cooling tower to a user.

It should be noted that, the display module 105 may be a device such as a client, a display, or the like, which may provide a picture playing interface for a user, which is not limited by the present application.

It should be noted that, the remote sensing image recognition system 100 illustrated in fig. 1 is merely illustrative of the application scenario of the present application, and is not limited to the application scenario of the present application.

Embodiments of the present application will be described in detail below with reference to the attached drawings.

In one aspect, the application discloses a remote sensing image identification method. As shown in fig. 2, the method includes:

s201, acquiring a remote sensing image.

Specifically, the process of obtaining the remote sensing image may be: one or more of photographic imaging, scanning imaging, or radar imaging, to which the present application is not limited.

The photographic imaging means that a photographic film is manufactured by uniformly coating silver halide substances on a film substrate, the scanning imaging means that two-dimensional images are obtained point by point and line by line, the radar imaging means that a narrow pulse is emitted to the side face through a transmitter, and the narrow pulse is received by a receiver to form a remote sensing image.

S202, inputting the remote sensing image into a trained feature extraction network, and determining a feature map of the remote sensing image.

Specifically, inputting the remote sensing image into a residual error network, determining at least one shallow feature map of the remote sensing image, and carrying out feature fusion on the at least one shallow feature map to obtain at least one deep feature map of the at least one shallow feature map.

As shown in fig. 3, fig. 3 is a schematic structural diagram of the feature extraction network. The residual network comprises: a first residual block, a second residual block, a third residual block, a fourth residual block, a fifth residual block, and an attention module.

Illustratively, a remote sensing image is input to a first residual block, and a first shallow feature map of the remote sensing image is obtained through convolution feature extraction. The first shallow feature map is input to a second residual block, a feature map 1 of the first shallow feature map is obtained through convolution feature extraction, and the feature map 1 is input to an attention module to obtain a second shallow feature map. And inputting the second shallow feature map to a third residual block, obtaining a feature map 2 of the second shallow feature map through convolution feature extraction, and inputting the feature map 2 to an attention module to obtain a third shallow feature map. And inputting the third shallow feature map into a fourth residual block, obtaining a feature map 3 of the third shallow feature map through convolution feature extraction, and inputting the feature map 3 into an attention module to obtain a fourth shallow feature map. And inputting the fourth shallow feature map into a fifth residual block, obtaining a feature map 4 of the fourth shallow feature map through convolution feature extraction, and inputting the feature map 4 into an attention module to obtain a fifth shallow feature map.

The convolution feature extraction may be a convolution operation performed on each image region of the feature map. The attention module is used for carrying out weight adjustment on the feature map by using an attention mechanism algorithm so that a key target (such as a target cooling tower) in the feature map has more weight information.

Further, after the first shallow feature map, the second shallow feature map, the third shallow feature map, the fourth shallow feature map and the fifth shallow feature map are obtained, feature fusion is performed on at least one shallow feature map, and at least one deep feature map of the at least one shallow feature map is obtained.

Illustratively, a convolution operation is performed on the fifth shallow feature map to obtain a first deep feature map. And upsampling the fourth shallow feature map and the first deep feature map, and adding the upsampled fourth shallow feature map and the first deep feature map to obtain a second deep feature map. And up-sampling the third shallow feature map and the second deep feature map, and adding the up-sampled third shallow feature map and the up-sampled second deep feature map to obtain a third deep feature map. And up-sampling the second shallow feature map and the third deep feature map, and adding the up-sampled second shallow feature map and third deep feature image pixels to obtain a fourth deep feature map.

For example, a convolution operation is performed on the fifth shallow feature map once 1*1 to obtain a feature map N5, and a convolution operation is performed on the feature map N5 once 3*3 to obtain a first deep feature map. And performing one 1*1 convolution operation on the fourth shallow feature map, performing two upsampling on the convolution result and the feature map N5, adding up-sampled result pixels to obtain a feature map N4, and performing one 3*3 convolution operation on the feature map N4 to obtain a second deep feature map. And performing one 1*1 convolution operation on the third shallow feature map, performing two upsampling on the convolution result and the feature map N4, adding up-sampled result pixels to obtain a feature map N3, and performing one 3*3 convolution operation on the feature map N3 to obtain a third deep feature map. And performing one 1*1 convolution operation on the second shallow feature map, performing two upsampling on the convolution result and the feature map N3, adding up-sampled result pixels to obtain a feature map N2, and performing one 3*3 convolution operation on the feature map N2 to obtain a fourth deep feature map.

And S203, inputting the feature map into a region candidate network, and determining candidate regions and correction position information of the remote sensing image.

Specifically, the feature map is input to a region candidate network, and candidate suggested regions of the feature map are determined through an anchor frame. And inputting the feature map and the candidate suggested region into a pooling layer and a full-connection layer, and determining the candidate region and the correction position information of the remote sensing image.

As shown in fig. 4, fig. 4 is a schematic structural diagram of a region candidate network, where the region candidate network includes a candidate region generating module and a selecting module. The selection module includes sliding window, convolution layer, softmax classification, and boundary regression.

The first deep feature map, the second deep feature map, the third deep feature map and the fourth deep feature map are input to a candidate region generation module of the region candidate network respectively to obtain an anchor frame corresponding to the feature map, and the anchor frame is utilized to screen features through IoU and NMS to obtain a candidate suggestion frame, wherein a region corresponding to the candidate suggestion frame is a candidate region of the deep feature map. And inputting the obtained candidate region and the deep feature map to a selection module, fixing the candidate region through the ROI pooling layer, classifying the candidate region through soft max to obtain the candidate region of the feature map, and obtaining the corrected position information of the feature map through regression operation.

It should be noted that, the training process of the detection model of the feature extraction network and the area candidate network may be as shown in fig. 5, and includes: and preliminarily acquiring a high-resolution remote sensing image dataset. And carrying out data enhancement on the high-resolution remote sensing image data set through preprocessing operation to obtain the remote sensing image data sets with various types. The preprocessing operation may include: geometric transformations such as flipping, rotating, cropping, scaling, translating, dithering, and the like may also include: the application relates to a pixel transformation method for adding salt and pepper noise, gaussian noise, carrying out Gaussian blur, adjusting HSV contrast, adjusting brightness, adjusting saturation, equalizing a histogram, adjusting white balance and the like, and the application does not limit a data preprocessing method.

Furthermore, a selection module is added by integrating a feature pyramid and a convolution attention mechanism, and a recognition network is optimized to build a model. And inputting the preprocessed high-resolution remote sensing image data set to train based on the built model to obtain the detection model with strong maturity and generalization capability. And finally, performing verification test on the detection model by using a remote sensing image test set to obtain a trained target detection model.

S204, correcting the candidate region by using the correction position information, and determining target prediction.

For example, the corrected location information may include x and y offsets of the candidate region, and scaling factors of the height and width of the candidate region. And then the candidate areas are adjusted by utilizing the offset of x and y and the scaling factors of height and width to obtain target prediction areas with target cooling towers.

The scheme illustrated in fig. 2 or 3 or 4 or 5 described above will be described in detail by way of specific examples.

As shown in fig. 6, an image acquisition module is used to acquire a high-resolution remote sensing image, the remote sensing image is input to a first residual block (CONV 1) of a feature extraction network to obtain a first shallow feature map, the first shallow feature map is input to CONV2, a second shallow feature map is obtained through a convolution attention module (CBAM), the second shallow feature map is input to CONV3, a third shallow feature map is obtained through the CBAM, the third shallow feature map is input to CONV4, a fourth shallow feature map is obtained through the CBAM, the fourth shallow feature map is input to CONV5, and a fifth shallow feature map is obtained through the CBAM.

Further, the fifth shallow feature map is convolved once 1*1 to obtain a feature map N5, and the feature map N5 is convolved once 3*3 to obtain a feature map P5. The fourth shallow feature map is convolved 1*1 once, up-sampled twice with feature map N5 and added to pixels, and convolved 3*3 again to obtain feature map P4. The third shallow feature map is convolved 1*1 once, up-sampled twice with feature map N4 and added to pixels, and convolved 3*3 again to obtain feature map P3. The second shallow feature map is convolved 1*1 once, up-sampled twice with feature map N3 and added to pixels, and convolved 3*3 again to obtain feature map P2. The feature map P2, the feature map P3, the feature map P4, and the feature map P5 are feature map sets.

Still further, as shown in fig. 7, the feature map set is input to the area candidate network, the feature map P2 is input to the area candidate network 2, the feature map P3 is input to the area candidate network 3, the feature map P4 is input to the area candidate network 4, and the feature map P5 is input to the area candidate network 5. Finally, a final target detection area is obtained through sliding window, convolution calculation, softmax classification and continuous correction of boundary regression in the selection module, wherein the target detection area is the position of the target cooling tower.

The above description has been presented with respect to the solution provided by the embodiment of the present application mainly from the point of view of the working principle of the device. It is to be appreciated that the computing device, in order to implement the functionality described above, includes corresponding hardware structures and/or software modules that perform the various functions. Those of skill in the art will readily appreciate that the various illustrative algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application can divide the functional modules of the computing device according to the method example, for example, each functional module can be divided corresponding to each function, or two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

Fig. 8 shows a possible schematic diagram of the identification device of a remote sensing image in the above embodiment in the case of dividing the respective functional modules with the respective functions. As shown in fig. 8, the remote sensing image recognition apparatus 800 may include: acquisition module 801, determination module 802.

The acquiring module 801 is configured to enable the remote sensing image recognition device 800 to execute S201 of the remote sensing image recognition method shown in fig. 2.

A determining module 802, configured to support the remote sensing image recognition apparatus 800 to execute S202 or S203 or S204 of the remote sensing image recognition method shown in fig. 2.

It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

The remote sensing image recognition device 800 provided in the embodiment of the present application is used for executing the above remote sensing image recognition method, so that the same effect as the above remote sensing image recognition method can be achieved.

Further, as shown in fig. 9, the apparatus 800 for identifying a remote sensing image according to the embodiment of the present application may further include: fusion module 803.

The fusion module 803 is configured to perform feature fusion on at least one shallow feature map in the remote sensing image recognition method by using the remote sensing image recognition device 800 to obtain at least one deep feature map of the at least one shallow feature map.

The embodiment of the present application further provides a device for identifying a remote sensing image, as shown in fig. 10, where the device 1000 for identifying a remote sensing image may include a memory 1001, a processor 1002, and a transceiver 1003, where the memory 1001 and the processor 1002 may be connected by a bus or a network or other manners, and in fig. 10, the connection is exemplified by a bus.

The processor 1002 may be a central processing unit (central processing unit, CPU). The processor 1002 may also be other general purpose processors, digital remote sensing image identifiers (digital signal processor, DSP), application specific integrated circuits (application specific integrated circu it, ASIC), field programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination thereof.

The processor 1002 is configured to attribute control rights of virtual objects participating in collision, which do not belong to the same server, to the same server by using the remote sensing image identification method provided by the present application.

The memory 1001 may be a volatile memory (RAM) such as a random-access memory (RAM); or a nonvolatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or a combination of the above-mentioned types of memories for storing application code, configuration files, data information, or other content in which the methods of the application may be implemented.

The memory 1001 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as metadata collection modules and the like in the embodiments of the present application. The processor 1002 executes various functional applications of the processor and data processing by running non-transitory software programs, instructions, and modules stored in the memory 1001.

The memory 1001 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created by the processor 1002, etc. In addition, memory 1001 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 1001 optionally includes memory remotely located with respect to processor 1002, which may be connected to processor 1002 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transceiver 1003 is used for information interaction between the identification device 1000 of the remote sensing image and other devices.

The one or more modules are stored in the memory 1001 and when executed by the processor 1002 perform the functions of the method of identifying a remote sensing image in the embodiment shown in fig. 2.

The embodiment of the application also provides a computer readable storage medium, wherein instructions are stored on the computer readable storage medium, and the instructions are executed to execute the remote sensing image identification method and the related steps in the method embodiment.

It will be apparent to those skilled in the art from this description that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application, or a contributing part or all or part of the technical solution, may be embodied in the form of a software product, where the software product is stored in a storage medium, and includes several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for identifying a remote sensing image, comprising:

acquiring a remote sensing image;

inputting the remote sensing image into a trained feature extraction network, and determining a feature map of the remote sensing image;

inputting the feature map to a region candidate network, and determining candidate regions and correction position information of the remote sensing image;

and correcting the candidate region by using the correction position information to determine a target prediction region.

2. The method of claim 1, wherein inputting the remote sensing image into a trained feature extraction network, determining a feature map of the remote sensing image, comprises:

inputting the remote sensing image into a residual error network, and determining at least one shallow feature map of the remote sensing image;

and carrying out feature fusion on the at least one shallow feature map to obtain at least one deep feature map of the at least one shallow feature map.

3. The method of claim 2, wherein the residual network comprises: a first residual block, a second residual block, a third residual block, a fourth residual block, a fifth residual block and an attention module; the inputting the remote sensing image into a residual error network, and determining at least one shallow feature map of the remote sensing image comprises the following steps:

inputting the remote sensing image into the first residual block, and determining a first shallow feature map;

inputting the first shallow feature map into the second residual block and the attention module, and determining a second shallow feature map;

inputting the second shallow feature map into the third residual block and the attention module, and determining a third shallow feature map;

inputting the third shallow feature map into the fourth residual block and the attention module, and determining a fourth shallow feature map;

and inputting the fourth shallow feature map into the fifth residual block and the attention module to determine a fifth shallow feature map.

4. The method of claim 2, wherein the at least one shallow feature map comprises: a second shallow feature map, a third shallow feature map, a fourth feature map, and a fifth feature map; the feature fusion is performed on the at least one shallow feature map to obtain at least one deep feature map of the at least one shallow feature map, which comprises the following steps:

convolving the fifth shallow feature map to determine a first deep feature map;

upsampling and pixel addition are carried out on the fourth shallow layer feature map and the first deep layer feature map, and a second deep layer feature map is determined;

upsampling and pixel addition are carried out on the third shallow layer feature map and the second deep layer feature map, and a third deep layer feature map is determined;

and carrying out up-sampling and pixel addition on the second shallow layer characteristic map and the third deep layer characteristic map, and determining a fourth deep layer characteristic map.

5. The method of claim 1, wherein said inputting the feature map into a region candidate network, determining candidate regions and corrected location information for the remote sensing image, comprises:

inputting the feature map to the region candidate network, and determining candidate suggested regions of the feature map through an anchor frame;

and inputting the feature map and the candidate suggested area into a pooling layer and a full-connection layer, and determining the candidate area and the correction position information of the remote sensing image.

6. A remote sensing image recognition device, comprising:

the acquisition module is used for acquiring the remote sensing image;

the determining module is used for inputting the remote sensing image into the trained feature extraction network and determining a feature map of the remote sensing image; inputting the feature map to a region candidate network, and determining candidate regions and correction position information of the remote sensing image; and correcting the candidate region by using the correction position information to determine a target prediction region.

7. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

the determining module is specifically configured to input the remote sensing image to a residual network, and determine at least one shallow feature map of the remote sensing image;

the device further comprises:

8. The apparatus of claim 7, wherein the residual network comprises: a first residual block, a second residual block, a third residual block, a fourth residual block, a fifth residual block and an attention module;

the determining module is specifically configured to input the remote sensing image into the first residual block, and determine a first shallow feature map; inputting the first shallow feature map into the second residual block and the attention module, and determining a second shallow feature map; inputting the second shallow feature map into the third residual block and the attention module, and determining a third shallow feature map; inputting the third shallow feature map into the fourth residual block and the attention module, and determining a fourth shallow feature map; and inputting the fourth shallow feature map into the fifth residual block and the attention module to determine a fifth shallow feature map.

9. The apparatus of claim 7, wherein the at least one shallow feature map comprises: a second shallow feature map, a third shallow feature map, a fourth feature map, and a fifth feature map;

the fusion module is specifically configured to convolve the fifth shallow feature map, and determine a first deep feature map; upsampling and pixel addition are carried out on the fourth shallow layer feature map and the first deep layer feature map, and a second deep layer feature map is determined; upsampling and pixel addition are carried out on the third shallow layer feature map and the second deep layer feature map, and a third deep layer feature map is determined; and carrying out up-sampling and pixel addition on the second shallow layer characteristic map and the third deep layer characteristic map, and determining a fourth deep layer characteristic map.

10. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

the determining module is specifically configured to input the feature map to the area candidate network, and determine a candidate suggestion area of the feature map through an anchor frame; and inputting the feature map and the candidate suggested area into a pooling layer and a full-connection layer, and determining the candidate area and the correction position information of the remote sensing image.

11. An apparatus for identifying a remote sensing image, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of identifying a remote sensing image as claimed in any one of claims 1 to 5.

12. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the method of identifying a remote sensing image according to any of claims 1-5.