CN114913428A

CN114913428A - Remote sensing image target detection system based on deep learning

Info

Publication number: CN114913428A
Application number: CN202210446366.6A
Authority: CN
Inventors: 孟庆松; 张海
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-08-16

Abstract

The invention discloses a remote sensing image target detection system based on deep learning, and relates to the technical field of image target detection; the detection method comprises the following steps: the module is integrated into a whole and is used for replacing a main network ResNet-101 in the original detection algorithm and inputting an image; a HRNet high resolution network; performing feature fusion on the FPN-like structure; a channel domain attention mechanism; a spatial domain attention mechanism; a feature map; predicting key points; predicting the polar diameter and the polar angle of the polar coordinate system; obtaining pole coordinates (x, y), and obtaining a polar diameter and a polar angle; finally outputting a target prediction frame; the method is suitable for detecting the target of the remote sensing image by integrating a plurality of detection difficulties of the remote sensing image, so as to realize quick detection, reduce the detection workload and shorten the time; accurate target detection can be realized, and the network detection precision is higher.

Description

Remote sensing image target detection system based on deep learning

Technical Field

The invention belongs to the technical field of image target detection, and particularly relates to a remote sensing image target detection system based on deep learning.

Background

The target detection task based on the remote sensing image is to identify and position specific objects in the image, and plays an important role in sea vessel control, environmental quality monitoring, ground planning and layout and the like. Therefore, how to improve the performance of target detection to obtain a detection result with more accurate positioning and more detailed classification becomes a key research content in the field. The primary condition of positioning and identification in the task of detecting the remote sensing image target is to extract the characteristics of the image. The traditional detection methods comprise a template matching method, a region prior-based method, an image analysis method and the like, all of which need manual feature design, have strong specificity of features and do not have cross-class universality.

Nowadays, the aerospace technology is rapidly developed in China, the remote sensing technology is continuously improved and matured, the remote sensing image is more convenient and efficient to obtain than before, and the method has the characteristics of large data volume, rich varieties, clear imaging, high timeliness and the like, and is very in line with the development trend of the current social big data era. Therefore, a solid foundation is laid for the development of the remote sensing image detection technology, and more valuable and abundant and diverse data information can be provided for the deep learning method. Currently, for remote sensing images, a deep learning target detection method can be migrated to the field of remote sensing detection. However, the remote sensing image has certain difference from the natural scene image, and the remote sensing image has unique characteristics such as high resolution, small and dense targets, complex background and the like due to different imaging modes. Therefore, the migration method needs to be improved in a targeted manner, so that the difficult problems in remote sensing image detection are solved, and overcoming the critical detection difficulties becomes a key research direction for many scholars in the field. The challenges of target detection of visible light remote sensing images are as follows: 1. with the improvement of resolution and the expansion of an imaging area, a large number of complex natural and artificial backgrounds exist in a visible light remote sensing image, and serious interference is generated on the detection of a target. 2. The direction change is large: the remote sensing image is shot from an aerial visual angle, a scene is a top view, targets are distributed in the scene at various angles, the adaptability of most of the existing algorithms to the angles is not high, and the robustness is not enough when the multi-direction problem is processed. In addition, when a multi-direction target is positioned by a classical horizontal frame positioning mode, a surrounding frame is not compact enough, and positioning is not fine enough. 3. The remote sensing image target has large scale change, in order to solve the problem, an anchor detector needs to be provided with anchors with various scales, and a large amount of low-quality positive sample anchors are introduced in the anchor matching process, so that the detection precision of the detector is influenced. 4. The design of the anchor in the detection method with the anchor depends on the experience of people seriously, the pertinence to the data set is very strong, the data set needs to be redesigned when being replaced, and the generalization is lacked.

Disclosure of Invention

To solve the problems in the background art; the invention aims to provide a remote sensing image target detection system based on deep learning.

The invention relates to a remote sensing image target detection system based on deep learning, which comprises the following detection methods: the module is fused into a whole and is used for replacing a backbone network ResNet-101, (1) and an input image in the original detection algorithm; (2) HRNet high resolution network; (3) carrying out feature fusion on the FPN-like structure; (4) attention mechanism of channel region; (5) spatial domain attention mechanism; (6) a characteristic diagram; (7): (7.1) predicting key points; (7.2) predicting the polar diameter and the polar angle of the polar coordinate system; (8): (8.1), obtaining pole coordinates (x, y), (8.2), and obtaining a polar diameter and a polar angle; (9) and finally outputting the target prediction frame.

Compared with the prior art, the invention has the beneficial effects that:

the method is suitable for detecting the target of the remote sensing image by integrating a plurality of detection difficulties of the remote sensing image, so that the rapid detection is realized, the detection workload is reduced, and the time is shortened.

Secondly, accurate target detection can be realized, and the network detection precision is higher.

Drawings

For ease of illustration, the invention is described in detail by the following detailed description and the accompanying drawings.

FIG. 1 is a schematic diagram of the original P-RSDet framework;

FIG. 2 is a flow chart of an improved schematic network structure of the present invention;

FIG. 3 is a schematic diagram of the overall network architecture of the present invention;

FIG. 4 is a schematic diagram of the structures of BottleNeck and BasicBlock in the present invention.

Detailed Description

In order that the objects, aspects and advantages of the invention will become more apparent, the invention will be described by way of example only, and in connection with the accompanying drawings. It is to be understood that this description is made only by way of example and not as a limitation on the scope of the invention. The structure, proportion, size and the like shown in the drawings are only used for matching with the content disclosed in the specification, so that the person skilled in the art can understand and read the description, and the description is not used for limiting the limit condition of the implementation of the invention, so the method has no technical essence, and any structural modification, proportion relation change or size adjustment still falls within the range covered by the technical content disclosed by the invention without affecting the effect and the achievable purpose of the invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.

The specific implementation mode adopts the following technical scheme: as shown in fig. 1, a scheme for improving the detection accuracy is provided for the original frame P-RSDet keypoint detection method. And the module is fused into a whole and is used for replacing a backbone network ResNet-101 in the original detection algorithm (P-RSDet). The network structure before and after the improvement is expected: as shown in fig. 2 and 3;

in the feature extraction network shown in fig. 3, BottleNeck is used for the transformation of the internal feature map in block 1, see the left half of fig. 4, and BasicBlock is used for the transformation of the internal feature maps in

blocks

2, 3, and 4, see the right half of fig. 4. The up-sampling uses bilinear interpolation followed by 1X1 convolution to adjust the number of channels, and the down-sampling uses 2 steps and 3X3 convolution kernel size. In an FPN (feature pyramid) -like structure feature fusion network, for a target with a small size, pyramid levels of a middle lower layer containing more image bottom layer information and detail information often have a better detection effect, the image layers have higher resolution, and the perception degree of the whole information of an image is also higher. It can be speculated that these pyramid levels of the lower partial layers will have more play space in the remote sensing image. Therefore, the generation mode of each level of the classical feature pyramid network is reserved, and branches are added on the basis of the classical feature pyramid network, namely, information of the pyramid at the upper layer is added in the pyramid levels at the middle and lower layers, namely, feature maps after the feature maps are sampled are fused to add more abstract semantic information. It is considered necessary to add high-level features on the basis of bottom-level information, and for complex scenes and huge target scale spans of remote sensing images, the fusion of high-level semantic information has good effect. Therefore, the low-level feature map not only contains the original bottom-layer image information and detail features, but also greatly enhances the perception capability of the target after the injection of the high-level semantic features. Through the fusion structure of the multilayer pyramid, the whole network has stronger adaptability to small-size targets and targets with large size change.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A remote sensing image target detection system based on deep learning is characterized in that: the detection method comprises the following steps: the module is fused into a whole and is used for replacing a backbone network ResNet-101, (1) and an input image in the original detection algorithm; (2) HRNet high resolution network; (3) carrying out feature fusion on the FPN-like structure; (4) channel domain attention mechanism; (5) a spatial domain attention mechanism; (6) a characteristic diagram; (7): (7.1) predicting key points; (7.2) predicting the polar diameter and the polar angle of the polar coordinate system; (8): (8.1), obtaining pole coordinates (x, y), (8.2), and obtaining a polar diameter and a polar angle; (9) and finally outputting the target prediction frame.