CN116935155A

CN116935155A - Multi-stage remote sensing image target detection method, device, equipment and medium

Info

Publication number: CN116935155A
Application number: CN202310766856.9A
Authority: CN
Inventors: 许婷婷; 邱吉冰; 王颖; 王原原; 毛旷
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-10-24

Abstract

The application relates to a multi-stage remote sensing image target detection method, a device, a computer device and a computer readable storage medium, wherein the multi-stage remote sensing image target detection method comprises the following steps: inputting the remote sensing image into a first target detection network trained in advance, and outputting a plurality of prediction frames and corresponding prediction information; obtaining a first target detection result under the condition that the confidence coefficient corresponding to the prediction frame is larger than a second threshold value; under the condition that the confidence coefficient corresponding to the prediction frame is between a first threshold value and a second threshold value, cutting, rotating and scaling the remote sensing image according to the position information of the prediction frame, and inputting the remote sensing image into a pre-trained second target detection network to obtain a second target detection result; and obtaining a final target detection result based on the first target detection result and the second target detection result. The problem of low accuracy of the target detection result of the remote sensing image is solved, and the accuracy of the target detection result of the remote sensing image is improved.

Description

Multi-stage remote sensing image target detection method, device, equipment and medium

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a method and apparatus for detecting a target of a multi-stage remote sensing image, a computer device, and a computer readable storage medium.

Background

Object detection is one of the most important and challenging branches in the field of computer vision. It is widely used in life of people, such as monitoring safety, automatic driving, etc. The task of object detection is to locate instances of a class of semantic objects.

The remote sensing image target detection is used for obtaining timely and accurate battlefield information, capturing strategic hitting targets, providing accurate qualitative positioning information and the like in high-tech military countermeasure. Plays a role in civil fields such as resource detection, environment monitoring, city planning and the like. By carrying out target recognition on remote sensing images obtained by satellites, aviation or aerospace craft, the situation information such as topography, equipment, army call and the like of the photographed area can be known.

Different from a general ground image, the remote sensing image is a nodding image, and the acquired target has the problems of multiple angles and multiple scales due to the fact that the postures and the heights of satellites and aerospace aircrafts are changed in the flying process. For example, the vessel may be oriented in any of the up, down, left, and right directions, and the resolution may be 1m per pixel or 5m per pixel. The current ground target detection algorithm is mainly aimed at single angles, such as pedestrian detection and vehicle detection under a monitoring camera, wherein the target object is a vertical person and a vehicle, the target angles are similar, and the pedestrians or the vehicles are all vertical to the horizontal axis. The current target regression frame is a horizontal rectangular frame, and the application of horizontal rectangular frame prediction in remote sensing image target detection can lead to large characteristic change in the target object frame under different angles.

Yolo algorithm has been widely used in industry based on algorithm instantaneity. Yolo is a typical single-stage target detection algorithm, and the principle is that through a multi-scale (generally three-scale) feature layer, targets with different sizes are respectively detected, each feature point is responsible for detecting an object with a center point at the position of the object, and each feature point can output a corresponding prediction result.

For the target detection algorithm, confidence threshold screening is needed in the model post-processing stage, and the distribution of the deep learning data set is not completely uniform, so that the confidence distribution of the target detection result of each scene is not uniform in actual use. For example, the confidence of the noise image is low as a whole, and the confidence distribution of each target detection result is concentrated. The high-definition image can well detect the target, so that the confidence of the target-free result is high, the confidence of the target-free result is low, and the confidence distribution is scattered. In order to keep the accuracy of the detection results high, a high confidence screening threshold is typically set, which results in a low recall of the target. In addition, due to the multi-angle and multi-scale characteristics of the remote sensing image target, the data distribution range is wide, and a single model is difficult to achieve both a large target and a small target.

Aiming at the problem of low accuracy of a remote sensing image target detection result in the related technology, no effective solution is proposed at present.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a multi-stage remote sensing image target detection method, apparatus, computer device, and computer readable storage medium.

In a first aspect, an embodiment of the present application provides a method for detecting a target of a multi-stage remote sensing image, where the method includes:

inputting the remote sensing image into a first target detection network trained in advance, and outputting a plurality of prediction frames and corresponding prediction information; the prediction information comprises confidence level, classification probability and position information of the prediction frame;

obtaining a first target detection result under the condition that the confidence coefficient corresponding to the prediction frame is larger than a second threshold value;

cutting the remote sensing image according to the position information of the prediction frame under the condition that the confidence coefficient corresponding to the prediction frame is between a first threshold value and a second threshold value, obtaining at least one first remote sensing image, rotating and scaling each first remote sensing image, and inputting the first remote sensing image into a pre-trained second target detection network to obtain a second target detection result; wherein the first threshold is less than the second threshold;

and obtaining a final target detection result based on the first target detection result and the second target detection result.

In one embodiment, the position information includes a length and a width of the prediction frame, an included angle between a long side of the prediction frame and an x-axis positive direction in a rectangular coordinate system, and a center coordinate of the prediction frame; the rectangular coordinate system is established by taking the vertex of the left upper corner of the remote sensing image as an original point, taking the horizontal right as the positive direction of the x axis and taking the vertical downward as the positive direction of the y axis.

In one embodiment, the rotating and scaling each of the first remote sensing images includes:

and rotating each first remote sensing image to enable the included angle between the long side of each first remote sensing image and the x-axis direction to be zero, and scaling each first remote sensing image according to the same preset size.

In one embodiment, the rotating and scaling the first remote sensing images, and inputting the first remote sensing images to a pre-trained second target detection network, and obtaining a second target detection result includes:

generating a corresponding scaling information map by taking the scaling ratio of each first remote sensing image as a pixel value and the preset size as a size;

respectively splicing the scaled first remote sensing image and the corresponding scaled information graph to obtain at least one second remote sensing image;

and inputting at least one second remote sensing image into a pre-trained second target detection network to obtain a second target detection result.

In one embodiment, the inputting the at least one second remote sensing image into a pre-trained second target detection network, and obtaining a second target detection result includes:

inputting at least one second remote sensing image into a pre-trained second target detection network, and outputting at least one prediction frame and corresponding prediction information;

and under the condition that the confidence coefficient corresponding to the prediction frame is larger than a third threshold value, obtaining a second target detection result.

In one embodiment, before inputting the remote sensing image into the pre-trained first target detection network, the method comprises:

acquiring a first training sample, wherein the first training sample is a remote sensing image which is subjected to data augmentation after rotation, shearing, splicing, brightness contrast change, blurring and scaling;

inputting the first training sample into a convolutional neural network, extracting a feature map by utilizing a feature extraction network, further fusing multi-scale features by utilizing a pyramid network structure to obtain a final feature map, and acquiring a detection result based on the feature map;

calculating a loss function based on the detection result and the real result;

and updating parameters of the convolutional neural network based on the loss function to obtain the first target detection network trained in advance.

In one embodiment, before the rotating and scaling the first remote sensing images and inputting the first remote sensing images into the pre-trained second target detection network, the method includes:

acquiring a second training sample, wherein the second training sample comprises a sheared remote sensing image with a target object and a remote sensing image without a target object, and performing rotation and scaling treatment on the sheared remote sensing image;

inputting the second training sample into a convolutional neural network to extract a feature map, and acquiring a detection result based on the feature map;

calculating a loss function based on the detection result and the real result;

and updating parameters of the convolutional neural network based on the loss function to obtain the pre-trained second target detection network.

In one embodiment, the rotation and scaling of the remote sensing image includes:

and disturbing the remote sensing image, wherein the disturbance comprises at least one of central position disturbance, length-width disturbance and angle disturbance.

In a second aspect, an embodiment of the present application further provides a multi-stage remote sensing image target detection apparatus, where the apparatus includes:

the input module is used for inputting the remote sensing image into a first target detection network trained in advance and outputting a plurality of prediction frames and corresponding prediction information; the prediction information comprises confidence level, classification probability and position information of the prediction frame;

the first target detection module is used for obtaining a first target detection result under the condition that the confidence coefficient corresponding to the prediction frame is larger than a second threshold value;

the second target detection module is used for shearing the remote sensing image according to the position information of the prediction frame under the condition that the confidence coefficient corresponding to the prediction frame is between a first threshold value and a second threshold value, obtaining at least one first remote sensing image, rotating and scaling each first remote sensing image, and inputting the first remote sensing image into a pre-trained second target detection network to obtain a second target detection result; wherein the first threshold is less than the second threshold;

and the obtaining module is used for obtaining a final target detection result based on the first target detection result and the second target detection result.

In a third aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements a method as described in the first aspect above.

The multi-stage remote sensing image target detection method, the multi-stage remote sensing image target detection device, the computer equipment and the computer readable storage medium output a plurality of prediction frames and corresponding prediction information by inputting the remote sensing image into a first target detection network trained in advance; the prediction information comprises confidence level, classification probability and position information of the prediction frame; obtaining a first target detection result under the condition that the confidence coefficient corresponding to the prediction frame is larger than a second threshold value; cutting the remote sensing image according to the position information of the prediction frame under the condition that the confidence coefficient corresponding to the prediction frame is between a first threshold value and a second threshold value, obtaining at least one first remote sensing image, rotating and scaling each first remote sensing image, and inputting the first remote sensing image into a pre-trained second target detection network to obtain a second target detection result; wherein the first threshold is less than the second threshold; and obtaining a final target detection result based on the first target detection result and the second target detection result. The problem of low accuracy of the target detection result of the remote sensing image is solved, and the accuracy of the target detection result of the remote sensing image is improved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a block diagram of the hardware architecture of a terminal of a multi-stage remote sensing image object detection method in one embodiment;

FIG. 2 is a flow diagram of a method for multi-stage remote sensing image target detection in one embodiment;

FIG. 3 is a diagram of position information of a prediction block in one embodiment;

FIG. 4 is a schematic diagram of rotating and scaling a first telemetry image, in one embodiment;

FIG. 5 is a flow chart of a second target detection result in one embodiment;

FIG. 6 is a flow chart illustrating steps performed in step S503 in one embodiment;

FIG. 7 is a flow diagram of a first object detection network obtaining pre-training in one embodiment;

FIG. 8 is a flow diagram of first object detection network training in one embodiment;

FIG. 9 is a flow diagram of a second target detection network obtaining pre-training in one embodiment;

FIG. 10 is a flow diagram of second object detection network training in one embodiment;

FIG. 11 is a schematic diagram of perturbing a remote sensing image with an object in one embodiment;

FIG. 12 is a flow chart of a method for multi-stage remote sensing image target detection in accordance with a preferred embodiment;

FIG. 13 is a block diagram of a multi-stage remote sensing image object detection device in one embodiment.

Detailed Description

The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application.

It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.

The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or similar computing device. For example, the method runs on a terminal, and fig. 1 is a block diagram of a hardware structure of the terminal of the multi-stage remote sensing image target detection method of the present embodiment. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 102 and a memory 104 for storing data, wherein the processors 102 may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the multi-stage remote sensing image object detection method in the present embodiment, and the processor 102 executes the computer program stored in the memory 104, thereby performing various functional applications and data processing, that is, implementing the method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (NIC) that may be connected to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

The embodiment of the application provides a multi-stage remote sensing image target detection method, as shown in fig. 2, comprising the following steps:

step S201, inputting a remote sensing image into a first target detection network trained in advance, and outputting a plurality of prediction frames and corresponding prediction information; the prediction information comprises confidence level, classification probability and position information of the prediction frame;

specifically, the embodiment of the application can adopt any target detection algorithm to construct a pre-trained first target detection network, including but not limited to a Yolov5 algorithm, wherein the first target detection network has a pyramid structure, and can detect target objects under multiple scales simultaneously. In the embodiment of the application, the Yolov5 algorithm is taken as an example, and target detection is carried out on 3 scales. Specifically, the remote sensing image is input to a first target detection network trained in advance, and a plurality of prediction frames and prediction information corresponding to each prediction frame are output. Wherein the output plurality of prediction frames comprises 3 scales of different sizes. The prediction information includes confidence, classification probability and position information of the prediction frame, and the prediction information can be expressed as x _c ,y _c ,w,h,conf,θ,c ₀ ,…,c _n-1 ]Wherein conf is confidence, [ c ] ₀ ,…,c _n-1 ]For the classification probability of each category, [ x ] _c ,y _c ,w,h,θ]Is the position information of the prediction frame.

As shown in fig. 3, the position information x of one of the prediction frames is shown _c ,y _c ,w,h,θ]. Specifically, a rectangular coordinate system is established by taking the vertex of the left upper angle of the remote sensing image as an origin, taking the horizontal right as the positive direction of the x axis and taking the vertical downward as the positive direction of the y axis, w and h are respectively the length and the width of a prediction frame, and θ is the length side of the prediction frame and the long side of the prediction frame in the rectangular coordinate systemIncluded angle of positive direction of x axis, x _c And y _c Respectively the center coordinates of the prediction frames. Wherein, considering the periodicity of the angle, θ ε [0, pi ].

Further, the output plurality of prediction frames are preprocessed. Specifically, a first threshold sigma is set ₁ Confidence conf<σ ₁ Is deleted, and the confidence level conf is more than or equal to sigma is reserved ₁ Non-maximum suppression (Non-Maximum Suppression, NMS) is performed on the remaining prediction frames to obtain N prediction frames, and the subsequent object detection operations are performed on the N prediction frames.

Step S202, obtaining a first target detection result under the condition that the confidence coefficient corresponding to the prediction frame is larger than a second threshold value;

specifically, a second threshold sigma is set ₂ (σ ₂ >σ ₁ ) Based on the second threshold sigma ₂ Screening out conf is more than or equal to sigma ₂ N ' prediction frames (N '. Ltoreq.n), the N ' prediction frames and corresponding prediction information, i.e. the first target detection result.

Step S203, cutting the remote sensing image according to the position information of the prediction frame under the condition that the confidence coefficient corresponding to the prediction frame is between a first threshold value and a second threshold value, and inputting the rotated and scaled first remote sensing image into a pre-trained second target detection network to obtain a second target detection result; wherein the first threshold is less than the second threshold;

step S204, obtaining a final target detection result based on the first target detection result and the second target detection result.

Specifically, the sum of the first target detection result and the second target detection result is the final target detection result.

Step S201 to step S204, inputting the remote sensing image to a first target detection network trained in advance, and outputting a plurality of prediction frames and corresponding prediction information; the prediction information comprises confidence level, classification probability and position information of the prediction frame; obtaining a first target detection result under the condition that the confidence coefficient corresponding to the prediction frame is larger than a second threshold value; cutting the remote sensing image according to the position information of the prediction frame under the condition that the confidence coefficient corresponding to the prediction frame is between a first threshold value and a second threshold value, obtaining at least one first remote sensing image, rotating and scaling each first remote sensing image, and inputting the first remote sensing image into a pre-trained second target detection network to obtain a second target detection result; wherein the first threshold is less than the second threshold; and obtaining a final target detection result based on the first target detection result and the second target detection result. The method solves the problem of low accuracy of the remote sensing image target detection model in the related technology, and improves the accuracy of the target detection result through a multi-stage target detection network.

Specifically, as shown in fig. 4, a schematic diagram of rotating one of the first remote sensing images so that an included angle between a long side of the first remote sensing image and an x-axis direction is zero, and scaling the first remote sensing image according to a preset size (size×size), where size×size represents a scaled length and width of the first remote sensing image is shown. Illustratively, the size×size is set to 384×384 in the present embodiment.

In one embodiment, as shown in fig. 5, the rotating and scaling the first remote sensing images and inputting the first remote sensing images into a pre-trained second target detection network, and obtaining the second target detection result includes the following steps:

step S501, taking the scaling ratio of each first remote sensing image as a pixel value, and taking the preset size as a size, to generate a corresponding scaling information map;

specifically, the scaling of each first remote sensing image is calculated, namelyLong side scaling ratio_w and short side scaling ratio_h, whereinAnd generating a scaling information graph with the size of size multiplied by size by taking the ratio_w and the ratio_h as pixel values which alternate in turn.

Step S502, respectively performing splicing processing on the scaled first remote sensing image and the corresponding scaled information graph to obtain at least one second remote sensing image;

specifically, the sizes of the scaled first remote sensing image and the corresponding scaled information graph are size multiplied by size, the scaled first remote sensing image and the corresponding scaled information graph are respectively subjected to splicing processing, specifically, a concat function is adopted to fuse the features, the increase of the channel number of the scaled first remote sensing image is realized, and the corresponding second remote sensing image is obtained.

Step S503, inputting at least one second remote sensing image to a second target detection network trained in advance, so as to obtain a second target detection result.

In this embodiment, the first remote sensing image after scaling may be blurred, which results in different blur degrees of the first remote sensing image. Therefore, the scaling information diagram corresponding to the first remote sensing image is combined, scaling characteristics are added on the basis of reflectivity characteristics of the remote sensing image, and the accuracy of the second target detection result is improved.

In one embodiment, as shown in fig. 6, the inputting at least one second remote sensing image into the pre-trained second target detection network, and obtaining the second target detection result includes the following steps:

step S601, inputting at least one second remote sensing image into a pre-trained second target detection network, and outputting at least one prediction frame and corresponding prediction information;

specifically, one of the second remote sensing images is input to a pre-trained second target detection network, a plurality of prediction frames are obtained, and one prediction frame with the highest confidence coefficient and corresponding prediction information are taken as output. Therefore, if M second remote sensing images are obtained in step S502, the M second remote sensing images are input to the second target detection network trained in advance, and M prediction frames and corresponding prediction information are output.

Step S602, obtaining a second target detection result when the confidence coefficient corresponding to the prediction frame is greater than a third threshold.

Specifically, the confidence level corresponding to the output prediction frame is represented by conf', and a third threshold sigma is set ₃ Will conf'<σ ₃ Is deleted, and conf' is kept more than or equal to sigma ₃ The reserved prediction frame and the corresponding prediction information are the second target detection result.

In one embodiment, as shown in fig. 7, the remote sensing image input to the pre-trained first target detection network includes the following steps:

step S701, a first training sample is obtained, wherein the first training sample is a remote sensing image which is subjected to data augmentation after rotation, shearing, splicing, brightness contrast change, blurring and scaling;

specifically, the first training sample is a remote sensing image with a rotating target and a label, and the data is amplified by means of rotation, shearing, splicing, brightness contrast change, blurring, zooming (zooming in or zooming out) and the like.

Step S702, inputting the first training sample into a convolutional neural network, extracting a feature map by using a feature extraction network, further fusing multi-scale features by using a pyramid network structure to obtain a final feature map, and obtaining a detection result based on the feature map;

specifically, as shown in fig. 8, in this embodiment, taking the yolov5 algorithm as an example, a first training sample is input into a convolutional neural network, a feature map F1 is obtained through a convolutional block, a feature map F2 after downsampling is obtained through convolution with a step length of 2 and the convolutional block, and a feature map F3, a feature map F4 and a feature map F5 are obtained repeatedly for multiple times (the network may be deepened or the network may be cut according to requirements). And (3) up-sampling the low-level feature C5, combining the low-level feature C5 with the feature of the previous stage, obtaining a feature C4 (repeated multiple times) through a convolution block, and predicting the feature (C3/C4/C5) under each resolution by using a target detection frame, wherein H3 detects a small target, H4 detects a medium-size target and H5 detects a large target, and obtaining a detection result.

Step S703, calculating a loss function based on the detection result and the real result;

step S704, updating parameters of the convolutional neural network based on the loss function, to obtain the first target detection network trained in advance. Targets at multiple scales can be detected simultaneously.

In one embodiment, as shown in fig. 9, before the first remote sensing images are rotated and scaled and input to the pre-trained second target detection network, the method includes the following steps:

step S801, a second training sample is obtained, wherein the second training sample comprises a remote sensing image with a target object after shearing and a remote sensing image without a target object, and rotation and scaling processing are carried out on the sheared remote sensing image;

specifically, the second training sample comprises a positive sample and a negative sample, the positive sample is a remote sensing image with a target object after shearing, the negative sample is a remote sensing image without the target object after shearing, and rotation and scaling are carried out on the sheared remote sensing image, so that the included angle between the long side of the sheared remote sensing image and the x-axis direction is zero, and the long side of the sheared remote sensing image is scaled to the same preset size.

Step S802, inputting the second training sample into a convolutional neural network to extract a feature map, and acquiring a detection result based on the feature map;

specifically, as shown in fig. 10, taking the yolov5 algorithm as an example, a first training sample is input into a convolutional neural network, a feature map F1 is obtained through a convolutional block, a feature map F2 after downsampling is obtained through convolution with a step length of 2 and the convolutional block, the steps are repeated to obtain a feature map F3, the feature map F3 is obtained through the convolutional block, and a target detection frame prediction is performed on the feature C3 to obtain a target detection result.

Step S803, calculating a loss function based on the detection result and the real result;

step S804, updating parameters of the convolutional neural network based on the loss function, to obtain the pre-trained second target detection network.

Specifically, the second target detection network trained in advance predicts a single-scale target and only predicts a target detection result of the feature map under one scale.

In one embodiment, the remote sensing image with the object after being cut includes: disturbing the remote sensing image with the target object, and shearing the remote sensing image with the target object after disturbance; wherein the disturbance comprises at least one of a center position disturbance, a length-width disturbance and an angle disturbance.

In this embodiment, in order to better fit the situation that may occur in the real application scenario, the shearing is not strictly performed according to the real position of the target object, and a certain degree of disturbance is performed on the central position, the length, the width and the angle of the remote sensing image before the shearing, and the three disturbances may be randomly combined and selected. As shown in fig. 11, the solid line box in the figure is the real position before the disturbance of the remote sensing image, and the dotted line box is the position of the remote sensing image after the disturbance and cutting.

The present embodiment is described and illustrated below by way of preferred embodiments.

Fig. 12 is a flowchart showing a multi-stage remote sensing image target detection method according to the present embodiment, and as shown in fig. 12, the multi-stage remote sensing image target detection method includes the following steps:

step S901, acquiring a remote sensing image;

step S902, inputting a remote sensing image into a trained first target detection network, and outputting a plurality of prediction frames and corresponding prediction information;

step S903, judging whether the confidence level conf of the predicted frame is greater than or equal to a first threshold sigma ₁ If yes, go to step S904;

step S904, performing non-maximum suppression on the prediction frame;

step S905, judging whether the confidence level conf of the predicted frame is greater than or equal to a second threshold sigma ₂ If yes, executing S906; if not, executing S907;

step S906, obtaining a target detection result 1;

step S907, cutting, rotating and scaling the remote sensing image based on the position information of the prediction frame;

step S908, inputting the remote sensing image processed in the step S907 to a trained second target detection network, and outputting a corresponding prediction frame and prediction information;

step S909, judging whether the confidence level conf' of the predicted frame is greater than or equal to a third threshold sigma ₃ If yes, executing S910;

step S910, obtaining a target detection result 2;

in step S911, a target detection result is obtained based on the target detection result 1 and the target detection result 2.

The embodiment of the application also provides a multi-stage remote sensing image target detection device, as shown in fig. 13, which comprises:

the input module 91 is configured to input the remote sensing image to a first target detection network trained in advance, and output a plurality of prediction frames and corresponding prediction information; the prediction information comprises confidence level, classification probability and position information of the prediction frame;

the first target detection module 92 is configured to obtain a first target detection result when the confidence coefficient corresponding to the prediction frame is greater than a second threshold;

the second target detection module 93 is configured to, when the confidence coefficient corresponding to the prediction frame is between a first threshold and the second threshold, cut the remote sensing image according to the position information of the prediction frame, obtain at least one first remote sensing image, perform rotation and scaling processing on each first remote sensing image, and input the first remote sensing image into a pre-trained second target detection network, so as to obtain a second target detection result; wherein the first threshold is less than the second threshold;

an obtaining module 94 is configured to obtain a final target detection result based on the first target detection result and the second target detection result.

In one embodiment, the second object detection module 93 is further configured to:

In one embodiment, the input module 91 is further configured to:

calculating a loss function based on the detection result and the real result;

disturbing the remote sensing image with the target object, and shearing the remote sensing image with the target object after disturbance; wherein the disturbance comprises at least one of a center position disturbance, a length-width disturbance and an angle disturbance.

The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.

An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor performs the steps of any of the above-described multi-stage remote sensing image object detection embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A multi-stage remote sensing image target detection method, the method comprising:

2. The method of claim 1, wherein the position information includes a length, a width of the prediction frame, an angle of a long side of the prediction frame with an x-axis positive direction in a rectangular coordinate system, and a center coordinate of the prediction frame; the rectangular coordinate system is established by taking the vertex of the left upper corner of the remote sensing image as an original point, taking the horizontal right as the positive direction of the x axis and taking the vertical downward as the positive direction of the y axis.

3. The method of claim 1, wherein the rotating and scaling each of the first remote sensing images comprises:

4. The method of claim 3, wherein the rotating and scaling each of the first remote sensing images is then input to a pre-trained second target detection network, and obtaining a second target detection result comprises:

5. The method of claim 4, wherein said inputting at least one of the second remote sensing images into a pre-trained second target detection network, obtaining a second target detection result comprises:

6. The method of claim 1, wherein prior to inputting the telemetry image into the pre-trained first target detection network comprises:

calculating a loss function based on the detection result and the real result;

7. The method of claim 1, wherein the rotating and scaling each of the first remote sensing images before inputting to the pre-trained second target detection network comprises:

calculating a loss function based on the detection result and the real result;

8. The method of claim 7, wherein the shearing the remote sensing image with the target object is preceded by:

9. A multi-stage remote sensing image target detection apparatus, the apparatus comprising:

10. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method of any one of claims 1 to 8.