CN112800955A - Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid - Google Patents

Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid Download PDF

Info

Publication number
CN112800955A
CN112800955A CN202110113543.4A CN202110113543A CN112800955A CN 112800955 A CN112800955 A CN 112800955A CN 202110113543 A CN202110113543 A CN 202110113543A CN 112800955 A CN112800955 A CN 112800955A
Authority
CN
China
Prior art keywords
feature
remote sensing
sensing image
layer
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110113543.4A
Other languages
Chinese (zh)
Inventor
张永生
张磊
戴晨光
王涛
纪松
于英
张振超
李力
李磊
吕可枫
王自全
闵杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202110113543.4A priority Critical patent/CN112800955A/en
Publication of CN112800955A publication Critical patent/CN112800955A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention belongs to the technical field of remote sensing image interpretation, and particularly relates to a remote sensing image rotating target detection method and system based on a weighted bidirectional feature pyramid, wherein the remote sensing image is subjected to normalization and random data amplification pretreatment; performing feature extraction on the preprocessed image data through a deep convolutional neural network to obtain a plurality of feature layers with different sizes; reinforcing the characteristic layer by using the weighted bidirectional characteristic pyramid to obtain a reinforced characteristic layer; aiming at an enhanced feature layer of image data, generating target level candidate regions with different sizes and aspect ratios by using a region generation network; and performing class classification and regression of a boundary box on the target horizontal candidate region by taking the original annotation value as a true value, and adding an angle regression parameter to obtain a final rotating rectangle detection result. The method can realize the detection of the rotating targets with complex and various backgrounds, undersized targets and various directions in the remote sensing image, and improve the detection precision of the small targets of the remote sensing image.

Description

Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
Technical Field
The invention belongs to the technical field of remote sensing image interpretation, and particularly relates to a remote sensing image rotating target detection method and system based on a weighted bidirectional feature pyramid.
Background
Target detection is a basic but challenging task in computer vision, and plays an important role not only in civil fields such as resource surveying, environment monitoring, city planning, and the like, but also in military fields such as battlefield target information acquisition, target capture, target information acquisition, and the like. With the development of deep learning technology, natural image target detection algorithm based on Deep Convolutional Neural Network (DCNN) has advanced a long time. However, compared with a natural image, the background complexity of the remote sensing image is higher, and simultaneously, the target in the remote sensing image has the characteristics of large scale difference, dense distribution, various directions and the like, and if the target detection algorithm based on the natural image is directly applied to the remote sensing image, an ideal effect cannot be achieved.
R-CNN-based horizontal bounding box detection algorithms, such as Fast R-CNN, Mask R-CNN and the like, have been widely applied to detection of remote sensing image targets, but the algorithms mainly have two defects: (1) for slender targets (such as ports, ships and warships and the like) in the remote sensing image, if the inclination angle of the target is too large, a large amount of background noise is introduced by using a horizontal candidate region, so that certain interference is caused to the classification of the target; (2) the method generally uses non-maximum suppression (NMS) as a post-processing operation, the intersection ratio (IoU) of adjacent elongated targets can be large, and when the NMS operation is executed, a certain target in the adjacent elongated targets can be suppressed, so that the problem of missed detection of the target is caused. (3) The accurate orientation and scale information of the target cannot be provided, which brings certain difficulties to practical application (such as remote sensing image target change detection, character recognition of multi-directional natural text and the like). Aiming at the problems of the horizontal bounding box in the process of detecting the multi-directional target, a series of inclined box target detection algorithms aiming at the remote sensing image are provided by a plurality of researchers inspired by text detection algorithms (such as RRPN and R2CNN) in natural scenes. Compared with a horizontal bounding box, the method for detecting the multidirectional target in the remote sensing image by using the inclined bounding box mainly has the following advantages: (1) the target is tightly wrapped by the inclined bounding box, so that introduction of excessive noise is avoided, and the precision of target classification is improved; (2) the inclined bounding box uses inclined non-maximum suppression (Skaw-NMS) as post-processing operation, thereby avoiding the problem of excessive suppression of the traditional NMS method; (3) by using the inclined bounding box target detection algorithm, the orientation information of the target can be kept, which is very useful for detecting some targets (such as ships and warships) in the remote sensing image. The multi-scale problem of the target in the natural image is solved by the proposal of a Feature Pyramid Network (FPN), and the network enables a fused feature map to have rich semantic information and accurate target position information by fusing deep features and shallow features. However, it has some defects, such as only upsampling the high-level features when fusing the features, but not processing the shallow-level features, and only performing indifferent pixel-level addition on the feature map in the fusion operation. In summary, it is necessary to search a rotating target detection algorithm capable of effectively solving the problems of complex and various backgrounds, undersized targets and various directions in the remote sensing image.
Disclosure of Invention
Therefore, the invention provides a remote sensing image rotating target detection method and system based on a weighted bidirectional feature pyramid, which take FPN as a basic frame and use the idea of BiFPN for reference, and can realize rotating target detection with complex and various backgrounds, undersize targets and various directions in a remote sensing image.
According to the design scheme provided by the invention, the remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid comprises the following steps:
carrying out normalization and random data amplification pretreatment on the remote sensing image;
performing feature extraction on the preprocessed image data through a deep convolutional neural network to obtain a plurality of feature layers with different sizes;
reinforcing the characteristic layer by using the weighted bidirectional characteristic pyramid to obtain a reinforced characteristic layer;
aiming at an enhanced feature layer of image data, generating target level candidate regions with different sizes and aspect ratios by using a region generation network;
and performing class classification and regression of a boundary box on the target horizontal candidate region by taking the original annotation value as a true value, and adding an angle regression parameter to obtain a final rotating rectangle detection result.
As the remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid, the input images with any size are further uniformly scaled to a fixed size, and normalization and augmentation processing is sequentially carried out on the images.
As the remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid, the short side is zoomed to a preset size in the unified zooming process of the image, the aspect ratio of the image is kept unchanged, and the corresponding long side is zoomed.
As the remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid, further, in the normalization processing, the mean value and the standard deviation of RGB three channels of the image are firstly obtained, and then the mean value is subtracted from the pixel value of the original image and divided by the standard deviation to obtain the pixel value after the normalization processing.
As the method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid, further, in the augmentation process, color augmentation and/or geometric augmentation are performed on the image according to a preset probability, wherein the color augmentation process includes but is not limited to: color dithering, gamma correction, histogram correction, and hsv transformation, and geometric augmentation processes include, but are not limited to: horizontal flipping, vertical flipping and random angular rotation.
The method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid further comprises an input module for performing convolution operation on input data, a top-down convolution module for performing up-sampling on a feature layer convolution operation result and a bottom-up convolution module for performing down-sampling on the feature layer up-sampling result.
As the method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid, further, in the convolution operation, firstly, carrying out two-dimensional convolution on a plurality of C2-C5 feature graphs with different sizes by utilizing convolution kernel to obtain an input feature layer with unified dimension, and then carrying out average pooling operation on the highest input feature layer; in the up-sampling, the feature graph of the convolution and average pooling operation is up-sampled, then multiplication weights are added from top to bottom, and the feature layer after the addition is convoluted by using a convolution kernel; in the down-sampling, the feature layer obtained in the up-sampling is down-sampled and then added with the up-sampling result of the previous layer and the input feature layer by multiplying weight, the feature layer obtained after the addition is checked by using convolution to carry out convolution to obtain the output of the enhanced feature layer of the previous layer, and the output of the enhanced feature layer of each layer is obtained by repeatedly executing from bottom to top in the down-sampling.
The remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid further comprises the steps of utilizing an area generation network, mapping each point of an enhanced feature map to an image data original image according to a scaling ratio when the feature is extracted, and generating a series of anchor frames with preset size and aspect ratio on the original image by taking the mapping point as a center; then, screening the anchor frame by calculating the intersection ratio between the candidate area and the real marking frame and removing the anchor frame beyond the boundary; and finally, obtaining a target horizontal candidate region by using non-maximum suppression operation.
The method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid further comprises the steps of carrying out optimization learning by utilizing a regression model, and regressing a predicted value by taking an original marked value as a true value; the prediction value comprises category prediction and position coordinate prediction, the corresponding loss function is a multitask loss function, the classification loss function is a cross entropy loss function, and the position regression loss function is a Smmolh-L1 loss function with an angle parameter theta; and gradually reducing the loss function in a random gradient descending mode, and obtaining a rotating rectangle detection result as an output by optimizing a convergence model.
Further, the present invention provides a remote sensing image rotating target detection system based on the weighted bidirectional feature pyramid, including: a preprocessing module, a data optimizing module, a region acquiring module and a target detecting module, wherein,
the preprocessing module is used for carrying out normalization and random data amplification preprocessing on the remote sensing image;
the data optimization module is used for extracting the features of the preprocessed image data through a deep convolutional neural network to obtain a plurality of feature layers with different sizes; reinforcing the characteristic layer by using the weighted bidirectional characteristic pyramid to obtain a reinforced characteristic layer;
the region acquisition module is used for generating target horizontal candidate regions with different sizes and aspect ratios by using a region generation network aiming at the enhanced feature layer of the image data;
and the target detection module is used for carrying out category classification and regression of the bounding box on the target horizontal candidate region by taking the original marked value as a true value, and obtaining a final rotating rectangle detection result by adding an angle regression parameter.
The invention has the beneficial effects that:
the invention improves the fusion method of different characteristic layers from the original non-difference pixel-level addition into linear weighting; the 'bottom-up' structure is added, so that the extracted features are further enhanced, and the improvement in the two aspects can effectively solve the problems of more small targets and various target scales in the remote sensing image; on the other hand, the detection result is changed from a horizontal rectangular frame to a rotating rectangular frame, so that the target is more accurately positioned, and the problem that the target directions in the remote sensing image are various is solved. The method provides corresponding improvement aiming at certain defects of the traditional FPN, fully extracts the characteristics of the multi-scale target in the remote sensing image through a weighted bidirectional characteristic pyramid, enables the positioning of the target to be more accurate through a regression rotation rectangular frame mode, and solves the positioning defects of more small targets, various target scales, various target directions and the like in the remote sensing image.
Description of the drawings:
FIG. 1 is a schematic flow chart of remote sensing image rotating target detection based on a weighted bidirectional feature pyramid in an embodiment;
FIG. 2 is a structural diagram of the overall algorithm in the embodiment;
FIG. 3 is a schematic diagram of an embodiment of a bi-directional feature pyramid structure;
FIG. 4 is a schematic diagram of data amplification effect in the embodiment;
FIG. 5 is a schematic comparison of the detection results of the horizontal frame and the rotating rectangular frame in the embodiment.
The specific implementation mode is as follows:
in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.
In order to meet the requirement of detecting a small target in a remote sensing image, in the embodiment of the present invention, referring to fig. 1, a method for detecting a rotating target of a remote sensing image based on a weighted bidirectional feature pyramid is provided, which specifically includes:
s1, carrying out normalization and random data amplification pretreatment on the remote sensing image;
s2, extracting the features of the preprocessed image data through a deep convolutional neural network to obtain a plurality of feature layers with different sizes;
s3, reinforcing the feature layer by using the weighted bidirectional feature pyramid to obtain a reinforced feature layer;
s4, aiming at the enhanced feature layer of the image data, generating target horizontal candidate regions with different sizes and aspect ratios by using a region generation network;
and S5, carrying out category classification and regression of the bounding box on the target horizontal candidate region by taking the original marked value as a true value, and adding an angle regression parameter to obtain a final rotating rectangle detection result.
The fusion method of different characteristic layers is improved from original non-difference pixel-level addition to linear weighting; the 'bottom-up' structure is added, so that the extracted features are further enhanced, and the improvement in the two aspects can effectively solve the problems of more small targets and various target scales in the remote sensing image; on the other hand, the detection result is changed from a horizontal rectangular frame to a rotating rectangular frame, so that the target is more accurately positioned, and the problem that the target directions in the remote sensing image are various is solved.
As the remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid in the embodiment of the invention, further, input images with any size are uniformly scaled to a fixed size, and normalization and amplification processing is sequentially carried out on the images. Further, in the process of image unified zooming, the short side is zoomed to a preset size, the aspect ratio of the image is kept unchanged, and then the corresponding long side is zoomed. Further, in the normalization processing, the mean value and the standard deviation of the RGB three channels of the image are firstly obtained, and then the mean value is subtracted from the pixel value of the original image and divided by the standard deviation to obtain the pixel value after the normalization processing. Further, in the augmentation process, color augmentation and/or geometric augmentation process is performed on the image with a preset probability, wherein the color augmentation process includes but is not limited to: color dithering, gamma correction, histogram correction, and hsv transformation, and geometric augmentation processes include, but are not limited to: horizontal flipping, vertical flipping and random angular rotation.
Referring to fig. 2, the manner of scaling an image of an arbitrary size to a uniform size is: firstly, the short edge of the image is zoomed to a certain specified size (the size is any whole hundred of values in 600-1200), then the aspect ratio of the original image is kept unchanged, and the corresponding long edge is zoomed; the way of normalizing the image is as follows: firstly, solving the mean value and standard deviation of RGB three channels of a data set, and then subtracting the mean value from the pixel value of the original image and dividing the mean value by the standard deviation to obtain a normalized pixel value; the random augmentation of the image is as follows: color augmentation and geometric augmentation are carried out on an input image with preset probabilities of 0.5 and the like, wherein the color augmentation mode comprises the following steps: color dithering, gamma correction, histogram correction, and hsv transformation, and the geometric augmentation mode comprises the following steps: horizontal turning, vertical turning and random angle rotation. The sizes of the feature layers C2, C3, C4 and C5 extracted by the ResNet-101 feature extraction network are 1/4, 1/8, 1/16 and 1/32 of the original image respectively, and the channel numbers are 256, 512, 1024 and 2048 respectively.
As the remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid in the embodiment of the invention, further, the weighted bidirectional feature pyramid comprises an input module for performing convolution operation on input data, a top-down convolution module for performing up-sampling on a feature layer convolution operation result, and a bottom-up convolution module for performing down-sampling on the feature layer up-sampling result. Further, in the convolution operation, firstly, a convolution kernel is used for performing two-dimensional convolution on a plurality of C2-C5 feature graphs with different sizes to obtain an input feature layer with unified dimension, and then the average pooling operation is performed on the highest input feature layer; in the up-sampling, the feature graph of the convolution and average pooling operation is up-sampled, then multiplication weights are added from top to bottom, and the feature layer after the addition is convoluted by using a convolution kernel; in the down-sampling, the feature layer obtained in the up-sampling is down-sampled and then added with the up-sampling result of the previous layer and the input feature layer by multiplying weight, the feature layer obtained after the addition is checked by using convolution to carry out convolution to obtain the output of the enhanced feature layer of the previous layer, and the output of the enhanced feature layer of each layer is obtained by repeatedly executing from bottom to top in the down-sampling.
Referring to fig. 3, the process of weighting the bidirectional feature pyramid module mainly includes three parts: constructing input, constructing a top-down structure and constructing a bottom-up structure. The specific way of constructing the input is as follows: and reducing the number of channels of the C2-C5 feature layers to be uniform 256 dimensions by using two-dimensional convolution with a convolution kernel of 1 × 1, recording the obtained feature layers as P2_ in-P5 _ in, and performing average pooling operation on the feature layers P5_ in to obtain P6_ in. The specific mode of constructing the top-down structure is as follows: the feature layers Pi _ in, i, 6,5,4,3 obtained in step S4 are up-sampled and added to the multiplication weights P (i-1) _ in, i, 6,5,4,3, and the added feature layers are convolved with a convolution kernel of 3 × 3 to eliminate the influence of the ghost effect, so as to obtain Pi _ td, i, 5,4,3 and P2_ out. The specific construction mode of the bottom-up structure is as follows: downsampling the obtained feature layer P2_ out, adding the downsampled feature layer P3_ td and P3_ in by multiplying weights (wherein the addition with the P3_ in is called jump connection), and carrying out convolution with convolution kernel of 3 x 3 on the added feature layer to eliminate the influence of a ghost effect to obtain P3_ out; repeating the steps to obtain P4_ out-P6 _ out in sequence, wherein the calculation mode of P6_ out is as follows: and the P5_ out is subjected to downsampling, multiplied by the weight value and added with the P6_ in, and then the ghost effect is eliminated through convolution.
As the remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid in the embodiment of the invention, further, a candidate region is generated by using an RPN (resilient packet network), each point of an enhanced feature map is mapped to an image data original image according to a scaling ratio when the feature is extracted, and a series of anchor frames with preset size and aspect ratio are generated on the original image by taking the mapping point as a center; then, screening the anchor frame by calculating the intersection ratio between the candidate area and the real marking frame and removing the anchor frame beyond the boundary; and finally, obtaining a target horizontal candidate region by using non-maximum suppression operation.
As the remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid in the embodiment of the invention, further, in a final rotating rectangle detection result obtained by horizontal candidate region regression, optimized learning is carried out by using a regression model, and a predicted value is regressed by taking an original labeled value as a true value; the prediction value comprises category prediction and position coordinate prediction, the corresponding loss function is a multitask loss function, the classification loss function is a cross entropy loss function, and the position regression loss function is a Smmolh-L1 loss function with an angle parameter theta; and gradually reducing the loss function in a random gradient descending mode, and obtaining a rotating rectangle detection result as an output by optimizing a convergence model.
Further, based on the above method, the present invention further provides a remote sensing image rotating target detection system based on the weighted bidirectional feature pyramid, including: a preprocessing module, a data optimizing module, a region acquiring module and a target detecting module, wherein,
the preprocessing module is used for carrying out normalization and random data amplification preprocessing on the remote sensing image;
the data optimization module is used for extracting the features of the preprocessed image data through a deep convolutional neural network to obtain a plurality of feature layers with different sizes; reinforcing the characteristic layer by using the weighted bidirectional characteristic pyramid to obtain a reinforced characteristic layer;
the region acquisition module is used for generating target horizontal candidate regions with different sizes and aspect ratios by using a region generation network aiming at the enhanced feature layer of the image data;
and the target detection module is used for carrying out category classification and regression of the bounding box on the target horizontal candidate region by taking the original marked value as a true value, and obtaining a final rotating rectangle detection result by adding an angle regression parameter.
To verify the validity of the scheme of the present invention, the following further explanation is made in combination with experimental data:
and scaling the image to be detected to a fixed size, and carrying out normalization and random data augmentation processing on the scaled image. Dividing the acquired data set into a training set, a verification set and a test set according to the proportion of 7:1:2, and ensuring the balance of each sample class as much as possible during division, wherein the verification set is used for adjusting the hyper-parameters of the model during model training, and the test set is used for testing the final performance of the model. All images of arbitrary size in the training set are scaled to a uniform size. In this embodiment, the width and height of the image are compared first, and the smaller value of the two is scaled to 800; then, the original aspect ratio is guaranteed to scale the other side unchanged. And normalizing the zoomed image. Firstly, solving the mean value and standard deviation of RGB three-channel pixel values of all images of a data set, and then subtracting the mean value from the pixel value of the original image and dividing the mean value by the standard deviation to obtain a normalized pixel value; the normalization can limit the data to be processed within a certain range, so as to facilitate the data processing and accelerate the convergence speed when the program runs. And carrying out random data augmentation on the normalized image. A probability threshold is preset, and if the random value is smaller than the threshold, the image is subjected to random transformation. The random transformation mode comprises color transformation and geometric transformation, and more specifically, the color transformation comprises the following steps: color dithering, gamma correction, histogram correction, and hsv transformation, wherein the geometric transformation comprises: horizontal turning, vertical turning and random angle rotation (the rotation range is-15 degrees). The data amplification effect is shown in fig. 4.
And extracting the features of the preprocessed image by using a ResNet-101 deep convolution neural network, and taking the four feature layers C2, C3, C4 and C5 to perform the next operation. Specifically, the sizes of the extracted feature layers C2, C3, C4 and C5 are 1/4, 1/8, 1/16 and 1/32 of the original image, and the number of channels is 256, 512, 1024 and 2048.
The five feature layers C2-C5 are used for constructing a weighted bidirectional feature pyramid module, the construction process is shown in FIG. 2, and the module can further strengthen the extracted image features. Reducing the number of channels of the C2-C5 feature layers to be uniform 256 dimensions by utilizing two-dimensional convolution with convolution kernel of 1 × 1, and marking the obtained feature layers as P2_ in-P5 _ in; and performing an average pooling operation with a convolution kernel of 3 × 3 and a step size of 2 on the feature layer P5_ in to obtain P6_ in, wherein the feature layer P6_ in is used for detecting a large-size target in the image. The feature layer Pi _ in, i is 6,5,4,3 is up-sampled by the nearest pixel method, then multiplied by the weight value and added with P (i-1) _ in, i is 6,5,4,3, then the added feature layer is subjected to one-layer two-dimensional convolution to eliminate the ghost effect, and finally Pi _ td, i is 5,4,3 and P2_ out are obtained. The process can be formulated as:
Figure BDA0002919807280000061
wherein, conv3*3Two-dimensional convolution representing a convolution kernel of 3 x 3, a step size of 1, and a padding of 1; w is a1,w2The weight is a preset weight value within the range of 0-1; up _ sample (·) is the nearest neighbor pixel method up-sampling operation; epsilon is an arbitrary minimum value larger than 0, and is generally 1 × 10-4. In the formula, when i is 3, the output is P2_ out.
And the P2_ out is downsampled, multiplied by the weight value and added with the P3_ td and the P3_ in, and then the ghost effect is eliminated through one layer of two-dimensional convolution to obtain a characteristic layer P3_ out. The process can be expressed by formula (2):
Figure BDA0002919807280000071
in the formula, w1,w2,w3The weight is a preset weight value within the range of 0-1; down _ sample (-) is the down-sampling operation of the nearest pixel method; the remaining definitions are the same as in formula (1).
Only changing the input, repeating the steps, and obtaining P4_ out and P5_ out in sequence; the feature layer P6_ out is calculated by the formula (3), in which the definition of each letter is the same as the formula (1) (2):
Figure BDA0002919807280000072
and generating horizontal candidate regions with different sizes and aspect ratios on the original image by using the region candidate network by taking the generated five-layer enhanced feature layer as a reference. Each feature point of the obtained feature map is mapped to an original image at a reduction ratio at the time of feature extraction, and a series of anchor frames are generated on the original image centering on the mapped point, the size and aspect ratio of the anchor frames being set in advance. And further screening the anchor frame. The screening method comprises the following steps: (1) removing an anchor frame exceeding the image boundary; (2) and calculating the intersection and parallel ratio between the left and right anchor frames and the real marking frame. Firstly, reserving an anchor frame which is intersected with a marking frame and has the largest ratio as a positive sample; and secondly, if the intersection ratio of a certain anchor frame and any one of the marking frames is greater than 0.7, the anchor frame is reserved as a positive sample, if the intersection ratio of the certain anchor frame and all the marking frames is less than 0.3, the anchor frame is a negative sample, and if the intersection ratio of the anchor frame and the marking frames is between 0.3 and 0.7, the anchor frame is screened. Finally, 512 samples (256 positive and negative samples each) are selected for input to the bounding box class and position regression network.
And performing class classification and regression of a bounding box on the obtained horizontal candidate region by taking the original annotation value as a true value, and adding an angle regression parameter to obtain a final rotating rectangle detection result. And distributing the candidate regions to different feature layers according to the sizes of the candidate regions, wherein a specific calculation formula is as follows:
Figure BDA0002919807280000073
wherein k is0Taking the number as 4; w and h are the width and height of the candidate region respectively. That is, if the width and height of the candidate region are both 224, it is assigned to the P4_ out feature layer. And RoI Pooling processing is carried out on the candidate regions belonging to each feature layer to extract fixed-length features, and the fixed-length features are input into two full-connection layers to carry out classification of the candidate region categories and regression of coordinate positions respectively. Wherein, the classification loss function is a cross entropy loss function, and the specific form is as follows:
Figure BDA0002919807280000074
in the formula, piA probability of predicting the candidate region as class i;
Figure BDA0002919807280000075
the value is true, and is generally 1.
The coordinate position regression loss function is a Smooth-L1 loss function with an angle parameter of the form:
Figure BDA0002919807280000081
where v denotes a difference in coordinates between the prediction frame and the candidate region, and v*And representing the difference value between the labeling frame and the candidate region, wherein the specific forms of the labeling frame and the candidate region are as follows:
Figure BDA0002919807280000082
wherein, xa,
Figure BDA0002919807280000083
And the horizontal coordinates of the central points of the prediction frame, the candidate area and the labeling frame are respectively represented, and the same principle is applied to y, w and h. The concrete meaning of the position regression loss function is: the difference between the prediction frame and the candidate region is made to be as close as possible to the difference between the labeling frame and the candidate region, that is, the prediction frame is made to be as close as possible to the real labeling frame, as shown in fig. 5, so that the target is accurately positioned.
Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing system embodiments, and are not described herein again.
In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the system according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A remote sensing image rotating target detection method based on a weighted bidirectional feature pyramid is characterized by comprising the following contents:
carrying out normalization and random data amplification pretreatment on the remote sensing image;
performing feature extraction on the preprocessed image data through a deep convolutional neural network to obtain a plurality of feature layers with different sizes;
reinforcing the characteristic layer by using the weighted bidirectional characteristic pyramid to obtain a reinforced characteristic layer;
aiming at an enhanced feature layer of image data, generating target level candidate regions with different sizes and aspect ratios by using a region generation network;
and performing class classification and regression of a boundary box on the target horizontal candidate region by taking the original annotation value as a true value, and adding an angle regression parameter to obtain a final rotating rectangle detection result.
2. The remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid is characterized in that input images of any size are uniformly scaled to a fixed size, and normalization and amplification processing is sequentially performed on the images.
3. The remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid is characterized in that in the image unified scaling process, the short sides are scaled to a preset size, the aspect ratio of the image is kept unchanged, and then the corresponding long sides are scaled.
4. The remote sensing image rotating target detection method based on the weighted bidirectional feature pyramid as claimed in claim 2, wherein in the normalization processing, a mean value and a standard deviation of RGB three channels of the image are obtained first, and then the mean value is subtracted from a pixel value of an original image and divided by the standard deviation to obtain a pixel value after the normalization processing.
5. The method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid as claimed in claim 2, wherein in the augmentation process, color augmentation and/or geometric augmentation are performed on the image with a preset probability, wherein the color augmentation process includes but is not limited to: color dithering, gamma correction, histogram correction, and hsv transformation, and geometric augmentation processes include, but are not limited to: horizontal flipping, vertical flipping and random angular rotation.
6. The method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid is characterized in that the weighted bidirectional feature pyramid comprises an input module for performing convolution operation on input data, a top-down convolution module for performing up-sampling on a feature layer convolution operation result and a bottom-up convolution module for performing down-sampling on the feature layer up-sampling result.
7. The method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid is characterized in that in the convolution operation, firstly, a convolution core is used for carrying out two-dimensional convolution on a plurality of C2-C5 feature graphs with different sizes to obtain an input feature layer with a unified dimension, and then the average pooling operation is carried out on the highest input feature layer; in the up-sampling, the feature graph of the convolution and average pooling operation is up-sampled, then multiplication weights are added from top to bottom, and the feature layer after the addition is convoluted by using a convolution kernel; in the down-sampling, the feature layer obtained in the up-sampling is down-sampled and then added with the up-sampling result of the previous layer and the input feature layer by multiplying weight, the feature layer obtained after the addition is checked by using convolution to carry out convolution to obtain the output of the enhanced feature layer of the previous layer, and the output of the enhanced feature layer of each layer is obtained by repeatedly executing from bottom to top in the down-sampling.
8. The method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid is characterized in that a regional generation network is utilized, each point of an enhanced feature map is mapped to an image data original image according to the scaling ratio when the feature is extracted, and a series of anchor frames with preset size and aspect ratio are generated on the original image by taking the mapped point as the center; then, screening the anchor frame by calculating the intersection ratio between the candidate area and the real marking frame and removing the anchor frame beyond the boundary; and finally, obtaining a target horizontal candidate region by using non-maximum suppression operation.
9. The method for detecting the remote sensing image rotating target based on the weighted bidirectional feature pyramid is characterized in that a regression model is used for optimization learning, and an original labeled value is used as a true value to carry out regression on a predicted value; the prediction value comprises category prediction and position coordinate prediction, the corresponding loss function is a multitask loss function, the classification loss function is a cross entropy loss function, and the position regression loss function is a Smmolh-L1 loss function with an angle parameter theta; and gradually reducing the loss function in a random gradient descending mode, and obtaining a rotating rectangle detection result as an output by optimizing a convergence model.
10. A remote sensing image rotating target detection system based on a weighted bidirectional feature pyramid is characterized by comprising: a preprocessing module, a data optimizing module, a region acquiring module and a target detecting module, wherein,
the preprocessing module is used for carrying out normalization and random data amplification preprocessing on the remote sensing image;
the data optimization module is used for extracting the features of the preprocessed image data through a deep convolutional neural network to obtain a plurality of feature layers with different sizes; reinforcing the characteristic layer by using the weighted bidirectional characteristic pyramid to obtain a reinforced characteristic layer;
the region acquisition module is used for generating target horizontal candidate regions with different sizes and aspect ratios by using a region generation network aiming at the enhanced feature layer of the image data;
and the target detection module is used for carrying out category classification and regression of the bounding box on the target horizontal candidate region by taking the original marked value as a true value, and obtaining a final rotating rectangle detection result by adding an angle regression parameter.
CN202110113543.4A 2021-01-27 2021-01-27 Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid Pending CN112800955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110113543.4A CN112800955A (en) 2021-01-27 2021-01-27 Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110113543.4A CN112800955A (en) 2021-01-27 2021-01-27 Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid

Publications (1)

Publication Number Publication Date
CN112800955A true CN112800955A (en) 2021-05-14

Family

ID=75812308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110113543.4A Pending CN112800955A (en) 2021-01-27 2021-01-27 Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid

Country Status (1)

Country Link
CN (1) CN112800955A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486821A (en) * 2021-07-12 2021-10-08 西安电子科技大学 No-reference video quality evaluation method based on time domain pyramid
CN114266771A (en) * 2022-03-02 2022-04-01 深圳市智源空间创新科技有限公司 Pipeline defect detection method and device based on improved extended feature pyramid model
CN115294452A (en) * 2022-08-08 2022-11-04 中国人民解放军火箭军工程大学 Rotary SAR ship target detection method based on bidirectional characteristic pyramid network
CN115311626A (en) * 2022-08-30 2022-11-08 金锋馥(滁州)科技股份有限公司 Express package detection and identification algorithm based on deep learning
CN116229437A (en) * 2023-03-14 2023-06-06 北京道达天际科技股份有限公司 Remote sensing image rotation target detection method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN111259758A (en) * 2020-01-13 2020-06-09 中国矿业大学 Two-stage remote sensing image target detection method for dense area
CN111738112A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Remote sensing ship image target detection method based on deep neural network and self-attention mechanism
CN111783590A (en) * 2020-06-24 2020-10-16 西北工业大学 Multi-class small target detection method based on metric learning
CN111881918A (en) * 2020-06-11 2020-11-03 中国人民解放军战略支援部队信息工程大学 Multi-scale rotating ship target detection algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN111259758A (en) * 2020-01-13 2020-06-09 中国矿业大学 Two-stage remote sensing image target detection method for dense area
CN111738112A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Remote sensing ship image target detection method based on deep neural network and self-attention mechanism
CN111881918A (en) * 2020-06-11 2020-11-03 中国人民解放军战略支援部队信息工程大学 Multi-scale rotating ship target detection algorithm
CN111783590A (en) * 2020-06-24 2020-10-16 西北工业大学 Multi-class small target detection method based on metric learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
SADRA NADDAF-SH 等: "An Efficient and Scalable Deep Learning Approach for Road Damage Detection", 《ARXIV:2011.09577V3 [CS.CV]》 *
SEYED MAJID AZIMI等: "Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery", 《ARXIV:1807.02700V3 [CS.CV]》 *
SIHWAN KIM 等: "Learning Robust Feature Representations for Scene Text Detection", 《ARXIV:2005.12466V1 [CS.CV]》 *
XUE YANG等: "Automatic Ship Detection of Remote Sensing Images from Google Earth in Complex Scenes Based on Multi-Scale Rotation Dense Feature Pyramid Networks", 《ARXIV:1806.04331 V1》 *
魏松杰 等: "深度神经网络下的SAR舰船目标检测与区分模型", 《西北工业大学学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486821A (en) * 2021-07-12 2021-10-08 西安电子科技大学 No-reference video quality evaluation method based on time domain pyramid
CN113486821B (en) * 2021-07-12 2023-07-04 西安电子科技大学 No-reference video quality evaluation method based on time domain pyramid
CN114266771A (en) * 2022-03-02 2022-04-01 深圳市智源空间创新科技有限公司 Pipeline defect detection method and device based on improved extended feature pyramid model
CN114266771B (en) * 2022-03-02 2022-06-03 深圳市智源空间创新科技有限公司 Pipeline defect detection method and device based on improved extended feature pyramid model
CN115294452A (en) * 2022-08-08 2022-11-04 中国人民解放军火箭军工程大学 Rotary SAR ship target detection method based on bidirectional characteristic pyramid network
CN115311626A (en) * 2022-08-30 2022-11-08 金锋馥(滁州)科技股份有限公司 Express package detection and identification algorithm based on deep learning
CN116229437A (en) * 2023-03-14 2023-06-06 北京道达天际科技股份有限公司 Remote sensing image rotation target detection method

Similar Documents

Publication Publication Date Title
CN109902677B (en) Vehicle detection method based on deep learning
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
CN112800955A (en) Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN112507777A (en) Optical remote sensing image ship detection and segmentation method based on deep learning
CN111126472A (en) Improved target detection method based on SSD
CN111914698B (en) Human body segmentation method, segmentation system, electronic equipment and storage medium in image
CN109977997B (en) Image target detection and segmentation method based on convolutional neural network rapid robustness
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN111738262A (en) Target detection model training method, target detection model training device, target detection model detection device, target detection equipment and storage medium
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN111079739B (en) Multi-scale attention feature detection method
CN113158862B (en) Multitasking-based lightweight real-time face detection method
US10762389B2 (en) Methods and systems of segmentation of a document
CN111626295B (en) Training method and device for license plate detection model
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN116645592B (en) Crack detection method based on image processing and storage medium
CN113591795A (en) Lightweight face detection method and system based on mixed attention feature pyramid structure
CN113177503A (en) Arbitrary orientation target twelve parameter detection method based on YOLOV5
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN115272691A (en) Training method, recognition method and equipment for steel bar binding state detection model
Fan et al. A novel sonar target detection and classification algorithm
CN113221731B (en) Multi-scale remote sensing image target detection method and system
CN111597845A (en) Two-dimensional code detection method, device and equipment and readable storage medium
CN113610178A (en) Inland ship target detection method and device based on video monitoring image
CN112614108A (en) Method and device for detecting nodules in thyroid ultrasound image based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210514