CN117636172A

CN117636172A - Target detection method and system for weak and small target of remote sensing image

Info

Publication number: CN117636172A
Application number: CN202311669036.4A
Authority: CN
Inventors: 王永成; 李征
Original assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Current assignee: Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-03-01
Anticipated expiration: 2043-12-06
Also published as: CN117636172B

Abstract

The invention relates to the technical field of intelligent interpretation of remote sensing images, in particular to a method and a system for detecting perceived textures and boundary targets of weak and small targets of remote sensing images, wherein the method comprises the following steps: inputting the remote sensing image into a feature extraction network to extract basic features and then extracting texture perception features; inputting the remote sensing image into a boundary map extraction module to extract a binary boundary map, and extracting boundary perception characteristics; the boundary perception feature and the basic feature are fused through a boundary guiding feature module, and boundary guiding features are obtained; decoupling and outputting texture perception features and boundary guidance features, inputting the texture perception features and the boundary guidance features into a task-decoupled RCNN network for double-branch decoupling prediction, obtaining classification and positioning results, and finishing perception texture and boundary target detection; the method and the system can mine the information that the remote sensing weak and small target is difficult to highlight, and further improve the expression of the weak and small target, so that the detection performance of the weak and small target is improved.

Description

Target detection method and system for weak and small target of remote sensing image

Technical Field

The invention relates to the technical fields of computer vision, deep learning and intelligent interpretation of remote sensing images, and particularly provides a method and a system for detecting perceived textures and boundary targets of weak and small targets of remote sensing images.

Background

In recent years, due to the large investment on civil and commercial satellites, the rapid development of imaging technology and other reasons, the quality of remote sensing images is improved, the acquisition cost of the images is reduced, and the method creates favorable conditions for target detection of the remote sensing images. However, a small target in a remote sensing image is always a difficult problem for the target detection task. These small and weak targets are widely distributed in the remote sensing image and have the following properties: (1) These targets have only a small pixel duty cycle in the remote sensing image; (2) The high frequency details of the target (e.g., texture details, boundary cues, colors, etc.) are not significant, the features vary widely, and are susceptible to interference from the then-current imaging conditions and complex contexts. However, small and weak targets often carry more important information, and effective detection of these targets is of great importance for real-world scenarios. In addition, in object detection, classification and regression are two highly correlated and contradictory tasks, most of which share the same features to perform classification and regression, but have respective information emphasis. The classification task tends to be more sensitive to the target boundary than the most semantic part of the features in the interior of the target, whereas the regression task is more sensitive to the target boundary. The use of coupled features does not yield optimal performance and the model requires extraction of the appropriate features for a particular task.

Disclosure of Invention

The invention provides a sensing texture and boundary feature target detection method for a weak and small target of a remote sensing image, which aims to solve the problems, and improves the information expression of the weak and small target by exploring high-frequency detail information such as key textures, boundaries and the like of the target, and enhances the features of the weak and small target, so that the detection precision of the weak and small target is improved. Meanwhile, the invention provides a decoupled detection head structure, which separates classification and positioning tasks, and a decoupled network can conduct targeted training according to specific tasks.

The invention provides a sensing texture and boundary target detection method for a weak and small target of a remote sensing image, which comprises the following steps:

step 1: inputting the remote sensing image into a feature extraction network to extract basic features, inputting the basic features into a texture perception enhancement module, and extracting texture perception features;

step 2: inputting a remote sensing image into a boundary map extraction module to extract a binary boundary map, inputting the boundary map into a boundary feature extraction module to extract boundary perception features; fusing the boundary perception feature and the basic feature through a boundary guiding feature module to obtain a boundary guiding feature;

Step 3: and decoupling and outputting the texture perception feature and the boundary guide feature, inputting the texture perception feature and the boundary guide feature into a task-decoupled RCNN network to perform double-branch decoupling prediction, obtaining classification and positioning results, and finishing perception texture and boundary target detection.

Preferably, in the step 1, the basic feature is input into a texture perception enhancement module, and the extracting the texture perception feature includes: and inputting the basic features into a feature pyramid to obtain fusion features, inputting the fusion features into a texture perception enhancement module, and extracting texture perception features.

Preferably, the step 1 includes:

inputting the remote sensing image into a feature extraction network to extract basic features C _i Inputting the basic features into a feature pyramid to obtain fusion features

Wherein c represents the channel dimension of the fusion feature; h and w represent the length and width, respectively, of the fusion feature.

Preferably, the inputting the fusion feature into the texture perception enhancement module, and extracting the texture perception feature includes:

will be fusedThe post feature map undergoes a 1 x 1 convolution of three parallel branches to integrate features and reduce the dimension to c ₁ Obtaining the characteristic f _θ ,f _g And feature f _θ ,/>Reduce the blood-lipid level to->

Calculating f _θ Features with transposition The covariance matrix sigma between the two pixels captures the correlation among pixels at different positions in a characteristic dimension space, and a calculation formula is as follows:

wherein f _θ ,f _g Representing the intermediate features after convolution processing, (x, y) representing the position of the pixel point, +.>Multiplying the representative elements;

introducing a spatial attention mechanism to strengthen the position of a target in the remote sensing image and highlight the pixel relationship in the target;

for the fusion feature f of the input _i Global average pooling of channel dimensions is performed and 3 x 3 is employed to remove unwanted noise;

the Sigmoid activation function is used for feature normalization to generate attention weight;

converting weight dimensions toGenerating an attention weight map M (x, y) and aligned with the channel dimension of the covariance matrix Σ, expressed as:

M(x,y)＝Reshape(σ(Conv _3×3 (f _gap (x,y))))

wherein GAP represents global average pooling operation, and sigma represents Sigmoid activation;

multiplying the attention weight map M (x, y) with the covariance matrix sigma (x, y) to obtain a target enhanced covariance matrix

The dimension of the covariance matrix of the target enhancement is calculated as

Regularizing the target enhanced covariance matrix and converting dimensions of the target enhanced covariance matrix to

Wherein the first dimension wh represents the number of covariance matrices of the target enhancement, which is equal to the feature size; the second dimension w represents the covariance matrix of the target enhancement Is a length of (2); the third dimension h represents the covariance matrix of the target enhancement +.>Is a width of (2);

wherein, as follows, the dot product of the matrix;

integrating the relation matrix of pixels between different channels by adopting a 1X 1 convolution operation, and converting the channels into

By combiningAnd f _g Multiplying and restoring the characteristic channel to c by using 1×1 convolution to obtain texture characteristics;

by adding captured texture features to the fusion feature f _i Texture perception features are obtained.

Preferably, the method further comprises the step of adding the captured texture feature to the fusion feature f _i In the method, the texture perception characteristic is obtained through the following formula:

wherein,representing matrix element addition.

Preferably, the inputting the remote sensing image into the boundary map extracting module for extracting the binary boundary map includes: and performing sliding extraction on the input remote sensing image by utilizing an edge extraction operator to generate a gradient map of the remote sensing image.

Preferably, the gradient map is subjected to gating filtering, a threshold value is set, and pixel values smaller than the threshold value are set to be zero, so that noise and useless texture information are removed.

Preferably, the inputting the boundary map into a boundary feature extraction module, and extracting the boundary perception feature includes:

Extracting bottom features by adopting 3×3 convolution, BN, reLU and 3×3 convolution;

reducing the dimension of the bottom layer feature by adopting 1×1 convolution, BN and ReLU operations, extracting edge information in the bottom layer feature by utilizing three parallel branches comprising 1×3 convolution, 3×1 convolution and 3×3 convolution, and summarizing and fusing the edge information to obtain potential edge features;

recovering a characteristic channel by adopting 1X 1 convolution, BN and ReLU operation;

and transferring the bottom layer feature to be fused with the potential edge feature through jump connection, and generating boundary sensing features through 1X 1 convolution and Sigmoid activation.

Preferably, the fusing, by the boundary guiding feature module, the boundary sensing feature with the basic feature, and obtaining the boundary guiding feature includes:

adding the boundary sensing characteristic after downsampling and the basic characteristic to fuse the characteristic, adopting a jump connection structure and a 3 multiplied by 3 convolution amplifying edge structure, and integrating the fused information;

and highlighting important feature channels in the fusion features by adopting a channel attention mechanism, and adding the important feature channels with the fusion features output by the feature pyramid to obtain boundary guiding features.

Preferably, the step 3 of decoupling the texture perception feature from the boundary guiding feature, inputting the decoupling prediction of the two branches into the RCNN network with task decoupling, obtaining the classification and positioning result, and completing the detection of the perceived texture and the boundary target includes:

Inputting the texture perception features generated in the step 1 into classification branches in the task decoupling RCNN, and inputting the boundary guiding features generated in the step 2 into positioning branches in the task decoupling RCNN;

setting different thresholds, and decoupling the classifying branch and the positioning branch to non-maximal inhibition NMS, so as to generate suggested areas with different densities;

decoupling sampling is carried out on the suggested area, and a decoupled region of interest is obtained; for the classification branches, adopting a fixed proportion to sample quantitative positive samples and negative samples; sampling a quantitative positive sample for the positioning branch;

and expanding the region of interest range of the classification branch, and synthesizing a high-quality region of interest for the positioning branch by utilizing a Weiszfeld algorithm to assist in positioning prediction.

Preferably, synthesizing the upper left corner and the lower right corner of all the suggested areas and the true labeling areas allocated to each target, and searching the geometric center of the discrete sample points in an iterative re-weighted least square mode;

assuming that the reference point to be fitted is x _i The algorithm attempts to find a center point y by iterating so that the distance between the point and the reference point is minimized, expressed as:

Wherein II ₂ Representing the L2 norm, j representing the number of reference points, D representing the sum of the distances between the synthesized center point y and the reference point x, argmin representing the parameter value y, which minimizes the sum;

solving in continuous space, the above equation will take the minimum at the extremum according to the optimization theory. Thus, bias y:

solving the above method, and solving a central point as follows:

where k represents the number of iterations. The obtained central point is smaller than a specified value through continuous iteration, the point is the fitted geometric center, and the central point is not overlapped with any reference point;

and synthesizing a suggested area close to the real labeling frame through fitting the upper left corner and the lower right corner of the frame, training and positioning a branch network, accelerating the convergence of the model and improving the stability of the model.

The invention also provides a sensing texture and boundary target detection system for the weak and small targets of the remote sensing image, which comprises the following steps:

the feature extraction network is used for extracting basic features of the remote sensing image;

the texture perception enhancement module is used for extracting texture perception features through the basic features;

the boundary map extraction module is used for extracting a binary boundary map of the remote sensing image;

The boundary feature extraction module is used for extracting boundary perception features of the boundary map;

the boundary guiding feature module is used for fusing the boundary perception feature and the basic feature to obtain a boundary guiding feature;

and the decoupling transmission module is used for decoupling the texture perception feature and the boundary guide feature, outputting the texture perception feature and the boundary guide feature, inputting the texture perception feature and the boundary guide feature into a task-decoupled RCNN network for performing double-branch decoupling prediction, obtaining classification and positioning results, and finishing the detection of the perceived texture and the boundary target.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a sensing texture and boundary feature target detection method for a weak and small target of a remote sensing image, and particularly provides a unique texture and boundary sensing network for solving the problem of the weak and small target in the remote sensing image; in order to mine deep information, a texture perception enhancement module and a boundary perception fusion module are provided, wherein the texture perception enhancement module can fully explore texture details in the target, the relation between features is built through a calculated covariance matrix, and meanwhile, the texture difference between the target and the background is highlighted; the latter introduces extra boundary clues to highlight the spatial position of the target, and the key features of the weak target are supplemented by exploring the high-frequency detail information such as the key textures, boundaries and the like of the target; the combination of the two modules enhances the perceptibility of the network to the weak and small targets. Meanwhile, in order to relieve task unfocused phenomenon caused by winding between classification and regression, task-decoupled RCNN is provided to train two branch networks independently, so that fine prediction of classification and positioning is realized. The detection method provided by the invention can mine the information that the remote sensing weak and small target is difficult to highlight, and further promote the expression of the weak and small target, so that the detection performance of the weak and small target is improved.

Drawings

FIG. 1 is a flow chart of a remote sensing image target detection method according to an embodiment of the present invention;

FIG. 2 is an exemplary diagram of a remote sensing image with a weak target present in accordance with an embodiment of the present invention;

FIG. 3 is an overall schematic diagram of an object detection network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a texture perception enhancement module in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of boundary map extraction in a boundary aware fusion module according to an embodiment of the invention;

FIG. 6 is a schematic diagram of boundary-aware feature extraction in a boundary-aware fusion module according to an embodiment of the invention;

FIG. 7 is a schematic diagram of boundary-aware feature fusion in a boundary-aware fusion module according to an embodiment of the invention;

FIG. 8 is a task decoupling RCNN network schematic diagram in accordance with an embodiment of the invention;

FIG. 9 is a schematic diagram of a proposed synthetic strategy according to an embodiment of the invention;

fig. 10 is an exemplary diagram of a result of detecting a weak target in a remote sensing image by using the target detection method according to the embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the following description, like modules are denoted by like reference numerals. In the case of the same reference numerals, their names and functions are also the same. Therefore, a detailed description thereof will not be repeated.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention.

In a specific embodiment of the invention, a method for detecting a perceived texture and boundary feature target of a weak and small target of a remote sensing image is provided, which comprises the following steps:

step 1: inputting the remote sensing image into a feature extraction network to extract basic features, inputting the basic features into a texture perception enhancement module, further exploring the texture mode of a target of the remote sensing image, and extracting texture perception features;

in a specific embodiment, inputting the basic feature into a texture perception enhancement module, and extracting the texture perception feature includes: and inputting the basic features into a feature pyramid to obtain fusion features, inputting the fusion features into a texture perception enhancement module, and extracting texture perception features.

In a specific embodiment, the step 1 specifically includes: inputting a remote sensing image with a weak and small target into a feature extraction network to obtain basic features C _i At the same time, inputting the basic features into a feature pyramid to obtain the fused featuresWherein c represents the channel dimension of the fusion feature; h and w represent the length and width, respectively, of the fusion feature.

And inputting the fused features into a texture perception enhancement module to extract texture features. Specifically, first, the fused feature map is subjected to a 1×1 convolution of three parallel branches to integrate features and reduce the dimension to c ₁ Obtaining the characteristic f _θ ,f _g And feature f _θ ,/>Reduce the blood-lipid level to->NextFrom, calculate f _θ Features of transpose->The covariance matrix sigma between the two pixels is used for capturing the correlation between pixels at different positions in a characteristic dimension space, and a calculation formula is as follows:

wherein f _θ ,f _g Representing the intermediate features after convolution processing, (x, y) representing the position of the pixel point, +.>Representing the multiplication of the elements. Meanwhile, a spatial attention mechanism is introduced to strengthen the position of the target in the remote sensing image and highlight the pixel relationship in the target. First, the input fusion feature f _i Global average pooling of channel dimensions is performed and 3 x 3 is employed to remove unwanted noise; then, the Sigmoid activation function is used for feature normalization to generate attention weight; finally, the weight dimension is converted to +. >An attention weight map M (x, y) is generated and aligned with the channel dimension of the covariance matrix Σ, which can be expressed as:

M(x,y)＝Reshape(σ(Conv _3×3 (f _gap (x,y)))) (3)

wherein GAP represents global average pooling operation, and sigma represents Sigmoid activation; subsequently, the attention map M (x, y) is multiplied by the covariance matrix Σ (x, y) to obtain a target enhanced covariance matrixAt this time, the dimension of the covariance matrix of the target enhancement calculated is +.>Regularizing these values by using SoftMax, i.e., regularizing the entirety of the calculated target enhanced covariance matrix, and converting the dimensions of the target enhanced covariance matrix toThe covariance matrix of the target enhancement before processing is two-dimensional, and the covariance matrix of the target enhancement is converted into three-dimensional after conversion; wherein the first dimension wh represents the number of covariance matrices of the target enhancement, which is equal to the feature size; the second dimension w represents the covariance matrix of the target enhancement +.>Is a length of (2); the third dimension h represents the covariance matrix of the target enhancement +.>Is a width of (c). Thus (S)>The meaning represented mathematically is the relationship between a pixel and all pixels at other locations. It can be written as:

Wherein, as follows, the dot product of the matrix; at the same time, a 1X 1 convolution operation is adopted to integrate the relation matrix of pixels between different channels and convert the channels intoAt this time, a kind of can is generatedMapping in a relationship that enlarges the similar locations of the homogeneous pixels and the object, which can be used to emphasize similar texture representations of the object at the global level, further enhancing valuable texture features; finally, by adding->And f _g Multiplying to obtain valuable texture features and restoring the feature channels to c using a 1 x 1 convolution; by adding the captured texture features to the original features f _i Texture perception features are obtained. Specifically, the treatment process is as follows:

wherein,representing matrix element addition.

Step 2: inputting the remote sensing image into a boundary map extractor to extract a binary boundary map, and inputting the boundary map into a boundary perception feature extraction module to further extract boundary perception features of a target; and finally, fusing the extracted features with the basic features extracted in the step 1 to obtain boundary guiding features. The inputting the remote sensing image into the boundary map extracting module for extracting the binary boundary map comprises the following steps: performing sliding extraction on the input remote sensing image by utilizing an edge extraction operator to generate a gradient map of the remote sensing image; preferably, the gradient map is subjected to gating filtering, a threshold value is set, and pixel values smaller than the threshold value are set to be zero, so that noise and useless texture information are removed.

In a specific embodiment of the present invention, the step 2 includes:

the proposed boundary sensing fusion module mainly comprises three parts: boundary map extraction, boundary perception feature extraction and boundary guidance feature fusion. And extracting the boundary map by using a boundary map extraction module. Specifically, the input image is subjected to sliding extraction by utilizing an edge extraction operator, and a gradient map of the image is generated. At this point, a lot of noise and useless detail texture may be contained in the gradient map. A hard threshold based gating filtering method is used to reject these disturbances. Specifically, by defining a set threshold, pixels in the image with pixel values smaller than the threshold are set to 0, so that edges of the image are preserved as much as possible.

Boundary perceptual features are extracted by a boundary feature extraction module, and specifically, for a generated boundary map, bottom features are extracted by a 3×3 convolution layer, a BN layer, a ReLU layer, and a 3×3 convolution layer. A bottleneck structure with asymmetric convolution extracts useful edge features and further suppresses noise. Specifically, first, a 1×1 convolution, a BN layer, and a ReLU layer are used to reduce the dimension of a feature map, and three parallel branches respectively including a 1×3 convolution, a 3×1 convolution, and a 3×3 convolution are used to extract potential edge features, and the extracted potential edge features are fused; then adding 1×1 convolution, BN layer, reLU layer to recover the characteristic channel; then, the underlying features are transferred through the jump connection to be fused with the extracted potential edge features, and the boundary sensing features are generated through 1×1 convolution and Sigmoid activation, and the features contain rich edge information.

And utilizing a boundary guidance feature module to fuse boundary features. Specifically, firstly, reducing the resolution of the boundary sensing features by using a downsampling operation to make the resolution of the boundary sensing features consistent with the basic features of the feature extraction network in the step 1, and fusing the boundary sensing features and the basic features in the form of element sums; then, adding the edge fusion feature generated in the previous step with the basic feature of the feature extraction network by using jump connection to amplify the edge structure in the basic feature, and integrating important information through a 3×3 convolution layer; and secondly, emphasizing important characteristic channels by using a channel attention mechanism, wherein the channel attention mechanism comprises a global average pooling layer, a 1 multiplied by 1 convolution layer and a Sigmoid activation layer and is used for generating channel attention weights, multiplying important information by the edge fusion characteristics through jump connection, and finally adding the important information with the fusion characteristics output by the characteristic pyramid to generate boundary guiding characteristics.

Step 3: the model decouples the extracted texture perception features and boundary guide features, outputs the texture perception features and the boundary guide features, inputs the texture perception features and the boundary guide features into RCNN with task decoupling for double-branch decoupling prediction, and obtains classification and positioning results;

in a specific embodiment of the present invention, the step 3 includes:

And (3) independently inputting the texture perception features and the boundary guiding features generated in the step (1) and the step (2) into a task decoupling RCNN network to perform decoupling category and position prediction, inputting the texture perception features into a classification branch, and inputting the boundary guiding features into a positioning branch.

Specifically, the texture perception feature generated in the step 1 is input to a classification branch in the task decoupling RCNN, and the boundary guiding feature generated in the step 2 is input to a positioning branch in the task decoupling RCNN;

In a specific embodiment, firstly, decoupling NMSs of two branch networks by setting different thresholds to generate suggested areas with different densities; subsequently, the recommended region is decoupled and sampled to obtain a decoupled region of interest, and for the classification branches, a fixed ratio is used to sample quantitative positive and negative samples. For the positioning branch, only a quantitative positive sample is sampled; next, targeted enhancements are made to the two-branch network. Classification is aided by the introduction of more context information by expanding the range of the region of interest. The extended range requires trade-offs, too small a range is insufficient to introduce sufficient information, and too large a range will introduce excessive noise interference. For the positioning branch, a suggested synthesis strategy is adopted to synthesize a high-quality region of interest to assist positioning. Suggesting synthetic strategies And synthesizing the left upper corner and the right lower corner of all the suggested areas and the true labeling areas allocated to each target based on the Weiszfeld algorithm, and searching the geometric center of the discrete sample points in an iterative re-weighted least square mode. Specifically, assume that the reference point to be fitted is x _i The algorithm attempts to find a center point y by iterating continuously so that the distance between the point and the reference point is minimized.

Wherein II ₂ Represents L ₂ The norm, j, represents the number of reference points, D represents the sum of the distances between the synthesized center point y and the reference point x, and argmin represents the parameter value y, which minimizes the sum. Solving in continuous space, the above equation will take the minimum at the extremum according to the optimization theory. Thus, bias y:

solving the above equation, finding a center point which does not coincide with any reference point.

Where k represents the number of iterations. And (3) continuously iterating to obtain a central point which is the fitted geometric center, wherein D is smaller than a specified value. And synthesizing a suggested area close to the real labeling frame through fitting the upper left corner and the lower right corner of the frame, training and positioning a branch network, accelerating the convergence of the model and improving the stability of the model.

In a specific embodiment of the present invention, a system for detecting a perceived texture and a boundary target of a weak target of a remote sensing image is further provided, where the system for detecting a perceived texture and a boundary target of a weak target of a remote sensing image includes:

Compared with the prior art, the invention discloses a unique texture and boundary sensing network to solve the problem of weak and small targets in remote sensing images. In order to mine deep information, a texture perception enhancement module and a boundary perception fusion module are provided. The former can fully explore the texture details inside the target, which builds the link between features through the calculated covariance matrix, while highlighting the texture differences between the target and the background. The latter introduces additional boundary cues to highlight the spatial position of the object. By exploring high-frequency detailed information such as textures, boundaries and the like of key targets, key features of weak and small targets are supplemented. The combination of the two modules enhances the perceptibility of the network to the weak and small targets. In order to relieve task unfocused phenomena caused by winding between classification and regression, task-decoupled RCNN is proposed to train two branch networks independently, thereby realizing fine prediction of classification and positioning. The method provided by the invention can be used for mining the information of the remote sensing weak and small target which is difficult to be highlighted, and further improving the expression of the weak and small target, so as to improve the detection performance of the weak and small target.

The present invention will be further described with reference to specific examples and drawings.

In a specific embodiment of the invention, a flow of a sensing texture and boundary target detection method for a weak and small target of a remote sensing image is shown in fig. 1, a remote sensing image with the weak and small target is specifically input into a feature extraction network to extract basic features, the basic features are input into a texture sensing enhancement module to extract texture sensing features, the remote sensing image with the weak and small target is input into a boundary map extraction module to extract a boundary map of the image, a boundary sensing fusion module is utilized to extract boundary guiding features according to the boundary map, the texture sensing features and the boundary guiding features are independently input into a task decoupling RCNN network, a task specific region of interest is acquired by adopting a decoupling sampling strategy, training of classification and regression networks is enhanced by utilizing a decoupling suggestion expansion strategy and a suggestion synthesis strategy, and the trained network can effectively face detection of the weak and small target of the remote sensing image.

Fig. 2 is a remote sensing image of a weak and small target in an embodiment of the present invention, and it can be seen from the figure that the target of the image presents a small size due to a very high imaging height and an imaging view angle of a bird's eye view, and meanwhile, the information of the targets is greatly suppressed due to uneven illumination and shielding of a cloud layer, so that weak feature expression is presented. These targets are easily submerged in a broad background, resulting in the loss of critical information such as texture, boundaries, colors, etc., and in addition to further compression processing by the network, these weak targets are easily ignored by the network, causing a large range of missed detection. Aiming at the key problem of weak and small target feature deletion, the invention provides a key technology for excavating texture and boundary features, and effectively improves the detection accuracy of the weak and small target.

Fig. 3 is an overall schematic diagram of a remote sensing image-oriented weak and small target detection method according to a specific embodiment of the present invention, and as can be seen from the figure, the method provided by the embodiment of the present invention mainly includes three parts, namely, a first texture perception enhancement module: the method is responsible for exploring the texture mode of the image and extracting the texture perceived characteristics; second, a boundary perception fusion module: the method is in charge of fusing the extracted boundary sensing characteristics with basic characteristics and generating boundary guiding characteristics; third, task decoupling RCNN network: and the method is responsible for decoupling texture perception features and boundary guidance features, and realizes detection of weak and small targets of the remote sensing image by utilizing the decoupled features in combination with a decoupling sampling strategy, a decoupling suggestion expansion strategy and a suggestion synthesis strategy to train classification and regression network in a targeted manner.

FIG. 4 is a schematic diagram of a texture enhancement module according to an embodiment of the present invention, specifically, taking a fused feature f with a resolution of 256×100×100 output by a feature pyramid as an example (i.e., the fused feature f mentioned above) _i I is 3,4,5, taking here the case of i=3 as an example), first, the feature map integrates the features by 1×1 convolution, which is subject to three parallel branches, and reduces the dimensions to 64, resulting in feature f _θ ,f _g And feature f _θ ,/>Reduce the blood-lipid level to->Next, f is calculated _θ Features of transpose->The covariance matrix sigma between the two pixels is used for capturing the correlation between pixels at different positions in a characteristic dimension space, and a calculation formula is as follows:

where (x, y) represents the position of the pixel,representing the multiplication of the elements. However, the above steps can only calculate the correlation of all pixels without difference, and cannot focus the pixel correlation inside the target. For this purpose, a spatial attention mechanism is introduced to strengthen the position of the object and highlight the pixel relationship inside the object. First, a global average pooling of channel dimensions is performed on the input features f,and 3 x 3 convolution is employed to remove unwanted noise; subsequently, the Sigmoid activation function is normalized with features to generate attention weights, and finally the weight dimensions are converted to +.>An attention weight map is generated and aligned with the channel dimension of the covariance matrix Σ, which can be expressed as:

in this particular embodiment, the formula is predominantly clear of the c value compared to the original formula (2).

M(x,y)＝Reshape(σ(Conv _3×3 (f _gap (x,y)))) (3)

Wherein GAP represents global average pooling operation, and sigma represents Sigmoid activation; subsequently, the attention map M (x, y) is multiplied by the covariance matrix Sigma (x, y) to obtain a target-enhanced covariance matrix At this time, the dimension of the covariance matrix of the target enhancement calculated is +.> Regularizing these values by using SoftMax, i.e. regularizing the whole of the calculated target enhanced covariance matrix and converting the dimensions of the target enhanced covariance matrix to +.>The covariance matrix of the target enhancement before processing is two-dimensional, and the covariance matrix of the target enhancement is converted into three-dimensional after conversion; wherein the first dimension 10000 represents the number of covariance matrices of the target enhancement, which is equal to the feature size; the second dimension 100 represents the target enhancementCovariance matrix->Is a length of (2); the third dimension 100 represents the covariance matrix of the target enhancement +.>Is a width of (c). Thus (S)>The meaning represented mathematically is the relationship between a pixel and all pixels at other locations. It can be written as:

wherein, as follows, the dot product of the matrix; at the same time, a 1X 1 convolution operation is adopted to integrate the relation matrix of pixels between different channels and convert the channels intoAt this time, a relationship map is obtained that can amplify the similar positions of the homogeneous pixels and the target, which can be used to emphasize the global similar texture representation of the target, further enhancing the beneficial texture features; finally, by adding- >And f _g Multiplication yields valuable texture features and uses a 1 x 1 convolution to restore the feature channel to 256. By adding the captured texture features to the original features f, texture-aware features are obtained. The treatment process is as follows:

wherein,representing matrix element addition.

Fig. 5 to fig. 7 are a schematic diagram of boundary map extraction, a schematic diagram of boundary sensing feature extraction, and a schematic diagram of boundary sensing feature fusion in a boundary sensing fusion module according to an embodiment of the invention. Specifically, assuming that an input image is I (x, y), as an alternative scheme, the embodiment of the invention takes a Sobel kernel as an example to extract a directional gradient map in a sliding manner in the horizontal and vertical directions of I (x, y); the gradient is the difference between adjacent pixels, and the pixel difference in the region where the gradient is severe is large, which is generally expressed as the edge inside the image. And superposing the horizontal gradient map and the vertical gradient map to obtain a boundary map G (x, y) of the image. However, there are many discrete noise points in G (x, y) and texture details of the disturbances, which are filtered using gating to obtain a fine boundary mapTo support extraction of boundary-aware features, which can be expressed as:

wherein λ is a set threshold, which is set to 0.15 in the embodiment of the present invention. When (when) When the pixel value of (2) is smaller than 0.15 times the maximum gray value, it is set to 0. Further extracting boundary perception features by using a boundary feature extraction module, and extracting bottom features f by using a 3×3 convolution layer, a BN layer, a ReLU layer and a 3×3 convolution layer _base . A bottleneck structure with asymmetric convolution, symmetric convolution, is used to extract useful edge features and further suppress noise. Specifically, firstly, a 1×1 convolution layer, a BN layer and a ReLU layer are used for reducing the dimension of a feature map, and three parallel branches respectively comprising 1×3 convolution, 3×1 convolution and 3×3 convolution are utilized to extract and fuse edge features; then, adding a 1×1 convolution layer, a BN layer and a ReLU layer to recover the characteristic channel; then, the basic feature f is transferred by a jump connection _base Fused with the fused edge features to generate boundary perception features B through a 1X 1 convolution layer and a Sigmoid activation layer _f The feature contains rich edge information.

B _f ＝σ(Conv _1×1 (f _bou )) (12)

Wherein Bottleneck represents two 1×1 convolutional layers from front to back, BN layer, reLU layer, f _1×3 ,f _3×1 ,f _3×3 Representing edge features extracted by asymmetric and symmetric convolutions, respectively. The boundary guidance feature module is further utilized to fuse boundary features, and the extracted edge perception feature B is used as information compensation _f And basic feature C _i Fusing, i.e {2,3,4,5}, to achieve effective emphasis of edges and integrating them into the output P of the FPN _i To further highlight edge expression of the feature. Specifically, the resolution of the boundary sensing feature is reduced by using a downsampling operation to be consistent with the basic feature of the feature extraction network, and the basic feature and the boundary sensing feature are fused in the form of element sum, which can be expressed as:

wherein f _i Representing the fused features, D representing the downsampling operation; then, adding the fused features with the basic features of the feature extraction network by using jump connection to amplify the edge structure in the features, and integrating important information by 3×3 convolution; second, the important feature channels are emphasized with a channel attention mechanism comprising a global average pooling layer, a 1 x 1 convolution layer, a Sigmoid activation layer, features f that combine the generated channel attention weights with the fused ones by a jump connection _i Multiplying and amplifying the characteristics of the important channels; finally, fusing the feature P with the feature pyramid _i The addition generates boundary-directed features, which can be expressed as:

FIG. 8 is a schematic diagram of a task decoupled RCNN network comprising two separate classification and positioning branches in accordance with an embodiment of the present invention. After the task decoupling RCNN network obtains the decoupling characteristics, the task decoupling RCNN network firstly performs decoupling sampling to generate specific suggestions for different tasks so as to prevent information interaction between the tasks. Specifically, the task decoupling RCNN network first executes NMS with different thresholds to obtain suggested regions of different distributions. The embodiment of the invention sets the NMS threshold value of classification to 0.7, and the NMS threshold value of positioning to 0.85. After the suggested areas with different densities are obtained, decoupling sampling is carried out on the suggested areas, and the task-specific interested areas are obtained. For classification branches, the embodiment of the invention randomly samples 256 training samples and maintains 1:3, the positive and negative sample ratio enables the network to fully learn the foreground and background information. For locating branches, the embodiment of the invention randomly samples only 128 positive samples from the positive sample set to train the regression branches. Second, optimization is performed for a particular task. For the classification branch, a decoupled suggested expansion strategy is implemented to absorb more context information for the region of interest to assist in classification, and embodiments of the present invention expand the region of interest to a 1.3-fold range. For positioning branches, by suggesting a synthetic strategy, more high quality samples are added to the network to facilitate training of the positioning network. Finally, after the independent ROI alignment operation is performed to unify the sizes, the decoupled region of interest is input into an independent full-connection layer to realize the classification and positioning prediction of the target.

FIG. 9 is a schematic diagram of a proposed composition strategy according to an embodiment of the present invention, which is based on Weiszfeld algorithm, and synthesizes all the proposed regions allocated to each target with the upper left corner and the lower right corner of the true labeling region, and finds the geometric center of the discrete sample points by means of iterative re-weighted least square. In particular, the present embodiment assumes that there are 4 reference points x to be fitted _i I epsilon {1,2,3,4}, by algorithmThe iteration is continued to try to find a centre point y such that the distance between this point and the 4 reference points is minimized.

Wherein II ₂ Representing the L2 norm, argmin represents the parameter value y which minimizes the sum. Solving in continuous space, the above equation will take the minimum at the extremum according to the optimization theory. Thus, bias y:

solving the above equation, and solving a geometric median point, wherein the median point does not coincide with any reference point.

/>

Where k represents the number of iterations. Through continuous iteration, the obtained median point is the fitted geometric median point, and the obtained median point is the geometric median point, so that the formula 15 is smaller than a specified value. And synthesizing a high-quality region of interest close to the real labeling frame by fitting the upper left corner and the lower right corner of the frame to train and position the branch network, accelerating the convergence of the model and improving the stability of the model.

Fig. 10 is an exemplary diagram of the detection result of a weak target in a remote sensing image by using the target detection method according to the embodiment of the present invention, and as shown in the drawing, the targets exhibit weak details due to the blockage of illumination and shadow, and by adding to their small size, so that the targets are difficult to be effectively predicted, especially for buildings, vessels, cable towers, self-constructed houses, etc. in the drawing. The remote sensing image target detection method for the weak targets can still effectively recall the weak targets, accurately identify the types and the positions of the targets, and fully demonstrate the detection capability of the method for the weak targets.

While embodiments of the present invention have been illustrated and described above, it will be appreciated that the above described embodiments are illustrative and should not be construed as limiting the invention. Variations, modifications, alternatives and variations of the above-described embodiments may be made by those of ordinary skill in the art within the scope of the present invention.

The above embodiments of the present invention do not limit the scope of the present invention. Any other corresponding changes and modifications made in accordance with the technical idea of the present invention shall be included in the scope of the claims of the present invention.

Claims

1. The method for detecting the perceived texture and the boundary target of the weak target facing the remote sensing image is characterized by comprising the following steps of:

2. The method for detecting a perceived texture and a boundary target for a weak and small target of a remote sensing image according to claim 1, wherein the step 1 of inputting the basic feature into a texture perception enhancement module, the extracting texture perception features comprises: and inputting the basic features into a feature pyramid to obtain fusion features, inputting the fusion features into a texture perception enhancement module, and extracting texture perception features.

3. The method for detecting perceived textures and boundary targets for weak targets of remote sensing images according to claim 2, wherein the step 1 comprises:

4. The method for detecting a perceived texture and a boundary target for a weak and small target of a remote sensing image according to claim 3, wherein the inputting the fusion feature into the texture perception enhancement module, the extracting texture perception features comprises:

subjecting the fused feature map to 1×1 convolution of three parallel branches to integrate features and reduce the dimension to c ₁ Obtaining the processed characteristic f _θ ,f _g And feature f _θ ,/>Reduce the blood-lipid level to->

Calculating f _θ Features with transpositionThe covariance matrix sigma between the two pixels captures the correlation among pixels at different positions in a characteristic dimension space, and a calculation formula is as follows:

M(x,y)＝Reshape(σ(Conv _3×3 (f _gap (x,y))))

Calculated covariance of target enhancementThe dimensions of the matrix are

Wherein the first dimension wh represents the number of covariance matrices of the target enhancement, which is equal to the feature size; the second dimension w represents the covariance matrix of the target enhancementIs a length of (2); the third dimension h represents the covariance matrix of the target enhancement +.>Is a width of (2);

wherein, as follows, the dot product of the matrix;

By combining And f _g Multiplying and restoring the characteristic channel to c by using 1×1 convolution to obtain texture characteristics;

5. The method for detecting a perceived texture and boundary target for a small target of a remote sensing image according to claim 4, wherein the capturing is performed by adding a captured texture feature to the fusion feature f _i In the method, the texture perception characteristic is obtained through the following formula:

wherein,representing matrix element addition.

6. The method for detecting a perceived texture and a boundary target for a weak and small target of a remote sensing image according to claim 1, wherein the inputting the remote sensing image into the boundary map extraction module to extract a binary boundary map comprises: and performing sliding extraction on the input remote sensing image by utilizing an edge extraction operator to generate a gradient map of the remote sensing image.

7. The method for detecting the perceived texture and boundary target of the weak and small target of the remote sensing image according to claim 6, wherein the gradient map is subjected to a gating filter process, a threshold value is set, and pixel values smaller than the threshold value are set to zero so as to remove noise and useless texture information.

8. The method for detecting a perceived texture and a boundary target for a weak and small target of a remote sensing image according to claim 1, wherein the inputting the boundary map into the boundary feature extraction module, the extracting boundary perceived features comprises:

9. The method for detecting a perceived texture and a boundary target for a weak and small target of a remote sensing image according to claim 1, wherein the fusing the boundary perceived feature and the basic feature by the boundary guiding feature module, and obtaining the boundary guiding feature comprises:

10. The method for detecting perceived textures and boundary targets for weak targets in remote sensing images according to claim 1, wherein the step 3 comprises:

11. The sensing texture and boundary target detection system for the weak and small targets of the remote sensing image is characterized by comprising the following components: