CN114255385B

CN114255385B - Optical remote sensing image ship detection method and system based on sensing vector

Info

Publication number: CN114255385B
Application number: CN202111557818.XA
Authority: CN
Inventors: 李润生; 潘超凡; 胡庆; 牛朝阳; 刘伟; 许岩
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-10-04
Anticipated expiration: 2041-12-17
Also published as: CN114255385A

Abstract

The invention belongs to the technical field of target identification, and particularly relates to a method and a system for detecting an optical remote sensing image ship based on a sensing vector, which are used for constructing a rotating target detection network model, wherein the rotating target detection model comprises the following components: the system comprises a feature extraction module for extracting the trunk features of an input image, a feature fusion module for fusing feature maps, a plurality of edge perception vector detection heads for learning an object bounding box of the input image, and an attention mechanism module arranged between the feature fusion module and the plurality of edge perception vector detection heads and used for guiding the object bounding box; and training and optimizing the rotating target detection network model by using the sample data of the optical remote sensing image so as to extract the target in the optical remote sensing image to be detected by using the rotating target detection network model after training and optimizing. The method solves the problems of large interference of the near-shore scene and the like, utilizes the attention mechanism to know the bounding box for learning, improves the robustness of the detection model, and is convenient for the application of the actual scene.

Description

Optical remote sensing image ship detection method and system based on sensing vector

Technical Field

The invention belongs to the technical field of target identification, and particularly relates to a method and a system for detecting an optical remote sensing image ship based on a sensing vector.

Background

The optical remote sensing image ship target detection task plays an important role in obtaining the information of opposite ships and providing decision reference for commanders of our parties. The current target detection technology based on deep learning can be classified into horizontal frame detection and rotational frame detection according to whether target angle information is considered. The horizontal frame detection represents the target by predicting the horizontal minimum bounding rectangle of the target, and the development of the horizontal frame detection approximately goes through four stages of a sliding window method, a selective search method, a Region generation Network (RPN) and an anchor-free frame detection. Horizontal box detection is applicable to the situation of small aspect ratio and sparse arrangement of targets. For some targets with large aspect ratios and densely arranged scenes, the horizontal frames may have high overlap with each other, and some prediction frames may be excluded when not greatly suppressed, resulting in missed detection. In addition, when a large aspect ratio target in any direction is predicted, the surrounding frame contains a large number of background areas, and the interference of noise information of the surrounding frame can reduce the network detection performance. The rotating frame takes the angle information of the target into consideration, and the problems can be well solved. Current rotating target detectors can be broadly classified into two categories according to whether a predefined candidate box is employed: anchor-Based (Anchor-Based) and Anchor-Free (Anchor-Free) detectors. The Anchor-Based rotation detector extracts candidate target regions according to the prior Anchor frame and completes classification and regression processes, and the Anchor-Free detector realizes modeling of targets through key point prediction.

Because the imaging of the optical remote sensing image is a overlooking visual angle, the ship target in the optical remote sensing image has the characteristics of any direction and multiple scales. For the ship rotating target with a large length-width ratio, the target can be better attached by adopting a rotating frame detection method, and the interference of a background area is avoided. Aiming at the characteristics, the method is based on a ship detection full convolution network DiamondNet extracted from key points, and has the basic idea that a central point, two head and tail points and two side wing points are designed on the basis of considering the shape characteristics of a ship to derive effective ship representation, and the key points are grouped into corresponding target examples through a clustering algorithm to complete ship target detection; for another example, a ship target is modeled into a central point and a head point, a feature map is extracted by a trunk network and a direction invariant model (OIM) and then the key points are predicted to obtain two-point representation of the ship target, and a priori information is utilized to refine and refine a prediction result. In addition, the task of detecting the ship target by using the optical remote sensing image still faces the problem of background noise, and particularly, buildings, wharfs and the like in an offshore scene bring great interference to detection, so that the target detection rate of the offshore scene is influenced.

Disclosure of Invention

Therefore, the invention provides a method and a system for detecting an optical remote sensing image ship based on a sensing vector, solves the problems of large interference of an offshore scene and the like, utilizes an attention mechanism to learn a bounding box, improves the robustness of a detection model, and is convenient for practical scene application.

According to the design scheme provided by the invention, the optical remote sensing image ship detection method based on the perception vector comprises the following contents:

constructing a rotating target detection network model, wherein the rotating target detection model comprises: the system comprises a feature extraction module, a feature fusion module, a plurality of edge perception vector detection heads and an attention mechanism module, wherein the feature extraction module is used for extracting trunk features of an input optical remote sensing image, the feature fusion module is used for fusing a trunk feature map, the edge perception vector detection heads are used for learning a target bounding box of the input optical remote sensing image, and the attention mechanism module is arranged between the feature fusion module and the edge perception vector detection heads and used for guiding the target bounding box;

and training and optimizing the rotating target detection network model by using the sample data of the optical remote sensing image, and extracting the target in the optical remote sensing image to be detected by using the rotating target detection network model after training and optimizing.

As the ship detection method based on the optical remote sensing image of the perception vector, further, the attention mechanism module comprises a space attention mechanism unit for extracting space dimension attention and a channel attention mechanism unit for extracting channel dimension attention weight, and the space attention mechanism unit and the channel attention mechanism unit are combined in parallel.

As the optical remote sensing image ship detection method based on the perception vector, further, the attention mechanism module re-weights the features by multiplying the weight tensor and the fused feature map pixel by pixel to obtain an information continuous feature map.

As the ship detection method based on the optical remote sensing image of the sensing vector, further, the channel attention mechanism unit is composed of an SE attention module for obtaining the attention weight of each channel, wherein the SE attention module obtains the attention weight of each channel by learning the correlation among the channels of the channel domain and scoring the importance of the channels.

As the optical remote sensing image ship detection method based on the perception vector, further, the channel attention weight obtaining process of the SE attention module comprises: firstly, carrying out global average pooling operation on a spatial domain to obtain a one-dimensional vector consisting of the number of characteristic channels; and then, compressing and expanding the one-dimensional vector by two stages of fully connected layers for information interaction between channels, and acquiring the attention weight of the channel by a sigmoid activation function.

As the optical remote sensing image ship detection method based on the perception vector, further, aiming at the fused feature map, a space attention mechanism unit obtains a double-channel saliency map by performing maximum pooling and average pooling in channel dimensions, the maximum pooled channel saliency map is selected as the weight of the space attention mechanism unit, meanwhile, the double-channel saliency map performs feature fusion through a convolution kernel to obtain a single-channel feature map, and a cross entropy loss function used for attention loss is calculated through a sigmoid function after activation and a true value mask map.

As the optical remote sensing image ship detection method based on the perception vector, further, the cross entropy loss function is expressed as:

wherein w and h represent the width and height of the feature map and the truth mask map,

representing true values of the mask pixel value, u _ij Representing single channel significant map pixel values.

As the optical remote sensing image ship detection method based on the perception vector, a weight factor for balancing the distribution of positive and negative samples is further added to a cross entropy loss function to ensure the learning of a target area.

As the optical remote sensing image ship detection method based on the perception vector, the invention further generates a true value mask diagram based on an eight parameter system, and the method specifically comprises the following steps: firstly, establishing a two-dimensional plane rectangular coordinate system, and marking an origin of coordinates and a vertex coordinate of a marking frame; secondly, taking the vertex coordinates of the labeling frame as a unit, establishing two linear equations by using a first group of opposite sides to obtain a region enclosed by two straight lines, establishing two linear equations by using a second group of opposite sides to obtain the enclosed region, and taking the intersection of the two regions as a labeled target region; and assigning the pixel value of the target area in the intersection of the two areas as 1, and assigning the pixel values of the other areas as 0 to obtain a true value mask diagram.

Further, the invention also provides a sensing vector-based optical remote sensing image ship detection system, which comprises: a model acquisition module and an object detection module, wherein,

the model acquisition module is used for constructing a rotating target detection network model, wherein the rotating target detection model comprises the following components: the system comprises a feature extraction module, a feature fusion module, a plurality of edge perception vector detection heads and an attention mechanism module, wherein the feature extraction module is used for extracting trunk features of an input optical remote sensing image, the feature fusion module is used for fusing a trunk feature map, the edge perception vector detection heads are used for learning a target bounding box of the input optical remote sensing image, and the attention mechanism module is arranged between the feature fusion module and the edge perception vector detection heads and used for guiding the target bounding box;

and the target detection module is used for training and optimizing the rotating target detection network model by using the optical remote sensing image sample data and extracting a target in the optical remote sensing image to be detected by using the rotating target detection network model after training and optimization.

The invention has the beneficial effects that:

according to the method, an attention module is added after a feature fusion network of a detection model is used for enhancing target area information and weakening interference of irrelevant background information; and the geometric relation among the boundary sensing vectors is utilized to strengthen the coupling relation among the vectors through an automatic supervision loss function, the situation that the surrounding frame has irregular shapes due to vector independence is prevented, the situation that buildings, wharfs and the like in a near-shore scene bring large interference to detection is solved, and the target detection efficiency is improved. By combining the experimental result, in the HRSC2016 data set L2 level detection task, the average precision of the detection result in the scheme is improved by 6.91% compared with the existing network detection method, the interference of background noise is effectively inhibited, the false alarm rate of the target detection of the near-shore ship is reduced, and the method has a good application prospect.

Description of the drawings:

FIG. 1 is a flow chart of an optical remote sensing image ship detection method based on a perception vector in an embodiment;

FIG. 2 is a schematic diagram of a conventional network for detecting a rotating object in an embodiment;

FIG. 3 is a schematic diagram of an improved network structure according to the present disclosure in the embodiment;

FIG. 4 is a schematic diagram of a generation process of a ground truth mask diagram in the embodiment;

FIG. 5 is a schematic diagram of the generation effect of a ground truth mask in the embodiment;

FIG. 6 is a schematic diagram of a potential problem of BBAVectors in the embodiment;

FIG. 7 is a schematic diagram of vector supervision loss calculation in an embodiment;

FIG. 8 is a schematic diagram of vector supervision loss calculation in an embodiment;

FIG. 9 is a graphical illustration of the accuracy versus recall of 4 targets in a real-time example;

FIG. 10 is a comparative illustration of the detection results in the examples;

FIG. 11 is a schematic diagram showing the comparison of the detection results of a port at a certain place in the embodiment;

fig. 12 is a schematic diagram of spatial point distribution in the embodiment.

The specific implementation mode is as follows:

in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

Aiming at the problems of large interference, high false alarm rate and the like in the detection of an offshore ship target in an optical remote sensing image, the embodiment of the invention provides a ship detection method based on an optical remote sensing image of a sensing vector, which is shown in a figure 1 and comprises the following contents:

s101, constructing a rotating target detection network model, wherein the rotating target detection model comprises the following components: the system comprises a feature extraction module, a feature fusion module, a plurality of edge perception vector detection heads and an attention mechanism module, wherein the feature extraction module is used for extracting trunk features of an input optical remote sensing image, the feature fusion module is used for fusing a trunk feature map, the edge perception vector detection heads are used for learning a target enclosure frame of the input optical remote sensing image, and the attention mechanism module is arranged between the feature fusion module and the edge perception vector detection heads and is used for guiding the target enclosure frame;

s102, training and optimizing the rotating target detection network model by using the optical remote sensing image sample data, and extracting the target in the optical remote sensing image to be detected by using the rotating target detection network model after training and optimizing.

An attention mechanism module is added on the basis of an Anchor-Free type rotation detection network based on bounding Box edge perception Vectors (Box Boundary-Aware Vectors, BBAVectors) to enhance target information, and the problem of large interference of a near-shore scene is solved.

In order to solve the problem of unbalance of positive and negative samples in the anchor-based two-stage rotating target detection network, the method can be expanded on the basis of the CenterNet, and the single-stage rotating target detection network is shown in FIG. 2. ResNet101 can be adopted as a main feature extraction network, a feature map pyramid network obtains a 4-time down-sampling feature map C2 through three times of feature fusion, and then four detection heads are input to learn a target enclosure frame. The number of channels of the central point heat map is the number of target categories used for positioning the central point of the object, and the value of a pixel point on the heat map is the confidence of the point in the corresponding channel (category). The loss function of the training heatmap is the variant focal loss, as shown in equation (1):

in the formula: n is the number of targets;

is true value; p is a radical of _i Is a predicted value; the hyper-parameters α and β are used to control the weight of each pixel.

The central point offset map is used for central point regression, and the training loss is as follows:

in the formula:

represents a true value offset; o _k Is the predicted offset.

The bounding box parameters include four BBAVectors and the width and height of the horizontally circumscribed rectangle: b = [ t, r, b, l, w = _e ,h _e ]. The training loss is:

when the rotating frame is close to the coordinate axis, vector types are difficult to distinguish, and the rotating direction characteristic diagram can better solve the problem. When the intersection ratio of the rotating frame and the horizontal circumscribed rectangle reaches a certain threshold value, the predicting frame is the circumscribed rectangle, otherwise, the predicting frame is the rotating frame. The rotation direction profile is defined as:

the loss function is as in equation (5):

in the scheme, a supervised attention mechanism module is added between the feature fusion network and the detection head, and the network learning is guided through a truth value mask diagram, so that more attention is focused on a ship target area, and the interference of background noise is weakened. Further, the attention mechanism module includes a spatial attention mechanism unit for spatial dimension attention extraction and a channel attention mechanism unit for channel dimension attention weight extraction, both of which are combined in parallel.

For the detection of the near-shore ship targets, the great challenge is that the interference is high, and some on-shore targets such as houses, docks, wharfs and the like are easy to be detected by mistake. The baseline model is improved to improve the detection accuracy, and referring to fig. 3, the input optical remote sensing image is scaled to 608 × 608 size, and the dimension is 608 × 3; and performing 5-time two-dimensional convolution operation (downsampling) on the zoomed optical remote sensing image by using a trunk feature extraction network ResNet101, connecting a feature pyramid structure and a trunk network by using a transverse path, and fusing an upsampling result of a high-level feature map and a trunk network feature map to obtain the feature map with low-level position information and high-level semantic information. The output C2 dimension is 152 x 256. C2 is input into the supervised multidimensional attention module, for two branches: channel attention branch, spatial attention branch. And performing global average pooling on the C2 in the spatial dimension by the channel attention branch pair to obtain a characteristic map of 1 x 256, then obtaining a characteristic map of 1 x 16 by a first fully-connected layer, activating relu of the characteristic map, and obtaining a characteristic map of 1 x 256 by a second fully-connected layer, wherein the characteristic map is activated by a sigmoid function to obtain channel attention branch weights. And the spatial attention branch performs maximum pooling and average pooling on the C2 in the channel dimension respectively, and selects the maximum pooling saliency map subjected to the softmax function as the spatial attention branch weight. And multiplying the two branch weights and C2 pixel by pixel to obtain a relatively enhanced feature map F of the target area. F respectively inputs four detection heads, namely four branches. In each branch, F obtains a characteristic diagram of 256 channels through two-dimensional convolution, then is activated by using relu, and finally obtains a parameter prediction characteristic layer through a second layer convolution operation on the activated characteristic layer.

The attention mechanism refers to the function of focusing the interested target by the human brain and aims to make the neural network model have different attention degrees to different areas. For the target area, the attention mechanism gives higher characteristic weight to the target area, and meanwhile, the weight value of the background area is suppressed, so that more attention resources are invested, more target information is obtained, and the target identification capability of the network is improved. And combining the space attention mechanism and the channel attention mechanism in parallel, respectively extracting attention weights of a space dimension and a channel dimension, and multiplying the weight tensor and the input feature map C2 pixel by pixel to realize the re-weighting of the features to obtain the feature map with continuous information. Different from a serial connection mode, the parallel combination can avoid spatial information loss caused by global average pooling of a channel attention mechanism, and effectively enhance the learning capability of the network on target characteristics.

As the optical remote sensing image ship detection method based on the perception vector in the embodiment of the invention, further, the attention mechanism module re-weights the features by multiplying the weight tensor and the fused feature map pixel by pixel to obtain the information continuous feature map. Further, the channel attention mechanism unit is composed of an SE attention module for obtaining an attention weight of each channel, wherein the SE attention module obtains the attention weight of each channel by learning the correlation among the channels of the channel domain and scoring the importance of the channels. Further, the channel attention weight acquisition process of the SE attention module includes: firstly, performing global average pooling operation on a spatial domain to obtain a one-dimensional vector consisting of the number of characteristic channels; and then, compressing and expanding the one-dimensional vector by using two stages of fully connected layers for information interaction between channels, and acquiring the attention weight of the channel by using a sigmoid activation function.

The channel attention branch consists of an SE attention module which learns the correlation between channels for a channel domain and scores the importance of the channels to obtain the attention weight of each channel. Specifically, the SE module is divided into two parts, the first part is to perform global average pooling operation on the spatial domain to obtain a1 × 1 × C one-dimensional vector, where C is the number of feature channels. The principle is that the channel attention weight is applied to the whole space dimension, so that the whole information of a space domain is required to be used for calculation, the whole information is compressed to be 1 × 1 to obtain a global receptive field, and the synthesis of the global information is realized. The second part is an excitation operation, the compression and expansion recovery process of the one-dimensional vector is realized through two-stage full-connection layers with the compression and expansion ratio of 16, and finally the channel attention weight is obtained through a Sigmoid activation function. The purpose of the two-stage full connection layers is to realize information interaction between the channels, so that the weight is calibrated according to the learned channel correlation, and the one-sidedness of weight learning caused by channel independence is avoided.

As an optical remote sensing image ship detection method based on perception vectors in the embodiment of the invention, further, for the fused feature map, a spatial attention mechanism unit obtains a dual-channel saliency map by performing maximum pooling and average pooling in channel dimensions, selects the maximum pooled channel saliency map as the weight of the spatial attention mechanism unit, performs feature fusion on the dual-channel saliency map through a convolution kernel to obtain a single-channel feature map, and calculates a cross entropy loss function used as attention loss through a sigmoid function after activation and a true value mask map. Further, a weight factor for balancing the distribution of positive and negative samples is added to the cross entropy loss function to ensure the learning of the target region.

The spatial attention branch performs maximum pooling and average pooling on the input feature map C2 in channel dimensions to obtain a two-channel saliency map, and target information of the saliency map is effectively enhanced relative to a noise region after the target information passes through a softmax function. Different from average pooling, the maximum pooling abandons the secondary important information of channel dimensionality, only selects the most representative features, has high contrast and has strong strengthening effect on the features and inhibiting effect on background noise. The saliency map of the largest pooled channel may be selected as a weight for the spatial attention branch, taking into account the characterization capabilities of the features. Meanwhile, the two-channel saliency map is subjected to feature fusion through a convolution kernel with the size of 7 multiplied by 7 to obtain a single-channel feature map, and cross entropy loss is calculated as attention loss through the sigmoid function after activation and a truth-value mask map. The loss can guide the network to pay attention to the target area in a targeted manner, and the efficiency of supervision training is exerted.

For a single sample, the cross entropy loss function is defined as follows:

in the formula (I), the compound is shown in the specification,

is a true value label; u is the prediction probability. The cross entropy loss function for all samples is then:

in the formula, N is the number of samples.

When calculating attention loss, the sample space is a two-dimensional tensor region, and the number of samples is h × w, so the cross entropy loss function is:

in the formula, w and h represent the width and height of a characteristic diagram and a truth value mask diagram;

representing a true value mask image pixel value; u. of _ij Representing single channel significant map pixel values.

Because the area ratio of the target area in the image is relatively small and the distribution of positive and negative samples is unbalanced, the distribution of a loss function is inclined to a background area by directly calculating the cross entropy loss, so that the attention degree of a trained network to the target area is low, and the detection performance is poor. In order to ensure effective learning of the target region, a weight factor needs to be added into the loss function to improve the loss weight of the target region and balance the distribution of positive and negative samples. The attention loss function is finally defined as:

in the formula: lambda [ alpha ] ₁ Losing weight for pixel in true value area; lambda [ alpha ] ₀ Weight is lost for background area pixels.

As an optical remote sensing image ship detection method based on a sensing vector in the embodiment of the present invention, further, a true value mask map is generated based on an eight-parameter system, which specifically includes: firstly, establishing a two-dimensional plane rectangular coordinate system, and marking an origin of coordinates and a vertex coordinate of a marking frame; secondly, taking the vertex coordinates of the labeling frame as a unit, establishing two linear equations by using a first group of opposite sides to obtain a region enclosed by two straight lines, establishing two linear equations by using a second group of opposite sides to obtain the enclosed region, and taking the intersection of the two regions as a labeled target region; and assigning the pixel value of the target area in the intersection of the two areas as 1, and assigning the pixel values of the other areas as 0 to obtain a true value mask diagram.

And generating a truth mask map by using a mask map generation algorithm based on an eight-parameter system for supervised training of the attention module. Establishing a two-dimensional rectangular plane coordinate system with the origin of coordinates as the upper left point, as shown in FIG. 4 (a), and the vertex coordinates of the labeling frame are (x) ₁ ,y ₁ )、(x ₂ ,y ₂ )、(x ₃ ,y ₃ )、(x ₄ ,y ₄ ). Firstly, two linear equations are established by the first set of opposite sides to obtain an area enclosed by two straight lines, then two linear equations are established by the second set of opposite sides to obtain the enclosed area, and the intersection of the two areas is the marked target area, as shown in fig. 4 (d). The area is assigned with 1, and the other areas are assigned with 0, so that a true value mask diagram of a picture is obtained, and the effect is shown in fig. 5. The specific implementation algorithm can be designed as follows:

by learning BBAVectors, the network can unify the target location information to the same cartesian coordinate system, avoiding the inconsistency of regression of each parameter. However, this method also has a potential problem, as shown in fig. 6. Since the four vectors are individually learned, there is no constraint relation between them, and the condition for constituting the rectangular frame, i.e., the facies, may not be satisfiedThe neighboring vectors are not necessarily perpendicular, and the inverted vectors are not necessarily collinear, resulting in the finally learned bounding box being a trapezoid. In order to solve the problems, in the scheme, vector supervision loss can be utilized, and the normalization of bounding box learning is ensured through the constraint relation among vectors. The loss function calculation is divided into three parts, namely: l is _t·r 、L _l·b 、L _t·b 。L _t·r Vector combination calculation from fig. 7 (a) for ensuring the one-and two-quadrant boundary vector vertical relationship; l is a radical of an alcohol _l·b The vector combination calculation in FIG. 7 (b) ensures the vertical relationship of the three-quadrant and four-quadrant boundary vectors; l is _t·b The vector combination calculation of FIG. 7 (c) ensures the collinear relationship of the one-quadrant and three-quadrant boundary vectors.

The constraint relation of two perpendicular adjacent vectors can be realized by the combined action of the three groups of losses, and the final vector supervision loss expression is as follows:

L _vec ＝L _t·r +L _l·b +L _t·b (10)

wherein L is _t·r 、L _l·b 、L _t·b Respectively as follows:

in the formula: n is the total number of targets; t, l, b, r are BBAVectors of one, two, three, four quadrants respectively. When loss is calculated, vectors need to be normalized, loss is measured by utilizing the inner product of unit vectors, and the inner product value at the moment is a cosine value of an included angle between two vectors and can be used for representing an angle relation. As shown in FIG. 7, e ₁ And e ₂ In a vertical relationship, d = e ₁ ·e ₂ ＝0；e ₁ And e ₂ When reverse collinearity, d = e ₁ ·e ₂ ＝-1。

The vector constraint loss and the spatial attention loss in the scheme are added into a base line network loss function to form an improved loss function, which can be expressed as follows:

L＝L _h +L _o +L _b +L _α +weight ₁ ·L _att +weight ₂ ·L _vec (12)

in the formula, weight ₁ And weight ₂ The coordination coefficients for attention loss and vector constraint loss, respectively.

Further, based on the above method, an embodiment of the present invention further provides a ship detection system based on an optical remote sensing image of a sensing vector, including: a model acquisition module and an object detection module, wherein,

the model acquisition module is used for constructing a rotating target detection network model, wherein the rotating target detection model comprises: the system comprises a feature extraction module, a feature fusion module, a plurality of edge perception vector detection heads and an attention mechanism module, wherein the feature extraction module is used for extracting trunk features of an input optical remote sensing image, the feature fusion module is used for fusing a trunk feature map, the edge perception vector detection heads are used for learning a target bounding box of the input optical remote sensing image, and the attention mechanism module is arranged between the feature fusion module and the edge perception vector detection heads and used for guiding the target bounding box;

To verify the validity of the scheme, the following further explanation is made by combining experimental data:

the remote sensing ship target data set adopted by experimental verification is HRSC2016, covers near-shore and sea surface scenes, and comprises 1061 images with the size of 300 × 300-1500 × 900, and the total number of targets is 2976. The training set comprises 436 images, the verification set comprises 181 images, and the test set comprises 444 images. The dataset detection task consists of 3 levels: l is ₁ 、L ₂ 、L ₃ 。L ₁ The class target categories are all ships; l is ₂ The level target consists of an aircraft carrier, a battleship, a commercial ship and a submarine; l is ₃ Stage target is L ₂ And (4) classifying the targets, namely classifying the targets by 27 types of targets. Because the ship target in the data set is relatively large, the data set is not cut for preprocessing in order to ensure the integrity and the training effect of the target. Randomly flipping, randomly rotating and randomly following the input image during the training processAnd (3) performing data enhancement processing of machine clipping to prevent an overfitting problem and enhance the robustness of model training. The experimental environment is Windows10 operating system, the CPU is Inter (R) Xeon (R) Gold 5218CPU@2.30GHz,GPU NVIDIA RTX 5000, the video memory is 16G, and the development platform is pytorch 1.7.1+ CUDA10.2. Training parameter settings epoch =100, batch size =8, an initial learning rate of 0.000125, an exponential adjustment strategy for learning rate adjustment, and a learning rate attenuation factor of 0.96, wherein an adaptive moment estimation method Adam is adopted as an optimizer, and a main feature extraction network is trained on the basis of ResNet101 pre-training weights. During training, the input image is scaled to 608 × 608, and the output 4-fold down-sampled feature map C2 has a size of 152 × 152. The hyper-parameter in equation (9) is set to λ ₁ ＝3、λ ₀ And =1. The hyper-parameter setting in equation (12) is weight ₁ ＝1、weight ₂ ＝1。

TABLE 1 improved method vs. base line network Performance

(07 (12) denotes evaluation index using 2007 (2012))

Table 1 shows the scheme, the baseline network and other rotation detection networks in the same field in L ₁ And comparing detection results on the level detection tasks. It can be seen that the method of the present case shows competitive performance in terms of detection accuracy. For the two evaluation indexes of detection accuracy, the evaluation indexes are respectively improved by 1.23% and 3.75% on the basis of a baseline network. In the reasoning speed aspect, the method is 12.68fps, and is only reduced by 0.72fps compared with a base line network. The reason for the slight decrease in inference speed is that the improved network has more attention-making modules than the baseline network, and the convolutional layer and the fully-connected layer in the modules increase the network parameters and the computation amount, so the speed is decreased. In a whole view, the improved network of the scheme is used for detecting the precision and pushingCompared with other detection networks, the two aspects of the physical speed have great advantages, and compared with a base line network, the precision is improved to some extent, so that the effectiveness of the improved method is proved.

For L on HRSC2016 dataset ₂ The class 4 target test set precision-recall rate curve of the improved model of the class detection task is shown in fig. 9, so that the detection effect of the aircraft carrier and the battleship is better, and the detection effect of the aircraft carrier and the battleship is the worst. The main reasons are that the characteristics of an aircraft carrier and a battleship are relatively fixed, the characteristics of commercial ships are various, the detection difficulty is relatively high, and the conditions of false detection and missed detection are easy to occur.

In order to visually compare the detection effects of the baseline network and the scheme detection model, partial near-shore and sea surface test results are respectively extracted. As shown in fig. 10 (a) and 10 (b), for a scene with strong onshore interference and a submarine target, the detection effect of the improved model of the scheme is improved to a certain extent, and for some small targets, the scheme also has good detection performance, and for example, if the submarine is missed or mistakenly detected by the original network, the corresponding target can be correctly identified. For some sea surface targets, the scheme can carry out more accurate detection and classification, partial false alarms and misclassifications may occur in the original network, and the comparison result is shown in fig. 10 (c) and fig. 10 (d).

TABLE 2 improved network to baseline network Performance comparison

Besides the test in the HRSC2016 data set, the scheme also selects a remote sensing image of a certain port of a certain country for comparison and detection. The image size is 5112 × 6352, which can be cut to 608 × 608 sub-image size in this scheme. Considering the object truncation that the cropping may cause, to ensure the integrity of the object, the size of the cropping overlap is set to 300, the edge subgraph is bounded by the original graph, and 357 subgraphs are finally obtained. The overlapping area may bring the situation that the same target corresponds to multiple detection frames, so that sub-graph detection results need to be merged first, and then filtering of redundant detection frames is realized by using non-maximum suppression to obtain a large-graph detection result. As can be seen from fig. 11, for a near-shore scenario, the interference immunity of the improved model has a significant advantage. The three red amplified areas are compared to see that the baseline network misdetects the arrangement of some long strip buildings and houses on the shore as ship targets, and the network model improved by the scheme better avoids building interference, and the false alarm rate is far lower than that of the baseline network. However, both methods have a certain degree of omission, and ships parked close to the wharf are not accurately identified from the yellow enlarged area.

In order to research the contribution of each part of the improved model, ablation experiments are carried out on the supervised attention module and the BBAVectors loss in the embodiment of the scheme, and two groups of experiments are designed according to the hyperparameter in the loss function and the foreground and background weights in the supervision training to explore the optimal parameter combination. Ablation experiments based on L ₂ The level detection task is performed.

Table 3 attention Module impact on network Performance

The results of the experiment are shown in table 3. It can be seen that the contribution of the attention module to the improvement of the detection performance is about 4 percentage points, and the detection results of other three types of targets except the merchant ship target are improved in different degrees. In this case, the merchant ship features are diversified, the intra-class variance is large, and adding the attention module when the training sample is insufficient can lead to over-fitting some detail features and insufficient feature learning for the whole class, and the detection performance is poor. The characteristics of the aircraft carrier and the submarine are relatively single, and the intra-class variance is smaller than that of the commercial ship, so that the attention module is favorable for better learning of target characteristics, and the detection performance of the network is improved.

Whether or not to add L _vec As a control variable, the performance of the loss function and the influence on the detection result are researched through an ablation experiment. To visualize the learning effect of the unpacking frame, the inner product e of the unit vectors in the directions of the three groups of BBAVectors is calculated _t ·e _r 、e _l ·e _b 、e _t ·e _b Conversion to spatial coordinates of pointsExpressed, the conversion relationship is as in equation (13):

in the formula, x, y and z are three-dimensional coordinates of a space point.

The distribution of the spatial points obtained by performing coordinate transformation on the detection results of the test set is shown in fig. 12. As can be seen in FIG. 12, the addition of L _vec The inner product distribution of the later unit vectors is closer to the direction of the origin, which shows that the learned BBAVectors have stronger pairwise vertical relation compared with a base line network, the bounding box is closer to a rectangle, irregularity is reduced to a certain degree, and the fact that the loss function is designed by utilizing the constraint relation among the vectors is proved, so that the learning effect of the bounding box can be effectively improved by training the four BBAVectors in a self-supervision mode.

TABLE 4L _vec Loss impact on network performance

Table 4 shows L _vec The impact on network detection performance can be seen at L ₂ The precision on the level detection task is improved by 5.66%. The aircraft carrier target is improved by 12.95%, the submarine target is improved by 9.62%, and the detection results of the battleship target and the commercial ship target are improved insignificantly and are only improved by 0.32% and 0.18% respectively. Through analysis, the oblique angle deck of the aircraft carrier is considered to generate certain interference on the learning of BBAVectors, and the influence can be weakened better by using the geometrical relationship of adjacent vectors for constraint; the submarine target length-width ratio is large, the side vector learning is difficult, the head-tail vector learning is simpler, and the learning effect can be obviously improved by guiding the regression process through the vertical relation between the head-tail vector and the side vector; apparent appearance characteristics of warship targets and difficult learningThe degree is relatively low, the base line network can achieve better effect, and L is added _vec The loss does not greatly improve the detection performance; the change range of the target scale of the commercial ship is large, fewer training samples cannot effectively learn the target shape, and the influence of the added constraint loss on the detection result is small.

In the embodiment of the scheme, in the model training process, the coordination coefficient coefficients of attention loss and BBAVectors loss in the multi-task loss are set to be 1 by default, and the foreground loss weight and the background loss weight in the attention loss are set to be lambda by default respectively ₁ ＝3、λ ₀ And =1. In order to explore the influence of the coordination coefficient and foreground and background loss weight on the model, several groups of different combinations can be designed according to a control variable method for detecting performance comparison, and the experimental results are shown in tables 5 and 6.

TABLE 5 coordination coefficient influence (λ) ₁ ＝3，λ ₀ ＝1)

TABLE 6 attention loss foreground, background weight impact (weight) ₁ ＝1，weight ₂ ＝1)

As can be seen from Table 5, λ is fixed ₁ ＝3、λ ₀ =1, when the coordination coefficient combination of attention loss and vector loss is 1 and 0.5, the average accuracy of detection is 84.59%, compared to the default combination weight ₁ ＝1、weight ₂ =1 improvement by 0.04%.

As can be seen from Table 6, fixed weight ₁ ＝1、weight ₂ =1, when foreground and background weight is λ ₁ ＝3、λ ₀ The detection performance is optimal when the signal value is 1 and is 84.55 percent. Too low foreground weights may result in insufficient attention to the target region; too high foreground weight can greatly reduce the proportion of background noise during training, and certain noise can prevent the learning of certain characteristics by the networkThe overfitting is generated, so the high foreground weight can weaken and reduce the anti-interference performance and robustness of the detector, and the detection performance is reduced to a certain extent.

The experimental results further show that the scheme can effectively improve the detection performance of the detection network, obviously improve the detection effect of the aircraft carrier and submarine targets, and has better application prospect.

Unless specifically stated otherwise, the relative steps, numerical expressions and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

Based on the foregoing method and/or system, an embodiment of the present invention further provides a server, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method described above.

Based on the above method and/or system, the embodiment of the invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the above method.

In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the following descriptions are only illustrative and not restrictive, and that the scope of the present invention is not limited to the above embodiments: those skilled in the art can still make modifications or changes to the embodiments described in the foregoing embodiments, or make equivalent substitutions for some features, within the scope of the disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An optical remote sensing image ship detection method based on a perception vector is characterized by comprising the following contents:

constructing a rotating target detection network model, wherein the rotating target detection model comprises: the system comprises a feature extraction module, a feature fusion module, a plurality of edge perception vector detection heads and an attention mechanism module, wherein the feature extraction module is used for extracting trunk features of an input optical remote sensing image, the feature fusion module is used for fusing a trunk feature map, the edge perception vector detection heads are used for learning a target enclosure frame of the input optical remote sensing image, and the attention mechanism module is arranged between the feature fusion module and the edge perception vector detection heads and used for guiding the learning of the target enclosure frame;

training and optimizing the rotating target detection network model by using the optical remote sensing image sample data, and extracting a target in the optical remote sensing image to be detected by using the rotating target detection network model after training and optimizing;

the attention mechanism module comprises a spatial attention mechanism unit for spatial dimension attention extraction and a channel attention mechanism unit for channel dimension attention weight extraction, wherein the spatial attention mechanism unit and the channel attention mechanism unit are combined in parallel;

aiming at the fused feature map, a spatial attention mechanism unit acquires a double-channel saliency map by performing maximum pooling and average pooling in channel dimensions, selects the maximum pooled channel saliency map as the weight of the spatial attention mechanism unit, performs feature fusion on the double-channel saliency map through a convolution kernel to acquire a single-channel feature map, and calculates a cross entropy loss function used as attention loss through a sigmoid function after activation and a truth-value mask map;

generating a truth mask diagram based on an eight-parameter system, specifically comprising: firstly, establishing a two-dimensional plane rectangular coordinate system, and marking an origin of coordinates and a vertex coordinate of a marking frame; secondly, taking the vertex coordinates of the labeling frame as a unit, establishing two linear equations by using a first group of opposite sides to obtain a region enclosed by two straight lines, establishing two linear equations by using a second group of opposite sides to obtain the enclosed region, and taking the intersection of the two regions as a labeled target region; assigning the pixel value of a target area in the intersection of the two areas to be 1, and assigning the pixel values of the other areas to be 0 to obtain a true value mask diagram;

the geometric relation among the edge perception vectors is utilized, and the coupling constraint relation among the vectors is learned through an automatic supervision loss function, so that the irregular shape of the bounding box caused by the independence of the edge perception vectors is prevented.

2. The method for detecting the ship based on the optical remote sensing image of the perception vector of claim 1, wherein the attention mechanism module reweighs the features by multiplying the weight tensor and the fused feature map pixel by pixel to obtain the continuous feature map of the information.

3. The ship detection method based on the optical remote sensing image of the perception vector of claim 1, wherein the channel attention mechanism unit is composed of an SE attention module for obtaining the attention weight of each channel, wherein the SE attention module obtains the attention weight of each channel by learning the correlation among the channels of the channel domain and scoring the importance of the channel.

4. The ship detection method based on the optical remote sensing image of the perception vector of claim 3, wherein the channel attention weight obtaining process of the SE attention module comprises the following steps: firstly, performing global average pooling operation on a spatial domain to obtain a one-dimensional vector consisting of the number of characteristic channels; and then, compressing and expanding the one-dimensional vector by using two stages of fully connected layers for information interaction between channels, and acquiring the attention weight of the channel by using a sigmoid activation function.

5. The method for detecting ships based on optical remote sensing images of perceptual vectors as claimed in claim 1, wherein the cross-entropy loss function is expressed as:

representing true values of the mask image pixel value, u _ij Representing single channel significant picture pixel values.

6. The method for detecting ships based on perception vectors according to claim 1 or 5, wherein weight factors for balancing the distribution of positive and negative samples are added to the cross entropy loss function to ensure the learning of the target area.

7. An optical remote sensing image ship detection system based on a perception vector, which is realized based on the method of claim 1 and comprises the following steps: a model acquisition module and an object detection module, wherein,

the model acquisition module is used for constructing a rotating target detection network model, wherein the rotating target detection model comprises: the system comprises a feature extraction module, a feature fusion module, a plurality of edge perception vector detection heads and an attention mechanism module, wherein the feature extraction module is used for extracting trunk features of an input optical remote sensing image, the feature fusion module is used for fusing a trunk feature map, the edge perception vector detection heads are used for learning a target enclosure frame of the input optical remote sensing image, and the attention mechanism module is arranged between the feature fusion module and the edge perception vector detection heads and used for guiding the learning of the target enclosure frame;

the target detection module is used for training and optimizing the rotating target detection network model by using the optical remote sensing image sample data; and extracting the target in the optical remote sensing image to be detected by using the trained and optimized rotating target detection network model.