CN111476159A

CN111476159A - Method and device for training and detecting detection model based on double-angle regression

Info

Publication number: CN111476159A
Application number: CN202010264623.5A
Authority: CN
Inventors: 屈桢深; 赵鹏博; 关秋雨; 谢伟男
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2020-07-31
Anticipated expiration: 2040-04-07
Also published as: CN111476159B

Abstract

The invention provides a method and a device for training and detecting a detection model based on double-angle regression, which relate to the field of ship target detection and comprise the following steps: acquiring a training set image containing labeling information; inputting the training set image into a detection model, determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, and the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted offset point of the detection frame; determining the value of the loss function according to the actual coordinate data and the predicted coordinate data; and adjusting parameters of the detection model according to the value of the loss function until a convergence condition is met, and finishing the training of the detection model. The invention extracts the predicted coordinate data through the detection model, reflects the key points of the target, solves the problems of missed detection and false detection caused by large size change of the ship target and different ship density degrees, and simultaneously meets the real-time requirement.

Description

Method and device for training and detecting detection model based on double-angle regression

Technical Field

The invention relates to the field of ship target detection, in particular to a method and a device for training and detecting a detection model based on double-angle regression.

Background

With the development of the times, specific ship targets are rapidly and accurately detected and identified in remote sensing images, and the method has great significance in the fields of wrecking ship rescue, military ship target monitoring, battle-time warship target striking and the like.

At present, in the field of ship detection, a common detection method is an electromagnetic wave echo positioning detection method, such as a shore-based radar and an air-based radar. The principle of the ship target detection method based on the radar signals is that ship positions are obtained after a series of processing methods such as signal analysis, enhancement and the like are carried out on radar signal echoes. However, the detection method is extremely easily influenced by complex weather and hydrological conditions on the sea surface, and meanwhile, the detection method cannot carry out long-distance or ultra-long-distance ship detection due to the limitation of the curvature of the earth, and cannot be used for ship rescue or ordinary long-distance monitoring tasks.

Another common detection method is the detection of vessel targets based on optical remote sensing images. The method has the advantages of long action distance and high detection precision, belongs to a passive detection mode, and does not need additional equipment, so the method is widely applied. In the method, the common technology is to utilize the characteristic extraction to process the optical remote sensing image so as to achieve the aim of target detection. The method for processing the optical remote sensing image by utilizing the feature extraction mostly depends on the accuracy degree of the feature processing part, and the accurate features of the target are difficult to obtain due to the difference of the application scene and the target, so that the accuracy is greatly influenced, and meanwhile, the robustness of the method is not high enough.

In recent years, in methods for detecting a ship target based on an optical remote sensing image, a deep learning algorithm is gradually applied, wherein the deep learning algorithm detection based on the optical remote sensing image is an emerging ship target detection method which is gradually developed in recent years. The principle is that the existing image data with labels is used for training, and a target detection model generated by training is used for detecting a shot image, so that an interested target in the shot image is found. The deep learning method has a series of advantages of small interference influence, wide scene application range, low cost and the like. However, in the existing deep learning algorithm based on the optical remote sensing image, only fast R-CNN is often reproduced as a detection network, the original method is not improved in network structure, loss function and the like, the problems of missed detection and false detection and time delay existing in the detection of a ship target cannot be fundamentally solved, and the trained model has a great possibility of over-fitting.

Disclosure of Invention

In view of the above, the present invention is directed to solving, at least to some extent, the technical problems in the related art. In order to achieve the above object, the present invention provides a method for training a detection model based on dual-angle regression, comprising the following steps:

acquiring a training set image containing marking information, wherein the marking information comprises actual coordinate data of a plurality of ship targets;

inputting the training set image into the detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, and the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point of the detection frame;

determining a value of a loss function according to the actual coordinate data and the predicted coordinate data;

and adjusting parameters of the detection model according to the value of the loss function until a convergence condition is met, and finishing the training of the detection model.

Therefore, the method extracts the predicted coordinate data of the detection frame through the detection model to obtain a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge and diagonal length ratio and a predicted bias point, reflects the key point information of the target, and solves the problem of inaccurate prediction of the detection frame caused by long and narrow ship target, inclination angle and large size change through extracting various detection frame data; meanwhile, the detection model is utilized to find the predicted coordinate data of the detection frame, so that the defect of low detection speed is avoided. In conclusion, the method for training the detection model based on the double-angle regression extracts various detection frame data through the detection model, so that the accuracy of the detection target is greatly improved, the real-time detection of the ship target is achieved at the detection speed, and the requirements of high accuracy and strong real-time property are met.

Further, the training set image comprises a plurality of image sub-blocks; the method for acquiring the training set image containing the labeling information comprises the following steps:

dividing a remote sensing satellite acquisition image into a plurality of image sub-blocks with fixed resolution, and converting the labeling information of the remote sensing satellite acquisition image to the corresponding plurality of image sub-blocks.

Therefore, the image sub-blocks with fixed resolution are obtained through image segmentation, and the accuracy and the speed of model training are improved.

Further, in the step of dividing the remote sensing satellite acquisition image into a plurality of image sub-blocks with fixed resolution, the step of determining the fixed resolution includes the following steps:

determining the fixed resolution according to the number of the segmentation parts of the image acquired by the remote sensing satellite;

or determining the fixed resolution according to the target integrity of the ship target in the remote sensing satellite acquisition image;

or determining the fixed resolution according to the sparsity of the ship target.

Therefore, in consideration of the fact that the resolution of the shot optical photo with the ship target is greatly changed, the resolution can be changed from 1000 to 30000, and in order to deal with the situation, the method adopts three methods to determine the fixed resolution of the image subblocks, so that the sparsity is better unified, and the detection accuracy is improved.

Further, the detection model comprises a feature extraction network and a feature restoration network; the method for inputting the training set images into the detection model comprises the following steps:

inputting the training set images into the feature extraction network, and determining a feature extraction graph, wherein the feature extraction graph comprises initial feature data of the detection frame;

inputting the feature extraction map into the feature restoration network, and determining a feature restoration map, wherein the feature restoration map includes the predicted coordinate data of the detection frame.

Therefore, by setting the two-stage network, the defect of complex network structure is avoided, the final characteristics are effectively extracted by utilizing the extraction and reduction of the characteristics, and the detection accuracy is ensured.

Further, the feature extraction network sequentially includes: the system comprises a down-sampling convolutional layer and a fusion convolutional layer, wherein the down-sampling convolutional layer is used for performing down-sampling processing on the training set image, and the fusion convolutional layer is used for performing down-sampling processing on an output characteristic diagram of the down-sampling convolutional layer and performing characteristic mixing.

Therefore, all the feature extraction operations are completed by using convolution kernel operation, a multilayer convolution network is designed to effectively carry out down-sampling processing on the image, and the fusion convolution layer is arranged to extract key feature information, so that accurate target detection is facilitated.

Further, the fusion convolutional layer sequentially comprises: a first fused convolutional layer, a second fused convolutional layer, a third fused convolutional layer, and a fourth fused convolutional layer, wherein:

the first fusion convolutional layer comprises a first fusion sublayer and a second fusion sublayer, and the first fusion sublayer and the second fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; determining an output feature map of the first fused convolutional layer according to the output feature map of the downsampled convolutional layer and the output feature map of the second fused sublayer;

the second fusion convolutional layer comprises a third fusion sublayer, a fourth fusion sublayer and a fifth fusion sublayer, and the third fusion sublayer and the fourth fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the fifth fusion sublayer is used for performing down-sampling processing on the output feature map of the first fusion convolutional layer by using convolution operation; determining an output characteristic diagram of the second fused convolutional layer according to the output characteristic diagram of the fourth fused sublayer and the output characteristic diagram of the fifth fused sublayer;

the third fusion convolutional layer comprises a sixth fusion sublayer, a seventh fusion sublayer and an eighth fusion sublayer, and the sixth fusion sublayer and the seventh fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eighth fusion sublayer is configured to perform downsampling on the output feature map of the second fusion convolutional layer by using a convolution operation; determining an output characteristic diagram of the third fused convolutional layer according to the output characteristic diagram of the seventh fused sublayer and the output characteristic diagram of the eighth fused sublayer;

the fourth fused convolutional layer comprises a ninth fused sublayer, a tenth fused sublayer and an eleventh fused sublayer, wherein the ninth fused sublayer and the tenth fused sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eleventh fusion sublayer is used for performing downsampling processing on the output feature map of the third fusion convolutional layer by using convolution operation; and determining the output characteristic diagram of the fourth fused convolutional layer according to the output characteristic diagram of the tenth fused sublayer and the output characteristic diagram of the eleventh fused sublayer.

Therefore, each fusion convolution layer is provided with a plurality of fusion sublayers for doubling the height and width of the image and increasing the channel number of the image, and the feature is mixed by using a by-pass method, so that the correct feature information can be extracted.

Further, the feature reduction network sequentially includes: the device comprises an up-sampling convolutional layer, a fusion characteristic layer and an output characteristic layer, wherein the up-sampling convolutional layer is used for performing up-sampling processing, the fusion characteristic layer is used for performing down-sampling processing and performing characteristic mixing, the output characteristic layer is used for performing different convolution operations on an output characteristic diagram of the fusion characteristic layer and outputting different characteristic restoration diagrams.

Therefore, all the feature reduction operations are completed by using convolution kernel operations, a multilayer convolution network is designed to effectively carry out up-sampling processing on the image, and key feature information is reduced on the basis of a feature extraction image, so that accurate target detection is facilitated.

Further, the upsampled convolutional layer includes a first upsampled layer, a first interpolated layer, a second upsampled layer, a second interpolated layer, and a third upsampled layer, wherein: the first up-sampling layer is used for performing up-sampling processing on the feature extraction image by using deconvolution operation; the first interpolation layer is used for carrying out quadruple bilinear interpolation up-sampling operation on the output characteristic diagram of the first up-sampling layer; the second up-sampling layer is used for performing up-sampling processing on the output characteristic diagram of the first up-sampling layer by using a deconvolution operation; the second interpolation layer is used for performing double-time bilinear interpolation up-sampling operation on the output characteristic diagram of the second up-sampling layer; the third up-sampling layer is configured to perform up-sampling processing on the output feature map of the second up-sampling layer by using a deconvolution operation.

Therefore, the first up-sampling layer, the first interpolation layer, the second up-sampling layer, the second interpolation layer and the third up-sampling layer are arranged, the feature extraction graph is effectively restored, key feature information is restored, and accurate target detection is facilitated.

Further, the fused feature layer comprises a first fused feature layer, a first downsampling feature layer, a second downsampling feature layer and a second fused feature layer, and the first fused feature layer is used for fusing the output feature map of the first interpolation layer, the output feature map of the second interpolation layer and the output feature map of the third upsampling layer by an end-to-end method; the first downsampling feature layer refers to and carries out downsampling processing on the output feature map of the first fusion feature layer by using convolution operation; the second down-sampling feature layer refers to down-sampling the output feature map of the first down-sampling feature layer by using convolution operation; and the second fused feature layer reference is output after adding the output feature map of the first downsampled feature layer and the output feature map of the second downsampled feature layer.

Therefore, the first fusion feature layer, the first down-sampling feature layer, the second down-sampling feature layer and the second fusion feature layer are arranged, feature fusion is effectively carried out on the feature extraction graph, key feature information is fused, and accurate target detection is facilitated.

The output of the feature restoration graph comprises that the output feature layer performs five different convolution operations on the output feature graph of the fusion feature layer to obtain five feature restoration graphs, the five feature restoration graphs comprise a Heatmap output feature graph, an Angle1Angle2 output feature graph, a ShortSide output feature graph, a Short L ongRatio output feature graph and a PointReg output feature graph, wherein the Heatmap output feature graph comprises data of the predicted central point, the Angle1Angle2 output feature graph comprises data of the predicted first diagonal Angle and data of the predicted second diagonal Angle, the ShortSide output feature graph is data of the predicted Short side length, the Short L ongRatio output feature graph is data of the ratio of the predicted Short side to the diagonal length, and the PointReg output feature graph is data of the predicted bias point.

Five different output characteristic graphs are obtained by the mode, and each output represents different detection frame data information, so that the detection accuracy is effectively improved.

The second purpose of the invention is to provide a test model training device based on double-angle regression, which extracts test frame data through a test model, reflects key points of a target, solves the problem of missed test and false test caused by large size change of a ship target and different ship density degrees, and simultaneously meets the real-time requirement.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a test model training device based on biangular regression comprises:

the acquisition unit is used for acquiring a training set image containing marking information, wherein the marking information comprises actual coordinate data of a plurality of ship targets;

the processing unit is used for inputting the training set images into the detection model and determining predicted coordinate data of each detection frame, wherein the predicted coordinate data comprise a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted offset point of the detection frame; the system is also used for determining the value of a loss function according to the actual coordinate data and the predicted coordinate data;

and the training unit is used for adjusting the parameters of the detection model according to the value of the loss function until a convergence condition is met, and finishing the training of the detection model.

Compared with the prior art, the detection model training device based on the dihedral regression and the detection model training method based on the dihedral regression have the same beneficial effects, and are not repeated herein.

The third purpose of the invention is to provide a detection method based on double-angle regression, which extracts detection frame data through the detection model, reflects key points of a target, solves the problem of missed detection and false detection caused by large size change of a ship target and different ship density degrees, and simultaneously meets the real-time requirement.

a detection method based on dual-angle regression comprises the following steps:

acquiring an image to be detected, and preprocessing the image to be detected to obtain a plurality of image sub-blocks;

inputting a plurality of image sub-blocks into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, the predicted coordinate data comprises a predicted central point of the detection frame, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point, and the detection model is obtained by adopting the double-angle regression-based detection model training method for training;

determining the collection position of the detection frame on each image sub-block according to the predicted coordinate data;

and determining an optimal detection frame according to the acquisition position.

Therefore, the method extracts the predicted coordinate data of the detection frame through the detection model, reflects the key point information of the target, solves the problem of inaccurate prediction of the detection frame caused by long and narrow ship targets, large inclination angle and large size change through extracting various detection frame data, finds the predicted coordinate data of the detection frame through the detection model, and avoids the defect of low detection speed. In addition, the invention screens the acquisition position, thereby increasing the accurate precision. In conclusion, the detection method based on the double-angle regression provided by the invention extracts various detection frame data through the detection model, so that the accuracy of the detection target is greatly improved, the real-time detection of the ship target is achieved at the detection speed, and the requirements of high accuracy and strong real-time property are met.

Further, the determining the acquisition position of the detection frame on each image sub-block according to the predicted coordinate data comprises: determining the detection position coordinates of the detection frame in the output feature map of the fusion feature layer of the detection model according to the predicted coordinate data; and determining the acquisition position coordinate of the detection frame on each image sub-block according to the detection position coordinate.

Therefore, the positions of the ship targets on the image sub-blocks are effectively determined by using a Heatmap output feature map, an Angle1Angle2 output feature map, a ShortSide output feature map, a Short L ongRatio output feature map, a predicted central point in a PointReg output feature map, an Angle formed by the target detection frame and the image x-axis in the positive direction, the length of the Short edge of the target detection frame, the ratio of the Short edge of the target detection frame to the diagonal Short edge to the diagonal length and the offset of the target central point.

Further, the detecting position coordinates include coordinates of a first detecting vertex, coordinates of a second detecting vertex, coordinates of a third detecting vertex, and coordinates of a fourth detecting vertex, and the determining, according to the predicted coordinate data, the detecting position coordinates at which the detecting frame is located in the output feature map of the fused feature layer of the detection model includes:

judging whether the predicted first diagonal angle and the predicted second diagonal angle meet preset angle conditions or not;

and if so, determining the coordinates of the first detection vertex, the second detection vertex, the third detection vertex and the fourth detection vertex according to the predicted coordinate data.

Therefore, after the center point of the ship target is determined, the coordinates of the four corner points A, B, C, D of the predicted inclined rectangle can be obtained according to the angle formed by the target detection frame and the X-axis positive direction of the image, the length of the short side of the target detection frame, the proportion of the short side of the target detection frame to the short side of the diagonal line to the length of the diagonal line and the offset of the target center point.

Further, the detection position coordinates comprise coordinates of a first detection vertex, coordinates of a second detection vertex, coordinates of a third detection vertex and coordinates of a fourth detection vertex, and the acquisition position coordinates comprise coordinates of a first acquisition vertex, coordinates of a second acquisition vertex, coordinates of a third acquisition vertex and coordinates of a fourth acquisition vertex; the determining, according to the detection position coordinates, the acquisition position coordinates where the detection frame is located on each of the image sub-blocks includes:

determining a first constant according to the ratio of the resolution of each image sub-block to the resolution of the output feature map of the fused feature layer;

and respectively multiplying the coordinates of the first detection vertex, the second detection vertex, the third detection vertex and the fourth detection vertex by the first constant to determine the corresponding coordinates of the first acquisition vertex, the second acquisition vertex, the third acquisition vertex and the fourth acquisition vertex.

Therefore, the predicted ship target is based on the fusion characteristic layer, and then all coordinates are multiplied by a first constant to obtain the coordinates of the ship target on the image sub-blocks after the remote sensing image is segmented.

Further, the determining the optimal detection frame according to the acquisition position includes:

determining coordinates of the detection frame on an original remote sensing satellite acquired image to which each image subblock belongs according to the name of each image subblock, wherein the name of each image subblock comprises image subblock coordinate information of each image subblock on the original remote sensing satellite acquired image;

and screening by using a non-maximum value inhibition method, and selecting the optimal detection frame, wherein the non-maximum value inhibition method is used for selecting the optimal detection frame according to the confidence score and the intersection ratio of the prediction frame.

Therefore, in order to find the detection frame which meets the best requirement and has the best detection effect, after the remote sensing image original image with all prediction results is obtained through synthesis, all the detection frames corresponding to the image are analyzed and screened. In the step, the invention uses non-maximum value inhibition as an analysis screening method, and effectively screens out an optimal detection frame.

The fourth purpose of the invention is to provide a detection device based on double-angle regression, which extracts detection frame data through a detection model, reflects key points of a target, solves the problems of missed detection and false detection caused by large size change of a ship target and different ship density degrees, and simultaneously meets the real-time requirement.

an acquisition unit: the method comprises the steps of obtaining an image to be detected;

a processing unit: the image preprocessing module is used for preprocessing the image to be detected to obtain an image subblock; the image subblocks are further used for inputting the image subblocks into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, the predicted coordinate data comprises a predicted central point of the detection frame, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point, and the detection model is obtained by training by adopting a detection model training method based on double-angle regression; the detection frame is used for acquiring the image sub-block, and the acquisition position of the detection frame on the image sub-block is determined according to the predicted coordinate data;

screening unit: for determining an optimal detection frame from the acquisition position

Compared with the prior art, the detection device based on the double-angle regression and the detection method based on the double-angle regression have the same beneficial effects, and are not repeated herein.

The fifth purpose of the invention is to provide a non-transitory computer-readable storage medium, which extracts detection frame data through a detection model, reflects key points of a target, solves the problem of missed detection and false detection caused by large size change of a ship target and different ship density degrees, and simultaneously meets the real-time requirement.

a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for training a bi-angular regression-based detection model as described above or carries out the method for bi-angular regression-based detection as described above.

The beneficial effects of the computer-readable storage medium and the detection method based on the dual-angle regression are the same as those of the prior art, and are not described herein again.

Drawings

FIG. 1 is a schematic flow chart of a method for training a detection model based on bi-angle regression according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a diagonal angle according to an embodiment of the present invention;

FIG. 3 is a flow chart of network input according to an embodiment of the present invention

FIG. 4 is a schematic structural diagram of a detection model according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a training apparatus for a test model based on bi-angle regression according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart of a detection method based on dual angle regression according to an embodiment of the present invention;

FIG. 7 is a schematic flow chart illustrating the determination of a model according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart illustrating a process for determining a location of an acquisition according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart illustrating a process for determining a vertex for collection according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart of a screening test frame according to an embodiment of the present invention;

FIG. 11 is a schematic diagram showing the relationship between the image sub-blocks and the remote sensing images according to the embodiment of the present invention;

fig. 12 is a schematic structural diagram of a detection apparatus based on dual angle regression according to an embodiment of the present invention.

Detailed Description

Embodiments in accordance with the present invention will now be described in detail with reference to the drawings, wherein like reference numerals refer to the same or similar elements throughout the different views unless otherwise specified. It is to be noted that the embodiments described in the following exemplary embodiments do not represent all embodiments of the present invention. They are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the claims, and the scope of the present disclosure is not limited in these respects. Features of the various embodiments of the invention may be combined with each other without departing from the scope of the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the detection of a ship target, the identification and detection by using an optical remote sensing image is a trend. In the prior art, there are two main methods for detecting a ship target by using an optical remote sensing image. One is to perform the algorithm processing of feature extraction on the optical remote sensing image, and most of the methods rely on the accuracy of the feature processing part. Due to the difference between the application scene and the target, the accurate characteristics of the target are difficult to obtain, so the detection accuracy of the method is greatly influenced, and meanwhile, the robustness of the method is not high enough. And the other method is that the optical remote sensing image is processed by using a deep learning algorithm, the existing labeled image data is used for training, and the shot image is detected by using a target detection model generated by training, so that the target of interest in the shot image is found. The deep learning method has a series of advantages of small interference influence, wide scene application range, low cost and the like.

However, in the existing deep learning algorithm detection based on the optical remote sensing image, due to the problems of complex network structure, inaccurate feature extraction and the like, the phenomenon of missed detection and false detection exists, and meanwhile, real-time detection is difficult to perform, so that in order to realize high-accuracy and strong-instantaneity ship target detection, a detection method based on double-angle regression needs to be proposed urgently.

The embodiment of the invention provides a detection model training method based on double-angle regression. Fig. 1 is a schematic flow chart of a method for training a detection model based on bi-angle regression according to an embodiment of the present invention, including steps S101 to S104, where:

in step S101, a training set image including annotation information is obtained, where the annotation information includes actual coordinate data of a plurality of ship targets. Therefore, the training sample data is effectively acquired.

In step S102, the training set image is input into the detection model, and the predicted coordinate data of the detection box is determined, where the detection box is used to select the predicted ship target, and the predicted coordinate data includes the predicted central point of the detection box, the predicted first diagonal angle, the predicted second diagonal angle, the predicted short edge length, the predicted short edge-diagonal length ratio, and the predicted offset point. Thus, by extracting a plurality of kinds of detection frame data, the key point information of the target is reflected.

In step S103, the value of the loss function is determined from the actual coordinate data and the predicted coordinate data. And a proper loss function is selected to ensure the training accuracy.

In step S104, parameters of the detection model are adjusted according to the value of the loss function until the convergence condition is satisfied, and training of the detection model is completed. Therefore, the method extracts the predicted coordinate data of the detection frame through the detection model to obtain a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge and diagonal length ratio and a predicted bias point, reflects the key point information of the target, and solves the problem of inaccurate prediction of the detection frame caused by long and narrow ship target, inclination angle and large size change through extracting various detection frame data; meanwhile, the detection model is utilized to find the predicted coordinate data of the detection frame, so that the defect of low detection speed is avoided. In conclusion, the method for training the detection model based on the double-angle regression extracts various detection frame data through the detection model, so that the accuracy of the detection target is greatly improved, the real-time detection of the ship target is achieved at the detection speed, and the requirements of high accuracy and strong real-time property are met.

In the following description, the predicted center point is the center point of the predicted detection frame, the predicted first diagonal angle is the angle formed by one diagonal line of the predicted detection frame and the x-axis, the predicted first diagonal angle is the angle formed by the other diagonal line of the predicted detection frame and the x-axis, the predicted short side length is the predicted short side length of the detection frame, and the predicted offset point is the deviation between the predicted center point and the actual mapping center point, referring to fig. 2, fig. 2 is a schematic diagram of the diagonal angles of the embodiment of the present invention, wherein α is the predicted first diagonal angle, β is the predicted second diagonal angle, and generally, the predicted second diagonal angle is greater than the predicted first diagonal angle.

Optionally, the training set image includes a plurality of image sub-blocks, and acquiring the plurality of image sub-blocks includes: the method comprises the steps of dividing an image acquired by the remote sensing satellite into a plurality of image sub-blocks with fixed resolution, and converting annotation information of the image acquired by the remote sensing satellite into the corresponding image sub-blocks. Therefore, the image subblocks are divided from the multiple remote sensing satellite collected images, so that the detection accuracy and the detection speed of the network to the images are conveniently improved, and the subsequent high-accuracy and strong-real-time detection target is facilitated.

Optionally, determining the fixed resolution includes three methods:

in the first method, a fixed resolution is determined based on the number of segments of the image acquired by the remote sensing satellite.

Since the convolutional neural network has a faster processing speed when processing an image with a smaller resolution, but if the image segmentation size is too small, one satellite remote sensing image is segmented into a larger number of parts, and the relationship between the number of segmented parts and the resolution of the segmented image is as follows (considering only square segmentation):

wherein H and W represent the height and width of the satellite remote sensing image, H and W represent the width and height of the small image after segmentation, and S represents the width of the overlap between each segmentation image. According to the formula and specific experiments, the image segmentation resolution with the optimal detection precision and the balance between detection speeds is 1024 x 1024.

In the second method, the fixed resolution is determined according to the target integrity of the ship target in the remote sensing satellite acquisition image.

The ship target on the remote sensing satellite image has the phenomena that a target and an image boundary form a certain included angle, the size of the target is greatly changed, the target sparsity in different images is greatly changed, and the like. If a smaller image segmentation resolution ratio is adopted, the phenomenon that the ship target is cracked can be caused. Namely, the phenomenon that the ship target which happens to be positioned at the segmentation edge is segmented into two parts due to the fact that the segmentation resolution is too small, so that false detection is caused easily occurs. In this case, the present invention has been experimented with that the target integrity can be best maintained using image segmentation resolution of 1024 x 1024 or more.

In the third method, the fixed resolution is determined according to the sparsity of the ship target.

Due to the large difference in sparsity of ship targets in different parts of the satellite captured image, there is 3 ＊ 10 in the a region of the image, for example²A number of targets, but only a few vessel targets are present in the B region of the image, and no targets are present at all in the C region of the image. The characteristic that the sample sparsity distribution is not uniform easily causes the reduction of network robustness, and is particularly obvious when large-resolution segmentation is used. In this case, the invention has been experimented, and the sparsity can be better unified when the resolution of the image segmentation is 1024 × 1024 or less.

In the embodiment of the invention, the detection model comprises a feature extraction network and a feature restoration network; fig. 3 is a flowchart illustrating a network input according to an embodiment of the present invention, wherein step S102 includes step S1021 and step S1022.

In step S1021, the training set image is input to a feature extraction network to obtain a feature extraction graph, where the feature extraction graph includes the initial feature data of the detection frame. Therefore, the features of the image are effectively extracted, and data redundancy is avoided.

In step S1022, the feature extraction map is input to a feature restoration network to obtain a feature restoration map, where the feature restoration map includes the predicted coordinate data of the detection frame. Therefore, by setting the two-stage network, the defect of complex network structure is avoided, the final characteristics are effectively extracted by utilizing the extraction and reduction of the characteristics, and the detection accuracy is ensured.

Optionally, the feature extraction network sequentially includes: the system comprises a down-sampling convolutional layer and a fusion convolutional layer, wherein the down-sampling convolutional layer is used for performing down-sampling processing on a training set image, and the fusion convolutional layer is used for performing down-sampling processing on an output characteristic diagram of the down-sampling convolutional layer and performing characteristic mixing. Therefore, all the feature extraction operations are completed by using convolution kernel operation, a multilayer convolution network is designed to effectively carry out down-sampling processing on the image, and the fusion convolution layer is arranged to extract key feature information, so that accurate target detection is facilitated.

Optionally, the fusion convolutional layer comprises in sequence: a first fused convolutional layer, a second fused convolutional layer, a third fused convolutional layer, and a fourth fused convolutional layer, wherein:

the first fusion convolutional layer comprises a first fusion sublayer and a second fusion sublayer, and the first fusion sublayer and the second fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; and determining the output characteristic diagram of the first fused convolutional layer according to the output characteristic diagram of the downsampled convolutional layer and the output characteristic diagram of the second fused sublayer. Therefore, a plurality of fusion sublayers are arranged for reducing the height and width of the image by one time and increasing the channel number of the image, and the characteristics are mixed by using a by-pass method, so that the correct characteristic information can be extracted.

The second fusion convolutional layer comprises a third fusion sublayer, a fourth fusion sublayer and a fifth fusion sublayer, and the third fusion sublayer and the fourth fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the fifth fusion sublayer is used for performing down-sampling processing on the output characteristic diagram of the first fusion convolutional layer by utilizing convolution operation; and determining the output characteristic diagram of the second fused convolutional layer according to the output characteristic diagram of the fourth fused sublayer and the output characteristic diagram of the fifth fused sublayer. Therefore, a plurality of fusion sublayers are arranged for reducing the height and width of the image by one time and increasing the channel number of the image, and the characteristics are mixed by using a by-pass method, so that the correct characteristic information can be extracted.

The third fusion convolutional layer comprises a sixth fusion sublayer, a seventh fusion sublayer and an eighth fusion sublayer, and the sixth fusion sublayer and the seventh fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eighth fusion sublayer is used for performing downsampling processing on the output characteristic diagram of the second fusion convolutional layer by utilizing convolution operation; and determining the output characteristic diagram of the third fused convolutional layer according to the output characteristic diagram of the seventh fused sublayer and the output characteristic diagram of the eighth fused sublayer. Therefore, a plurality of fusion sublayers are arranged for reducing the height and width of the image by one time and increasing the channel number of the image, and the characteristics are mixed by using a by-pass method, so that the correct characteristic information can be extracted.

The fourth fused convolutional layer comprises a ninth fused sublayer, a tenth fused sublayer and an eleventh fused sublayer, wherein the ninth fused sublayer and the tenth fused sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eleventh fusion sublayer is used for performing down-sampling processing on the output characteristic diagram of the third fusion convolutional layer by utilizing convolution operation; and determining the output characteristic diagram of the fourth fused convolutional layer according to the output characteristic diagram of the tenth fused sublayer and the output characteristic diagram of the eleventh fused sublayer. Therefore, a plurality of fusion sublayers are arranged for reducing the height and width of the image by one time and increasing the channel number of the image, and the characteristics are mixed by using a by-pass method, so that the correct characteristic information can be extracted.

Optionally, the downsampled convolutional layer sequentially comprises a first downsampling layer and a second downsampling layer, thereby sequentially performing effective downsampling operation.

In the embodiment of the present invention, step S1021 includes the following specific steps:

inputting the training set image into a first down-sampling layer, and performing down-sampling processing on the training set image by the first down-sampling layer through convolution and utilizing a first convolution kernel to obtain an output characteristic diagram of the first down-sampling layer;

inputting the output characteristic diagram of the first down-sampling layer into a second down-sampling layer, and performing down-sampling processing on the second down-sampling layer by utilizing a second convolution kernel through convolution to obtain the output characteristic diagram of the second down-sampling layer;

inputting the output characteristic diagram of the second downsampling layer into the first fused convolutional layer, carrying out downsampling processing on the output characteristic diagram of the second downsampling layer by the first fused convolutional layer through convolution by using a third convolution core, and carrying out characteristic mixing to obtain the output characteristic diagram of the first fused convolutional layer;

inputting the output characteristic diagram of the first fused convolutional layer into a second fused convolutional layer, performing convolution on the second fused convolutional layer, performing down-sampling on the output characteristic diagram of the first fused convolutional layer by using a fourth convolution core, and performing characteristic mixing to obtain an output characteristic diagram of the second fused convolutional layer;

inputting the output characteristic diagram of the second fused convolutional layer into a third fused convolutional layer, performing down sampling on the output characteristic diagram of the second fused convolutional layer by utilizing a fifth convolutional core through convolution of the third fused convolutional layer, and performing characteristic mixing to obtain an output characteristic diagram of the third fused convolutional layer;

and inputting the output characteristic diagram of the third fused convolutional layer into a fourth fused convolutional layer, performing convolution on the fourth fused convolutional layer, performing down-sampling on the output characteristic diagram of the third fused convolutional layer by using a sixth convolution core, and performing characteristic mixing to obtain an output characteristic diagram of the fourth fused convolutional layer, wherein the characteristic output diagram of the fourth fused convolutional layer is a characteristic extraction diagram. Therefore, all the feature extraction operations are completed by using convolution kernel operation, a multilayer convolution network is designed to effectively carry out down-sampling processing on the image, and key feature information is extracted, so that accurate target detection is facilitated.

Optionally, the feature reduction network sequentially includes: the device comprises an up-sampling convolutional layer, a fusion characteristic layer and an output characteristic layer, wherein the up-sampling convolutional layer is used for performing up-sampling processing, the fusion characteristic layer is used for performing down-sampling processing and performing characteristic mixing, and the output characteristic layer is used for performing different convolution operations on an output characteristic diagram of the fusion characteristic layer and outputting different characteristic restoration diagrams. Therefore, all the feature reduction operations are completed by using convolution kernel operations, a multilayer convolution network is designed to effectively carry out up-sampling processing on the image, and key feature information is reduced on the basis of a feature extraction image, so that accurate target detection is facilitated.

Optionally, the upsampled convolutional layer comprises a first upsampled layer, a first interpolated layer, a second upsampled layer, a second interpolated layer, and a third upsampled layer, wherein: the first up-sampling layer is used for performing up-sampling processing on the feature extraction image by using deconvolution operation; the first interpolation layer is used for carrying out quadruple bilinear interpolation up-sampling operation on the output characteristic diagram of the first up-sampling layer; the second up-sampling layer is used for performing up-sampling processing on the output characteristic diagram of the first up-sampling layer by using deconvolution operation; the second interpolation layer is used for performing double-time bilinear interpolation up-sampling operation on the output characteristic graph of the second up-sampling layer; and the third up-sampling layer is used for performing up-sampling processing on the output characteristic diagram of the second up-sampling layer by using deconvolution operation. Therefore, the first up-sampling layer, the first interpolation layer, the second up-sampling layer, the second interpolation layer and the third up-sampling layer are arranged, the feature extraction graph is effectively restored, key feature information is restored, and accurate target detection is facilitated.

Optionally, the fused feature layer includes a first fused feature layer, a first downsampling feature layer, a second downsampling feature layer, and a second fused feature layer, where the first fused feature layer is used to fuse the output feature map of the first interpolation layer, the output feature map of the second interpolation layer, and the output feature map of the third upsampling layer by an end-to-end method; the first downsampling feature layer refers to and carries out downsampling processing on the output feature map of the first fusion feature layer by using convolution operation; the second down-sampling feature layer refers to and carries out down-sampling processing on the output feature map of the first down-sampling feature layer by utilizing convolution operation; and the second fused feature layer is output after adding the output feature map of the first downsampling feature layer and the output feature map of the second downsampling feature layer. Therefore, the first fusion feature layer, the first down-sampling feature layer, the second down-sampling feature layer and the second fusion feature layer are arranged, feature fusion is effectively carried out on the feature extraction graph, key feature information is fused, and accurate target detection is facilitated.

In the embodiment of the present invention, step S1022 includes the following specific steps:

inputting the feature extraction image into a first up-sampling layer, setting a twelfth step length by the first up-sampling layer through a seventh convolution kernel, and performing deconvolution up-sampling processing on the feature extraction image to obtain an output feature image of the first up-sampling layer;

inputting the output characteristic diagram of the first up-sampling layer into a first interpolation layer, and performing four-time bilinear interpolation up-sampling operation on the output characteristic diagram of the first up-sampling layer by the first interpolation layer to obtain the output characteristic diagram of the first interpolation layer;

inputting the output characteristic diagram of the first up-sampling layer into a second up-sampling layer, enabling the second up-sampling layer to pass through an eighth convolution kernel and set a thirteenth step length, and performing deconvolution up-sampling processing on the output characteristic diagram of the first up-sampling layer to obtain an output characteristic diagram of the second up-sampling layer;

inputting the output characteristic diagram of the second up-sampling layer into a second interpolation layer, and performing double-time bilinear interpolation up-sampling operation on the output characteristic diagram of the second up-sampling layer by the second interpolation layer to obtain an output characteristic diagram of the second interpolation layer;

inputting the output characteristic diagram of the second up-sampling layer into a third up-sampling layer, enabling the third up-sampling layer to pass through a ninth convolution kernel and set a fourteenth step length, and performing deconvolution up-sampling processing on the output characteristic diagram of the second up-sampling layer to obtain an output characteristic diagram of the third up-sampling layer;

inputting the output characteristic diagram of the first interpolation layer, the output characteristic diagram of the second interpolation layer and the output characteristic diagram of the third up-sampling layer into a first fusion characteristic layer, and fusing the first fusion characteristic layer by a head-to-tail connection method to obtain the output characteristic diagram of the first fusion characteristic layer;

inputting the output feature map of the first fusion feature layer into a first down-sampling feature layer, wherein the first down-sampling feature layer performs down-sampling processing on the output feature map of the first fusion feature layer by utilizing convolution operation through a tenth convolution kernel and setting a fifteenth step length;

inputting the output characteristic diagram of the first down-sampling characteristic layer into a second down-sampling characteristic layer, wherein the second down-sampling characteristic layer obtains the output characteristic diagram of the second down-sampling characteristic layer by using the eleventh convolution kernel for the output characteristic diagram of the first down-sampling characteristic layer and setting a sixteenth step length;

inputting the output characteristic diagram of the first down-sampling characteristic layer and the output characteristic diagram of the second down-sampling characteristic layer into a second fusion characteristic layer, and adding the output characteristic diagram of the first down-sampling characteristic layer and the output characteristic diagram of the second down-sampling characteristic layer by the second fusion characteristic layer and then outputting to obtain an output characteristic diagram of the second fusion characteristic layer;

and performing five convolution operations on the output characteristic diagram of the second fusion characteristic layer to obtain five different final output characteristic diagrams. Therefore, all the feature reduction operations are completed by using convolution kernel operations, a multilayer convolution network is designed to effectively carry out up-sampling processing on the image, and key feature information is reduced on the basis of a feature extraction image, so that accurate target detection is facilitated.

Optionally, all of the convolutional layers are followed by a bn layer and a Relu layer. The bn layer uses translation and scaling transformation and rebalancing and rescaling transformation to ensure that the expressive power of the model does not degrade due to normalization, and the expression of the bn layer is as follows:

where μ and σ are the mean and standard deviation according to the image data characteristics, and g and b are the rebalancing parameters that require network learning.

The Relu layer is a nonlinear change whose functional expression is shown below, acting to mimic the nonlinear expression of neuronal signaling connections in the human brain.

Optionally, the outputting of the feature reduction image comprises performing five different convolution operations on the second fusion feature layer by the output feature layer to obtain five feature reduction images, wherein the five feature reduction images comprise a Heatmap output feature map, an Angle1Angle2 output feature map, a ShortSide output feature map, a Short L ongraph output feature map and a PointReg output feature map, the Heatmap output feature map comprises data of a predicted central point, the Angle1Angle2 output feature map comprises data of a predicted first diagonal Angle and data of a predicted second diagonal Angle, the ShortSide output feature map is data of a predicted Short edge length, the ShortSide 54 grito output feature map is data of a predicted Short edge to diagonal length ratio, the PointReg output feature map is data of a predicted bias point, and therefore, five different convolution kernels are used for the second fusion feature layer, five different activation functions are used for different parts to obtain five different parts of outputs, the five parts of the ShortSide output feature maps are respectively equal to the five different characteristics of the Heatmap output, the five different characteristics are respectively obtained by detecting the five different effective characteristic maps L, and the detection results are obtained by detecting the five different characteristics of the inspection results of the ShortSide output feature maps 2 and the PointReg L.

In the embodiment of the present invention, step S102210 includes the following specific steps:

and performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a twelfth convolution kernel, wherein the used activation function is a Sigmoid activation function, and the Heatmap output characteristic diagram is obtained. Wherein the Heatmap output feature map includes data of the predicted center point.

And performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a thirteenth convolution kernel, wherein the used activation function is a Relu activation function, and an Angle1Angle2 output characteristic diagram is obtained. Wherein the Angle1Angle2 output feature map comprises data predicting a first diagonal Angle and data predicting a second diagonal Angle.

And performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a fourteenth convolution kernel, wherein the used activation function is a Relu activation function, and obtaining a ShortSide output characteristic diagram. Wherein, the ShortSide output characteristic diagram is data of the predicted short edge length.

And performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a fifteenth convolution kernel, wherein the used activation function is a Sigmoid activation function, and a Short L opengratio output characteristic diagram is obtained.

And performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a sixteenth convolution kernel, wherein the used activation function is a Relu activation function, and a PointReg output characteristic diagram is obtained. Wherein, the PointReg output characteristic diagram is data of a predicted bias point.

The activation functions used by the parts of Heatmap and Short L only are Sigmoid activation functions, and the activation functions used by the parts of Angle1Angle2, Short side and PointReg are Relu activation functions.

The Sigmoid activation function is a mapping function for mapping a real number to a (0,1) interval, and its specific expression is as follows:

in a specific embodiment of the present invention, referring to fig. 4, fig. 4 is a structure diagram of a detection model according to an embodiment of the present invention, and a specific structure of the detection model is described by changing ResNet18 into an example, the feature extraction network includes an input layer I, C1, a C2 layer, a L a layer L, a L1 a layer L, a L a layer 363 a layer L, a L a layer L, a L5 a layer L, a L a layer L, a sampling rolling layer including a C L layer, a fusion rolling layer including a L a layer L, a L a layer L, a layer L a layer.

The layer C1 is a first down-sampling layer, the layer C2 is a second down-sampling layer, the layer L eye 11 is a first fusion sublayer, the layer L eye 12 is a second fusion sublayer, the layer L0 eye 21 is a third fusion sublayer, the layer L eye 22 is a fourth fusion sublayer, the layer L eye 23 is a fifth fusion sublayer, the layer L eye 31 is a sixth fusion sublayer, the layer L eye 32 is a seventh fusion sublayer, the layer L eye 33 is an eighth fusion sublayer, the layer L eye 41 is a ninth fusion sublayer, the layer L eye 42 is a tenth fusion sublayer, and the layer L eye 43 is an eleventh fusion sublayer.

The input layer I is an input image sub-block, and the size of the input image is 3 × 1024. Where 3 is the number of channels of the image, the first 1024 is the height of the image and the second 1024 is the width of the image. The following description of the channels, heights, and widths of the images and feature maps will be in the order described above, and will not be further described.

In the embodiment of the invention, all the feature extraction and feature reduction operations are completed by using the convolution kernel operation. The convolution kernel is of a specific form C H W, where C represents the number of convolution kernels used and H and W represent the size of the convolution kernels. The contents inside the convolution kernels are parameters, which are updated and determined by automatic back propagation of the neural network, so that the parameter values at different positions between the convolution kernels are not related to the parameters zhi3 between different convolution kernels at all. In the following description of this patent, all convolution kernels are described in the form of H × W, and the number of convolution kernels is the same as the number of channels of the convolution layer and the feature map corresponding to the operation portion.

The output feature map of the C1 layer is a down-sampling feature map obtained by down-sampling operation, which is used to reduce the height and width of the image by one time and increase the number of channels of the image. The first convolution kernel used in the present invention is a 5 × 5 convolution kernel, and the step size is set to 2, so that the output characteristic diagram of C1 layers with a size of 64 × 512 is finally obtained.

The output characteristic map of the C2 layer is a characteristic map obtained by a down-sampling operation, and is used for reducing the height and width of the image by one time. The second convolution kernel used in the present invention is a 5 × 5 convolution kernel, and the step size is set to 2, so that the output characteristic map of C2 layers with a size of 64 × 256 is finally obtained.

The output characteristic diagram of L a layer 1 is a characteristic diagram obtained through ordinary sampling, and comprises L a layer 11 and L a layer 12 which are used for extracting and using a "by-pass bypass" method for mixing characteristics, the invention obtains an output characteristic diagram of L a layer 11 by setting a first step size to 1 through a third convolution kernel used for the output characteristic diagram of C2 to be 3a convolution kernel, obtains an output characteristic diagram of L a layer 12 by setting a second step size to 1 through a third convolution kernel used for the output characteristic diagram of L a layer 11 to be 3a convolution kernel, and finally obtains an output characteristic diagram of L a layer 1 by adding the output characteristic diagram of C2 and the output characteristic diagram of L a layer 12, wherein the output characteristic diagram of L a layer 11 is 256 a 256, the output characteristic diagram of 2a layer 695 a 256 is a 256 a large as an 8664 a 256 a large as an 8653.

The output feature map of the layer is a feature map obtained by down sampling, and comprises the layer, 0 layer and 1 layer, the feature map is used for reducing the height and width of the image by one time and increasing the number of channels of the image and is mixed by using a 'by-pass bypass' method, the invention obtains 3 layer by setting a third step size to be 2 through a fourth convolution kernel used for the output feature map of the 2 layer to be 3 convolution kernel, obtains an output feature map of the 5 layer by setting a fourth step size to be 1 through a fourth convolution kernel used for the output feature map of the 4 layer to be 3 convolution kernel, obtains an output feature map of the 2 layer by setting a fifth step size to be 2 through a fourth convolution kernel used for the output feature map of the 6 layer, and finally obtains the output feature map of the 128 layer by adding the output feature map of the 128 layer to the output feature map of the 128 layer, wherein the output feature map size of the 128 layer is 128 a feature size of the 128 layer, and the output feature map of the 128 layer is 128 output feature size of the 128 layer.

The output feature map of the layer is a feature map obtained by down sampling, and comprises an eye, 0 eye and 1 eye, the feature map is used for reducing the height and width of the image by one time and increasing the number of channels of the image and is mixed by using a 'by-pass bypass' method, a fifth convolution kernel used for the output feature map of the 2 eye is a3 x 3 convolution kernel, a sixth step size is set to be 2 to obtain an output feature map of the 3 eye, a fifth convolution kernel used for the output feature map of the 4 eye is a3 x 3 convolution kernel, a seventh step size is set to be 1 to obtain an output feature map of the 5 eye, a fifth convolution kernel used for the output feature map of the 6 eye is a3 x 3 convolution kernel, an eighth step size is set to be 2 to obtain an output feature map of the eye, finally the output feature map of the eye and the output feature map of the eye are added to obtain an output feature map of the 256, wherein the output feature map of the 256 is 64 as a three as.

The output feature map of the layer is a feature map obtained through down sampling, and comprises an eye, 0 eye and 1 eye, the feature map is used for reducing the height and width of the image by one time and increasing the number of channels of the image and is mixed by using a 'by-pass bypass' method, the sixth convolution kernel used by the 2 eye is a3 x 3 convolution kernel, a ninth step size is set to be 2 to obtain an output feature map of the 3 eye, the sixth convolution kernel used by the output feature map of the 4 eye is a3 x 3 convolution kernel, a tenth step size is set to be 1 to obtain an output feature map of the 5 eye, the sixth convolution kernel used by the output feature map of the 6 eye is a3 convolution kernel, an eleventh step size is set to be 2 to obtain the eye, and finally the output feature map of the eye and the output feature map of the eye are added to obtain the output feature map of the 5 eye, wherein the output feature map of the eye is 512 as large as the output feature map of the ad 512 as the ad 32, and the output feature map of the eye is 512 as the 128 as small as the ad 512 as the ad 32 as the ad 512 as the ad 128 as the ad 32 as the ad 512 as the output feature map of the ad 512 as the ad 32.

Referring to fig. 4, the specific structure of the feature reduction network is still described by taking ResNet18 as an example, and the feature reduction network includes: the multilayer film comprises an R1 layer, a B1 layer, an R2 layer, a B2 layer, an R3 layer, a T1 layer, an M1 layer, an M2 layer and a T2 layer. The up-sampling convolutional layer comprises an R1 layer, a B1 layer, an R2 layer, a B2 layer and an R3 layer, the fusion characteristic layer comprises a T1 layer, an M1 layer and an M2 layer, and the output characteristic layer comprises a T2 layer.

The device comprises a first sampling layer, a second interpolation layer, a first up-sampling layer, a first B1 layer, a second R2 layer, a second interpolation layer, a third R2 layer, a first fusion feature layer, a second down-sampling feature layer, a third R3 layer, a first T1 layer, a second fusion feature layer, a second M1 layer and a second T2 layer, wherein the first up-sampling layer is the R1 layer, the second up-sampling layer is the B1 layer, the second up-sampling layer is the R2 layer, the second up-sampling layer.

The output feature map of the R1 layer is a feature map obtained through an up-sampling operation, and is used for reducing the depth of an image and increasing the resolution. The seventh convolution kernel used in the present invention is a3 × 3 convolution kernel, the twelfth step is set to 1, and the input feature extraction graph is operated by deconvolution, resulting in a feature output graph of R1 layers with a size of 256 × 64.

The output feature map of the B1 layer is a feature map obtained by four times of bilinear interpolation up-sampling operation, and is used for improving the resolution of the feature map under the condition of keeping the depth of the feature map. The invention uses a quadruple bilinear interpolation method to operate the output characteristic diagram of the R1 layer, and obtains the output characteristic diagram of the B1 layer with the size of 256 × 256.

The output feature map of the R2 layer is a feature map obtained through an up-sampling operation, and is used for reducing the depth of an image and increasing the resolution. The eighth convolution kernel used in the present invention is a3 × 3 convolution kernel, the thirteenth step is set to 1, and the feature extraction maps of R1 layers are operated by deconvolution, resulting in feature extraction maps of R2 layers of size 128 × 128.

The output feature map of the B2 layer is a feature map obtained by twice bilinear interpolation up-sampling operation, and is used for improving the resolution of the feature map while maintaining the depth of the feature map. The invention uses a double-linear interpolation method to operate the output characteristic diagram of the R2 layer, and obtains the output characteristic diagram of the B2 layer with the size of 128 × 256.

The output feature map of the R3 layer is a feature map obtained through an up-sampling operation, and is used for reducing the depth of an image and increasing the resolution. The ninth convolution kernel used in the present invention is a3 × 3 convolution kernel, the fourteenth step is set to 1, and the deconvolution operation is performed on the output feature maps of the R2 layers, so as to finally obtain the output feature maps of the R3 layers with a size of 64 × 256.

The T1 layer is a fusion layer obtained by value-invariant fusion and is used for combining information in different feature maps under the condition that the resolution is kept unchanged. The output characteristic diagrams of the B1 layer, the B2 layer and the R3 layer are fused by the T1 layer through a head-to-tail connection method to obtain the output characteristic diagram of the T1 layer with the size of 448 x 256.

The output characteristic diagram of the M1 layer is a characteristic diagram obtained through ordinary sampling and is used for extracting more important information in the T1 layer and further removing interference. The tenth convolution kernel used in the present invention is a3 × 3 convolution kernel, the fifteenth step is set to 1, and the convolution operation is performed on the output feature maps of the T1 layers, so as to finally obtain the output feature maps of the M1 layers with the size of 128 × 256.

The output characteristic diagram of the M2 layer is a characteristic diagram obtained through ordinary sampling and used for extracting the position information of the ship target. The eleventh convolution kernel used in the invention is a3 × 3 convolution kernel, the sixteenth step is set to be 1, and the convolution operation is performed on the output characteristic diagrams of the M1 layers, so that the output characteristic diagrams of the M2 layers with the size of 1 × 256 are finally obtained.

The output characteristic diagram of the T2 layer is a position information reinforced characteristic diagram generated by broadcast summation, and the characteristic diagram reinforces the sensitivity to the position information of the ship target. The invention adds the output characteristic diagram of M1 with the output characteristic diagram of M2 layer by layer to obtain the output characteristic diagram of T2 with the size of 128 × 256.

The ship target positioning strengthening mechanism comprises an R1 layer, an R2 layer, an R3 layer, a B1 layer, a B2 layer and a T1 layer, wherein the R1 layer, the R2 layer, the R3 layer, the B1 layer, the B2 layer and the T1 layer form sub-pixel multi-scale feature fusion, and the T1 layer, the T2 layer, the M1 layer and the M2 layer form a ship target positioning.

In particular, in sub-pixel multi-scale feature fusion, bilinear interpolation is an extension of linear interpolation on a two-dimensional rectangular grid, and is used for interpolating bivariate functions (expressed as height and width in an image). The core idea is to perform linear interpolation in two directions to obtain the value of the target point from the values of the surrounding points. Since the B1 convolutional layer and the B2 convolutional layer are directly obtained by bilinear interpolation, the B1 and the B2 can directly obtain the target features hidden by the sub-pixel parts in the R1 convolutional layer and the R2 convolutional layer without adding extra calculation amount, and such features are mostly edge features of the target. Therefore, the design of the invention can better judge the edge of the target to be detected under the condition that the ship targets are more dense, and the edge is often the most important judgment standard for detecting the target by using the neural network method. Thus, in a neural network, a larger signature tends to be suitable for detecting smaller targets, and a smaller signature tends to be suitable for detecting larger targets. The invention utilizes the properties to construct a network structure, wherein B1 is a small feature map of edge feature enhancement, B2 is a medium feature map of edge feature enhancement, and R3 is a large feature map of edge feature sufficiency. After the three characteristic graphs are fused by a first-order joining method without data loss, the structure provided by the invention can better detect the condition of large size change.

Specifically, in a ship target positioning and strengthening mechanism, M1 is a convolution layer formed by extracting effective features of a fusion layer by using convolution operation, and plays roles in further refining the features and removing interference. The content contained in M2 is the location information of the ship target, i.e. the area corresponding to the predicted ship target on the input sub-block. In this characteristic diagram, the portion with the ship target approaches 1, and the portion without the ship target approaches 0. After M1 is added with M2 layer by layer, the method can play a role in emphasizing the part of the convolutional layer related to the ship target position information characteristics. Under the condition that the ship target to be detected is sparsely distributed or has uneven density, the ship target detection method still has a good detection effect. Compared with other existing algorithms, the structure designed aiming at the characteristics of large size change, extremely dense targets and different sparsity of the ship target provided by the invention can greatly improve the detection capability of the ship target with the characteristics. Meanwhile, the structure provided by the invention does not increase learning parameters and learning amount, does not influence the network training speed and the detection speed, and does not increase the overstaffed degree of the network.

The activation functions used by the Heatmap output characteristic diagram, the Short L output characteristic diagram are Sigmoid activation functions, and the activation functions used by the Angle1Angle2 output characteristic diagram, the Short side output characteristic diagram and the PointReg output characteristic diagram are Relu activation functions, wherein:

the Heatmap output feature map is used for predicting the positions of the centers of the targets of different classes, the specific expression form of the characteristic map is 1 × 256, and the meaning of the predicted value is the probability that a certain point is the center point of the ship target;

angle1Angle2 outputs a feature map for predicting the Angle between the two diagonal lines of the oblique rectangle where the labeling target is located and the positive direction of the x-axis of the image. The specific expression is 2 x 256, wherein the first layer of depth is a fixed predicted smaller angle and the second layer of depth is a fixed predicted larger angle;

the short side output feature graph is used for predicting the length of the short side of an oblique rectangle at the position of a marked target, the specific expression form of the short side output feature graph is 1 × 256, and the inclusion value means that the length of the short side of the ship target is generated on the 256 feature graphs through mapping on the height and the width;

the Short L opengratio output feature map is used for predicting the proportion of the length of the Short side and the length of the diagonal line of the oblique rectangular box marking the position of the target, and the specific form of the feature map is 1 × 256;

the PointReg output feature map is used for solving the problem that the position of a ship target center is shifted due to the fact that a Heatmap cannot predict sub-pixels, the predicted value of the PointReg output feature map is used for compensating the predicted center coordinates in the Heatmap, the predicted center coordinates are in a specific form of 2 x 256, the first layer of depth is the shift of the target center in the height direction, and the second layer of depth is the shift of the target center in the width direction.

Besides the network change of ResNet18, the invention also provides a network change of ResNet50, a network change of ResNet101, a network change of dla34 and a network change of Hourglass, and the improved structure is similar to the change of the network change of ResNet18, and is not repeated here. Aiming at the requirements of tasks on different detection precision and detection speed, the invention provides a partial network for variable feature extraction and feature restoration. The invention provides five detection models, which comprise: the corresponding relations of the five detection models, the detection precision and the detection speed are shown in the table 1,

table 1 shows five detection models and their corresponding detection accuracy and detection speed. In the table, the evaluation marking baseline of the detection precision is changed from ResNet18, the evaluation marking baseline of the detection speed is changed from Hourglass, and the detection speed and the detection precision of the above method based on the five different feature extraction/feature reduction are all superior to typical two-stage detection algorithms such as FasterR-CNN, FPN and the improved versions thereof.

Table 1 five detection models and corresponding detection precision and detection speed thereof

	ResNet18 improvement	ResNet50 improvement	ResNet101 improvement	dla34 modified	Hourglass modification
						Detection accuracy	--	Is higher than	Is very high	Is very high	Super high
Detecting speed	Very fast	Very quickly	Is quicker	Is quicker	--

In the following steps, for convenience of explanation, the method of using ResNet18 instead of the feature extraction/feature reduction portion is described in the present invention, and other methods similar to the method of using ResNet18 are used, and therefore, no additional explanation is provided.

In the embodiment of the present invention, step S103 includes the following specific steps:

when the central point of the detection frame is predicted, the first network prediction value is a Heatmap output characteristic diagram, the first label is a first probability diagram, and the loss function is L2 loss function, wherein in the first probability diagram, the coordinate of the central point of the ship target is 1, and the other parts are 0;

when the first diagonal Angle of the detection frame is predicted, the second network prediction value is an Angle1Angle2 output feature map, the second label is a value of an included Angle formed by one diagonal line of an inclined rectangle formed by coordinates of a ship target and the positive direction of the x axis of an image coordinate system, and the loss function is a L2 loss function;

when the second diagonal angle of the detection frame is predicted, the third network prediction value is a ShortSide output characteristic diagram, the third label is a value of an included angle formed by the other diagonal of an oblique rectangle formed by coordinates of a ship target and the positive direction of the x axis of the image coordinate system, and the loss function is L2 loss function;

when the length of the Short edge of the detection frame is predicted, the fourth network prediction value is a Short L ongRatio output characteristic diagram, the fourth label is the length of the Short edge of an inclined rectangle formed by coordinates of a ship target, and the loss function is L1 loss function;

when the bias point of the detection frame is predicted, the fifth network prediction value is a PointReg output characteristic diagram, the fifth label is a difference value between the center point coordinate of the ship target and the mapping center point coordinate of the training set image, and the loss function is L2 loss function, wherein the mapping center point coordinate is the center point coordinate obtained after the center point coordinate detected by the T2 layer is mapped to the training set image.

The output characteristic diagram of Heatmap is used for predicting the positions of the centers of targets of different classes, the output characteristic diagram of Angle1Angle2 is used for predicting the angles formed by two diagonals of an oblique rectangle at the position of a marking target and the positive direction of an x-axis of an image, the output characteristic diagram of ShortSide is used for predicting the length of the Short side of the oblique rectangle at the position of the marking target, the output characteristic diagram of Short L one is used for predicting the proportion formed by the length of the Short side of the oblique rectangle frame at the position of the marking target and the length of the diagonal, the output characteristic diagram of PointReg is used for solving the problem that the center of a ship target is displaced due to the fact that the output characteristic diagram of Heatmap cannot predict sub-pixels, and the predicted value is used for compensating the predicted center coordinates in the output characteristic diagram of Heatmap.

In the embodiment of the invention, the predicted central point of the detection box is a probability value, and the value range of the probability value is 0-1; the predicted detection frame bias point is in a sub-pixel level, and the value range of the predicted detection frame bias point is also between 0 and 1; the predicted first diagonal angle and the second diagonal angle of the detection frame are in a radian system, and the value range of the first diagonal angle and the second diagonal angle is between 0 and pi; the predicted short side length of the detection frame is in a pixel level, and the value range is not fixed; the ratio of the short edge of the detection frame to the short edge of the diagonal line to the length of the diagonal line is predicted to be unitless, and the value range of the ratio is 0-0.707.

According to the value range, the invention uses L1 loss function for the short side length of the detection frame, uses L2 loss function for the center point of the detection frame, the subpixel bias of the center point of the detection frame, the dual diagonal angle of the detection frame, and the ratio of the short side length of the detection frame to the short side length of the diagonal to the length of the diagonal, and the two loss function forms are as follows:

l1 loss function:

l2 loss function:

wherein y is the sum of L1 loss function and L2 loss function

Label and network prediction values, respectively.

And for the length of the short edge of the detection frame, the label is the length of the short edge of an oblique rectangle formed by the target coordinates of the ships in the training set, and the network predicted value is ShortSide.

For the central point of the detection frame, the label is a probability graph, the probability of the central point coordinate part corresponding to the ship target in the probability graph is 1, and the other parts are 0; the predicted value of the network is Heatmap.

For the detection frame center point sub-pixel offset, the label is the difference value between the ship target center point coordinate in the image sub-block and the ship target center point after the T2 convolution layer is mapped to the image sub-block (namely, the center point coordinate is multiplied by 4). Specifically, since there is no fractional number of pixels, one is with a center point coordinate of C (x) in the image sub-block_c,y_c) The ship target of (1), the coordinates in the T2 layer are

When x is_cAnd y_cIf the coordinate value of (2) is not a multiple of 4, the coordinate corresponding to the T2 convolution layer becomes

In this case, because the PointReg resolution of the prediction part is the same as that of the T2 convolution layer, the coordinates of the center point of the ship target after the prediction part is mapped back to the image sub-blocks are the coordinates

The precision of the fractional part is lost, and the lost part is a label for detecting the sub-pixel offset of the central point of the frame.

For the double diagonal angles of the detection frame, the label is α (shown in fig. 3) of the included Angle formed by two diagonals of an oblique rectangle formed by the target coordinates of the ships in the training set and the positive direction of the x-axis of the image coordinate system, and the predicted value is Angle1Angle 2.

And for the ratio of the Short edge of the detection frame to the diagonal length, the label is the ratio of the length of the Short edge of the oblique rectangle formed by the target coordinates of the ships in the training set to the length of the diagonal of the oblique rectangle, and the predicted value is Short L probability.

In the present example, 1.25 × 10 was used^-4The learning rate of the method is determined, 64 image sub-blocks are simultaneously input and trained during each training, the L1 loss function and the L2 loss function are adopted as evaluation criteria in the aspect of a loss function, and an Adam optimization method is adopted in the aspect of an optimization method.

According to the method described above, the training is iterated through 360 large loops (epoch), one large loop (epoch)Meaning that all images in the training set are trained once, wherein the learning rate is reduced to 1.25 × 10 at the 120 th major cycle^-5In the 200 th large cycle, the learning rate is reduced to 1.25 × 10^-6In the 300 th major cycle, the learning rate is reduced to 1.25 × 10^-7. In the training process, the generated network model is stored every time five major cycles of training are completed, and finally the model with the best performance effect on the test set is selected as the final model. The evaluation process of the performance effect in the test set is the same as the process described in the embodiment shown below.

In the embodiment of the present invention, in the training process of the detection model based on the dual angle regression, the computer adopts the following configuration: processor with a memory having a plurality of memory cells

Xeon Gold 6152, video card NVIDIA Telsa V10016GB by 4 and memory 256GB, thereby effectively completing training.

According to the test model training method based on the double-angle regression, provided by the invention, the test model is trained, the test frame data is extracted, the key points of the target are reflected, the problems of missed detection and false detection caused by large size change of the ship target and different ship density degrees are solved, and meanwhile, the real-time requirement is met.

Fig. 5 is a schematic structural diagram of a detection model training apparatus 800 based on bi-angle regression according to an embodiment of the present invention, which includes an obtaining unit 801, a processing unit 802, and a training unit 803.

An obtaining unit 801, configured to obtain a training set image including annotation information, where the annotation information includes actual coordinate data of a plurality of ship targets;

a processing unit 802, configured to input the training set image into the detection model, and determine predicted coordinate data of each detection frame, where the predicted coordinate data includes a predicted central point of the detection frame, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short side length, a predicted short side-to-diagonal length ratio, and a predicted offset point; the system is also used for determining the value of a loss function according to the actual coordinate data and the predicted coordinate data;

and a training unit 803, configured to adjust parameters of the detection model according to the value of the loss function until a convergence condition is satisfied, and complete training of the detection model.

According to the test model training device based on the double-angle regression, provided by the invention, the test frame data is extracted through the test model, the key points of the target are reflected, the problems of missed detection and false detection caused by large size change of the ship target and different ship density degrees are solved, and meanwhile, the real-time requirement is met.

In another embodiment of the present invention, a detection method based on dual angle regression is provided, and referring to fig. 6, fig. 6 is a schematic flow chart of the detection method based on dual angle regression according to the embodiment of the present invention, which includes steps S201 to S204.

In step S201, an image to be detected is obtained, and the image to be detected is preprocessed to obtain a plurality of image sub-blocks. Therefore, the image to be tested is effectively obtained, and the image to be tested is preprocessed to obtain the image sub-block. The image sub-blocks obtained after segmentation are favorable for being input into the detection model.

In step S202, the image sub-blocks are input to the detection model, and the predicted coordinate data of the detection frame is determined, the detection frame is used for framing the predicted ship target, the predicted coordinate data includes the predicted central point of the detection frame, the predicted first diagonal angle, the predicted second diagonal angle, the predicted short edge length, the predicted short edge-diagonal length ratio and the predicted offset point, and the detection model is obtained by training using the detection model training method based on the double-angle regression as described above.

In step S203, the capture position of the detection frame on the image sub-block is determined according to the predicted coordinate data. Therefore, the position of the detection frame on the image sub-block is effectively acquired.

In step S204, an optimal detection frame is determined according to the acquisition position. Therefore, the method extracts the predicted coordinate data of the detection frame through the detection model, reflects the key point information of the target, solves the problem of inaccurate prediction of the detection frame caused by long and narrow ship targets, large inclination angle and large size change through extracting various detection frame data, finds the predicted coordinate data of the detection frame through the detection model, and avoids the defect of low detection speed. In addition, the invention screens the acquisition position, thereby increasing the accurate precision. In conclusion, the detection method based on the double-angle regression provided by the invention extracts various detection frame data through the detection model, so that the accuracy of the detection target is greatly improved, the real-time detection of the ship target is achieved at the detection speed, and the requirements of high accuracy and strong real-time property are met.

In the embodiment of the present invention, referring to fig. 7, fig. 7 is a schematic flowchart illustrating a process of determining a model according to the embodiment of the present invention, and step S202 includes steps S2021 to S2023.

In step S2021, the requirements of the actual detection task on the detection accuracy and the detection speed are acquired.

In step S2022, according to the requirements of the actual detection task on the detection accuracy and the detection speed, the correspondence relationship between the detection model and the detection accuracy and the correspondence relationship between the detection model and the detection speed are searched, and the detection model that most matches the requirements of the actual detection task is selected.

In step S2023, the image sub-blocks are input to the best matching detection model, and the predicted coordinate data of the detection frame is determined. Therefore, aiming at the requirements of tasks on different detection precision and detection speed, the invention provides a variable feature extraction and feature reduction partial network. Optionally, the expression form of the correspondence between the detection model and the detection accuracy and the correspondence between the detection model and the detection speed is table 1, that is, the relationship between the detection model and the detection accuracy and the detection speed is searched in table 1, and then the selection is performed.

In the embodiment of the present invention, referring to fig. 8, fig. 8 is a schematic flowchart illustrating a process of determining an acquisition position according to the embodiment of the present invention, and step S203 includes step S2031 to step S2032.

In step S2031, detection position coordinates where the detection frame is located in the fusion feature layer of the detection model are determined based on the predicted coordinate data. Optionally, the detection position coordinates include coordinates of a first detection vertex, coordinates of a second detection vertex, coordinates of a third detection vertex, and coordinates of a fourth detection vertex.

Therefore, the predicted central point in the Heatmap output characteristic map, the Angle1Angle2 output characteristic map, the shortSide output characteristic map, the shortSiso output characteristic map L ongRatio output characteristic map, the predicted central point in the PointReg output characteristic map, the Angle formed by the target detection frame and the image x-axis forward direction, the length of the Short edge of the target detection frame, the ratio of the Short edge of the target detection frame to the diagonal Short edge to the diagonal length, and the offset of the target central point are used for effectively determining the position of the ship target on the image sub-block.

In the embodiment of the present invention, step S2031 specifically includes the following steps:

judging whether the predicted first diagonal angle and the predicted second diagonal angle meet angle preset conditions or not;

and if so, determining the coordinates of the first detection vertex, the second detection vertex, the third detection vertex and the fourth detection vertex according to the predicted coordinate data. Therefore, after the center point of the ship target is determined, the coordinates of the four corner points A, B, C, D of the predicted inclined rectangle can be obtained according to the angle formed by the target detection frame and the X-axis positive direction of the image, the length of the short side of the target detection frame, the proportion of the short side of the target detection frame to the short side of the diagonal line to the length of the diagonal line and the offset of the target center point.

Alternatively, if the first diagonal angle is predicted to be greater than 90 degrees and the second diagonal angle is predicted to be less than 90 degrees, or the first diagonal angle is predicted to be less than 90 degrees and the second diagonal angle is predicted to be greater than 90 degrees, the coordinates of the first detected vertex, the coordinates of the second detected vertex, the coordinates of the third detected vertex, and the coordinates of the fourth detected vertex are calculated by the following formulas:

A_x＝(O_x+b_x)-(l/k)/2*cos(α)

A_y＝(O_y+b_y)+(l/k)/2*sin(α)

B_x＝(O_x+b_x)+(l/k)/2*cos(π-β)

B_y＝(O_y+b_y)+(l/k)/2*sin(π-β)

C_x＝(O_x+b_x)+(l/k)/2*cos(α)

C_y＝(O_y+b_y)-(l/k)/2*sin(α)

D_x＝(O_x+b_x)-(l/k)/2*cos(π-β)

D_y＝(O_y+b_y)-(l/k)/2*sin(π-β)

if the first diagonal angle is predicted to be greater than 90 degrees and the second diagonal angle is predicted to be greater than 90 degrees, calculating the coordinates of the first detection vertex, the coordinates of the second detection vertex, the coordinates of the third detection vertex and the coordinates of the fourth detection vertex by the following formulas:

A_x＝(O_x+b_x)+(l/k)/2*cos(π-α)

A_y＝(O_y+b_y)+(l/k)/2*sin(π-α)

B_x＝(O_x+b_x)+(l/k)/2*cos(π-β)

B_y＝(O_y+b_y)+(l/k)/2*sin(π-β)

C_x＝(O_x+b_x)-(l/k)/2*cos(π-α)

C_y＝(O_y+b_y)-(l/k)/2*sin(π-α)

D_x＝(O_x+b_x)-(l/k)/2*cos(π-β)

D_y＝(O_y+b_y)-(l/k)/2*sin(π-β)

if the first diagonal angle is predicted to be smaller than 90 degrees and the second diagonal angle is predicted to be smaller than 90 degrees, calculating the coordinates of the first detection vertex, the coordinates of the second detection vertex, the coordinates of the third detection vertex and the coordinates of the fourth detection vertex by the following formulas:

A_x＝(O_x+b_x)-(l/k)/2*cos(β)

A_y＝(O_y+b_y)+(l/k)/2*sin(β)

B_x＝(O_x+b_x)+(l/k)/2*cos(α)

B_y＝(O_y+b_y)-(l/k)/2*sin(α)

C_x＝(O_x+b_x)+(l/k)/2*cos(β)

C_y＝(O_y+b_y)-(l/k)/2*sin(β)

D_x＝(O_x+b_x)-(l/k)/2*cos(α)

D_y＝(O_y+b_y)+(l/k)/2*sin(α)

where a _ x denotes an abscissa of the first detected vertex, a _ y denotes an ordinate of the first detected vertex, B _ x denotes an abscissa of the second detected vertex, B _ y denotes an ordinate of the second detected vertex, C _ x denotes an abscissa of the third detected vertex, C _ y denotes an ordinate of the third detected vertex, D _ x denotes an abscissa of the fourth detected vertex, D _ y denotes an ordinate of the fourth detected vertex, O _ x denotes an abscissa of the predicted center point, O _ y denotes an ordinate of the predicted center point, B _ x denotes an abscissa of the predicted bias point, B _ y denotes an ordinate of the predicted bias point, α denotes the predicted first diagonal angle, β denotes the predicted second diagonal angle, and k denotes a predicted short-to-diagonal length ratio.

Therefore, when the larger angle of the dual diagonal angles is greater than 90 degrees and the smaller angle is less than 90 degrees, the coordinates of the four corner points A, B, C, D of the predicted tilted rectangle can be obtained according to the diagonal angles and the relevant information at the moment; when two angles in the dual diagonal angles are larger than 90 degrees, the coordinates of the four corner points A, B, C, D of the predicted diagonal rectangle can be obtained according to the diagonal angles and the relevant information at the moment; when both of the dual diagonal angles are smaller than 90 degrees, the coordinates of the four corner points A, B, C, D of the predicted tilted rectangle can be obtained according to the diagonal angles and the related information at this time.

In the embodiment of the present invention, referring to fig. 9, fig. 9 is a schematic flowchart illustrating a process of determining a collection vertex according to the embodiment of the present invention, and step S2032 specifically includes step S20321 to step S20322.

In step S20321, a first constant is determined according to a ratio of the resolution of the image sub-block to the resolution of the output feature map of the fused feature layer in the detection model. Therefore, the accurate coordinate proportion is determined by taking the output characteristic diagram of the fusion characteristic layer as a standard.

In step S20322, the coordinates of the first detection vertex, the coordinates of the second detection vertex, the coordinates of the third detection vertex, and the coordinates of the fourth detection vertex are multiplied by a first constant, respectively, and the corresponding coordinates of the first collection vertex, the second collection vertex, the coordinates of the third collection vertex, and the coordinates of the fourth collection vertex are determined. Therefore, the predicted ship target is all based on the feature reduction graph, and then all coordinates are multiplied by a first constant to obtain the coordinates of the ship target on the image sub-blocks divided by the remote sensing image.

In the embodiment of the present invention, referring to fig. 10, fig. 10 is a schematic flow chart of the screening detection frame in the embodiment of the present invention, and step S204 specifically includes step S2041 to step S2042.

In step S2041, determining coordinates of the detection frame on the original remote sensing satellite acquired image to which the detection frame belongs according to the names of the image sub-blocks, wherein the name of each image sub-block includes image sub-block coordinate information of each image sub-block on the original remote sensing satellite acquired image;

in step S2042, a non-maximum value suppression method is used to select an optimal detection frame by screening, wherein the non-maximum value suppression method is used to select the optimal detection frame according to the confidence score and the intersection ratio of the prediction frame. Therefore, in order to find the detection frame which meets the best requirement and has the best detection effect, after the remote sensing image original image with all prediction results is obtained through synthesis, all the detection frames corresponding to the image are analyzed and screened. In the step, the invention uses non-maximum value inhibition as an analysis screening method, and effectively screens out an optimal detection frame.

Optionally, the image sub-block is named: the method comprises the following steps that the name of an original remote sensing satellite acquired image to which an image sub-block belongs, the x-axis coordinate of an image sub-block upper left corner point of the original remote sensing satellite acquired image to which the image sub-block belongs and the y-axis coordinate of an image sub-block upper left corner point of the original remote sensing satellite acquired image to which the image sub-block belongs are arranged, an acquisition position comprises a first horizontal coordinate and a first vertical coordinate of a detection frame of the image sub-block, and the specific operation of the step S2051: adding the first abscissa and the x-axis coordinate of the point at the upper left corner of the image sub-block in the original remote sensing satellite acquired image to obtain the abscissa of the detection frame in the original remote sensing satellite acquired image; and adding the first vertical coordinate and the y-axis coordinate of the point at the upper left corner of the image sub-block in the original remote sensing satellite acquired image to obtain the vertical coordinate of the detection frame in the original remote sensing satellite acquired image. Therefore, by using a simple naming method, the image sub-blocks belonging to the same satellite remote sensing image are synthesized, and the coordinates of the detection frame on the original remote sensing satellite acquisition image are effectively obtained.

Specifically, referring to fig. 11, fig. 11 is a schematic diagram illustrating a relationship between an image sub-block and an original remote sensing image according to an embodiment of the present invention, and as shown in fig. 11, O (0,0) is an upper left point of the original remote sensing image and is considered as a relative origin; a (x)_a,y_a) Is the upper left point of the image sub-block, and the coordinate of the image sub-block on the original remote sensing image is (x)_a,y_a)；C(x_c,y_c) The coordinates of the ship target central point predicted in the image sub-blocks are (x) relative to the coordinates of the image sub-blocks_c,y_c) Then, the coordinates of the predicted ship target on the original remote sensing image are obtained as follows:

T(x_t,y_t)＝(x_a,y_a)+(x_c,y_c)

according to the detection method based on the double-angle regression, the detection frame data is extracted through the detection model, the key points of the target are reflected, the problems of missed detection and false detection caused by large size change of the ship target and different ship concentration degrees are solved, the real-time requirement is met, and the detection accuracy is further improved through the optimal selection frame.

Fig. 12 is a schematic structural diagram of a detection apparatus 900 based on dual angle regression according to an embodiment of the present invention, which includes an obtaining unit 901, a processing unit 902, and a screening unit 903, where:

an acquisition unit 901: the method comprises the steps of obtaining an image to be detected;

the processing unit 902: the image preprocessing module is used for preprocessing an image to be tested to obtain an image subblock; the image subblocks are input into a detection model, prediction coordinate data of a detection frame are determined, the detection frame is used for framing a predicted ship target, the prediction coordinate data comprise a prediction central point of the detection frame, a prediction first diagonal angle, a prediction second diagonal angle, a prediction short edge length, a prediction short edge and diagonal length ratio and a prediction bias point, and the detection model is obtained by training through the detection model training method based on double-angle regression; the acquisition position of the detection frame on the image sub-block is determined according to the predicted coordinate data;

a screening unit 903: and obtaining an optimal detection frame according to the acquisition position.

According to the detection device based on the double-angle regression, the detection frame data is extracted through the detection model, the key points of the target are reflected, the problems of missed detection and false detection caused by large size change of the ship target and different ship concentration degrees are solved, the real-time requirement is met, and the detection accuracy is further improved through the optimal selection frame.

Yet another embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for training a detection model based on bi-angular regression as described above, or implements the method for detecting based on bi-angular regression as described above.

The computer-readable storage medium provided by the invention extracts the data of the detection frame through the detection model, reflects the key points of the target, solves the problems of missed detection and false detection caused by large size change of the ship target and different ship density degrees, also meets the real-time requirement, and further increases the detection accuracy through the optimal selection frame.

Although the present disclosure has been described above, the scope of the present disclosure is not limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present disclosure, and such changes and modifications will fall within the scope of the present invention.

Claims

1. A detection model training method based on double-angle regression is characterized by comprising the following steps:

inputting the training set image into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, and the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point of the detection frame;

and adjusting parameters of the detection model according to the value of the loss function until a convergence condition is met so as to finish the training of the detection model.

2. The method of dual-angle regression-based detection model training of claim 1, wherein the training set images comprise a plurality of image sub-blocks; the method for acquiring the training set image containing the labeling information comprises the following steps:

3. The method of claim 2, wherein the fixed resolution determination comprises:

4. The method for training the detection model based on the bi-angle regression as claimed in claim 1, wherein the detection model comprises a feature extraction network and a feature reduction network; the method for inputting the training set images into the detection model comprises the following steps:

5. The method of claim 4, wherein the feature extraction network comprises in sequence: the system comprises a down-sampling convolutional layer and a fusion convolutional layer, wherein the down-sampling convolutional layer is used for performing down-sampling processing on the training set image, and the fusion convolutional layer is used for performing down-sampling processing on an output characteristic diagram of the down-sampling convolutional layer and performing characteristic mixing.

6. The method of claim 5, wherein the fused convolutional layer sequentially comprises: a first fused convolutional layer, a second fused convolutional layer, a third fused convolutional layer, and a fourth fused convolutional layer, wherein:

the first fusion convolutional layer comprises a first fusion sublayer and a second fusion sublayer, and the first fusion sublayer and the second fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; determining an output characteristic diagram of the first fused convolutional layer according to the output characteristic diagram of the downsampled convolutional layer and the output characteristic diagram of the second fused sublayer;

the second fusion convolutional layer comprises a third fusion sublayer, a fourth fusion sublayer and a fifth fusion sublayer, and the third fusion sublayer and the fourth fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the fifth fusion sublayer is used for performing down-sampling processing on the output characteristic diagram of the first fusion convolutional layer by utilizing convolution operation; determining an output characteristic diagram of the second fused convolutional layer according to the output characteristic diagram of the fourth fused sublayer and the output characteristic diagram of the fifth fused sublayer;

the third fusion convolutional layer comprises a sixth fusion sublayer, a seventh fusion sublayer and an eighth fusion sublayer, and the sixth fusion sublayer and the seventh fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eighth fusion sublayer is used for performing downsampling processing on the output characteristic diagram of the second fusion convolutional layer by using convolution operation; determining an output characteristic diagram of the third fused convolutional layer according to the output characteristic diagram of the seventh fused sublayer and the output characteristic diagram of the eighth fused sublayer;

the fourth fused convolutional layer comprises a ninth fused sublayer, a tenth fused sublayer and an eleventh fused sublayer, wherein the ninth fused sublayer and the tenth fused sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eleventh fusion sublayer is used for performing down-sampling processing on the output feature map of the third fusion convolutional layer by using convolution operation; and determining the output characteristic diagram of the fourth fused convolutional layer according to the output characteristic diagram of the tenth fused sublayer and the output characteristic diagram of the eleventh fused sublayer.

7. The method for training the detection model based on the bi-angle regression as claimed in claim 4, wherein the feature reduction network comprises in sequence: the device comprises an up-sampling convolutional layer, a fusion characteristic layer and an output characteristic layer, wherein the up-sampling convolutional layer is used for performing up-sampling processing, the fusion characteristic layer is used for performing down-sampling processing and performing characteristic mixing, the output characteristic layer is used for performing different convolution operations on an output characteristic diagram of the fusion characteristic layer and outputting different characteristic restoration diagrams.

8. The method of claim 7, wherein the upsampled convolutional layer comprises a first upsampled layer, a first interpolated layer, a second upsampled layer, a second interpolated layer, and a third upsampled layer, wherein: the first up-sampling layer is used for performing up-sampling processing on the feature extraction image by using deconvolution operation; the first interpolation layer is used for carrying out quadruple bilinear interpolation up-sampling operation on the output characteristic diagram of the first up-sampling layer; the second up-sampling layer is used for performing up-sampling processing on the output characteristic diagram of the first up-sampling layer by using a deconvolution operation; the second interpolation layer is used for performing double-time bilinear interpolation up-sampling operation on the output characteristic diagram of the second up-sampling layer; the third up-sampling layer is configured to perform up-sampling processing on the output feature map of the second up-sampling layer by using a deconvolution operation.

9. The method for training the detection model based on the bi-angle regression as claimed in claim 8, wherein the fused feature layer includes a first fused feature layer, a first downsampled feature layer, a second downsampled feature layer and a second fused feature layer, the first fused feature layer is used for fusing the output feature map of the first interpolation layer, the output feature map of the second interpolation layer and the output feature map of the third upsampled layer by an end-to-end method; the first downsampling feature layer performs downsampling processing on the output feature map of the first fusion feature layer by using convolution operation; the second down-sampling feature layer performs down-sampling processing on the output feature map of the first down-sampling feature layer by using convolution operation; and the second fusion characteristic layer adds the output characteristic diagram of the first down-sampling characteristic layer and the output characteristic diagram of the second down-sampling characteristic layer and outputs the result.

10. The method as claimed in claim 7, wherein the outputting of the feature reduction graph includes performing five different convolution operations on the output feature graph of the fused feature layer by the output feature layer to obtain five feature reduction graphs, wherein the five feature reduction graphs include a Heatmap output feature graph, an Angle1Angle2 output feature graph, a ShortSide L aspect output feature graph and a PointReg output feature graph, the Heatmap output feature graph includes data of the predicted central point, the Angle1Angle2 output feature graph includes data of the predicted first diagonal Angle and data of the predicted second diagonal Angle, the ShortSide output feature graph is data of the predicted Short side length, the ShortSide L aspect output feature graph is data of the predicted Short side length, and the PointReg output feature graph is data of the predicted offset point.

11. A test model training device based on biangular regression is characterized by comprising:

12. A detection method based on dual-angle regression is characterized by comprising the following steps:

inputting a plurality of image sub-blocks into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point of the detection frame, and the detection model is obtained by training by adopting the double-angle regression-based detection model training method according to any one of claims 1-10;

13. The bi-angular regression-based detection method of claim 12, wherein the determining the acquisition position of the detection frame on each of the image sub-blocks according to the predicted coordinate data comprises:

determining the detection position coordinates of the detection frame in the output feature map of the fusion feature layer of the detection model according to the predicted coordinate data;

and determining the acquisition position coordinate of the detection frame on each image sub-block according to the detection position coordinate.

14. The bi-angular regression-based detection method according to claim 13, wherein the detection position coordinates include coordinates of a first detection vertex, coordinates of a second detection vertex, coordinates of a third detection vertex, and coordinates of a fourth detection vertex; the determining, according to the predicted coordinate data, the detection position coordinate where the detection frame is located in the fusion feature layer of the detection model includes:

15. The bi-angular regression-based detection method according to claim 13, wherein the detection position coordinates include coordinates of a first detection vertex, coordinates of a second detection vertex, coordinates of a third detection vertex, and coordinates of a fourth detection vertex, and the acquisition position coordinates include coordinates of a first acquisition vertex, coordinates of a second acquisition vertex, coordinates of a third acquisition vertex, and coordinates of a fourth acquisition vertex; the determining, according to the detection position coordinates, the acquisition position coordinates where the detection frame is located on each of the image sub-blocks includes:

16. The bi-angular regression-based detection method of claim 12, wherein the determining an optimal detection box according to the acquisition location comprises:

17. A detection device based on dual-angle regression is characterized by comprising:

a processing unit: the image preprocessing module is used for preprocessing the image to be detected to obtain an image subblock; the image sub-blocks are further used for inputting the image sub-blocks into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing and selecting a predicted ship target, the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point of the detection frame, and the detection model is obtained by training with the double-angle regression-based detection model training method according to any one of claims 1 to 10; the detection frame is used for acquiring the image sub-block, and the acquisition position of the detection frame on the image sub-block is determined according to the predicted coordinate data;

screening unit: and the method is used for determining the optimal detection frame according to the acquisition position.

18. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for training a dihedral regression-based detection model according to any one of claims 1-10, or carries out a method for dihedral regression-based detection according to any one of claims 12-16.