CN111476159B

CN111476159B - Method and device for training and detecting detection model based on double-angle regression

Info

Publication number: CN111476159B
Application number: CN202010264623.5A
Authority: CN
Inventors: 屈桢深; 赵鹏博; 关秋雨; 谢伟男
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2023-04-07
Anticipated expiration: 2040-04-07
Also published as: CN111476159A

Abstract

The invention provides a method and a device for training and detecting a detection model based on double-angle regression, which relate to the field of ship target detection and comprise the following steps: acquiring a training set image containing labeling information; inputting the training set image into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, and the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted offset point of the detection frame; determining the value of the loss function according to the actual coordinate data and the predicted coordinate data; and adjusting parameters of the detection model according to the value of the loss function until a convergence condition is met, and finishing the training of the detection model. The method extracts the predicted coordinate data through the detection model, reflects the key points of the target, solves the problems of missed detection and false detection caused by large size change of the ship target and different ship concentration degrees, and simultaneously meets the real-time requirement.

Description

Method and device for training and detecting detection model based on double-angle regression

Technical Field

The invention relates to the field of ship target detection, in particular to a method and a device for training and detecting a detection model based on double-angle regression.

Background

With the development of the times, specific ship targets are rapidly and accurately detected and identified in remote sensing images, and the method has great significance in the fields of wrecking ship rescue, military ship target monitoring, battle-time warship target striking and the like.

At present, in the field of ship detection, a common detection method is an electromagnetic wave echo positioning detection method, such as a shore-based radar and an air-based radar. The principle of the ship target detection method based on the radar signals is that ship positions are obtained after a series of processing methods such as signal analysis, enhancement and the like are carried out on radar signal echoes. However, the detection method is extremely easily influenced by complex weather and hydrological conditions on the sea surface, and meanwhile, the detection method cannot carry out long-distance or ultra-long-distance ship detection due to the limitation of the curvature of the earth, and cannot be used for ship rescue or ordinary long-distance monitoring tasks.

Another common detection method is the detection of vessel targets based on optical remote sensing images. The method has the advantages of long action distance and high detection precision, belongs to a passive detection mode, and does not need additional equipment, so the method is widely applied. In the method, the common technology is to process an optical remote sensing image by utilizing feature extraction so as to achieve the purpose of target detection. The method for processing the optical remote sensing image by utilizing the feature extraction mostly depends on the accuracy degree of the feature processing part, and the accurate features of the target are difficult to obtain due to the difference of the application scene and the target, so that the accuracy is greatly influenced, and meanwhile, the robustness of the method is not high enough.

In recent years, in methods for detecting a ship target based on an optical remote sensing image, a deep learning algorithm is gradually applied, wherein the deep learning algorithm detection based on the optical remote sensing image is an emerging ship target detection method which is gradually developed in recent years. The principle is that the existing image data with labels is used for training, and a target detection model generated by training is used for detecting a shot image, so that an interested target in the shot image is found. The deep learning method has a series of advantages of small interference influence, wide scene application range, low cost and the like. However, in the existing deep learning algorithm based on the optical remote sensing image, only fast R-CNN is often reproduced as a detection network, the original method is not improved in network structure, loss function and the like, the problems of missed detection and false detection and time delay existing in the detection of a ship target cannot be fundamentally solved, and the trained model has a great possibility of over-fitting.

Disclosure of Invention

In view of the above, the present invention is directed to solving the technical problems in the related art at least to some extent. In order to achieve the above object, the present invention provides a method for training a detection model based on dual-angle regression, comprising the following steps:

acquiring a training set image containing marking information, wherein the marking information comprises actual coordinate data of a plurality of ship targets;

inputting the training set image into the detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, and the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point of the detection frame;

determining a value of a loss function according to the actual coordinate data and the predicted coordinate data;

and adjusting parameters of the detection model according to the value of the loss function until a convergence condition is met, and finishing the training of the detection model.

Therefore, the method extracts the predicted coordinate data of the detection frame through the detection model to obtain a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge and diagonal length ratio and a predicted bias point, reflects the key point information of the target, and solves the problem of inaccurate prediction of the detection frame caused by long and narrow ship target, inclination angle and large size change through extracting various detection frame data; meanwhile, the detection model is utilized to find the predicted coordinate data of the detection frame, so that the defect of low detection speed is avoided. In conclusion, the method for training the detection model based on the double-angle regression extracts various detection frame data through the detection model, so that the accuracy of the detection target is greatly improved, the real-time detection of the ship target is achieved at the detection speed, and the requirements of high accuracy and strong real-time property are met.

Further, the training set image comprises a plurality of image sub-blocks; the method for acquiring the training set image containing the labeling information comprises the following steps:

dividing a remote sensing satellite acquisition image into a plurality of image sub-blocks with fixed resolution, and converting the labeling information of the remote sensing satellite acquisition image to the corresponding plurality of image sub-blocks.

Therefore, the image sub-blocks with fixed resolution are obtained through image segmentation, and the accuracy and the speed of model training are improved.

Further, in the step of dividing the remote sensing satellite acquisition image into a plurality of image sub-blocks with fixed resolution, the step of determining the fixed resolution includes the following steps:

determining the fixed resolution according to the number of the segmentation parts of the image acquired by the remote sensing satellite;

or determining the fixed resolution according to the target integrity of the ship target in the remote sensing satellite acquisition image;

or determining the fixed resolution according to the sparsity of the ship target.

Therefore, in consideration of the fact that the resolution of the shot optical photo with the ship target is extremely changed, the resolution can be changed from 1000 × 1000 to 30000 × 30000, in order to cope with the situation, the method adopts three methods to determine the fixed resolution of the image subblocks, enables the sparsity to be well unified, and is beneficial to improving the detection accuracy.

Further, the detection model comprises a feature extraction network and a feature restoration network; the method for inputting the training set images into the detection model comprises the following steps:

inputting the training set image to the feature extraction network, and determining a feature extraction graph, wherein the feature extraction graph comprises initial feature data of the detection frame;

and inputting the feature extraction graph into the feature restoration network, and determining a feature restoration graph, wherein the feature restoration graph comprises the predicted coordinate data of the detection frame.

Therefore, by setting the two-stage network, the defect of complex network structure is avoided, the final characteristics are effectively extracted by utilizing the extraction and reduction of the characteristics, and the detection accuracy is ensured.

Further, the feature extraction network sequentially includes: the system comprises a down-sampling convolutional layer and a fusion convolutional layer, wherein the down-sampling convolutional layer is used for performing down-sampling processing on the training set image, and the fusion convolutional layer is used for performing down-sampling processing on an output characteristic diagram of the down-sampling convolutional layer and performing characteristic mixing.

Therefore, all the feature extraction operations are completed by using convolution kernel operations, a multilayer convolution network is designed to effectively perform down-sampling processing on the image, and the fusion convolution layer is arranged to extract key feature information, so that accurate target detection is facilitated.

Further, the fusion convolution layer sequentially includes: a first fused convolutional layer, a second fused convolutional layer, a third fused convolutional layer, and a fourth fused convolutional layer, wherein:

the first fusion convolutional layer comprises a first fusion sublayer and a second fusion sublayer, and the first fusion sublayer and the second fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; determining an output feature map of the first fused convolutional layer according to the output feature map of the downsampled convolutional layer and the output feature map of the second fused sublayer;

the second fusion convolutional layer comprises a third fusion sublayer, a fourth fusion sublayer and a fifth fusion sublayer, and the third fusion sublayer and the fourth fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the fifth fusion sublayer is used for performing down-sampling processing on the output feature map of the first fusion convolutional layer by using convolution operation; determining an output characteristic diagram of the second fused convolutional layer according to the output characteristic diagram of the fourth fused sublayer and the output characteristic diagram of the fifth fused sublayer;

the third fusion convolutional layer comprises a sixth fusion sublayer, a seventh fusion sublayer and an eighth fusion sublayer, and the sixth fusion sublayer and the seventh fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eighth fusion sublayer is configured to perform downsampling on the output feature map of the second fusion convolutional layer by using a convolution operation; determining an output characteristic diagram of the third fused convolutional layer according to the output characteristic diagram of the seventh fused sublayer and the output characteristic diagram of the eighth fused sublayer;

the fourth fused convolutional layer comprises a ninth fused sublayer, a tenth fused sublayer and an eleventh fused sublayer, wherein the ninth fused sublayer and the tenth fused sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eleventh fusion sublayer is used for performing downsampling processing on the output feature map of the third fusion convolutional layer by using convolution operation; and determining the output characteristic diagram of the fourth fused convolutional layer according to the output characteristic diagram of the tenth fused sublayer and the output characteristic diagram of the eleventh fused sublayer.

Therefore, each fusion convolution layer is provided with a plurality of fusion sublayers for doubling the height and width of the image and increasing the channel number of the image, and the feature is mixed by using a by-pass method, so that the correct feature information can be extracted.

Further, the feature reduction network sequentially includes: the device comprises an up-sampling convolutional layer, a fusion characteristic layer and an output characteristic layer, wherein the up-sampling convolutional layer is used for performing up-sampling processing, the fusion characteristic layer is used for performing down-sampling processing and performing characteristic mixing, the output characteristic layer is used for performing different convolution operations on an output characteristic diagram of the fusion characteristic layer and outputting different characteristic restoration diagrams.

Therefore, all the feature restoration operations are completed by using convolution kernel operations, a multilayer convolution network is designed to effectively carry out up-sampling processing on the image, and key feature information is restored on the basis of the feature extraction image, so that accurate target detection is facilitated.

Further, the upsampled convolutional layer includes a first upsampled layer, a first interpolation layer, a second upsampled layer, a second interpolation layer, and a third upsampled layer, wherein: the first up-sampling layer is used for performing up-sampling processing on the feature extraction image by using deconvolution operation; the first interpolation layer is used for carrying out quadruple bilinear interpolation up-sampling operation on the output characteristic diagram of the first up-sampling layer; the second up-sampling layer is used for performing up-sampling processing on the output feature map of the first up-sampling layer by using deconvolution operation; the second interpolation layer is used for performing double-time bilinear interpolation up-sampling operation on the output characteristic diagram of the second up-sampling layer; the third up-sampling layer is configured to perform up-sampling processing on the output feature map of the second up-sampling layer by using a deconvolution operation.

Therefore, the first up-sampling layer, the first interpolation layer, the second up-sampling layer, the second interpolation layer and the third up-sampling layer are arranged, the feature extraction graph is effectively restored, key feature information is restored, and accurate target detection is facilitated.

Further, the fused feature layer comprises a first fused feature layer, a first downsampling feature layer, a second downsampling feature layer and a second fused feature layer, and the first fused feature layer is used for fusing the output feature map of the first interpolation layer, the output feature map of the second interpolation layer and the output feature map of the third upsampling layer by an end-to-end connection method; the first downsampling feature layer refers to and carries out downsampling processing on the output feature map of the first fusion feature layer by using convolution operation; the second down-sampling feature layer refers to down-sampling the output feature map of the first down-sampling feature layer by using convolution operation; and the second fused feature layer reference is output after adding the output feature map of the first downsampled feature layer and the output feature map of the second downsampled feature layer.

Therefore, the first fusion feature layer, the first down-sampling feature layer, the second down-sampling feature layer and the second fusion feature layer are arranged, feature fusion is effectively carried out on the feature extraction graph, key feature information is fused, and accurate target detection is facilitated.

Further, the outputting of the feature reduction image includes: the output feature layer performs five different convolution operations on the output feature graph of the fusion feature layer to obtain five feature restoration graphs; the five feature reduction maps comprise a Heatmap output feature map, an Angle1Angle2 output feature map, a ShortSide output feature map, a ShortLongRatio output feature map and a PointReg output feature map, wherein the Heatmap output feature map comprises data of the predicted central point; the Angle1Angle2 output feature map comprises data of the predicted first diagonal Angle and data of the predicted second diagonal Angle; the short output characteristic graph is the data of the predicted short edge length; the ShortLongRatio output characteristic diagram is data of the ratio of the predicted short edge to the length of the diagonal line; and the PointReg output characteristic diagram is the data of the prediction bias point.

Thus, the present invention uses five different convolution kernels for the second fused feature layer and uses different activation functions for the different parts to obtain five part outputs. These are respectively Heaatmap, angle1Angle2, shortSide, shortLongRatio, pointReg. By the mode, five different output characteristic graphs are obtained, each output represents different detection frame data information, and the detection accuracy is effectively improved.

The second purpose of the invention is to provide a detection model training device based on double-angle regression, which extracts detection frame data through a detection model, reflects key points of a target, solves the problem of missed detection and false detection caused by large size change of a ship target and different ship density degrees, and simultaneously meets the real-time requirement.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a test model training device based on biangular regression comprises:

the acquisition unit is used for acquiring a training set image containing marking information, wherein the marking information comprises actual coordinate data of a plurality of ship targets;

the processing unit is used for inputting the training set images into the detection model and determining predicted coordinate data of each detection frame, wherein the predicted coordinate data comprise a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted offset point of the detection frame; and further configured to determine a value of a loss function from the actual coordinate data and the predicted coordinate data;

and the training unit is used for adjusting the parameters of the detection model according to the value of the loss function until a convergence condition is met, and finishing the training of the detection model.

Compared with the prior art, the detection model training device based on the dihedral regression and the detection model training method based on the dihedral regression have the same beneficial effects, and are not repeated herein.

The third purpose of the invention is to provide a detection method based on double-angle regression, which extracts detection frame data through the detection model, reflects key points of a target, solves the problem of missed detection and false detection caused by large size change of a ship target and different ship density degrees, and simultaneously meets the real-time requirement.

a detection method based on dual-angle regression comprises the following steps:

acquiring an image to be detected, and preprocessing the image to be detected to obtain a plurality of image sub-blocks;

inputting the image sub-blocks into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point of the detection frame, and the detection model is obtained by adopting the training method of the detection model based on the double-angle regression;

determining the collection position of the detection frame on each image sub-block according to the predicted coordinate data;

and determining an optimal detection frame according to the acquisition position.

Therefore, the method extracts the predicted coordinate data of the detection frame through the detection model, reflects the key point information of the target, solves the problem of inaccurate prediction of the detection frame caused by long and narrow ship targets, large inclination angle and large size change through extracting various detection frame data, finds the predicted coordinate data of the detection frame through the detection model, and avoids the defect of low detection speed. In addition, the invention screens the acquisition position, thereby increasing the accurate precision. In conclusion, the detection method based on the double-angle regression provided by the invention extracts various detection frame data through the detection model, so that the accuracy of the detection target is greatly improved, the real-time detection of the ship target is achieved at the detection speed, and the requirements of high accuracy and strong real-time property are met.

Further, the determining the acquisition position of the detection frame on each image sub-block according to the predicted coordinate data comprises: determining the detection position coordinates of the detection frame in an output feature map of a fusion feature layer of the detection model according to the predicted coordinate data; and determining the acquisition position coordinate of the detection frame on each image sub-block according to the detection position coordinate.

Therefore, the position of the ship target on the image sub-block is effectively determined by using a Heatmap output characteristic diagram, an Angle1Angle2 output characteristic diagram, a ShortSide output characteristic diagram, a ShortLongratio output characteristic diagram, a central point predicted in a PointReg output characteristic diagram, an Angle formed by the target detection frame and the image x-axis forward direction, the length of the short edge of the target detection frame, the ratio of the short edge of the target detection frame to the diagonal short edge to the diagonal length and the offset of the target central point.

Further, the detecting position coordinates include coordinates of a first detecting vertex, coordinates of a second detecting vertex, coordinates of a third detecting vertex, and coordinates of a fourth detecting vertex, and the determining, according to the predicted coordinate data, the detecting position coordinates at which the detecting frame is located in the output feature map of the fused feature layer of the detection model includes:

judging whether the predicted first diagonal angle and the predicted second diagonal angle meet preset angle conditions or not;

and if so, determining the coordinates of the first detection vertex, the second detection vertex, the third detection vertex and the fourth detection vertex according to the predicted coordinate data.

Therefore, after the central point of the ship target is determined, the coordinates of four corner points A, B, C and D of the prediction inclined rectangle can be obtained according to the angle formed by the target detection frame and the X-axis of the image in the positive direction, the short edge length of the target detection frame, the proportion of the short edge of the target detection frame to the short edge of the diagonal line to the length of the diagonal line and the target central point offset.

Further, the detection position coordinates comprise coordinates of a first detection vertex, coordinates of a second detection vertex, coordinates of a third detection vertex and coordinates of a fourth detection vertex, and the acquisition position coordinates comprise coordinates of a first acquisition vertex, coordinates of a second acquisition vertex, coordinates of a third acquisition vertex and coordinates of a fourth acquisition vertex; the determining, according to the detection position coordinates, the acquisition position coordinates where the detection frame is located on each of the image sub-blocks includes:

determining a first constant according to the ratio of the resolution of each image sub-block to the resolution of the output feature map of the fused feature layer;

and respectively multiplying the coordinates of the first detection vertex, the second detection vertex, the third detection vertex and the fourth detection vertex by the first constant to determine the corresponding coordinates of the first acquisition vertex, the second acquisition vertex, the third acquisition vertex and the fourth acquisition vertex.

Therefore, the predicted ship target is based on the fusion characteristic layer, and then all coordinates are multiplied by a first constant to obtain the coordinates of the ship target on the image sub-blocks after the remote sensing image is segmented.

Further, the determining the optimal detection frame according to the acquisition position includes:

determining coordinates of the detection frame on an original remote sensing satellite acquired image to which each image subblock belongs according to the name of each image subblock, wherein the name of each image subblock comprises image subblock coordinate information of each image subblock on the original remote sensing satellite acquired image;

and screening by using a non-maximum value inhibition method, and selecting the optimal detection frame, wherein the non-maximum value inhibition method is used for selecting the optimal detection frame according to the confidence score and the intersection ratio of the prediction frame.

Therefore, in order to find the detection frame which meets the best requirement and has the best detection effect, after the remote sensing image original image with all prediction results is obtained through synthesis, all the detection frames corresponding to the image are analyzed and screened. In the step, the invention uses non-maximum value inhibition as an analysis screening method, and effectively screens out an optimal detection frame.

The fourth purpose of the invention is to provide a detection device based on double-angle regression, which extracts detection frame data through a detection model, reflects key points of a target, solves the problems of missed detection and false detection caused by large size change of a ship target and different ship density degrees, and simultaneously meets the real-time requirement.

an acquisition unit: the image acquisition module is used for acquiring an image to be detected;

a processing unit: the image preprocessing module is used for preprocessing the image to be detected to obtain an image subblock; the image subblocks are further used for inputting the image subblocks into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, the predicted coordinate data comprises a predicted central point of the detection frame, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point, and the detection model is obtained by training by adopting a detection model training method based on double-angle regression; the detection frame is used for acquiring the image sub-block, and the acquisition position of the detection frame on the image sub-block is determined according to the predicted coordinate data;

screening unit: for determining an optimal detection frame from the acquisition position

Compared with the prior art, the detection device based on the double-angle regression and the detection method based on the double-angle regression have the same beneficial effects, and are not repeated herein.

The fifth purpose of the invention is to provide a non-transitory computer-readable storage medium, which extracts detection frame data through a detection model, reflects key points of a target, solves the problem of missed detection and false detection caused by large size change of a ship target and different ship density degrees, and simultaneously meets the real-time requirement.

a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for training a bi-angular regression-based detection model as described above, or carries out a method for bi-angular regression-based detection as described above.

The beneficial effects of the computer-readable storage medium and the detection method based on the dual-angle regression are the same as those of the prior art, and are not described herein again.

Drawings

FIG. 1 is a schematic flow chart of a method for training a detection model based on bi-angle regression according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a diagonal angle according to an embodiment of the present invention;

FIG. 3 is a flow chart of network input according to an embodiment of the present invention

FIG. 4 is a schematic structural diagram of a detection model according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a training apparatus for a test model based on bi-angle regression according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart of a detection method based on dual angle regression according to an embodiment of the present invention;

FIG. 7 is a schematic flow chart illustrating the determination of a model according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart illustrating the determination of the acquisition location according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart illustrating a process for determining a vertex for collection according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart of a screening test frame according to an embodiment of the present invention;

FIG. 11 is a schematic diagram showing a relationship between an image sub-block and an associated remote sensing image according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a detection apparatus based on dual-angle regression according to an embodiment of the present invention.

Detailed Description

Embodiments in accordance with the present invention will now be described in detail with reference to the drawings, wherein like reference numerals refer to the same or similar elements throughout the different views unless otherwise specified. It is to be noted that the embodiments described in the following exemplary embodiments do not represent all embodiments of the present invention. They are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the claims, and the scope of the present disclosure is not limited in these respects. Features of the various embodiments of the invention may be combined with each other without departing from the scope of the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the detection of a ship target, the identification and detection by using an optical remote sensing image is a trend. In the prior art, there are two main methods for detecting a ship target by using an optical remote sensing image. One is to perform the algorithm processing of feature extraction on the optical remote sensing image, and most of the methods rely on the accuracy of the feature processing part. Due to the difference between the application scene and the target, the accurate characteristics of the target are difficult to obtain, so the detection accuracy of the method is greatly influenced, and meanwhile, the robustness of the method is not high enough. And the other method is that the optical remote sensing image is processed by using a deep learning algorithm, the existing labeled image data is used for training, and the shot image is detected by using a target detection model generated by training, so that the target of interest in the shot image is found. The deep learning method has a series of advantages of small interference influence, wide scene application range, low cost and the like.

However, in the existing deep learning algorithm detection based on the optical remote sensing image, due to the problems of complex network structure, inaccurate feature extraction and the like, the phenomenon of missed detection and false detection exists, and meanwhile, real-time detection is difficult to perform, so that in order to realize high-accuracy and strong-instantaneity ship target detection, a detection method based on double-angle regression needs to be proposed urgently.

The embodiment of the invention provides a detection model training method based on double-angle regression. Fig. 1 is a schematic flow chart of a method for training a detection model based on bi-angle regression according to an embodiment of the present invention, including steps S101 to S104, where:

in step S101, a training set image including annotation information is obtained, where the annotation information includes actual coordinate data of a plurality of vessel targets. Therefore, the training sample data is effectively acquired.

In step S102, the training set image is input into the detection model, and the predicted coordinate data of the detection box is determined, where the detection box is used to select the predicted ship target, and the predicted coordinate data includes the predicted central point of the detection box, the predicted first diagonal angle, the predicted second diagonal angle, the predicted short edge length, the predicted short edge-diagonal length ratio, and the predicted offset point. Thus, by extracting a plurality of kinds of detection frame data, the key point information of the target is reflected.

In step S103, the value of the loss function is determined from the actual coordinate data and the predicted coordinate data. And a proper loss function is selected to ensure the training accuracy.

In step S104, parameters of the detection model are adjusted according to the value of the loss function until a convergence condition is satisfied, and training of the detection model is completed. Therefore, the method extracts the predicted coordinate data of the detection frame through the detection model to obtain a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point, reflects key point information of the target, and solves the problem of inaccurate prediction of the detection frame caused by long and narrow ship target, inclined angle and large size change by extracting various detection frame data; meanwhile, the detection model is utilized to find the predicted coordinate data of the detection frame, so that the defect of low detection speed is avoided. In conclusion, the method for training the detection model based on the double-angle regression extracts various detection frame data through the detection model, so that the accuracy of the detection target is greatly improved, the real-time detection of the ship target is achieved at the detection speed, and the requirements of high accuracy and strong real-time property are met.

The predicted central point is the central point of the predicted detection frame, the predicted first diagonal angle is the angle formed by one diagonal line of the predicted detection frame and the x axis, the predicted first diagonal angle is the angle formed by the other diagonal line of the predicted detection frame and the x axis, the predicted short side length is the short side length in the predicted detection frame, and the predicted offset point is the deviation between the predicted central point and the actual mapping central point. Referring to fig. 2, fig. 2 is a schematic diagram of diagonal angles according to an embodiment of the present invention, where α is a predicted first diagonal angle and β is a predicted second diagonal angle.

Optionally, the training set image includes a plurality of image sub-blocks, and acquiring the plurality of image sub-blocks includes: the method comprises the steps of dividing an image acquired by the remote sensing satellite into a plurality of image sub-blocks with fixed resolution, and converting annotation information of the image acquired by the remote sensing satellite into the corresponding image sub-blocks. Therefore, the image subblocks are divided from the multiple remote sensing satellite collected images, so that the detection accuracy and the detection speed of the network to the images are conveniently improved, and the subsequent high-accuracy and strong-real-time detection target is facilitated.

Optionally, determining the fixed resolution includes three methods:

in the first method, a fixed resolution is determined based on the number of segments of the image acquired by the remote sensing satellite.

Since the convolutional neural network has a faster processing speed when processing an image with a smaller resolution, but if the image segmentation size is too small, one satellite remote sensing image is segmented into a larger number of parts, and the relationship between the number of segmented parts and the resolution of the segmented image is as follows (considering only square segmentation):

wherein H and W represent the height and width of the satellite remote sensing image, H and W represent the width and height of the small image after segmentation, and S represents the width of the overlap between each segmentation image. According to the formula and specific experiments, the image segmentation resolution with the optimal detection precision and the balance between detection speeds is 1024 x 1024.

In the second method, the fixed resolution is determined according to the target integrity of the ship target in the remote sensing satellite acquisition image.

The ship target on the remote sensing satellite image has the phenomena that a target and an image boundary form a certain included angle, the size of the target is greatly changed, the target sparsity in different images is greatly changed, and the like. If a smaller image segmentation resolution ratio is adopted, the phenomenon that the ship target is cracked can be caused. Namely, the phenomenon that the ship target which happens to be positioned at the segmentation edge is segmented into two ship targets due to the undersize segmentation resolution ratio, so that the false detection is caused easily occurs. In this case, the present invention has been experimented with that the target integrity can be best maintained using image segmentation resolution of 1024 x 1024 or more.

In the third method, the fixed resolution is determined according to the sparsity of the ship target.

Due to the fact that the sparsity of ship targets in different parts of satellite shooting images is different greatly. For example, there is 3X 10 in the A region of the image ² A number of targets, but only a few vessel targets are present in the B region of the image, and no targets are present at all in the C region of the image. The characteristic that the sample sparsity distribution is not uniform easily causes the reduction of network robustness, and is particularly obvious when large-resolution segmentation is used. In this case, the invention has been experimented, and the sparsity can be better unified when the resolution of the image segmentation is 1024 × 1024 or less.

In the embodiment of the invention, the detection model comprises a feature extraction network and a feature restoration network; fig. 3 is a schematic flow chart of the network input according to the embodiment of the present invention, and step S102 includes step S1021 and step S1022.

In step S1021, the training set image is input to a feature extraction network to obtain a feature extraction graph, where the feature extraction graph includes the initial feature data of the detection frame. Therefore, the features of the image are effectively extracted, and data redundancy is avoided.

In step S1022, the feature extraction map is input to a feature restoration network to obtain a feature restoration map, where the feature restoration map includes the predicted coordinate data of the detection frame. Therefore, by setting the two-stage network, the defect of complex network structure is avoided, the final characteristics are effectively extracted by utilizing the extraction and reduction of the characteristics, and the detection accuracy is ensured.

Optionally, the feature extraction network sequentially includes: the system comprises a down-sampling convolutional layer and a fusion convolutional layer, wherein the down-sampling convolutional layer is used for performing down-sampling processing on a training set image, and the fusion convolutional layer is used for performing down-sampling processing on an output characteristic diagram of the down-sampling convolutional layer and performing characteristic mixing. Therefore, all the feature extraction operations are completed by using convolution kernel operations, a multilayer convolution network is designed to effectively perform down-sampling processing on the image, and the fusion convolution layer is arranged to extract key feature information, so that accurate target detection is facilitated.

Optionally, the fusion convolutional layer comprises in sequence: a first fused convolutional layer, a second fused convolutional layer, a third fused convolutional layer, and a fourth fused convolutional layer, wherein:

the first fusion convolutional layer comprises a first fusion sublayer and a second fusion sublayer, and the first fusion sublayer and the second fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; and determining the output characteristic diagram of the first fused convolutional layer according to the output characteristic diagram of the downsampled convolutional layer and the output characteristic diagram of the second fused sublayer. Therefore, a plurality of fusion sublayers are arranged for reducing the height and width of the image by one time and increasing the channel number of the image, and the characteristics are mixed by using a by-pass method, so that the correct characteristic information can be extracted.

The second fusion convolutional layer comprises a third fusion sublayer, a fourth fusion sublayer and a fifth fusion sublayer, and the third fusion sublayer and the fourth fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the fifth fusion sublayer is used for performing down-sampling processing on the output characteristic diagram of the first fusion convolutional layer by utilizing convolution operation; and determining the output characteristic diagram of the second fused convolutional layer according to the output characteristic diagram of the fourth fused sublayer and the output characteristic diagram of the fifth fused sublayer. Therefore, a plurality of fusion sublayers are arranged for reducing the height and width of the image by one time and increasing the channel number of the image, and the characteristics are mixed by using a by-pass method, so that the correct characteristic information can be extracted.

The third fusion convolution layer comprises a sixth fusion sublayer, a seventh fusion sublayer and an eighth fusion sublayer, and the sixth fusion sublayer and the seventh fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eighth fusion sublayer is used for performing downsampling processing on the output characteristic diagram of the second fusion convolutional layer by utilizing convolution operation; and determining the output characteristic diagram of the third fused convolutional layer according to the output characteristic diagram of the seventh fused sublayer and the output characteristic diagram of the eighth fused sublayer. Therefore, a plurality of fusion sublayers are arranged for reducing the height and width of the image by one time and increasing the channel number of the image, and the characteristics are mixed by using a by-pass method, so that the correct characteristic information can be extracted.

The fourth fused convolutional layer comprises a ninth fused sublayer, a tenth fused sublayer and an eleventh fused sublayer, wherein the ninth fused sublayer and the tenth fused sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eleventh fusion sublayer is used for performing down-sampling processing on the output characteristic diagram of the third fusion convolutional layer by utilizing convolution operation; and determining the output characteristic diagram of the fourth fused convolutional layer according to the output characteristic diagram of the tenth fused sublayer and the output characteristic diagram of the eleventh fused sublayer. Therefore, a plurality of fusion sublayers are arranged and used for reducing the height and the width of the image by one time and increasing the number of channels of the image, and the characteristics are mixed by using a by-pass method, so that the correct characteristic information can be extracted.

Optionally, the downsampled convolutional layer sequentially comprises a first downsampling layer and a second downsampling layer, thereby sequentially performing effective downsampling operation.

In the embodiment of the present invention, step S1021 includes the following specific steps:

inputting the training set image into a first down-sampling layer, and carrying out down-sampling processing on the training set image by the first down-sampling layer through convolution and a first convolution core to obtain an output characteristic diagram of the first down-sampling layer;

inputting the output characteristic diagram of the first down-sampling layer into a second down-sampling layer, and performing down-sampling processing on the second down-sampling layer by utilizing a second convolution kernel through convolution to obtain the output characteristic diagram of the second down-sampling layer;

inputting the output characteristic diagram of the second downsampling layer into the first fused convolutional layer, carrying out downsampling processing on the output characteristic diagram of the second downsampling layer by the first fused convolutional layer through convolution by using a third convolution core, and carrying out characteristic mixing to obtain the output characteristic diagram of the first fused convolutional layer;

inputting the output characteristic diagram of the first fused convolutional layer into a second fused convolutional layer, performing convolution on the second fused convolutional layer, performing down-sampling on the output characteristic diagram of the first fused convolutional layer by using a fourth convolution core, and performing characteristic mixing to obtain an output characteristic diagram of the second fused convolutional layer;

inputting the output characteristic diagram of the second fused convolutional layer into a third fused convolutional layer, performing down sampling on the output characteristic diagram of the second fused convolutional layer by utilizing a fifth convolutional core through convolution of the third fused convolutional layer, and performing characteristic mixing to obtain an output characteristic diagram of the third fused convolutional layer;

and inputting the output characteristic diagram of the third fused convolutional layer into a fourth fused convolutional layer, performing down-sampling on the output characteristic diagram of the third fused convolutional layer by using a sixth convolutional core through convolution by the fourth fused convolutional layer, and performing characteristic mixing to obtain an output characteristic diagram of the fourth fused convolutional layer, wherein the characteristic output diagram of the fourth fused convolutional layer is a characteristic extraction diagram. Therefore, all the feature extraction operations are completed by using convolution kernel operation, a multilayer convolution network is designed to effectively carry out down-sampling processing on the image, and key feature information is extracted, so that accurate target detection is facilitated.

Optionally, the feature reduction network sequentially includes: the device comprises an up-sampling convolutional layer, a fusion characteristic layer and an output characteristic layer, wherein the up-sampling convolutional layer is used for performing up-sampling processing, the fusion characteristic layer is used for performing down-sampling processing and performing characteristic mixing, and the output characteristic layer is used for performing different convolution operations on an output characteristic diagram of the fusion characteristic layer and outputting different characteristic restoration diagrams. Therefore, all the feature reduction operations are completed by using convolution kernel operations, a multilayer convolution network is designed to effectively carry out up-sampling processing on the image, and key feature information is reduced on the basis of a feature extraction image, so that accurate target detection is facilitated.

Optionally, the upsampled convolutional layer comprises a first upsampled layer, a first interpolated layer, a second upsampled layer, a second interpolated layer, and a third upsampled layer, wherein: the first up-sampling layer is used for performing up-sampling processing on the feature extraction image by using deconvolution operation; the first interpolation layer is used for carrying out quadruple bilinear interpolation up-sampling operation on the output characteristic diagram of the first up-sampling layer; the second up-sampling layer is used for performing up-sampling processing on the output characteristic diagram of the first up-sampling layer by using deconvolution operation; the second interpolation layer is used for performing double-time bilinear interpolation up-sampling operation on the output characteristic diagram of the second up-sampling layer; and the third up-sampling layer is used for performing up-sampling processing on the output characteristic diagram of the second up-sampling layer by using deconvolution operation. Therefore, the first up-sampling layer, the first interpolation layer, the second up-sampling layer, the second interpolation layer and the third up-sampling layer are arranged, the feature extraction graph is effectively restored, key feature information is restored, and accurate target detection is facilitated.

Optionally, the fused feature layer includes a first fused feature layer, a first downsampling feature layer, a second downsampling feature layer, and a second fused feature layer, where the first fused feature layer is used to fuse the output feature map of the first interpolation layer, the output feature map of the second interpolation layer, and the output feature map of the third upsampling layer by an end-to-end method; the first downsampling feature layer refers to and carries out downsampling processing on the output feature map of the first fusion feature layer by using convolution operation; the second down-sampling feature layer refers to and carries out down-sampling processing on the output feature map of the first down-sampling feature layer by utilizing convolution operation; and the second fused feature layer is output after adding the output feature map of the first downsampling feature layer and the output feature map of the second downsampling feature layer. Therefore, the first fusion feature layer, the first down-sampling feature layer, the second down-sampling feature layer and the second fusion feature layer are arranged, feature fusion is effectively carried out on the feature extraction graph, key feature information is fused, and accurate target detection is facilitated.

In the embodiment of the present invention, step S1022 includes the following specific steps:

inputting the feature extraction image into a first up-sampling layer, enabling the first up-sampling layer to pass through a seventh convolution kernel and setting a twelfth step length, and performing deconvolution up-sampling processing on the feature extraction image to obtain an output feature image of the first up-sampling layer;

inputting the output characteristic diagram of the first up-sampling layer into a first interpolation layer, and performing four-time bilinear interpolation up-sampling operation on the output characteristic diagram of the first up-sampling layer by the first interpolation layer to obtain the output characteristic diagram of the first interpolation layer;

inputting the output characteristic diagram of the first up-sampling layer into a second up-sampling layer, enabling the second up-sampling layer to pass through an eighth convolution kernel and set a thirteenth step length, and performing deconvolution up-sampling processing on the output characteristic diagram of the first up-sampling layer to obtain an output characteristic diagram of the second up-sampling layer;

inputting the output characteristic diagram of the second up-sampling layer into a second interpolation layer, and performing double-time bilinear interpolation up-sampling operation on the output characteristic diagram of the second up-sampling layer by the second interpolation layer to obtain an output characteristic diagram of the second interpolation layer;

inputting the output characteristic diagram of the second up-sampling layer into a third up-sampling layer, enabling the third up-sampling layer to pass through a ninth convolution kernel and set a fourteenth step length, and performing deconvolution up-sampling processing on the output characteristic diagram of the second up-sampling layer to obtain an output characteristic diagram of the third up-sampling layer;

inputting the output characteristic diagram of the first interpolation layer, the output characteristic diagram of the second interpolation layer and the output characteristic diagram of the third up-sampling layer into a first fusion characteristic layer, and fusing the first fusion characteristic layer by a head-to-tail connection method to obtain the output characteristic diagram of the first fusion characteristic layer;

inputting the output feature map of the first fusion feature layer into a first down-sampling feature layer, wherein the first down-sampling feature layer performs down-sampling processing on the output feature map of the first fusion feature layer by utilizing convolution operation through a tenth convolution kernel and setting a fifteenth step length;

inputting the output characteristic diagram of the first down-sampling characteristic layer into a second down-sampling characteristic layer, wherein the second down-sampling characteristic layer obtains the output characteristic diagram of the second down-sampling characteristic layer by using the eleventh convolution kernel for the output characteristic diagram of the first down-sampling characteristic layer and setting a sixteenth step length;

inputting the output characteristic diagram of the first down-sampling characteristic layer and the output characteristic diagram of the second down-sampling characteristic layer into a second fusion characteristic layer, and adding the output characteristic diagram of the first down-sampling characteristic layer and the output characteristic diagram of the second down-sampling characteristic layer and outputting the sum to obtain an output characteristic diagram of the second fusion characteristic layer by the second fusion characteristic layer;

and performing five convolution operations on the output characteristic diagram of the second fusion characteristic layer to obtain five different final output characteristic diagrams. Therefore, all the feature reduction operations are completed by using convolution kernel operations, a multilayer convolution network is designed to effectively carry out up-sampling processing on the image, and key feature information is reduced on the basis of a feature extraction image, so that accurate target detection is facilitated.

Optionally, all of the convolutional layers are followed by a bn layer and a Relu layer. The bn layer uses translation and scaling transformation and rebalancing and rescaling transformation to ensure that the expressive power of the model does not degrade due to normalization, and the expression of the bn layer is as follows:

where μ and σ are the mean and standard deviation according to the image data characteristics, and g and b are the rebalancing parameters that require network learning.

The Relu layer is a nonlinear variation whose functional expression is shown below, acting to mimic the nonlinear expression of neuronal signaling connections in the human brain.

Optionally, the outputting of the feature reduction graph comprises: the output characteristic layer carries out five different convolution operations on the second fusion characteristic layer to obtain five characteristic restoration images; the five characteristic recovery maps comprise a Heatmap output characteristic map, an Angle1Angle2 output characteristic map, a ShortSide output characteristic map, a ShortLongratio output characteristic map and a PointReg output characteristic map, wherein the Heatmap output characteristic map comprises data of a predicted central point; angle1Angle2 output feature maps comprise data predicting a first diagonal Angle and data predicting a second diagonal Angle; the short output characteristic graph is data for predicting the length of the short side; the ShortLongRatio output characteristic diagram is data for predicting the length proportion of the short edge and the diagonal line; the PointReg output feature map is data of the predicted bias points. Thus, the present invention uses five different convolution kernels for the second fused feature layer and different activation functions for the different parts to obtain five part outputs. These are respectively Heaatmap, angle1Angle2, shortSide, shortLongRatio, pointReg. By the mode, five different output characteristic graphs are obtained, and each output characteristic represents different detection frame data information, so that the detection accuracy is effectively improved.

In the embodiment of the present invention, step S102210 includes the following specific steps:

and performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a twelfth convolution kernel, wherein the used activation function is a Sigmoid activation function, and the Heatmap output characteristic diagram is obtained. Wherein the Heatmap output feature map comprises data of the predicted central point.

And performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a thirteenth convolution kernel, wherein the used activation function is a Relu activation function, and obtaining an Angle1Angle2 output characteristic diagram. Wherein, the Angle1Angle2 output characteristic diagram comprises data for predicting a first diagonal Angle and data for predicting a second diagonal Angle.

And performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a fourteenth convolution kernel, wherein the used activation function is a Relu activation function, and obtaining a ShortSide output characteristic diagram. Wherein, the ShortSide output characteristic diagram is data of the predicted short edge length.

And carrying out convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a fifteenth convolution kernel, wherein the used activation function is a Sigmoid activation function, and obtaining a ShortLongRatio output characteristic diagram. Wherein, the ShortSide output characteristic diagram is data of the predicted short edge length.

And performing convolution operation on the output characteristic diagram of the second fusion characteristic layer by using a sixteenth convolution kernel, wherein the used activation function is a Relu activation function, and a PointReg output characteristic diagram is obtained. Wherein, the PointReg output characteristic diagram is data of a predicted bias point.

Thus, the present invention uses five different convolution kernels for the second fused feature layer and different activation functions for the different parts to obtain five part outputs. They are Heatmap, angle1Angle2, shortSide, shortLongRatio, pointReg, respectively. The activation function used by the Heatmap and ShortLongRatio parts is a Sigmoid activation function, and the activation function used by the Angle1Angle2, shortSide and PointReg parts is a Relu activation function. By the mode, five different output characteristic graphs are obtained, and each output characteristic represents different detection frame data information, so that the detection accuracy is effectively improved.

The Sigmoid activation function is a mapping function for mapping a real number to a (0, 1) interval, and is represented as follows:

in a specific embodiment of the present invention, with reference to fig. 4, fig. 4 is a structural diagram of a detection model in the embodiment of the present invention, and a specific structure of the detection model is described by changing ResNet18 into an example. The feature extraction network includes: input Layer I, C1 Layer, C2 Layer, layer11 Layer, layer12 Layer, layer21 Layer, layer22 Layer, layer23 Layer, layer31 Layer, layer32 Layer, layer33 Layer, layer41 Layer, layer42 Layer and Layer43 Layer. The down-sampling convolution Layer comprises a C1 Layer and a C2 Layer, and the fusion convolution Layer comprises a Layer11 Layer, a Layer12 Layer, a Layer21 Layer, a Layer22 Layer, a Layer23 Layer, a Layer31 Layer, a Layer32 Layer, a Layer33 Layer, a Layer41 Layer, a Layer42 Layer and a Layer43 Layer.

The Layer C1 is a first down-sampling Layer, the Layer C2 is a second down-sampling Layer, the Layer11 is a first fusion sublayer, the Layer12 is a second fusion sublayer, the Layer21 is a third fusion sublayer, the Layer22 is a fourth fusion sublayer, the Layer23 is a fifth fusion sublayer, the Layer31 is a sixth fusion sublayer, the Layer32 is a seventh fusion sublayer, the Layer33 is an eighth fusion sublayer, the Layer41 is a ninth fusion sublayer, the Layer42 is a tenth fusion sublayer, and the Layer43 is an eleventh fusion sublayer.

The input layer I is an input image sub-block, and the size of the input image is 3 × 1024. Where 3 is the number of channels of the image, the first 1024 is the height of the image and the second 1024 is the width of the image. The following description of the channels, heights, and widths of the images and feature maps will be in the order described above, and will not be further described.

In the embodiment of the invention, all the feature extraction and feature reduction operations are completed by using the convolution kernel operation. The convolution kernel is of a particular form C H W, where C represents the number of convolution kernels used and H and W represent the sizes of the convolution kernels. The contents in the convolution kernels are parameters, and the parameters are updated and determined through automatic back propagation of the neural network, so that parameter values at different positions among the convolution kernels are not related to the parameters zhi3 among the different convolution kernels. In the following description of this patent, all convolution kernels are described in the form of H × W, and the number of convolution kernels is the same as the number of channels of the convolution layer and the feature map corresponding to the operation portion.

The output characteristic diagram of the C1 layer is a down-sampling characteristic diagram obtained through down-sampling operation, and is used for reducing the height and width of the image by one time and increasing the number of channels of the image. The first convolution kernel used in the present invention is a 5 × 5 convolution kernel, and the step size is set to 2, so as to finally obtain an output feature map of C1 layers with a size of 64 × 512.

The output characteristic diagram of the C2 layer is a characteristic diagram obtained by down-sampling operation and is used for reducing the height and width of the image by one time. The second convolution kernel used in the present invention is a 5 × 5 convolution kernel, and the step size is set to 2, so that the output characteristic map of the C2 layer with a size of 64 × 256 is finally obtained.

The output characteristic diagram of the Layer1 is a characteristic diagram obtained through ordinary sampling, and comprises Layer11 and Layer12, and is used for extracting and using a 'by-pass' method to mix characteristics. According to the method, the output characteristic diagram of the Layer11 is obtained by setting a first step length to be 1 by taking a third convolution kernel used for the output characteristic diagram of the C2 as a3 x 3 convolution kernel; setting a second step length as 1 to obtain an output characteristic diagram of the Layer12, wherein a third convolution kernel used for the output characteristic diagram of the Layer11 is a3 × 3 convolution kernel; and finally, adding the output characteristic diagram of the C2 and the output characteristic diagram of the Layer12 to obtain an output characteristic diagram of a characteristic diagram Layer 1. Wherein, the size of the output characteristic graph of Layer11 is 64 × 256; the output feature size of the convolutional Layer12 is 64 × 256; the output signature size of signature Layer1 is 64 × 256.

The output characteristic diagram of the Layer2 Layer is a characteristic diagram obtained through down sampling, and comprises Layer21, layer22 and Layer23, and is used for reducing the height and width of an image by one time and increasing the number of channels of the image and mixing characteristics by using a 'by-pass' method. According to the method, layer21 is obtained by setting the third step length to be 2 and taking the fourth convolution kernel used for the output characteristic diagram of Layer1 as 3 x 3 convolution kernel; setting a fourth step size to be 1 to obtain an output characteristic diagram of the Layer22 by using a fourth convolution kernel used for the output characteristic diagram of the Layer21 as a3 × 3 convolution kernel; setting a fifth step length to be 2 to obtain an output characteristic diagram of Layer23 by taking a fourth convolution kernel used by Layer1 as a3 x 3 convolution kernel; and finally, adding the output characteristic diagram of Layer22 and the output characteristic diagram of Layer23 to obtain the output characteristic diagram of Layer 2. Wherein the output signature size of Layer21 is 128 x 128; the output signature size of Layer22 is 128 x 128; the output signature size of Layer23 is 128 x 128; the output signature size of Layer2 is 128 x 128.

The output feature map of the Layer3 is a feature map obtained by down sampling, which comprises Layer31, layer32 and Layer33, and is used for reducing the height and width of the image by one time and increasing the number of channels of the image and mixing features by using a 'by-pass' method. The fifth convolution kernel used for the output characteristic diagram of Layer2 is 3 × 3 convolution kernel, and the sixth step length is set to be 2 to obtain the output characteristic diagram of Layer 31; setting a seventh step size to be 1 to obtain an output characteristic diagram of Layer32, wherein a fifth convolution kernel used for Layer31 is a3 × 3 convolution kernel; setting an eighth step length as 2 to obtain an output characteristic diagram of Layer33, wherein a fifth convolution kernel used by Layer2 is a3 × 3 convolution kernel; and finally, adding the output characteristic diagram of Layer32 and the output characteristic diagram of Layer33 to obtain the output characteristic diagram of Layer 3. Wherein the output feature size of Layer31 is 256 × 64; the output signature size of Layer32 is 128 x 128; the output signature size of Layer33 is 256 × 64; the output signature size of Layer3 is 256 x 64.

The output feature map of the Layer4 is a feature map obtained by down sampling, which comprises Layer41, layer42 and Layer43, and is used for reducing the height and width of the image by one time and increasing the number of channels of the image and mixing features by using a 'by-pass' method. In the invention, the sixth convolution kernel used for Layer3 is a3 × 3 convolution kernel, and the ninth step length is set to be 2 to obtain an output characteristic diagram of Layer 41; setting a tenth step size to be 1 by taking a sixth convolution kernel used for the output feature map of Layer41 as a3 × 3 convolution kernel to obtain an output feature map of Layer 42; setting the eleventh step length as 2 for the sixth convolution kernel of the output feature map of Layer3 as a3 × 3 convolution kernel to obtain Layer43; and finally, adding the output characteristic diagram of Layer42 and the output characteristic diagram of Layer43 to obtain the output characteristic diagram of Layer 4. Wherein the output signature size of Layer41 is 512 x 32; the output signature size of Layer42 is 128 x 128; the output signature size of Layer43 is 512 × 32; the output signature size of Layer4 is 512 x 32.

Referring to fig. 4, a specific structure of the feature reduction network is still described by taking ResNet18 as an example, and the feature reduction network includes: the multilayer film comprises an R1 layer, a B1 layer, an R2 layer, a B2 layer, an R3 layer, a T1 layer, an M2 layer and a T2 layer. The up-sampling convolutional layer comprises an R1 layer, a B1 layer, an R2 layer, a B2 layer and an R3 layer, the fusion characteristic layer comprises a T1 layer, an M1 layer and an M2 layer, and the output characteristic layer comprises a T2 layer.

The device comprises a base, a plurality of layers and a plurality of layers, wherein the R1 layer is a first up-sampling layer, the B1 layer is a first interpolation layer, the R2 layer is a second up-sampling layer, the B2 layer is a second interpolation layer, the R3 layer is a third up-sampling layer, the T1 layer is a first fusion characteristic layer, the M1 layer is a second down-sampling characteristic layer, and the T2 layer is a second fusion characteristic layer.

The output characteristic diagram of the R1 layer is a characteristic diagram obtained by up-sampling operation and is used for reducing the image depth and improving the resolution. The seventh convolution kernel used in the present invention is a3 × 3 convolution kernel, the twelfth step is set to 1, and the input feature extraction graph is operated by deconvolution, so that the feature output graph of the R1 layer with the size of 256 × 64 is finally obtained.

The output feature map of the B1 layer is obtained by four-time bilinear interpolation up-sampling operation and is used for improving the resolution of the feature map under the condition of keeping the depth of the feature map. According to the invention, the output characteristic diagram of the R1 layer is operated by using a quadruple bilinear interpolation method, so that the output characteristic diagram of the B1 layer with the size of 256 × 256 is obtained.

The output characteristic diagram of the R2 layer is a characteristic diagram obtained through an up-sampling operation and is used for reducing the image depth and improving the resolution. The eighth convolution kernel used in the present invention is a3 × 3 convolution kernel, the thirteenth step is set to 1, and the feature extraction maps of the R1 layers are operated by deconvolution, so that the feature extraction maps of the R2 layers with a size of 128 × 128 are finally obtained.

The output feature map of the B2 layer is obtained by twice bilinear interpolation up-sampling operation, and is used for improving the resolution of the feature map under the condition of keeping the depth of the feature map. The invention operates the output characteristic diagram of the R2 layer by using a double-time bilinear interpolation method to obtain the output characteristic diagram of the B2 layer with the size of 128 × 256.

The output characteristic diagram of the R3 layer is a characteristic diagram obtained by up-sampling operation and is used for reducing the image depth and improving the resolution. The ninth convolution kernel used in the present invention is a3 × 3 convolution kernel, the fourteenth step is set to be 1, and the deconvolution operation is performed on the output feature maps of the R2 layers, so as to finally obtain the output feature maps of the R3 layers with a size of 64 × 256.

The T1 layer is a fusion layer obtained through constant value fusion and is used for combining information in different feature maps under the condition that the resolution ratio is kept constant. And the T1 layer fuses the output characteristic diagram of the B1 layer, the output characteristic diagram of the B2 layer and the output characteristic diagram of the R3 layer by a head-to-tail connection method to obtain the output characteristic diagram of the T1 layer with the size of 448 x 256.

The output characteristic diagram of the M1 layer is a characteristic diagram obtained through common sampling and used for extracting more important information in the T1 layer and further removing interference. The tenth convolution kernel used in the present invention is 3 × 3 convolution kernel, the fifteenth step is set to 1, and the convolution operation is performed on the output feature map of the T1 layer, so as to finally obtain the output feature map of the M1 layer with the size of 128 × 256.

The output characteristic diagram of the M2 layer is a characteristic diagram obtained through common sampling and is used for extracting the position information of the ship target. The eleventh convolution kernel used in the invention is a3 × 3 convolution kernel, the sixteenth step size is set to be 1, and the convolution operation is performed on the output characteristic diagram of the M1 layer, so that the output characteristic diagram of the M2 layer with the size of 1 × 256 is finally obtained.

The output characteristic diagram of the T2 layer is a position information reinforced characteristic diagram generated through broadcast summation, and the characteristic diagram strengthens the sensitivity to the position information of the ship target. The invention adds the output feature map of M1 to the output feature map of M2 layer by layer to obtain an output feature map of T2 with a size of 128 × 256.

The R1 layer, the R2 layer, the R3 layer, the B1 layer, the B2 layer and the T1 layer form sub-pixel multi-scale feature fusion, and the T1 layer, the T2 layer, the M1 layer and the M2 layer form a ship target positioning strengthening mechanism.

Specifically, in sub-pixel multi-scale feature fusion, bilinear interpolation is an extension of linear interpolation on a two-dimensional rectangular grid, and is used for interpolating bivariate functions (expressed as height and width in an image). The core idea is to perform linear interpolation in two directions to obtain the value of the target point from the values of the surrounding points. Because the B1 convolutional layer and the B2 convolutional layer are directly obtained through bilinear interpolation, B1 and B2 can directly obtain target features hidden by sub-pixel parts in the R1 convolutional layer and the R2 convolutional layer without increasing extra calculation amount, and such features are mostly edge features of the target. Therefore, the design of the invention can better judge the edge of the target to be detected under the condition that the ship targets are denser, and the edge is often the most important judgment standard of the target detection by a neural network method. Thus, in a neural network, a larger signature tends to be suitable for detecting smaller targets, and a smaller signature tends to be suitable for detecting larger targets. The invention utilizes the above properties to construct a network structure, wherein B1 is a small feature map of edge feature enhancement, B2 is a medium feature map of edge feature enhancement, and R3 is a large feature map of edge feature sufficiency. After the three characteristic graphs are fused by a first-order joining method without data loss, the structure provided by the invention can better detect the condition of large size change.

Specifically, in a ship target positioning and strengthening mechanism, M1 is a convolution layer formed by extracting effective features of the fusion layer by using convolution operation, and plays roles in further refining the features and removing interference. The content contained in the M2 is the position information of the ship target, namely, the corresponding area of the ship target on the input sub-block is predicted. In this characteristic diagram, the portion with the ship target approaches 1, and the portion without the ship target approaches 0. After the M1 is added with the M2 layer by layer, the method can play a role in emphasizing and highlighting the part of the convolutional layer related to the ship target position information characteristics. And under the condition that the ship targets to be detected are sparsely distributed or unevenly distributed, the ship targets still have a better detection effect. Compared with other existing algorithms, the structure designed aiming at the characteristics of large size change, extremely dense targets and different sparsity of the ship target provided by the invention can greatly improve the detection capability of the ship target with the characteristics. Meanwhile, the structure provided by the invention can not increase learning parameters and learning amount, can not influence the network training speed and the detection speed, and can not increase the overstaffed degree of the network.

And the twelfth convolution kernel, the thirteenth convolution kernel, the fourteenth convolution kernel, the fifteenth convolution kernel and the sixteenth convolution kernel used for the output feature map of the T2 layer are all 3-by-3 convolution kernels, and different activation functions are used for different parts to obtain five-part outputs. They are respectively a Heatmap output characteristic diagram, an Angle1Angle2 output characteristic diagram, a ShortSide output characteristic diagram, a ShortLongratio output characteristic diagram and a PointReg output characteristic diagram. Wherein, the activation function used by the Heatmap output characteristic diagram and the ShortLongRatio output characteristic diagram is a Sigmoid activation function, and the activation function used by the Angle1Angle2 output characteristic diagram, the ShortSide output characteristic diagram and the PointReg output characteristic diagram is a Relu activation function, wherein:

the Heatmap output characteristic diagram is used for predicting the positions of the centers of the targets in different categories, the specific expression form of the characteristic diagram is 1 × 256, and the meaning of the predicted value is the probability that a certain point is the center point of the ship target;

angle1Angle2 output feature map is used for predicting the Angle formed by two diagonal lines of the oblique rectangle at the position of the labeling target and the positive direction of the x-axis of the image. The specific expression is 2 × 256, wherein the first layer of depth is a fixed predicted smaller angle and the second layer of depth is a fixed predicted larger angle;

the short side output feature graph is used for predicting the length of the short side of an oblique rectangle at the position of a marked target, the specific expression form of the short side output feature graph is 1 × 256, and the inclusion value means that the ship target is the length of the short side generated by mapping on the 256 feature graphs in height and width;

the short longratio output feature map is used for predicting the proportion of the length of the short side of the oblique rectangular box at the position of the marked target to the length of the diagonal line, and the specific form of the short longratio output feature map is 1 × 256;

the PointReg output characteristic diagram is used for solving the problem that the position of a ship target center is shifted due to the fact that a Heatmap cannot predict sub-pixels, the predicted value of the PointReg output characteristic diagram is used for compensating the predicted center coordinates in the Heatmap, the specific form of the PointReg output characteristic diagram is 2 x 256, the first layer of depth is the shift of the target center in the height direction, and the second layer of depth is the shift of the target center in the width direction.

Besides the network change of ResNet18, the invention also provides a network change of ResNet50, a network change of ResNet101, a network change of dla34 and a network change of Hourglass, and the improved structure is similar to the change of the network change of ResNet18, and is not described again here. Aiming at the requirements of tasks on different detection precision and detection speed, the invention provides a variable feature extraction and feature restoration partial network. The invention provides five detection models, which comprise: the corresponding relations between the five detection models, the detection precision and the detection speed are shown in the table 1,

table 1 shows five detection models and their corresponding detection accuracy and detection speed. In the table, the evaluation marking baseline of the detection precision is changed to ResNet18, the evaluation marking baseline of the detection speed is changed to Hourglass, and the detection speed and the detection precision of the method based on the five different characteristic extraction/characteristic reduction methods are superior to typical two-stage detection algorithms such as Faster R-CNN, FPN and the improved versions thereof.

Table 1 five detection models and corresponding detection precision and detection speed thereof

	Resnet18 modification	ResNet50 improvement	Resnet101 modification	dla34 modified	Hourglass improvement
						Detection accuracy	--	Is higher than	Is very high	Is very high	Extreme high
Detecting speed	Extremely fast	Quickly get rid of	Is quicker	Is quicker	--

In the following steps, for convenience of description, the method of using ResNet18 as the feature extraction/feature reduction portion is described in the present invention, and other methods similar to the method using ResNet18 are used, and therefore, the description thereof is not separately provided.

In the embodiment of the present invention, step S103 includes the following specific steps:

when the central point of the detection frame is predicted, the first network prediction value is a Heatmap output characteristic diagram, the first label is a first probability diagram, and the loss function is an L2 loss function, wherein in the first probability diagram, the coordinate of the central point of the ship target is 1, and the other parts are 0. Therefore, a loss function is effectively selected according to the characteristics of the central point;

when the first diagonal Angle of the detection frame is predicted, the second network prediction value is an Angle1Angle2 output characteristic diagram, the second label is a value of an included Angle formed by one diagonal line of an inclined rectangle formed by coordinates of a ship target and a positive direction of an x-axis of an image coordinate system, and the loss function is an L2 loss function. Therefore, a loss function is effectively selected according to the characteristics of the first diagonal angle;

when the angle of the second diagonal line of the detection frame is predicted, the third network prediction value is a ShortSide output characteristic diagram, the third label is a value of an included angle formed by the other diagonal line of an inclined rectangle formed by the coordinates of the ship target and the positive direction of the x-axis of the image coordinate system, and the loss function is an L2 loss function. Therefore, a loss function is effectively selected according to the characteristics of the second diagonal angle;

when the length of the short side of the detection frame is predicted, the fourth network prediction value is a ShortLongRatio output characteristic diagram, the fourth label is the length of the short side of an oblique rectangle formed by coordinates of a ship target, and the loss function is an L1 loss function. Therefore, a loss function is effectively selected according to the characteristics of the length of the short side;

when the offset point of the detection frame is predicted, the fifth network prediction value is a PointReg output characteristic diagram, the fifth label is a difference value between the coordinate of the central point of the ship target and the coordinate of the mapping central point of the training set image, and the loss function is an L2 loss function, wherein the coordinate of the mapping central point is the coordinate of the central point obtained after the coordinate of the central point detected by the T2 layer is mapped to the training set image. Therefore, the loss function is effectively selected according to the characteristics of the bias point.

Therefore, for different situations, different loss functions are used for training the network, and the labels and the predicted values of the network are different. The method comprises the steps that a Heatmap output characteristic diagram is used for predicting positions of centers of targets of different categories, an Angle1Angle2 output characteristic diagram is used for predicting angles formed by two diagonals of a diagonal rectangle at the position of a marked target and the positive direction of an image x axis, a ShortSide output characteristic diagram is used for predicting the length of a short side of the diagonal rectangle at the position of the marked target, a ShortLongRatio output characteristic diagram is used for predicting the proportion of the length of the short side of a diagonal rectangle frame at the position of the marked target and the length of the diagonal, a PointReg output characteristic diagram is used for solving the problem that the position of the center of a ship target is deviated due to the fact that the Heatmap output characteristic diagram cannot predict sub-pixels, and the predicted value is used for compensating the predicted center coordinates in the Heatmap output characteristic diagram.

In the embodiment of the invention, the predicted central point of the detection box is a probability value, and the value range of the probability value is between 0 and 1; the predicted bias point of the detection frame is in a sub-pixel level, and the value range of the bias point is also between 0 and 1; the predicted first diagonal angle and the second diagonal angle of the detection frame are in a radian system, and the value range of the first diagonal angle and the second diagonal angle is between 0 and pi; the predicted short side length of the detection frame is in a pixel level, and the value range is not fixed; the ratio of the short side of the detection frame to the short side of the diagonal line to the length of the diagonal line is predicted to be unitless, and the value range of the ratio is between 0 and 0.707.

According to the value range, the invention uses the L1 loss function for the short edge length of the detection frame, and uses the L2 loss function for the detection frame center point, the detection frame center point sub-pixel bias, the detection frame dual diagonal angle, the detection frame short edge and diagonal length ratio, and the two loss function forms are as follows:

l1 loss function:

l2 loss function:

wherein y is the sum of L1 loss function and L2 loss function

Label and network prediction values, respectively.

And for the length of the short edge of the detection frame, the label is the length of the short edge of an oblique rectangle formed by the target coordinates of the ships in the training set, and the network predicted value is ShortSide.

For the central point of the detection frame, the label is a probability graph, the probability of the central point coordinate part corresponding to the ship target in the probability graph is 1, and the other parts are 0; the predicted value of the network is Heatmap.

For the detection frame center point sub-pixel offset, the label is the difference value between the ship target center point coordinate in the image sub-block and the ship target center point after the T2 convolution layer is mapped to the image sub-block (namely the center point coordinate is multiplied by 4). Specifically, since there is no fractional number of pixels, one is with a center point coordinate of C (x) in the image sub-block _c ,y _c ) The ship target of (1), the coordinates in the T2 layer are

When x is _c And y _c If the coordinate value of (2) is not a multiple of 4, the coordinate value corresponding to the T2 convolution layer becomes

In this case, because the PointReg resolution of the prediction part is the same as that of the T2 convolutional layer, the coordinates of the center point of the ship target after the prediction part is mapped back to the image sub-block are ^ greater than or equal to>

The precision of the fractional part is lost, and the lost part is a label for detecting the sub-pixel offset of the central point of the frame.

For the double diagonal angles of the detection frame, the label is the value alpha and beta of the included angle formed by two diagonals of an oblique rectangle formed by the target coordinates of the ships in the training set and the positive direction of the x axis of the image coordinate system (as shown in figure 3); the predicted value is Angle1Angle2.

For the proportion of the short edge of the detection frame to the length of the diagonal line, the label is the proportion of the length of the short edge of an oblique rectangle formed by training concentrated ship target coordinates and the length of the diagonal line of the oblique rectangle; the predicted value is ShortLongRatio.

In the present example, 1.25X 10 is used ^-4 The learning rate of the training system is increased, and 64 image sub-blocks are simultaneously input and trained during each training; in terms of loss functions, the L1 loss function and the L2 loss function mentioned above are adopted as evaluation criteria; in the aspect of an optimization method, an Adam optimization method is adopted.

According to the method, the training is iterated through 360 large loops (epochs), one large loop (epoch) meaning that all the images in the training set are trained once across. Wherein the learning rate is reduced to 1.25 × 10 at the 120 th major cycle ^-5 In the 200 th major cycle, the learning rate is reduced to 1.25 × 10 ^-6 In the 300 th major cycle, the learning rate is reduced to 1.25 × 10 ^-7 . In the training process, the generated network model is stored every time five major cycles of training are completed, and finally the model with the best performance effect on the test set is selected as the final model. The evaluation process of the performance effect in the test set is the same as the process described in the embodiment shown below.

In the embodiment of the present invention, in the training process of the detection model based on the dual angle regression, the computer adopts the following configuration: processor with a memory for storing a plurality of data

Xeon Gold 6152, video card NVIDIA Telsa V10016GB by 4 and memory 256GB, thereby effectively completing the training.

According to the test model training method based on the double-angle regression, provided by the invention, the test model is trained, the test frame data is extracted, the key points of the target are reflected, the problems of missed detection and false detection caused by large size change of the ship target and different ship density degrees are solved, and meanwhile, the real-time requirement is met.

Fig. 5 is a schematic structural diagram of a detection model training apparatus 800 based on bi-angle regression according to an embodiment of the present invention, including an obtaining unit 801, a processing unit 802, and a training unit 803.

An obtaining unit 801, configured to obtain a training set image including annotation information, where the annotation information includes actual coordinate data of a plurality of ship targets;

a processing unit 802, configured to input the training set image into the detection model, and determine predicted coordinate data of each detection frame, where the predicted coordinate data includes a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short side length, a predicted short side-to-diagonal length ratio, and a predicted bias point of the detection frame; the system is also used for determining the value of a loss function according to the actual coordinate data and the predicted coordinate data;

and a training unit 803, configured to adjust parameters of the detection model according to the value of the loss function until a convergence condition is satisfied, and complete training of the detection model.

According to the test model training device based on the double-angle regression, provided by the invention, the test frame data is extracted through the test model, the key points of the target are reflected, the problems of missed detection and false detection caused by large size change of the ship target and different ship density degrees are solved, and meanwhile, the real-time requirement is met.

In another embodiment of the present invention, a detection method based on dihedral regression is provided, and with reference to fig. 6, fig. 6 is a schematic flow chart of the detection method based on dihedral regression according to the embodiment of the present invention, which includes steps S201 to S204.

In step S201, an image to be detected is obtained, and the image to be detected is preprocessed to obtain a plurality of image sub-blocks. Therefore, the image to be tested is effectively obtained, and the image to be tested is preprocessed to obtain the image sub-block. The image sub-blocks obtained after segmentation are favorable for being input into the detection model.

In step S202, the image sub-blocks are input to the detection model, and the predicted coordinate data of the detection frame is determined, the detection frame is used for framing the predicted ship target, the predicted coordinate data includes the predicted central point of the detection frame, the predicted first diagonal angle, the predicted second diagonal angle, the predicted short edge length, the predicted short edge-diagonal length ratio and the predicted offset point, and the detection model is obtained by training using the detection model training method based on the double-angle regression as described above.

In step S203, the capture position of the detection frame on the image sub-block is determined according to the predicted coordinate data. Therefore, the position of the detection frame on the image sub-block is effectively acquired.

In step S204, an optimal detection frame is determined according to the acquisition position. Therefore, the method extracts the predicted coordinate data of the detection frame through the detection model, reflects the key point information of the target, solves the problem of inaccurate prediction of the detection frame caused by long and narrow ship targets, large inclination angle and large size change through extracting various detection frame data, finds the predicted coordinate data of the detection frame through the detection model, and avoids the defect of low detection speed. In addition, the invention screens the acquisition position, thereby increasing the accurate precision. In conclusion, the detection method based on the double-angle regression extracts various detection frame data through the detection model, so that the accuracy of the detection target is greatly improved, the real-time detection of the ship target is achieved at the detection speed, and the requirements of high accuracy and strong real-time performance are met.

In the embodiment of the present invention, referring to fig. 7, fig. 7 is a schematic flowchart illustrating a process of determining a model according to the embodiment of the present invention, and step S202 includes steps S2021 to S2023.

In step S2021, the requirements of the actual detection task on the detection accuracy and the detection speed are acquired.

In step S2022, the correspondence relationship between the detection model and the detection accuracy and the correspondence relationship between the detection model and the detection speed are searched for based on the requirements of the actual detection task for the detection accuracy and the detection speed, and the detection model that most closely matches the requirements of the actual detection task is selected.

In step S2023, the image sub-blocks are input to the best matching detection model, and the predicted coordinate data of the detection frame is determined. Therefore, aiming at the requirements of tasks on different detection precision and detection speed, the invention provides a variable feature extraction and feature reduction partial network. Optionally, the expression form of the correspondence between the detection model and the detection accuracy and the correspondence between the detection model and the detection speed is table 1, that is, the relationship between the detection model and the detection accuracy and the detection speed is searched in table 1, and then the selection is performed.

In the embodiment of the present invention, referring to fig. 8, fig. 8 is a schematic flowchart illustrating a process of determining an acquisition position according to the embodiment of the present invention, and step S203 includes step S2031 to step S2032.

In step S2031, the detection position coordinates where the detection frame is located in the fusion feature layer of the detection model are determined based on the predicted coordinate data. Optionally, the detection position coordinates include coordinates of a first detection vertex, coordinates of a second detection vertex, coordinates of a third detection vertex, and coordinates of a fourth detection vertex.

In step S2032, the coordinates of the capturing position where the detection frame is located on the image sub-block are determined according to the detection position coordinates. Therefore, the position of the ship target on the image sub-block is effectively determined by using a Heatmap output characteristic diagram, an Angle1Angle2 output characteristic diagram, a ShortSide output characteristic diagram, a ShortLongratio output characteristic diagram, a predicted central point in a PointReg output characteristic diagram, an Angle formed by the target detection frame and the X-axis of the image in the positive direction, the length of the short edge of the target detection frame, the ratio of the short edge of the target detection frame to the short edge of the diagonal line to the length of the diagonal line and the target central point offset.

In the embodiment of the present invention, step S2031 specifically includes the following steps:

judging whether the predicted first diagonal angle and the predicted second diagonal angle meet angle preset conditions or not;

and if so, determining the coordinates of the first detection vertex, the second detection vertex, the third detection vertex and the fourth detection vertex according to the predicted coordinate data. Therefore, after the central point of the ship target is determined, the coordinates of four corner points A, B, C and D of the prediction inclined rectangle can be obtained according to the angle formed by the target detection frame and the X-axis of the image in the positive direction, the short edge length of the target detection frame, the proportion of the short edge of the target detection frame to the short edge of the diagonal line to the length of the diagonal line and the target central point offset.

Alternatively, if the first diagonal angle is predicted to be greater than 90 degrees and the second diagonal angle is predicted to be less than 90 degrees, or the first diagonal angle is predicted to be less than 90 degrees and the second diagonal angle is predicted to be greater than 90 degrees, the coordinates of the first detected vertex, the coordinates of the second detected vertex, the coordinates of the third detected vertex, and the coordinates of the fourth detected vertex are calculated by the following formulas:

A_x＝(O_x+b_x)-(l/k)/2*cos(α)

A_y＝(O_y+b_y)+(l/k)/2*sin(α)

B_x＝(O_x+b_x)+(l/k)/2*cos(π-β)

B_y＝(O_y+b_y)+(l/k)/2*sin(π-β)

C_x＝(O_x+b_x)+(l/k)/2*cos(α)

C_y＝(O_y+b_y)-(l/k)/2*sin(α)

D_x＝(O_x+b_x)-(l/k)/2*cos(π-β)

D_y＝(O_y+b_y)-(l/k)/2*sin(π-β)

if the first diagonal angle is predicted to be greater than 90 degrees and the second diagonal angle is predicted to be greater than 90 degrees, calculating the coordinates of the first detection vertex, the coordinates of the second detection vertex, the coordinates of the third detection vertex and the coordinates of the fourth detection vertex by the following formulas:

A_x＝(O_x+b_x)+(l/k)/2*cos(π-α)

A_y＝(O_y+b_y)+(l/k)/2*sin(π-α)

B_x＝(O_x+b_x)+(l/k)/2*cos(π-β)

B_y＝(O_y+b_y)+(l/k)/2*sin(π-β)

C_x＝(O_x+b_x)-(l/k)/2*cos(π-α)

C_y＝(O_y+b_y)-(l/k)/2*sin(π-α)

D_x＝(O_x+b_x)-(l/k)/2*cos(π-β)

D_y＝(O_y+b_y)-(l/k)/2*sin(π-β)

if the first diagonal angle is predicted to be smaller than 90 degrees and the second diagonal angle is predicted to be smaller than 90 degrees, calculating the coordinates of the first detection vertex, the coordinates of the second detection vertex, the coordinates of the third detection vertex and the coordinates of the fourth detection vertex by the following formulas:

A_x＝(O_x+b_x)-(l/k)/2*cos(β)

A_y＝(O_y+b_y)+(l/k)/2*sin(β)

B_x＝(O_x+b_x)+(l/k)/2*cos(α)

B_y＝(O_y+b_y)-(l/k)/2*sin(α)

C_x＝(O_x+b_x)+(l/k)/2*cos(β)

C_y＝(O_y+b_y)-(l/k)/2*sin(β)

D_x＝(O_x+b_x)-(l/k)/2*cos(α)

D_y＝(O_y+b_y)+(l/k)/2*sin(α)

wherein A _ x represents an abscissa of the first detected vertex, A _ y represents an ordinate of the first detected vertex, B _ x represents an abscissa of the second detected vertex, B _ y represents an ordinate of the second detected vertex, C _ x represents an abscissa of the third detected vertex, C _ y represents an ordinate of the third detected vertex, D _ x represents an abscissa of the fourth detected vertex, D _ y represents an ordinate of the fourth detected vertex, O _ x represents an abscissa of the predicted center point, O _ y represents an ordinate of the predicted center point, B _ x represents an abscissa of the predicted bias point, B _ y represents an ordinate of the predicted bias point, α represents a predicted first diagonal angle, β represents a predicted second diagonal angle, and k represents a predicted short side to diagonal length ratio.

Therefore, when the larger angle in the dual diagonal angles is larger than 90 degrees and the smaller angle is smaller than 90 degrees, the coordinates of the four corner points A, B, C and D of the predicted diagonal rectangle can be obtained according to the diagonal angles and the relevant information; when two angles in the dual diagonal angles are larger than 90 degrees, the coordinates of four corner points A, B, C and D of the predicted diagonal rectangle can be obtained according to the diagonal angles and the relevant information; when two angles in the double diagonal angles are smaller than 90 degrees, the coordinates of the four corner points A, B, C and D of the predicted oblique rectangle can be obtained according to the diagonal angles and the relevant information at the moment.

In the embodiment of the present invention, referring to fig. 9, fig. 9 is a schematic flowchart illustrating a process of determining a collection vertex according to the embodiment of the present invention, and step S2032 specifically includes step S20321 to step S20322.

In step S20321, a first constant is determined based on a ratio of the resolution of the image sub-blocks to the resolution of the output feature map of the fused feature layer in the detection model. Therefore, the accurate coordinate proportion is determined by taking the output characteristic diagram of the fusion characteristic layer as a standard.

In step S20322, the coordinates of the first detection vertex, the coordinates of the second detection vertex, the coordinates of the third detection vertex, and the coordinates of the fourth detection vertex are multiplied by a first constant, respectively, to determine the corresponding coordinates of the first collection vertex, the second collection vertex, the third collection vertex, and the fourth collection vertex. Therefore, the predicted ship target is all based on the feature reduction graph, and then all coordinates are multiplied by a first constant to obtain the coordinates of the ship target on the image sub-blocks divided by the remote sensing image.

In the embodiment of the present invention, referring to fig. 10, fig. 10 is a schematic flow chart of the screening detection frame in the embodiment of the present invention, and step S204 specifically includes step S2041 to step S2042.

In step S2041, determining coordinates of the detection frame on the original remote sensing satellite acquired image to which the detection frame belongs according to the names of the image sub-blocks, wherein the name of each image sub-block includes image sub-block coordinate information of each image sub-block on the original remote sensing satellite acquired image;

in step S2042, a non-maximum value suppression method is used to select an optimal detection frame by screening, wherein the non-maximum value suppression method is used to select the optimal detection frame according to the confidence score and the intersection ratio of the prediction frame. Therefore, in order to find the detection frame which meets the requirements most and has the best detection effect, after one remote sensing image original image with all prediction results is obtained through synthesis, all the detection frames corresponding to the image are analyzed and screened. In the step, the invention uses non-maximum value inhibition as an analysis screening method, and effectively screens out an optimal detection frame.

Optionally, the image sub-blocks are named: naming an original remote sensing satellite acquired image to which the image subblock belongs, arranging an upper left corner point of the image subblock on an x-axis coordinate of the original remote sensing satellite acquired image to which the image subblock belongs and arranging an upper left corner point of the image subblock on a y-axis coordinate of the original remote sensing satellite acquired image to which the image subblock belongs, wherein the acquisition position comprises a first abscissa and a first ordinate of a detection frame on the image subblock, and the specific operation of the step S2051 comprises the following steps of: adding the first abscissa and the x-axis coordinate of the point at the upper left corner of the image sub-block in the original remote sensing satellite acquired image to obtain the abscissa of the detection frame in the original remote sensing satellite acquired image; and adding the first vertical coordinate and the y-axis coordinate of the point at the upper left corner of the image sub-block in the original remote sensing satellite acquired image to obtain the vertical coordinate of the detection frame in the original remote sensing satellite acquired image. Therefore, by using a simple naming method, the image sub-blocks belonging to the same satellite remote sensing image are synthesized, and the coordinates of the detection frame on the original remote sensing satellite acquisition image are effectively obtained.

Specifically, referring to fig. 11, fig. 11 is a schematic diagram illustrating a relationship between the image sub-block and the original remote sensing image according to the embodiment of the present invention, and as shown in fig. 11, O (0, 0) is an upper left point of the original remote sensing image and is considered as a relative origin; a (x) _a ,y _a ) Is the upper left point of the image sub-block, and the coordinate of the image sub-block on the original remote sensing image is (x) _a ,y _a )；C(x _c ,y _c ) The coordinates of the central point of the ship target predicted in the image sub-blocks are (x) relative to the coordinates of the image sub-blocks _c ,y _c ) Then, the coordinates of the predicted ship target on the original remote sensing image are obtained as follows:

T(x _t ,y _t )＝(x _a ,y _a )+(x _c ,y _c )

according to the detection method based on the double-angle regression, the detection frame data is extracted through the detection model, the key points of the target are reflected, the problems of missed detection and false detection caused by large size change of the ship target and different ship concentration degrees are solved, the real-time requirement is met, and the detection accuracy is further improved through the optimal selection frame.

Fig. 12 is a schematic structural diagram of a detection apparatus 900 based on dual angle regression according to an embodiment of the present invention, which includes an obtaining unit 901, a processing unit 902, and a screening unit 903, where:

an acquisition unit 901: the method comprises the steps of obtaining an image to be detected;

the processing unit 902: the image preprocessing module is used for preprocessing an image to be tested to obtain an image subblock; the image subblocks are input into a detection model, prediction coordinate data of a detection frame are determined, the detection frame is used for framing a predicted ship target, the prediction coordinate data comprise a prediction central point of the detection frame, a prediction first diagonal angle, a prediction second diagonal angle, a prediction short edge length, a prediction short edge and diagonal length ratio and a prediction bias point, and the detection model is obtained by training through the detection model training method based on double-angle regression; the acquisition position of the detection frame on the image sub-block is determined according to the predicted coordinate data;

a screening unit 903: and obtaining an optimal detection frame according to the acquisition position.

According to the detection device based on the double-angle regression, provided by the invention, the detection frame data is extracted through the detection model, the key points of the target are reflected, the problems of missed detection and false detection caused by large size change of the ship target and different ship concentration degrees are solved, the real-time requirement is also met, and the detection accuracy is further improved through the optimal selection frame.

Yet another embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for training a detection model based on bi-angle regression as described above, or implements the method for detecting based on bi-angle regression as described above.

The computer-readable storage medium provided by the invention extracts the data of the detection frame through the detection model, reflects the key points of the target, solves the problem of missed detection and false detection caused by large size change of the ship target and different ship concentration degrees, also meets the real-time requirement, and further increases the detection accuracy through the optimal selection frame.

Although the present disclosure has been described above, the scope of the present disclosure is not limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present disclosure, and such changes and modifications will fall within the scope of the present invention.

Claims

1. A detection model training method based on double-angle regression is characterized by comprising the following steps:

inputting the training set image into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, and the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point of the detection frame;

adjusting parameters of the detection model according to the value of the loss function until a convergence condition is met so as to finish training the detection model;

the detection model comprises a feature extraction network and a feature restoration network; the method for inputting the training set images into the detection model comprises the following steps:

inputting the training set images into the feature extraction network, and determining a feature extraction graph, wherein the feature extraction graph comprises initial feature data of the detection frame;

inputting the feature extraction graph into the feature restoration network, and determining a feature restoration graph, wherein the feature restoration graph comprises the predicted coordinate data of the detection frame;

moreover, the feature extraction network comprises in sequence: the system comprises a down-sampling convolutional layer and a fusion convolutional layer, wherein the down-sampling convolutional layer is used for performing down-sampling processing on the training set image, and the fusion convolutional layer is used for performing down-sampling processing on an output characteristic diagram of the down-sampling convolutional layer and performing characteristic mixing.

2. The method of training a bi-angular regression-based detection model of claim 1, wherein the training set images comprise a plurality of image sub-blocks; the method for acquiring the training set image containing the labeling information comprises the following steps:

3. The method of training a bi-angular regression-based detection model of claim 2, wherein the fixed resolution determination process comprises:

4. The method of claim 1, wherein the fused convolutional layer sequentially comprises: a first fused convolutional layer, a second fused convolutional layer, a third fused convolutional layer, and a fourth fused convolutional layer, wherein:

the first fusion convolutional layer comprises a first fusion sublayer and a second fusion sublayer, and the first fusion sublayer and the second fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; determining an output characteristic diagram of the first fused convolutional layer according to the output characteristic diagram of the downsampled convolutional layer and the output characteristic diagram of the second fused sublayer;

the second fusion convolutional layer comprises a third fusion sublayer, a fourth fusion sublayer and a fifth fusion sublayer, and the third fusion sublayer and the fourth fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the fifth fusion sublayer is used for performing down-sampling processing on the output characteristic diagram of the first fusion convolutional layer by utilizing convolution operation; determining an output characteristic diagram of the second fused convolutional layer according to the output characteristic diagram of the fourth fused sublayer and the output characteristic diagram of the fifth fused sublayer;

the third fusion convolution layer comprises a sixth fusion sublayer, a seventh fusion sublayer and an eighth fusion sublayer, and the sixth fusion sublayer and the seventh fusion sublayer are used for performing downsampling processing by sequentially utilizing convolution operation; the eighth fusion sublayer is used for performing downsampling processing on the output characteristic diagram of the second fusion convolutional layer by using convolution operation; determining an output characteristic diagram of the third fused convolutional layer according to the output characteristic diagram of the seventh fused sublayer and the output characteristic diagram of the eighth fused sublayer;

the fourth fusion convolution layer comprises a ninth fusion sublayer, a tenth fusion sublayer and an eleventh fusion sublayer, and the ninth fusion sublayer and the tenth fusion sublayer are used for performing down-sampling processing by sequentially utilizing convolution operation; the eleventh fusion sublayer is used for performing down-sampling processing on the output characteristic diagram of the third fusion convolutional layer by using convolution operation; and determining the output characteristic diagram of the fourth fused convolutional layer according to the output characteristic diagram of the tenth fused sublayer and the output characteristic diagram of the eleventh fused sublayer.

5. The method for training the detection model based on the bi-angle regression as claimed in claim 1, wherein the feature reduction network comprises in sequence: the device comprises an up-sampling convolutional layer, a fusion characteristic layer and an output characteristic layer, wherein the up-sampling convolutional layer is used for performing up-sampling processing, the fusion characteristic layer is used for performing down-sampling processing and performing characteristic mixing, the output characteristic layer is used for performing different convolution operations on an output characteristic diagram of the fusion characteristic layer and outputting different characteristic restoration diagrams.

6. The method of claim 5, wherein the upsampled convolutional layer comprises a first upsampled layer, a first interpolated layer, a second upsampled layer, a second interpolated layer, and a third upsampled layer, wherein: the first up-sampling layer is used for performing up-sampling processing on the feature extraction graph by using deconvolution operation; the first interpolation layer is used for carrying out quadruple bilinear interpolation up-sampling operation on the output characteristic diagram of the first up-sampling layer; the second up-sampling layer is used for performing up-sampling processing on the output characteristic diagram of the first up-sampling layer by using a deconvolution operation; the second interpolation layer is used for performing double-time bilinear interpolation up-sampling operation on the output characteristic diagram of the second up-sampling layer; the third up-sampling layer is configured to perform up-sampling processing on the output feature map of the second up-sampling layer by using a deconvolution operation.

7. The method for training the detection model based on the bi-angle regression as claimed in claim 6, wherein the fused feature layer comprises a first fused feature layer, a first downsampled feature layer, a second downsampled feature layer and a second fused feature layer, and the first fused feature layer is used for fusing the output feature map of the first interpolation layer, the output feature map of the second interpolation layer and the output feature map of the third upsampled layer by an end-to-end method; the first downsampling feature layer carries out downsampling processing on the output feature map of the first fusion feature layer by utilizing convolution operation; the second down-sampling feature layer performs down-sampling processing on the output feature map of the first down-sampling feature layer by using convolution operation; and the second fusion characteristic layer adds the output characteristic diagram of the first down-sampling characteristic layer and the output characteristic diagram of the second down-sampling characteristic layer and outputs the result.

8. The method of claim 5, wherein the outputting of the feature reduction graph comprises: the output feature layer performs five different convolution operations on the output feature graph of the fusion feature layer to obtain five feature restoration images; the five feature reduction maps comprise a Heatmap output feature map, an Angle1Angle2 output feature map, a ShortSide output feature map, a ShortLongRatio output feature map and a PointReg output feature map, wherein the Heatmap output feature map comprises data of the predicted central point; the Angle1Angle2 output feature map comprises data of the predicted first diagonal Angle and data of the predicted second diagonal Angle; the short output characteristic graph is the data of the predicted short edge length; the ShortLongRatio output feature map is data of the ratio of the predicted short side to the diagonal length; and the PointReg output characteristic diagram is the data of the prediction bias point.

9. A test model training device based on biangular regression is characterized by comprising:

the processing unit is used for inputting the training set images into the detection model and determining predicted coordinate data of each detection frame, wherein the predicted coordinate data comprise a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short side length, a predicted short side-diagonal length ratio and a predicted offset point of the detection frame; the system is also used for determining the value of a loss function according to the actual coordinate data and the predicted coordinate data;

the training unit is used for adjusting the parameters of the detection model according to the value of the loss function until a convergence condition is met, and finishing the training of the detection model;

10. A detection method based on double-angle regression is characterized by comprising the following steps:

inputting a plurality of image sub-blocks into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, the predicted coordinate data comprises a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short edge length, a predicted short edge-diagonal length ratio and a predicted bias point of the detection frame, and the detection model is obtained by training by adopting the double-angle regression-based detection model training method according to any one of claims 1-8;

11. The bi-angular regression based detection method of claim 10, wherein said determining an acquisition position of the detection frame on each of the image sub-blocks according to the predicted coordinate data comprises:

determining the detection position coordinates of the detection frame in an output feature map of a fusion feature layer of the detection model according to the predicted coordinate data;

and determining the acquisition position coordinate of the detection frame on each image sub-block according to the detection position coordinate.

12. The bi-angular regression-based detection method according to claim 11, wherein the detection position coordinates include coordinates of a first detection vertex, coordinates of a second detection vertex, coordinates of a third detection vertex, and coordinates of a fourth detection vertex; the determining, according to the predicted coordinate data, the detection position coordinate where the detection frame is located in the fusion feature layer of the detection model includes:

13. The bi-angular regression-based detection method according to claim 11, wherein the detection position coordinates include coordinates of a first detection vertex, coordinates of a second detection vertex, coordinates of a third detection vertex, and coordinates of a fourth detection vertex, and the acquisition position coordinates include coordinates of a first acquisition vertex, coordinates of a second acquisition vertex, coordinates of a third acquisition vertex, and coordinates of a fourth acquisition vertex; the determining, according to the detection position coordinates, the acquisition position coordinates where the detection frame is located on each of the image sub-blocks includes:

14. The bi-angular regression-based detection method of claim 10, wherein the determining an optimal detection box according to the acquisition location comprises:

15. A detection apparatus based on dual angle regression, comprising:

an acquisition unit: the method comprises the steps of obtaining an image to be detected;

a processing unit: the image preprocessing module is used for preprocessing the image to be detected to obtain an image subblock; the image sub-blocks are further used for inputting the image sub-blocks into a detection model, and determining predicted coordinate data of a detection frame, wherein the detection frame is used for framing a predicted ship target, the predicted coordinate data comprise a predicted central point, a predicted first diagonal angle, a predicted second diagonal angle, a predicted short side length, a predicted short side-to-diagonal length ratio and a predicted bias point of the detection frame, and the detection model is obtained by adopting the training method of the detection model based on the double-angle regression according to any one of claims 1-8; the image sub-block is used for acquiring the image sub-block, and the detection frame is used for determining the acquisition position of the detection frame on the image sub-block according to the predicted coordinate data;

screening unit: and the method is used for determining the optimal detection frame according to the acquisition position.

16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for training a dihedral regression-based detection model according to any one of claims 1-8, or carries out a method for dihedral regression-based detection according to any one of claims 10-14.