CN113298169A

CN113298169A - Convolutional neural network-based rotating target detection method and device

Info

Publication number: CN113298169A
Application number: CN202110612780.5A
Authority: CN
Inventors: 产思贤; 吴炳辉; 郑竟成; 白琮; 周小龙; 陶健; 陈胜勇
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-06-02
Filing date: 2021-06-02
Publication date: 2021-08-24
Anticipated expiration: 2041-06-02
Also published as: CN113298169B

Abstract

The invention discloses a method and a device for detecting a rotating target based on a convolutional neural network, which are used for acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, marking each rotating target detection frame in the image training data set after data enhancement as (xywh theta) by adopting 5 parameters, inputting the parameters into a backbone network for training, introducing rotation angle loss in the training to update network parameters, and detecting an image to be detected by adopting the trained network. According to the method and the device, the target is detected by adopting the rotating target detection frame, the rotating frame of the target in the image can be effectively predicted, and the target is accurately surrounded.

Description

Convolutional neural network-based rotating target detection method and device

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a method and a device for detecting a rotating target based on a convolutional neural network.

Background

Target detection is a basic problem of machine vision, supports visual tasks such as instance segmentation, target tracking and action recognition, and has wide application in the fields of automatic driving, unmanned aerial vehicles, monitoring and the like. In the existing target detection algorithm, most of prediction frames are horizontal rectangular frames, and the horizontal rectangular frames cannot accurately contain targets and often contain more background information. When the object is not horizontally or vertically represented in the image, the horizontal rectangular frame will contain a lot of background information when surrounding the object, so how to accurately surround the object in the image is still a difficult problem.

The main technical scheme of target detection is a one-stage algorithm and a two-stage algorithm. A two-stage mainstream algorithm such as a fast-RCNN series firstly screens a large number of candidate regions possibly having targets, and then detects the candidate regions, so that the algorithm is high in accuracy, but low in speed and poor in real-time image detection effect. The mainstream algorithm of a stage, such as a YOLO series, directly completes end-to-end prediction, the model detection speed is higher, and the object detection precision is reduced to a certain extent. In an image, the target distribution is dense, and the prediction frames are intersected, so that the model screening of the prediction frames is difficult.

Disclosure of Invention

The application aims to provide a method and a device for detecting a rotating target based on a convolutional neural network, wherein a rotating angle is introduced into the prior art, and a rotating target detection frame is adopted to detect the target, so that the problem that a prediction frame is difficult to screen is solved.

In order to achieve the purpose, the technical scheme of the application is as follows:

a method for detecting a rotating target based on a convolutional neural network comprises the following steps:

step 1, acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after data enhancement by adopting 5 parameters to be (xywh theta), wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotation angle of the rotating target detection frame, and the rotation angle is the included angle between the long edge and the horizontal direction;

step 2, adjusting the image training data set to be in a preset size according to the long edge equal proportion, inputting the size into the backbone network dark-53, and outputting feature maps F with three sizes₁、F₂、F₃；

Step 3, converting the characteristic diagram F₁、F₂、F₃Performing characteristic processing to obtain a characteristic diagram F₁₂、F₂₂、F₃₂The method comprises the following steps:

will feature chart F₁Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F₂Performing feature fusion to obtain a feature map F₂₁Continuing to map the feature map F₂₁After up-sampling and characteristic diagram F₃Carrying out feature fusion to obtain a feature map F₃₂Then, the feature map F is used₃₂Directly inputting the feature map F after and before down sampling into the path aggregation network PAN₂₁Fusing to obtain a feature map F₂₂A feature map F₂₂Directly inputting the feature map F after and before down sampling into the path aggregation network PAN₁Fusing to obtain a feature map F₁₂；

The characteristic diagram F₁₂、F₂₂、F₃₂Three of each { F X F x [ (C + t + b) ]_obj) Tensor of size xn + θ' }, where F × F is the eigenmap size, C is the detection class, t is the prediction bounding box, b_objIs the goal score prediction, N is the preset number of anchor boxes, and θ' is the predicted rotation angle.

Step 4, comparing the characteristic diagram F₁₂、F₂₂、F₃₂Performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters;

and 5, executing the step 2 and the step 3 to the image to be detected to obtain a detection result.

Further, the predicted frame comprises the predicted center point coordinate (t)_x，t_y) And predicted long side t_WAnd a short side t_HThe feature map F is obtained according to the following formula₁₂、F₂₂、F₃₂Performing frame and rotation angle regression operation:

wherein (b)_x,b_y) For obtaining coordinates of the center point after the regression operation, b_wThe long side obtained after the regression operation, b_hShort side obtained after the regression operation, b_θIs the angle of rotation, p, obtained after the regression operation_w、p_hRespectively the long side and the short side of the anchor frame, e is a natural constant, alpha and beta are set parameters, (C)_x，C_y) And the real coordinates of the upper left corner of the grid where the central point is located.

Further, the calculation formula of the rotation angle loss is as follows:

wherein L is_angleIs the loss of rotation angle.

Further, the method for detecting a rotating target based on a convolutional neural network further includes:

and converting the detection result into a 4-focus representation method, and drawing a rotating target detection frame on the image to be detected to indicate the position and the category information of the target in the image.

The application also provides a rotary target detection device based on the convolutional neural network, which comprises:

the tag processing module is used for acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after the data enhancement as (xywh theta) by adopting 5 parameters, wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotation angle of the rotating target detection frame, and the rotation angle is the included angle between the long edge and the horizontal direction;

a feature extraction module for adjusting the image training data set to a preset size according to the long edge in an equal proportion, inputting the adjusted size into the backbone network dark-53 and outputting a feature map F with three sizes₁、F₂、F₃；

A feature processing module for processing the feature map F₁、F₂、F₃Performing characteristic processing to obtain a characteristic diagram F₁₂、F₂₂、F₃₂The following operations are performed:

will feature chart F₁Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F₂Performing feature fusion to obtain a feature map F₂₁Continuing to map the feature map F₂₁After up-sampling and characteristic diagram F₃Carrying out feature fusion to obtain a feature map F₃₂Then, the feature map F is used₃₂Directly inputting the feature map F after and before down sampling into the path aggregation network PAN₂₁Fusing to obtain a feature map F₂₂A feature map F₂₂Direct input to roadFeature map F after and before down-sampling in PAN₁Fusing to obtain a feature map F₁₂；

A parameter updating module for updating the feature map F₁₂、F₂₂、F₃₂Performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters;

and the detection module is used for inputting the image to be detected into the feature extraction module and the feature processing module to obtain a detection result.

Further, the calculation formula of the rotation angle loss is as follows:

wherein L is_angleIs the loss of rotation angle.

Further, the apparatus for detecting a rotating target based on a convolutional neural network further includes:

and the display module is used for converting the detection result into a 4-focus representation method and drawing a rotating target detection frame on the image to be detected to indicate the position and the category information of the target in the image.

According to the method and the device for detecting the rotating target based on the convolutional neural network, a rotating angle is introduced in the prior art, the target is detected by adopting a rotating target detection frame, the rotating frame of the target in the image can be effectively predicted, and the target can be accurately surrounded.

Drawings

FIG. 1 is a flow chart of a neural network-based rotating target detection method according to the present application;

fig. 2 is a schematic view of a rotating target detection block.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Firstly, preprocessing an image, calculating to obtain a label required by training, and performing data enhancement on the image before training; then starting training in batches, obtaining a feature map from the images in each batch through a convolutional neural network, regressing the feature map to obtain a predicted value, comparing the predicted value with a real value of the image to calculate loss, performing back propagation to reduce the loss after each batch of training is finished, and updating network parameters; and finally, after all training batches are finished, loading the trained parameters to the network, detecting the unlabeled image to be detected, and obtaining a predicted value, namely a prediction result of the model on the image.

Referring to fig. 1, a method for detecting a rotating target based on a neural network includes the following steps:

and step S1, acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after data enhancement by adopting 5 parameters to be (xywh theta), wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotating angle of the rotating target detection frame, and the rotating angle is the included angle between the long edge and the horizontal direction.

According to the method, a network model is trained firstly, an image training data set is acquired firstly, a rotating target detection frame is marked in each image in the acquired image training data set, data enhancement is performed on the data set firstly, the data enhancement comprises deformation, rotation, color enhancement and the like, and the description is omitted here.

The method uniformly defines the image labels, and obtains the coordinates (x) of four corner points of each rotating target detection frame₁y₁,x₂y₂,x₃y₃,x₄y₄) From the coordinate information (x)₁y₁,x₂y₂,x₃y₃,x₄y₄) Calculating to obtain coordinates (x, y) of a central point, taking the long side as width w and the short side as height h, calculating delta y according to two y coordinates of the long side, and calculating an included angle theta between the long side and the horizontal direction by using an inverse trigonometric function:

the format of the data label input to the network training is xyz [ theta ], xy represents the central point of the object, w is the longer side of the object, h is the shorter side, and [ theta ] is the rotation angle of the object.

For an image training data set, a category C is set for each image, and is used for representing the category to which the image belongs. The coordinates and side lengths of each point in the present application are in units of pixels.

5 parameter notation (xyz θ) and 4-angle point representation (x)₁y₁,x₂y₂,x₃y₃,x₄y₄) As shown in equation 2, the corner point (x)_i,y_i) The coordinate calculation formula is as follows:

wherein (x, y) is the coordinate of the central point, w is the long side, h is the short side, theta is the rotation angle, (O)_xi,O_yi) Is based on the centerAnd calculating the coordinates of the corresponding angular points of the horizontal frame by the point coordinates, the long side w and the short side h. The horizontal frame refers to an object detection frame in which θ is 0. i equals 1, 2, 3, 4.

And substituting the specific coordinates of each corner point into calculation to obtain a label marked by 5 parameters, and after obtaining a predicted label through network prediction, calculating four corner points in turn, and displaying the result in a visualized manner after the detection is finished.

Step S2, adjusting the image training data set to a preset size according to the long edge equal proportion, inputting the size into the backbone network dark-53, and outputting feature maps F with three sizes₁、F₂、F₃。

The method adopts batch training, the batch processing size is 16 (namely each batch processes 16 pictures) in the training process, the learning rate is started from 0.1, the first 3 batches are preheated by a warp-up method, each batch updates the learning rate by a one-dimensional linear difference value, the learning rate is set to be 0.01 after a warp-up stage, and the learning rate is updated by a cosine annealing method.

Because the original picture is large, the original picture is zoomed according to the long edge, the long edge is zoomed to 640 multiplied by 640, the part with the short edge less than 640 is filled with 0, the zoomed picture is input into a backbone network dark net-53, and after a series of operations such as convolution and the like, feature maps F with three sizes of 80 multiplied by 80, 40 multiplied by 40 and 20 multiplied by 20 are output successively₁、F₂、F₃. The size of the feature map is determined by the backbone network darknet-53 and will not be described herein.

Step S3, feature map F₁、F₂、F₃Performing characteristic processing to obtain a characteristic diagram F₁₂、F₂₂、F₃₂The method comprises the following steps:

will feature chart F₁Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F₂Performing feature fusion to obtain a feature map F₂₁Continuing to map the feature map F₂₁After up-sampling and characteristic diagram F₃Carrying out feature fusion to obtain a feature map F₃₂Then, the feature map F is used₃₂Directly inputting into a Path Aggregation Network (PAN) for downward miningCharacteristic graph F after and before sampling₂₁Fusing to obtain a feature map F₂₂A feature map F₂₂Directly inputting the feature map F after and before down sampling into the path aggregation network PAN₁Fusing to obtain a feature map F₁₂。

Specifically, 20 × 20 feature map F₁Directly inputting the data into a top-down characteristic pyramid network FPN, and up-sampling the data to obtain a characteristic diagram of 40 multiplied by 40 and a characteristic diagram F₂Performing feature fusion to obtain F₂₁(40X 40), continue to F₂₁The up-sampling obtains a feature map with the size of 80 multiplied by 80, a feature map with the size of 80 multiplied by 80 and a feature map F₃Performing feature fusion to obtain F₃₂(80X 80). Then, the feature map F is processed₃₂Inputting the data into a PAN structure of a bottom-up path aggregation network, wherein the structure can be regarded as the inverse operation of FPN, and the obtained 80 x 80 feature map F₃₂Directly inputting into PAN, down-sampling to 40 × 40, and comparing with the previous feature map F₂₁Fusion to give F₂₂(40 × 40), and then feature map F₂₂Directly inputting the feature map F after and before down sampling into the path aggregation network PAN₁Fusing to obtain a feature map F₁₂(20X 20). To this end, output signatures for all three dimensions were obtained: f₁₂(20×20)、F₂₂(40X 40) and F₃₂(80×80)。

Step S4, matching feature graph F₁₂、F₂₂、F₃₂And performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters.

Characteristic diagram F output in the last step₁₂、F₂₂、F₃₂Three of each { F X F x [ (C + t + b) ]_obj) Tensor of size xn + θ' }, where F × F is the eigenmap size, C is the detection class, t is the prediction bounding box, b_objIs the goal score prediction, N is the preset number of anchor boxes, and θ' is the predicted rotation angle.

In this embodiment, F ∈ {20,40,80}, where C is the category in the dataset, and the prediction bounding box includes the predicted center point coordinates (t ∈ {20,40,80}, where t is the predicted center point coordinate_x，t_y) And predicted long side t_WAnd a short side t_HN is the number of anchor frames preset, which is 3 in this embodiment.

Because the output characteristic value cannot be directly used for loss calculation, regression needs to be performed first to obtain an actual predicted value. The feature map F is obtained according to the following formula₁₂、F₂₂、F₃₂Performing frame and rotation angle regression operation:

In one embodiment, α is 0.5 and β is 1.5, and the angle θ is limited by regression to [ -1.5,1.5], which is the range in degrees of arc rotation.

It should be noted that the grid of the present application is disposed on the finally obtained feature map, which is an abstract concept for facilitating the frame regression calculation, and for the feature maps of 20 × 20,40 × 40, and 80 × 80, there are 20 × 20,40 × 40, and 80 × 80 grids, respectively, and it is a relatively mature technology in the art to divide the feature map into a plurality of grids, and no further description is given here.

Predicting the actual value (b) based on the frame, angle, etc. obtained above_xb_yb_wb_hb_θ) And calculating the loss with the label (xywh theta) obtained by preprocessing in the first step.

1. Calculating frame loss, predicting frame information and calculating real frame information according to the label, wherein the IoU (interaction of Union) is calculated and is the intersection ratio of the predicted frame and the real frame, and a predicted frame with a high IoU value is obtained through NMS post-processing:

wherein

A real box (ground route), B a prediction box,

the area where the real box and the prediction box intersect,

a higher value of IoU indicates a more accurate prediction for the area of the union of the real and predicted boxes.

The CIoU loss function is improved on the basis of IoU, and the convergence effect is better when training is carried out after improvement:

wherein b is the prediction frame center point (b)_xb_y)，b^gtIs the true frame center point (sand), ρ (b, b)^gt) Is the real frame and the predicted frame centerThe Euclidean distance of points, c is the diagonal length of the minimum closure area of the real frame and the prediction frame, and lambda is a weight parameter:

v is used to measure the similarity of aspect ratios, defined as follows:

wherein w^gt，h^gtRespectively the width and height (wh) of the real box.

2. Calculating the angle loss:

wherein L is_angleIs the loss of rotation angle.

In one particular embodiment:

wherein

Is the radian measure theta, b of the true angle_θIs a radian measure value of a predicted angle, and adopts Smooth_L1(x) A function.

It should be noted that the L1 and L2 functions may also be used to calculate the rotation angle loss, and the L1 and L2 functions are both loss functions, which are not described herein again.

3. The classification Loss was calculated using a Binary Cross Entropy Loss function (Binary Cross Entropy Loss):

wherein

Is the true category of the sample, C_nIs the sample prediction category.

4. Calculating the targetability score loss by using a binary cross entropy loss function:

wherein

Indicating whether it is a target, the value is 1 or 0, b_objTo predict the goal score.

It should be noted that the calculation of the frame loss, the classification loss, and the goal score loss is already a relatively mature technology in the art, and is not described herein again.

The loss between the predicted value and the true value is obtained, and the loss is reduced by carrying out back propagation before the end of each batch. And simultaneously updating the network parameters, starting the training of the next batch until the training of all batches of training data is finished, and finally obtaining the trained backbone network. After training of all batches is finished, all updated parameters are stored in the pt weight file, and the updated weights are only required to replace the original weights before detection is started, and the training steps S2 and S3 are repeated.

Step S5, for the image to be detected, step S2 and step S3 are executed to obtain the detection result.

The method comprises the steps of scaling an image to be detected to a 640 multiplied by 640 input network, outputting feature graphs of three sizes through a series of operations such as convolution and the like, and obtaining a predicted value after regression of the feature values, wherein the predicted value comprises a category C, a frame and an angle prediction b_xb_yb_wb_hb_θAnd the predicted value is directly output as the detection result of the time without loss calculation.

In another embodiment, the convolutional neural network-based rotating object detection method further includes:

In order to visualize the detected image, it is necessary to convert the 5-parameter notation (xyz θ) into the 4-angle point representation (x) by the parameter conversion in step S1₁y₁,x₂y₂,x₃y₃,x₄y₄) And drawing a rotating frame on the original image by using four corner points to indicate the position and the category information of the target in the image.

In one embodiment, the present application further provides a convolutional neural network-based rotating target detection apparatus, including:

will feature chart F₁Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F₂Performing feature fusion to obtain a feature map F₂₁Continuing to map the feature map F₂₁UpsamplingPosterior and feature map F₃Carrying out feature fusion to obtain a feature map F₃₂Then, the feature map F is used₃₂Directly inputting the feature map F after and before down sampling into the path aggregation network PAN₂₁Fusing to obtain a feature map F₂₂A feature map F₂₂Directly inputting the feature map F after and before down sampling into the path aggregation network PAN₁Fusing to obtain a feature map F₁₂；

In a specific example, the convolutional neural network-based rotating target detection method further includes:

For specific limitations of the convolutional neural network-based rotating object detection apparatus, reference may be made to the above limitations of the convolutional neural network-based rotating object detection method, and details are not repeated here. The modules in the convolutional neural network-based rotating object detecting device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

The memory and the processor are electrically connected, directly or indirectly, to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory stores a computer program that can be executed on the processor, and the processor executes the computer program stored in the memory, thereby implementing the network topology layout method in the embodiment of the present invention.

The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory is used for storing programs, and the processor executes the programs after receiving the execution instructions.

The processor may be an integrated circuit chip having data processing capabilities. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like. The various methods, steps and logic blocks disclosed in embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A convolutional neural network-based rotating target detection method is characterized in that the convolutional neural network-based rotating target detection method comprises the following steps:

2. The convolutional neural network-based rotating object detection method of claim 1, wherein the prediction bounding box comprises a predicted center point coordinate (t)_x，t_y) And predicted long side t_WAnd a short side t_HThe feature map F is obtained according to the following formula₁₂、F₂₂、F₃₂Performing frame and rotation angle regression operation:

wherein (b)_x,b_y) For obtaining coordinates of the center point after the regression operation, b_wThe long side obtained after the regression operation, b_hShort side obtained after the regression operation, b_θAfter the regression operationObtained rotation angle, p_w、p_hRespectively the long side and the short side of the anchor frame, e is a natural constant, alpha and beta are set parameters, (C)_x，C_y) And the real coordinates of the upper left corner of the grid where the central point is located.

3. The convolutional neural network-based rotating object detecting method as claimed in claim 2, wherein the rotation angle loss is calculated as follows:

wherein L is_angleIs the loss of rotation angle.

4. The convolutional neural network-based rotating object detection method of claim 1, further comprising:

5. A convolutional neural network-based rotating object detection apparatus, comprising:

a feature extraction module for adjusting the image training data set to be preset large according to the long edge equal proportionSmall, input into backbone network darknet-53, output three size characteristic diagram F₁、F₂、F₃；

6. The convolutional neural network-based rotating object detection device of claim 5, wherein the prediction bounding box comprises a predicted center point coordinate (t)_x，t_y) And predictedLong side t_WAnd a short side t_HThe feature map F is obtained according to the following formula₁₂、F₂₂、F₃₂Performing frame and rotation angle regression operation:

wherein (b)_x，b_y) For obtaining coordinates of the center point after the regression operation, b_wThe long side obtained after the regression operation, b_hShort side obtained after the regression operation, b_θIs the angle of rotation, p, obtained after the regression operation_w、p_hRespectively the long side and the short side of the anchor frame, e is a natural constant, alpha and beta are set parameters, (C)_x，C_y) And the real coordinates of the upper left corner of the grid where the central point is located.

7. The convolutional neural network-based rotating object detecting device as claimed in claim 6, wherein the rotation angle loss is calculated as follows:

wherein L is_angleIs the loss of rotation angle.

8. The convolutional neural network based rotating object detecting device as claimed in claim 5, further comprising: