CN113298169A - Convolutional neural network-based rotating target detection method and device - Google Patents

Convolutional neural network-based rotating target detection method and device Download PDF

Info

Publication number
CN113298169A
CN113298169A CN202110612780.5A CN202110612780A CN113298169A CN 113298169 A CN113298169 A CN 113298169A CN 202110612780 A CN202110612780 A CN 202110612780A CN 113298169 A CN113298169 A CN 113298169A
Authority
CN
China
Prior art keywords
feature map
feature
frame
rotation angle
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110612780.5A
Other languages
Chinese (zh)
Other versions
CN113298169B (en
Inventor
产思贤
吴炳辉
郑竟成
白琮
周小龙
陶健
陈胜勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110612780.5A priority Critical patent/CN113298169B/en
Publication of CN113298169A publication Critical patent/CN113298169A/en
Application granted granted Critical
Publication of CN113298169B publication Critical patent/CN113298169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for detecting a rotating target based on a convolutional neural network, which are used for acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, marking each rotating target detection frame in the image training data set after data enhancement as (xywh theta) by adopting 5 parameters, inputting the parameters into a backbone network for training, introducing rotation angle loss in the training to update network parameters, and detecting an image to be detected by adopting the trained network. According to the method and the device, the target is detected by adopting the rotating target detection frame, the rotating frame of the target in the image can be effectively predicted, and the target is accurately surrounded.

Description

Convolutional neural network-based rotating target detection method and device
Technical Field
The application belongs to the technical field of image processing, and particularly relates to a method and a device for detecting a rotating target based on a convolutional neural network.
Background
Target detection is a basic problem of machine vision, supports visual tasks such as instance segmentation, target tracking and action recognition, and has wide application in the fields of automatic driving, unmanned aerial vehicles, monitoring and the like. In the existing target detection algorithm, most of prediction frames are horizontal rectangular frames, and the horizontal rectangular frames cannot accurately contain targets and often contain more background information. When the object is not horizontally or vertically represented in the image, the horizontal rectangular frame will contain a lot of background information when surrounding the object, so how to accurately surround the object in the image is still a difficult problem.
The main technical scheme of target detection is a one-stage algorithm and a two-stage algorithm. A two-stage mainstream algorithm such as a fast-RCNN series firstly screens a large number of candidate regions possibly having targets, and then detects the candidate regions, so that the algorithm is high in accuracy, but low in speed and poor in real-time image detection effect. The mainstream algorithm of a stage, such as a YOLO series, directly completes end-to-end prediction, the model detection speed is higher, and the object detection precision is reduced to a certain extent. In an image, the target distribution is dense, and the prediction frames are intersected, so that the model screening of the prediction frames is difficult.
Disclosure of Invention
The application aims to provide a method and a device for detecting a rotating target based on a convolutional neural network, wherein a rotating angle is introduced into the prior art, and a rotating target detection frame is adopted to detect the target, so that the problem that a prediction frame is difficult to screen is solved.
In order to achieve the purpose, the technical scheme of the application is as follows:
a method for detecting a rotating target based on a convolutional neural network comprises the following steps:
step 1, acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after data enhancement by adopting 5 parameters to be (xywh theta), wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotation angle of the rotating target detection frame, and the rotation angle is the included angle between the long edge and the horizontal direction;
step 2, adjusting the image training data set to be in a preset size according to the long edge equal proportion, inputting the size into the backbone network dark-53, and outputting feature maps F with three sizes1、F2、F3
Step 3, converting the characteristic diagram F1、F2、F3Performing characteristic processing to obtain a characteristic diagram F12、F22、F32The method comprises the following steps:
will feature chart F1Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F2Performing feature fusion to obtain a feature map F21Continuing to map the feature map F21After up-sampling and characteristic diagram F3Carrying out feature fusion to obtain a feature map F32Then, the feature map F is used32Directly inputting the feature map F after and before down sampling into the path aggregation network PAN21Fusing to obtain a feature map F22A feature map F22Directly inputting the feature map F after and before down sampling into the path aggregation network PAN1Fusing to obtain a feature map F12
The characteristic diagram F12、F22、F32Three of each { F X F x [ (C + t + b) ]obj) Tensor of size xn + θ' }, where F × F is the eigenmap size, C is the detection class, t is the prediction bounding box, bobjIs the goal score prediction, N is the preset number of anchor boxes, and θ' is the predicted rotation angle.
Step 4, comparing the characteristic diagram F12、F22、F32Performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters;
and 5, executing the step 2 and the step 3 to the image to be detected to obtain a detection result.
Further, the predicted frame comprises the predicted center point coordinate (t)x,ty) And predicted long side tWAnd a short side tHThe feature map F is obtained according to the following formula12、F22、F32Performing frame and rotation angle regression operation:
Figure BDA0003096598900000021
Figure BDA0003096598900000022
Figure BDA0003096598900000023
Figure BDA0003096598900000024
Figure BDA0003096598900000031
wherein (b)x,by) For obtaining coordinates of the center point after the regression operation, bwThe long side obtained after the regression operation, bhShort side obtained after the regression operation, bθIs the angle of rotation, p, obtained after the regression operationw、phRespectively the long side and the short side of the anchor frame, e is a natural constant, alpha and beta are set parameters, (C)x,Cy) And the real coordinates of the upper left corner of the grid where the central point is located.
Further, the calculation formula of the rotation angle loss is as follows:
Figure BDA0003096598900000032
wherein L isangleIs the loss of rotation angle.
Further, the method for detecting a rotating target based on a convolutional neural network further includes:
and converting the detection result into a 4-focus representation method, and drawing a rotating target detection frame on the image to be detected to indicate the position and the category information of the target in the image.
The application also provides a rotary target detection device based on the convolutional neural network, which comprises:
the tag processing module is used for acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after the data enhancement as (xywh theta) by adopting 5 parameters, wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotation angle of the rotating target detection frame, and the rotation angle is the included angle between the long edge and the horizontal direction;
a feature extraction module for adjusting the image training data set to a preset size according to the long edge in an equal proportion, inputting the adjusted size into the backbone network dark-53 and outputting a feature map F with three sizes1、F2、F3
A feature processing module for processing the feature map F1、F2、F3Performing characteristic processing to obtain a characteristic diagram F12、F22、F32The following operations are performed:
will feature chart F1Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F2Performing feature fusion to obtain a feature map F21Continuing to map the feature map F21After up-sampling and characteristic diagram F3Carrying out feature fusion to obtain a feature map F32Then, the feature map F is used32Directly inputting the feature map F after and before down sampling into the path aggregation network PAN21Fusing to obtain a feature map F22A feature map F22Direct input to roadFeature map F after and before down-sampling in PAN1Fusing to obtain a feature map F12
The characteristic diagram F12、F22、F32Three of each { F X F x [ (C + t + b) ]obj) Tensor of size xn + θ' }, where F × F is the eigenmap size, C is the detection class, t is the prediction bounding box, bobjIs the goal score prediction, N is the preset number of anchor boxes, and θ' is the predicted rotation angle.
A parameter updating module for updating the feature map F12、F22、F32Performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters;
and the detection module is used for inputting the image to be detected into the feature extraction module and the feature processing module to obtain a detection result.
Further, the predicted frame comprises the predicted center point coordinate (t)x,ty) And predicted long side tWAnd a short side tHThe feature map F is obtained according to the following formula12、F22、F32Performing frame and rotation angle regression operation:
Figure BDA0003096598900000041
Figure BDA0003096598900000042
Figure BDA0003096598900000043
Figure BDA0003096598900000044
Figure BDA0003096598900000045
wherein (b)x,by) For obtaining coordinates of the center point after the regression operation, bwThe long side obtained after the regression operation, bhShort side obtained after the regression operation, bθIs the angle of rotation, p, obtained after the regression operationw、phRespectively the long side and the short side of the anchor frame, e is a natural constant, alpha and beta are set parameters, (C)x,Cy) And the real coordinates of the upper left corner of the grid where the central point is located.
Further, the calculation formula of the rotation angle loss is as follows:
Figure BDA0003096598900000046
wherein L isangleIs the loss of rotation angle.
Further, the apparatus for detecting a rotating target based on a convolutional neural network further includes:
and the display module is used for converting the detection result into a 4-focus representation method and drawing a rotating target detection frame on the image to be detected to indicate the position and the category information of the target in the image.
According to the method and the device for detecting the rotating target based on the convolutional neural network, a rotating angle is introduced in the prior art, the target is detected by adopting a rotating target detection frame, the rotating frame of the target in the image can be effectively predicted, and the target can be accurately surrounded.
Drawings
FIG. 1 is a flow chart of a neural network-based rotating target detection method according to the present application;
fig. 2 is a schematic view of a rotating target detection block.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Firstly, preprocessing an image, calculating to obtain a label required by training, and performing data enhancement on the image before training; then starting training in batches, obtaining a feature map from the images in each batch through a convolutional neural network, regressing the feature map to obtain a predicted value, comparing the predicted value with a real value of the image to calculate loss, performing back propagation to reduce the loss after each batch of training is finished, and updating network parameters; and finally, after all training batches are finished, loading the trained parameters to the network, detecting the unlabeled image to be detected, and obtaining a predicted value, namely a prediction result of the model on the image.
Referring to fig. 1, a method for detecting a rotating target based on a neural network includes the following steps:
and step S1, acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after data enhancement by adopting 5 parameters to be (xywh theta), wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotating angle of the rotating target detection frame, and the rotating angle is the included angle between the long edge and the horizontal direction.
According to the method, a network model is trained firstly, an image training data set is acquired firstly, a rotating target detection frame is marked in each image in the acquired image training data set, data enhancement is performed on the data set firstly, the data enhancement comprises deformation, rotation, color enhancement and the like, and the description is omitted here.
The method uniformly defines the image labels, and obtains the coordinates (x) of four corner points of each rotating target detection frame1y1,x2y2,x3y3,x4y4) From the coordinate information (x)1y1,x2y2,x3y3,x4y4) Calculating to obtain coordinates (x, y) of a central point, taking the long side as width w and the short side as height h, calculating delta y according to two y coordinates of the long side, and calculating an included angle theta between the long side and the horizontal direction by using an inverse trigonometric function:
Figure BDA0003096598900000061
the format of the data label input to the network training is xyz [ theta ], xy represents the central point of the object, w is the longer side of the object, h is the shorter side, and [ theta ] is the rotation angle of the object.
For an image training data set, a category C is set for each image, and is used for representing the category to which the image belongs. The coordinates and side lengths of each point in the present application are in units of pixels.
5 parameter notation (xyz θ) and 4-angle point representation (x)1y1,x2y2,x3y3,x4y4) As shown in equation 2, the corner point (x)i,yi) The coordinate calculation formula is as follows:
Figure BDA0003096598900000062
Figure BDA0003096598900000063
Figure BDA0003096598900000064
Figure BDA0003096598900000065
wherein (x, y) is the coordinate of the central point, w is the long side, h is the short side, theta is the rotation angle, (O)xi,Oyi) Is based on the centerAnd calculating the coordinates of the corresponding angular points of the horizontal frame by the point coordinates, the long side w and the short side h. The horizontal frame refers to an object detection frame in which θ is 0. i equals 1, 2, 3, 4.
And substituting the specific coordinates of each corner point into calculation to obtain a label marked by 5 parameters, and after obtaining a predicted label through network prediction, calculating four corner points in turn, and displaying the result in a visualized manner after the detection is finished.
Step S2, adjusting the image training data set to a preset size according to the long edge equal proportion, inputting the size into the backbone network dark-53, and outputting feature maps F with three sizes1、F2、F3
The method adopts batch training, the batch processing size is 16 (namely each batch processes 16 pictures) in the training process, the learning rate is started from 0.1, the first 3 batches are preheated by a warp-up method, each batch updates the learning rate by a one-dimensional linear difference value, the learning rate is set to be 0.01 after a warp-up stage, and the learning rate is updated by a cosine annealing method.
Because the original picture is large, the original picture is zoomed according to the long edge, the long edge is zoomed to 640 multiplied by 640, the part with the short edge less than 640 is filled with 0, the zoomed picture is input into a backbone network dark net-53, and after a series of operations such as convolution and the like, feature maps F with three sizes of 80 multiplied by 80, 40 multiplied by 40 and 20 multiplied by 20 are output successively1、F2、F3. The size of the feature map is determined by the backbone network darknet-53 and will not be described herein.
Step S3, feature map F1、F2、F3Performing characteristic processing to obtain a characteristic diagram F12、F22、F32The method comprises the following steps:
will feature chart F1Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F2Performing feature fusion to obtain a feature map F21Continuing to map the feature map F21After up-sampling and characteristic diagram F3Carrying out feature fusion to obtain a feature map F32Then, the feature map F is used32Directly inputting into a Path Aggregation Network (PAN) for downward miningCharacteristic graph F after and before sampling21Fusing to obtain a feature map F22A feature map F22Directly inputting the feature map F after and before down sampling into the path aggregation network PAN1Fusing to obtain a feature map F12
Specifically, 20 × 20 feature map F1Directly inputting the data into a top-down characteristic pyramid network FPN, and up-sampling the data to obtain a characteristic diagram of 40 multiplied by 40 and a characteristic diagram F2Performing feature fusion to obtain F21(40X 40), continue to F21The up-sampling obtains a feature map with the size of 80 multiplied by 80, a feature map with the size of 80 multiplied by 80 and a feature map F3Performing feature fusion to obtain F32(80X 80). Then, the feature map F is processed32Inputting the data into a PAN structure of a bottom-up path aggregation network, wherein the structure can be regarded as the inverse operation of FPN, and the obtained 80 x 80 feature map F32Directly inputting into PAN, down-sampling to 40 × 40, and comparing with the previous feature map F21Fusion to give F22(40 × 40), and then feature map F22Directly inputting the feature map F after and before down sampling into the path aggregation network PAN1Fusing to obtain a feature map F12(20X 20). To this end, output signatures for all three dimensions were obtained: f12(20×20)、F22(40X 40) and F32(80×80)。
Step S4, matching feature graph F12、F22、F32And performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters.
Characteristic diagram F output in the last step12、F22、F32Three of each { F X F x [ (C + t + b) ]obj) Tensor of size xn + θ' }, where F × F is the eigenmap size, C is the detection class, t is the prediction bounding box, bobjIs the goal score prediction, N is the preset number of anchor boxes, and θ' is the predicted rotation angle.
In this embodiment, F ∈ {20,40,80}, where C is the category in the dataset, and the prediction bounding box includes the predicted center point coordinates (t ∈ {20,40,80}, where t is the predicted center point coordinatex,ty) And predicted long side tWAnd a short side tHN is the number of anchor frames preset, which is 3 in this embodiment.
Because the output characteristic value cannot be directly used for loss calculation, regression needs to be performed first to obtain an actual predicted value. The feature map F is obtained according to the following formula12、F22、F32Performing frame and rotation angle regression operation:
Figure BDA0003096598900000081
Figure BDA0003096598900000082
Figure BDA0003096598900000083
Figure BDA0003096598900000084
Figure BDA0003096598900000085
wherein (b)x,by) For obtaining coordinates of the center point after the regression operation, bwThe long side obtained after the regression operation, bhShort side obtained after the regression operation, bθIs the angle of rotation, p, obtained after the regression operationw、phRespectively the long side and the short side of the anchor frame, e is a natural constant, alpha and beta are set parameters, (C)x,Cy) And the real coordinates of the upper left corner of the grid where the central point is located.
In one embodiment, α is 0.5 and β is 1.5, and the angle θ is limited by regression to [ -1.5,1.5], which is the range in degrees of arc rotation.
It should be noted that the grid of the present application is disposed on the finally obtained feature map, which is an abstract concept for facilitating the frame regression calculation, and for the feature maps of 20 × 20,40 × 40, and 80 × 80, there are 20 × 20,40 × 40, and 80 × 80 grids, respectively, and it is a relatively mature technology in the art to divide the feature map into a plurality of grids, and no further description is given here.
Predicting the actual value (b) based on the frame, angle, etc. obtained abovexbybwbhbθ) And calculating the loss with the label (xywh theta) obtained by preprocessing in the first step.
1. Calculating frame loss, predicting frame information and calculating real frame information according to the label, wherein the IoU (interaction of Union) is calculated and is the intersection ratio of the predicted frame and the real frame, and a predicted frame with a high IoU value is obtained through NMS post-processing:
Figure BDA0003096598900000086
wherein
Figure BDA0003096598900000087
A real box (ground route), B a prediction box,
Figure BDA0003096598900000088
the area where the real box and the prediction box intersect,
Figure BDA0003096598900000089
a higher value of IoU indicates a more accurate prediction for the area of the union of the real and predicted boxes.
The CIoU loss function is improved on the basis of IoU, and the convergence effect is better when training is carried out after improvement:
Figure BDA0003096598900000091
wherein b is the prediction frame center point (b)xby),bgtIs the true frame center point (sand), ρ (b, b)gt) Is the real frame and the predicted frame centerThe Euclidean distance of points, c is the diagonal length of the minimum closure area of the real frame and the prediction frame, and lambda is a weight parameter:
Figure BDA0003096598900000092
v is used to measure the similarity of aspect ratios, defined as follows:
Figure BDA0003096598900000093
wherein wgt,hgtRespectively the width and height (wh) of the real box.
2. Calculating the angle loss:
Figure BDA0003096598900000094
wherein L isangleIs the loss of rotation angle.
In one particular embodiment:
Figure BDA0003096598900000095
wherein
Figure BDA0003096598900000096
Is the radian measure theta, b of the true angleθIs a radian measure value of a predicted angle, and adopts SmoothL1(x) A function.
It should be noted that the L1 and L2 functions may also be used to calculate the rotation angle loss, and the L1 and L2 functions are both loss functions, which are not described herein again.
3. The classification Loss was calculated using a Binary Cross Entropy Loss function (Binary Cross Entropy Loss):
Figure BDA0003096598900000097
wherein
Figure BDA0003096598900000098
Is the true category of the sample, CnIs the sample prediction category.
4. Calculating the targetability score loss by using a binary cross entropy loss function:
Figure BDA0003096598900000099
wherein
Figure BDA00030965989000000910
Indicating whether it is a target, the value is 1 or 0, bobjTo predict the goal score.
It should be noted that the calculation of the frame loss, the classification loss, and the goal score loss is already a relatively mature technology in the art, and is not described herein again.
The loss between the predicted value and the true value is obtained, and the loss is reduced by carrying out back propagation before the end of each batch. And simultaneously updating the network parameters, starting the training of the next batch until the training of all batches of training data is finished, and finally obtaining the trained backbone network. After training of all batches is finished, all updated parameters are stored in the pt weight file, and the updated weights are only required to replace the original weights before detection is started, and the training steps S2 and S3 are repeated.
Step S5, for the image to be detected, step S2 and step S3 are executed to obtain the detection result.
The method comprises the steps of scaling an image to be detected to a 640 multiplied by 640 input network, outputting feature graphs of three sizes through a series of operations such as convolution and the like, and obtaining a predicted value after regression of the feature values, wherein the predicted value comprises a category C, a frame and an angle prediction bxbybwbhbθAnd the predicted value is directly output as the detection result of the time without loss calculation.
In another embodiment, the convolutional neural network-based rotating object detection method further includes:
and converting the detection result into a 4-focus representation method, and drawing a rotating target detection frame on the image to be detected to indicate the position and the category information of the target in the image.
In order to visualize the detected image, it is necessary to convert the 5-parameter notation (xyz θ) into the 4-angle point representation (x) by the parameter conversion in step S11y1,x2y2,x3y3,x4y4) And drawing a rotating frame on the original image by using four corner points to indicate the position and the category information of the target in the image.
In one embodiment, the present application further provides a convolutional neural network-based rotating target detection apparatus, including:
the tag processing module is used for acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after the data enhancement as (xywh theta) by adopting 5 parameters, wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotation angle of the rotating target detection frame, and the rotation angle is the included angle between the long edge and the horizontal direction;
a feature extraction module for adjusting the image training data set to a preset size according to the long edge in an equal proportion, inputting the adjusted size into the backbone network dark-53 and outputting a feature map F with three sizes1、F2、F3
A feature processing module for processing the feature map F1、F2、F3Performing characteristic processing to obtain a characteristic diagram F12、F22、F32The following operations are performed:
will feature chart F1Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F2Performing feature fusion to obtain a feature map F21Continuing to map the feature map F21UpsamplingPosterior and feature map F3Carrying out feature fusion to obtain a feature map F32Then, the feature map F is used32Directly inputting the feature map F after and before down sampling into the path aggregation network PAN21Fusing to obtain a feature map F22A feature map F22Directly inputting the feature map F after and before down sampling into the path aggregation network PAN1Fusing to obtain a feature map F12
The characteristic diagram F12、F22、F32Three of each { F X F x [ (C + t + b) ]obj) Tensor of size xn + θ' }, where F × F is the eigenmap size, C is the detection class, t is the prediction bounding box, bobjIs the goal score prediction, N is the preset number of anchor boxes, and θ' is the predicted rotation angle.
A parameter updating module for updating the feature map F12、F22、F32Performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters;
and the detection module is used for inputting the image to be detected into the feature extraction module and the feature processing module to obtain a detection result.
In a specific example, the convolutional neural network-based rotating target detection method further includes:
and converting the detection result into a 4-focus representation method, and drawing a rotating target detection frame on the image to be detected to indicate the position and the category information of the target in the image.
For specific limitations of the convolutional neural network-based rotating object detection apparatus, reference may be made to the above limitations of the convolutional neural network-based rotating object detection method, and details are not repeated here. The modules in the convolutional neural network-based rotating object detecting device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
The memory and the processor are electrically connected, directly or indirectly, to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory stores a computer program that can be executed on the processor, and the processor executes the computer program stored in the memory, thereby implementing the network topology layout method in the embodiment of the present invention.
The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory is used for storing programs, and the processor executes the programs after receiving the execution instructions.
The processor may be an integrated circuit chip having data processing capabilities. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like. The various methods, steps and logic blocks disclosed in embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A convolutional neural network-based rotating target detection method is characterized in that the convolutional neural network-based rotating target detection method comprises the following steps:
step 1, acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after data enhancement by adopting 5 parameters to be (xywh theta), wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotation angle of the rotating target detection frame, and the rotation angle is the included angle between the long edge and the horizontal direction;
step 2, adjusting the image training data set to be in a preset size according to the long edge equal proportion, inputting the size into the backbone network dark-53, and outputting feature maps F with three sizes1、F2、F3
Step 3, converting the characteristic diagram F1、F2、F3Performing characteristic processing to obtain a characteristic diagram F12、F22、F32The method comprises the following steps:
will feature chart F1Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F2Performing feature fusion to obtain a feature map F21Continuing to map the feature map F21After up-sampling and characteristic diagram F3Carrying out feature fusion to obtain a feature map F32Then, the feature map F is used32Directly inputting the feature map F after and before down sampling into the path aggregation network PAN21Fusing to obtain a feature map F22A feature map F22Directly inputting the feature map F after and before down sampling into the path aggregation network PAN1Fusing to obtain a feature map F12
The characteristic diagram F12、F22、F32Three of each { F X F x [ (C + t + b) ]obj) Tensor of size xn + θ' }, where F × F is the eigenmap size, C is the detection class, t is the prediction bounding box, bobjIs the goal score prediction, N is the preset number of anchor boxes, and θ' is the predicted rotation angle.
Step 4, comparing the characteristic diagram F12、F22、F32Performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters;
and 5, executing the step 2 and the step 3 to the image to be detected to obtain a detection result.
2. The convolutional neural network-based rotating object detection method of claim 1, wherein the prediction bounding box comprises a predicted center point coordinate (t)x,ty) And predicted long side tWAnd a short side tHThe feature map F is obtained according to the following formula12、F22、F32Performing frame and rotation angle regression operation:
Figure FDA0003096598890000021
Figure FDA0003096598890000022
Figure FDA0003096598890000023
Figure FDA0003096598890000024
Figure FDA0003096598890000025
wherein (b)x,by) For obtaining coordinates of the center point after the regression operation, bwThe long side obtained after the regression operation, bhShort side obtained after the regression operation, bθAfter the regression operationObtained rotation angle, pw、phRespectively the long side and the short side of the anchor frame, e is a natural constant, alpha and beta are set parameters, (C)x,Cy) And the real coordinates of the upper left corner of the grid where the central point is located.
3. The convolutional neural network-based rotating object detecting method as claimed in claim 2, wherein the rotation angle loss is calculated as follows:
Figure FDA0003096598890000026
wherein L isangleIs the loss of rotation angle.
4. The convolutional neural network-based rotating object detection method of claim 1, further comprising:
and converting the detection result into a 4-focus representation method, and drawing a rotating target detection frame on the image to be detected to indicate the position and the category information of the target in the image.
5. A convolutional neural network-based rotating object detection apparatus, comprising:
the tag processing module is used for acquiring an image training data set labeled with a rotating target detection frame, performing data enhancement on the image training data set, and marking each rotating target detection frame in the image training data set after the data enhancement as (xywh theta) by adopting 5 parameters, wherein xy represents the center of the rotating target detection frame, w represents the long edge of the rotating target detection frame, h represents the short edge of the rotating target detection frame, theta represents the rotation angle of the rotating target detection frame, and the rotation angle is the included angle between the long edge and the horizontal direction;
a feature extraction module for adjusting the image training data set to be preset large according to the long edge equal proportionSmall, input into backbone network darknet-53, output three size characteristic diagram F1、F2、F3
A feature processing module for processing the feature map F1、F2、F3Performing characteristic processing to obtain a characteristic diagram F12、F22、F32The following operations are performed:
will feature chart F1Directly inputting the data into a feature pyramid network FPN, and performing up-sampling on the data and a feature graph F2Performing feature fusion to obtain a feature map F21Continuing to map the feature map F21After up-sampling and characteristic diagram F3Carrying out feature fusion to obtain a feature map F32Then, the feature map F is used32Directly inputting the feature map F after and before down sampling into the path aggregation network PAN21Fusing to obtain a feature map F22A feature map F22Directly inputting the feature map F after and before down sampling into the path aggregation network PAN1Fusing to obtain a feature map F12
The characteristic diagram F12、F22、F32Three of each { F X F x [ (C + t + b) ]obj) Tensor of size xn + θ' }, where F × F is the eigenmap size, C is the detection class, t is the prediction bounding box, bobjIs the goal score prediction, N is the preset number of anchor boxes, and θ' is the predicted rotation angle.
A parameter updating module for updating the feature map F12、F22、F32Performing frame and rotation angle regression operation, calculating frame loss, rotation angle loss, classification loss and target score loss, performing back propagation to reduce loss, and updating network parameters;
and the detection module is used for inputting the image to be detected into the feature extraction module and the feature processing module to obtain a detection result.
6. The convolutional neural network-based rotating object detection device of claim 5, wherein the prediction bounding box comprises a predicted center point coordinate (t)x,ty) And predictedLong side tWAnd a short side tHThe feature map F is obtained according to the following formula12、F22、F32Performing frame and rotation angle regression operation:
Figure FDA0003096598890000031
Figure FDA0003096598890000032
Figure FDA0003096598890000033
Figure FDA0003096598890000034
Figure FDA0003096598890000035
wherein (b)x,by) For obtaining coordinates of the center point after the regression operation, bwThe long side obtained after the regression operation, bhShort side obtained after the regression operation, bθIs the angle of rotation, p, obtained after the regression operationw、phRespectively the long side and the short side of the anchor frame, e is a natural constant, alpha and beta are set parameters, (C)x,Cy) And the real coordinates of the upper left corner of the grid where the central point is located.
7. The convolutional neural network-based rotating object detecting device as claimed in claim 6, wherein the rotation angle loss is calculated as follows:
Figure FDA0003096598890000041
wherein L isangleIs the loss of rotation angle.
8. The convolutional neural network based rotating object detecting device as claimed in claim 5, further comprising:
and the display module is used for converting the detection result into a 4-focus representation method and drawing a rotating target detection frame on the image to be detected to indicate the position and the category information of the target in the image.
CN202110612780.5A 2021-06-02 2021-06-02 Rotating target detection method and device based on convolutional neural network Active CN113298169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110612780.5A CN113298169B (en) 2021-06-02 2021-06-02 Rotating target detection method and device based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110612780.5A CN113298169B (en) 2021-06-02 2021-06-02 Rotating target detection method and device based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN113298169A true CN113298169A (en) 2021-08-24
CN113298169B CN113298169B (en) 2024-03-01

Family

ID=77326817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110612780.5A Active CN113298169B (en) 2021-06-02 2021-06-02 Rotating target detection method and device based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN113298169B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591810A (en) * 2021-09-28 2021-11-02 湖南大学 Vehicle target pose detection method and device based on boundary tight constraint network and storage medium
CN113822882A (en) * 2021-11-22 2021-12-21 武汉飞恩微电子有限公司 Circuit board surface defect detection method and device based on deep learning
CN114119610A (en) * 2022-01-25 2022-03-01 合肥中科类脑智能技术有限公司 Defect detection method based on rotating target detection
CN114332638A (en) * 2021-11-03 2022-04-12 中科弘云科技(北京)有限公司 Remote sensing image target detection method and device, electronic equipment and medium
CN114596429A (en) * 2022-02-28 2022-06-07 安徽大学 Ear detection method based on self-defined rotating frame
CN114677568A (en) * 2022-05-30 2022-06-28 山东极视角科技有限公司 Linear target detection method, module and system based on neural network
CN115630660A (en) * 2022-12-23 2023-01-20 湖北凯乐仕通达科技有限公司 Barcode positioning method and device based on convolutional neural network
CN116681983A (en) * 2023-06-02 2023-09-01 中国矿业大学 Long and narrow target detection method based on deep learning
CN116735463A (en) * 2023-06-01 2023-09-12 中山大学 Directed target detection-based diatom size automatic measurement method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199993A (en) * 2020-09-01 2021-01-08 广西大学 Method for identifying transformer substation insulator infrared image detection model in any direction based on artificial intelligence
CN112766184A (en) * 2021-01-22 2021-05-07 东南大学 Remote sensing target detection method based on multi-level feature selection convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199993A (en) * 2020-09-01 2021-01-08 广西大学 Method for identifying transformer substation insulator infrared image detection model in any direction based on artificial intelligence
CN112766184A (en) * 2021-01-22 2021-05-07 东南大学 Remote sensing target detection method based on multi-level feature selection convolutional neural network

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591810B (en) * 2021-09-28 2021-12-07 湖南大学 Vehicle target pose detection method and device based on boundary tight constraint network and storage medium
CN113591810A (en) * 2021-09-28 2021-11-02 湖南大学 Vehicle target pose detection method and device based on boundary tight constraint network and storage medium
CN114332638B (en) * 2021-11-03 2023-04-25 中科弘云科技(北京)有限公司 Remote sensing image target detection method and device, electronic equipment and medium
CN114332638A (en) * 2021-11-03 2022-04-12 中科弘云科技(北京)有限公司 Remote sensing image target detection method and device, electronic equipment and medium
CN113822882A (en) * 2021-11-22 2021-12-21 武汉飞恩微电子有限公司 Circuit board surface defect detection method and device based on deep learning
CN114119610A (en) * 2022-01-25 2022-03-01 合肥中科类脑智能技术有限公司 Defect detection method based on rotating target detection
CN114119610B (en) * 2022-01-25 2022-06-28 合肥中科类脑智能技术有限公司 Defect detection method based on rotating target detection
CN114596429A (en) * 2022-02-28 2022-06-07 安徽大学 Ear detection method based on self-defined rotating frame
CN114596429B (en) * 2022-02-28 2024-04-19 安徽大学 Wheat head detection method based on custom rotating frame
CN114677568A (en) * 2022-05-30 2022-06-28 山东极视角科技有限公司 Linear target detection method, module and system based on neural network
CN115630660A (en) * 2022-12-23 2023-01-20 湖北凯乐仕通达科技有限公司 Barcode positioning method and device based on convolutional neural network
CN116735463A (en) * 2023-06-01 2023-09-12 中山大学 Directed target detection-based diatom size automatic measurement method
CN116681983A (en) * 2023-06-02 2023-09-01 中国矿业大学 Long and narrow target detection method based on deep learning

Also Published As

Publication number Publication date
CN113298169B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN113298169B (en) Rotating target detection method and device based on convolutional neural network
CN108520229B (en) Image detection method, image detection device, electronic equipment and computer readable medium
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN110309824B (en) Character detection method and device and terminal
CN113139543B (en) Training method of target object detection model, target object detection method and equipment
CN111553347B (en) Scene text detection method oriented to any angle
CN113239982A (en) Training method of detection model, target detection method, device and electronic system
CN110889437B (en) Image processing method and device, electronic equipment and storage medium
Zhou et al. Exploring faster RCNN for fabric defect detection
JP2010108135A (en) Image processing device, image processing program, and image processing method
CN111695609A (en) Target damage degree determination method, target damage degree determination device, electronic device, and storage medium
CN115909059A (en) Natural resource sample library establishing method and device
CN105095835A (en) Pedestrian detection method and system
CN113673519B (en) Character recognition method based on character detection model and related equipment thereof
Gu et al. A fast multi-object extraction algorithm based on cell-based connected components labeling
CN109598298B (en) Image object recognition method and system
CN110659637A (en) Electric energy meter number and label automatic identification method combining deep neural network and SIFT features
CN115761401A (en) Method and device for detecting small target on highway based on convolutional neural network
CN117037132A (en) Ship water gauge reading detection and identification method based on machine vision
CN112784494B (en) Training method of false positive recognition model, target recognition method and device
CN116778458B (en) Parking space detection model construction method, parking space detection method, equipment and storage medium
CN113255555A (en) Method, system, processing equipment and storage medium for identifying Chinese traffic sign board
CN114898306B (en) Method and device for detecting target orientation and electronic equipment
CN116363583A (en) Human body identification method, device, equipment and medium for top view angle
CN115937537A (en) Intelligent identification method, device and equipment for target image and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant