CN114493975A - Seedling rotating frame target detection method and system - Google Patents

Seedling rotating frame target detection method and system Download PDF

Info

Publication number
CN114493975A
CN114493975A CN202210143675.6A CN202210143675A CN114493975A CN 114493975 A CN114493975 A CN 114493975A CN 202210143675 A CN202210143675 A CN 202210143675A CN 114493975 A CN114493975 A CN 114493975A
Authority
CN
China
Prior art keywords
seedling
angle
historical
function
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210143675.6A
Other languages
Chinese (zh)
Inventor
徐大伟
胡东阳
王家豪
翟永杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN202210143675.6A priority Critical patent/CN114493975A/en
Publication of CN114493975A publication Critical patent/CN114493975A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0014Image feed-back for automatic industrial control, e.g. robot with camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Robotics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a seedling rotating frame target detection method and system, and provides an improved YOLOv5 model for seedling rotating frame target detection based on a decoupling structure based on YOLOv 5. The method comprises the steps of introducing deformable convolution into a backbone network to solve the problem of irregular edges of detected targets (seedlings), improving detection accuracy, then setting a decoupling head network in YOLOv5, decoupling an original detection task into a plurality of subtasks, independently designing each subtask to learn the optimal characteristics of the corresponding task, and finally integrating all detection results to obtain a final rotation detection frame. The method realizes accurate identification of the seedlings based on the improved YOLOv5 model.

Description

Seedling rotating frame target detection method and system
Technical Field
The invention relates to the technical field of seedling cultivation, in particular to a seedling rotating frame target detection method and system.
Background
With the rapid development of science and technology, the seedling tissue culture technology gradually matures, and as an asexual propagation technology, the method has the advantages of short culture period, high survival rate, high yield, capability of reducing production cost and the like, and has good development prospect. The seedling cutting robot can be divided into a seedling identification system, a seedling pose and coordinate analysis system and a cutting execution system by imitating the traditional manual seedling cutting work flow. The target recognition of the seedling is a necessary link for developing a seedling cutting robot.
Disclosure of Invention
In view of this, the present invention provides a seedling rotating frame target detection method and system, so as to realize accurate identification of seedlings.
In order to achieve the purpose, the invention provides the following scheme:
a seedling rotating frame target detection method comprises the following steps:
labeling the plurality of historical seedling sample images to obtain a label of each historical seedling sample image, and constructing a sample set comprising the historical seedling images and the labels of the historical seedling images;
training an improved YOLOv5 model by using the sample set to obtain a trained improved YOLOv5 model as a target detection model; the improved Yolov5 model is obtained by replacing the convolution in the CBL structure in the Yolov5 model with a variability convolution and replacing the head network in the Yolov5 model with a decoupling head network;
and inputting the seedling image to be detected into the target detection model to obtain a seedling detection result.
Optionally, the label of the historical seedling sample image is (c, x, y, l, s, CSL (θ));
wherein c is the category of the seedlings in the historical seedling sample image; x and y represent the coordinates of the central point of the labeled box of the seedling in the historical seedling sample image; l and s respectively represent the length change and the short side of the marked box; theta represents a clockwise included angle between the long side of the labeling square frame and the horizontal axis, and CSL (theta) represents an angle category when the clockwise included angle between the long side of the labeling square frame and the horizontal axis is theta;
Figure BDA0003507808070000021
wherein g (-) is a window function, r is a radius of the window function, and θ' is an angle class parameter.
Optionally, the improved YOLOv5 model includes a backbone network, a neck network, and a decoupling header network connected in sequence;
the homogeneous backbone network comprises a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure which are connected in sequence; the CSP structure comprises a first convolution layer, a first normalization layer and a first Leaky _ relu activation function, the first CSP1_3 structure and the second CSP1_3 structure respectively comprise an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module comprises a variable convolution layer, a second normalization layer and a second Leaky _ relu activation function;
the neck network comprises three down-sampling layers and three up-sampling layers;
the decoupling head network comprises a third convolution layer, a decoupling branch and a classification branch which are connected with the third convolution layer in parallel; the decoupling branch comprises a fourth convolution layer, a regression frame loss function and a frame loss, and the classifying branch comprises a full connection layer, a category loss function and an angle loss.
Optionally, the function expression of the variable convolutional layer is:
Figure BDA0003507808070000022
Figure BDA0003507808070000023
wherein, y (p)0) P representing variable convolutional layer output0Characteristic value of (a), p0Is input at an arbitrary position, p, in the feature map of the variable convolution layernIs p0The nth sample position in a centered grid R of 3 x 3 convolution kernels, Δ pnIs pnOffset of (2), x (p)0+pn+Δpn) Representing input variable volumesP in the characteristic diagram of the build-up layer0+pn+ΔpnThe characteristic value of (c).
Optionally, the total loss function adopted in the process of training the improved YOLOv5 model by using the sample set is as follows:
Figure BDA0003507808070000024
wherein L represents the total loss function, N represents the number of anchor points, i represents the ith anchor point, objiRepresenting a binary value, L, for distinguishing the foreground from the backgroundreg、LCSL、Lcls、LiouRespectively representing a regression loss function, an angle loss function, a category loss function and a frame loss function, (x, y) representing the coordinate of the central point of a labeled square frame of seedlings in a historical seedling sample image, l and s respectively representing the length change and the short side of the labeled square frame, theta represents the clockwise included angle between the long side of the labeled square frame and a horizontal axis, CSL (theta) represents the angle category when the clockwise included angle between the long side of the labeled square frame and the horizontal axis is theta, and lambda (theta) represents the angle category when the clockwise included angle between the long side of the labeled square frame and the horizontal axis is theta1、λ2、λ3And λ4Weights representing a regression loss function, an angle loss function, a class loss function, and a bounding box loss function, respectively.
Optionally, the regression loss function is:
Figure BDA0003507808070000031
wherein a represents the difference value between the prediction block and the labeling block,
Figure BDA0003507808070000032
the function of the smoothing is represented by a smooth function,
Figure BDA0003507808070000033
the angle loss function is:
Figure BDA0003507808070000034
where M represents the total number of angle categories, θ n is the angle category, yiθnFor the angle sign function of the ith anchor point, if the angle type is true, 1 is taken, otherwise, 0 and P are takeniθnThe probability that the angle class of the ith anchor point belongs to the angle class thetan;
the class loss function is:
Figure BDA0003507808070000035
wherein C represents the number of seedling categories,
Figure BDA0003507808070000036
the seedling category representing the ith anchor point belongs to the seedling category cnThe probability of (a) of (b) being,
Figure BDA0003507808070000037
representing the ith anchor point category function;
the frame loss function is:
Figure BDA0003507808070000038
where IoU denotes the intersection ratio of the prediction box and the labeled box,
Figure BDA0003507808070000039
c denotes the minimum box covered between the prediction box and the labeling box, B is the prediction box, B is the minimum boxgtThe boxes are labeled.
A seedling rotating frame target detection system, the system comprising:
the labeling module is used for labeling the plurality of historical seedling sample images, obtaining a label of each historical seedling sample image, and constructing a sample set containing the historical seedling images and the labels of the historical seedling images;
the model training module is used for training the improved YOLOv5 model by using the sample set to obtain a trained improved YOLOv5 model as a target detection model; the improved Yolov5 model is obtained by replacing the convolution in the CBL structure in the Yolov5 model with a variability convolution and replacing the head network in the Yolov5 model with a decoupling head network;
and the detection module is used for inputting the seedling image to be detected into the target detection model to obtain a seedling detection result.
Optionally, the label of the historical seedling sample image is (c, x, y, l, s, CSL (θ));
wherein c is the category of the seedlings in the historical seedling sample image; x and y represent the coordinates of the central point of the labeled box of the seedling in the historical seedling sample image; l and s respectively represent the length change and the short side of the marked box; theta represents a clockwise included angle between the long side of the labeling square frame and the horizontal axis, and CSL (theta) represents an angle category when the clockwise included angle between the long side of the labeling square frame and the horizontal axis is theta;
Figure BDA0003507808070000041
wherein g (-) is a window function, r is a radius of the window function, and θ' is an angle class parameter.
Optionally, the improved YOLOv5 model includes a backbone network, a neck network, and a decoupling header network connected in sequence;
the homogeneous backbone network comprises a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure which are connected in sequence; the CSP structure comprises a first convolution layer, a first normalization layer and a first Leaky _ relu activation function, the first CSP1_3 structure and the second CSP1_3 structure respectively comprise an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module comprises a variable convolution layer, a second normalization layer and a second Leaky _ relu activation function;
the neck network comprises three down-sampling layers and three up-sampling layers;
the decoupling head network comprises a third convolution layer, a decoupling branch and a classification branch which are connected with the third convolution layer in parallel; the decoupling branch comprises a fourth convolution layer, a regression frame loss function and a frame loss, and the classifying branch comprises a full connection layer, a category loss function and an angle loss.
Optionally, the function expression of the variable convolutional layer is:
Figure BDA0003507808070000042
Figure BDA0003507808070000043
wherein, y (p)0) P representing variable convolutional layer output0Characteristic value of (a), p0Is input at an arbitrary position, p, in the feature map of the variable convolution layernIs p0The nth sample position in a centered grid R of 3 x 3 convolution kernels, Δ pnIs pnOffset of (2), x (p)0+pn+Δpn) P in a feature diagram representing an input variable convolution layer0+pn+ΔpnThe characteristic value of (c).
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a seedling rotating frame target detection method and system, and provides an improved YOLOv5 model for seedling rotating frame target detection based on a decoupling structure based on YOLOv 5. The method comprises the steps of introducing deformable convolution into a backbone network to solve the problem of irregular edges of detected targets (seedlings), improving detection precision, then setting a decoupling head network in YOLOv5, decoupling an original detection task into a plurality of subtasks, independently designing each subtask to learn the optimal characteristics of the corresponding task, and finally integrating all detection results to obtain a final rotation detection frame. The method realizes accurate identification of the seedlings based on the improved YOLOv5 model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flowchart of a seedling rotating frame target detection method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a method for defining a rotation frame according to an embodiment of the present invention; FIG. 2(a) is a schematic of a five parameter process for an angular range of 90 degrees, FIG. 2(b) is a schematic of a five parameter process for an angular range of 180 degrees, and FIG. 2(c) is a schematic of an ordered quadrilateral representation;
fig. 3 is a schematic structural diagram of an improved YOLOv5 model according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a variable convolutional layer according to an embodiment of the present invention;
FIG. 5 is a comparison of receptive fields provided by an embodiment of the present invention; wherein, fig. 5(a) is a schematic view of the receptive field of the two-dimensional convolutional layer, and fig. 5(b) is a schematic view of the receptive field of the variable convolutional layer;
fig. 6 is a schematic structural diagram of a decoupling header network according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a seedling rotating frame target detection method and a seedling rotating frame target detection system so as to realize accurate identification of seedlings.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Taking butterfly orchid seedling identification as an example, the butterfly orchid seedling identification is to identify different parts of seedlings, rotary frame detection is usually used for target identification of space remote sensing images, and R2-FRCNN rotary frame detection network is provided for the problem that small targets are more under a complex background, such as Zhuyu, and the like. XueYang et al propose an end-to-end multi-class rotating frame detection algorithm for complex aerial images, fuse the features of two different layers, and define any direction rectangle by using five parameters (x, y, w, h, theta) in the second stage of the network. In the aerial image rotating frame target detection work, Ran Qin and the like propose an arbitrary orientation region simple network (AO-RPN) to generate a conversion direction suggestion of a horizontal anchor point, and propose a multi-head network (MRDet) to decouple a detection task into a plurality of subtasks, and each task branch is specially designed to learn the optimal characteristics of each item, so that a better detection effect is achieved in data sets of DOTA and HRSC 2016. There is no description in the prior art of using the YOLOv5 model for seedling identification.
As shown in fig. 1, the present invention provides a seedling rotating frame target detection method, which comprises the following steps:
step 101, labeling a plurality of historical seedling sample images, obtaining a label of each historical seedling sample image, and constructing a sample set including the historical seedling images and the labels of the historical seedling images.
Photographing by using an industrial camera, obtaining 1505 seedling images, and processing the obtained seedling images according to the following steps of 8: 2 into a training set and a test set.
The existing rotating frame definition methods are generally classified into two types, including a five-parameter direction definition method and an eight-parameter quadrilateral definition method. The five-parameter definition method realizes the detection of the rotating frame in any direction by increasing a direction parameter theta, and the definition of the angle starting point and the angle stopping point can also lead to different angle ranges by referring to different angles, as shown in fig. 2, fig. 2(a) is a schematic diagram of the five-parameter method in the range of 90 degrees, in the method, the angle theta represents an acute angle formed by the rotating frame and an x axis, the frame is represented as w, the other frame is represented as h, the theta defined by the method represents the range of [ -90,0), fig. 2(b) is a schematic diagram of the five-parameter method in the range of 180 degrees, fig. 2(b) is a long-edge definition method, the angle theta in the long-edge definition method represents an included angle formed by the long edge h of the rotating frame and the x axis, and the angle is fixed at the momentThe range is [ -90,90 ], and FIG. 2(c) is a schematic diagram of the ordered quadrilateral definition method, which is expressed by the form (x)1,y1,x2,y2,x3,y3,x4,y4) And sequencing by taking the leftmost point as a starting point.
Based on the 90 ° angle regression algorithm, due to the periodicity of angle (PoA) and the exchangeability of edge (EoE), the directional loss is very large when the angle is in the edge state, which increases the regression difficulty; the sudden increase in loss in the 180 ° long edge definition method is only due to PoA; if the detection frame is inconsistent with the real frame, the boundary overrun problem occurs, and a larger loss value is generated.
So far, only θ in the parameter of the long-edge definition method based on 180 ° regression has a boundary problem, and the Circular Smooth Label (CSL) method can deal with the problem caused by the boundary overrun, so the long-edge definition method and the CSL are combined to define the angle of the rotating frame, and the regression problem of the angle of the rotating frame is converted into a classification problem to limit the range of the detection result, thereby avoiding the problem of large loss caused by the angle detection method based on the regression method. The present invention utilizes formula (1) to classify the entire defined angular range, with 1 ° being classified as one class. G (x) in equation (1) is a window function with the window radius controlled by r. The window function satisfies four properties of periodicity, symmetry, extremum and monotonicity. The angular distance between the detection label and the real label can be measured by the model by setting the window function, namely the closer the detection value in a certain range is to the real value, the smaller the loss value is.
Specifically, the data set adopts a PASALVOC labeling format, a rolabeLIMG labeling tool is used for manually labeling the target in the seedling picture, and the labeling file comprises the matrix coordinate parameters of the real target. The marked data is stored in a tag file with an xml format, the xml tag file is converted into a txt file required by YOLOv5, and the converted format is (c, x, y, l, s, theta), wherein c is a category; x and y represent coordinates of the center point of the box; l and s respectively represent the long side and the short side of the marking frame; theta is expressed as the clockwise angle of the x-axis to the long side, and theta epsilon [1,180 ]. The Circular Smooth Label (CSL) method can deal with the problem caused by the boundary overrun, and defines the rotation frame angle by combining the long-edge definition method and the CSL, and limits the range of the prediction result by converting the regression problem of the rotation frame angle into a classification problem, thereby avoiding the problem of large loss caused by the angle prediction method based on the regression mode. We classify the entire defined angular range, 1 ° into one class. The expression of CSL is:
Figure BDA0003507808070000071
g (x) in the formula is a window function, and the window radius is controlled by r. The window function satisfies four properties of periodicity, symmetry, extremum and monotonicity, and is a typical optional pulse function, Gaussian function and the like. θ' is an angle class parameter. The angle theta in the label is input into formula 1, and the window function g (x) converts the original label into an angle class, classifying the label (c, x, y, l, s, CSL (theta)).
Step 102, training an improved Yolov5 model by using the sample set to obtain a trained improved Yolov5 model as a target detection model; the improved YOLOv5 model is obtained by replacing the convolution in the CBL structure in the YOLOv5 model with a variability convolution and replacing the head network in the YOLOv5 model with a decoupling head network.
As shown in fig. 3, the improved YOLOv5 model includes a backbone network, a neck network, and a decoupled header network.
1. Backbone network
As shown in fig. 3, a backbone network (backbone) in the improved YOLOv5 model is used to extract feature representations in an image, which mainly include a Focus structure, a CSP1_1 structure, a CSP1_3 structure, and an SPP structure. The Focus structure performs a slicing operation on the input seedling image, specifically, the image in step 101 is changed into a feature map, and then the feature map is sent to the CSPl _1 structure. The CSPl _1 structure includes a convolutional layer (Conv), a normalization layer (BN), and a leakage _ relu activation function. The CSPl _3 structure includes modified CBL modules, residual modules, and convolutional layers, Consate. The SPP structure processes the feature map in a maximal pooling mode of 1 × 1, 5 × 5, 9 × 9 and 13 × 13 respectively, and then performs multi-scale feature fusion through concat operation. The butterfly orchid seedling is characterized in that the edge of a detected target is mostly in an irregular structure due to the inherent characteristics of plants, and the detection effect of the traditional convolution operation on the butterfly orchid seedling rotating frame is not ideal. Aiming at the problem, deformable convolution which is more suitable for geometric deformation is introduced, so that the receptive field of a convolution kernel can pay more attention to the target edge characteristics, and the detection performance is optimized.
The conventional two-dimensional convolution is to perform sliding sampling on the feature map through a grid R with a fixed structure, and the sampling values w are weighted and summed according to weights. The grid R defines the size and weight w of the sampling region. Each position p in the input feature map x0Output y (p) of0) Comprises the following steps:
Figure BDA0003507808070000081
wherein p isnIs p is0All sample positions in the central grid R. As shown in FIG. 4, in the deformable convolution, p is sampled for each sample point0By an offset pnCorresponding to the output y (p)0) The following steps are changed:
Figure BDA0003507808070000091
x is the seedling characteristic map, p0Is each position in the feature map, where pnIs p is0Grid R is a 3 x 3 convolution kernel for all sample locations in the central grid R. The convolutions in all CBL structures in the backbone network (backbone) of the present invention are replaced with variability convolutions.
As shown in fig. 5, it can be seen from fig. 5 that, according to the comparison result between the receptive field of the conventional convolutional layer and the receptive field of the variable convolutional layer proposed in the embodiment of the present invention, the objective detection problem is more effectively solved by using the deformable convolution for the irregular edge geometry of the butterfly orchid seedling.
The deformable convolution changes the shape of the traditional sampling region, in the model calculation process, the sampling point can possibly extend to the part outside the region of interest, and the weight is introduced on the basis of the structure, so that the deviant of each position is learned, the weight of each sampling point is also learned, the influence of the extreme sampling point on the network feature extraction is reduced, and the effectiveness of the seedling feature extraction is further ensured. And outputting the preliminarily extracted seedling characteristic information after the step is finished.
2. Neck network
Fig. 3 shows an exemplary general structure of the neck network in the embodiment of the present invention, which employs target detection: SPP binds to PANet. The times of upsampling and downsampling are increased by using the PANet, and three upsampling and three downsampling are adopted, so that the characteristic information in the low-layer information is completely transmitted to the high layer, the loss in the information transmission process is reduced, the utilization rate of the low-layer information is improved, and the precision of seedling target detection is increased. And outputting the reinforced seedling characteristic information.
3. Decoupled header network
The decoupling head network provides a decoupling detection structure for a rotating frame target for the coupling problem (see the structural schematic diagram of the decoupling head network in fig. 6): according to the embodiment of the invention, the original YOLO HEAD (HEAD) is replaced by a decoupled HEAD structure so as to improve the detection performance and optimize the positioning effect of the rotating frame.
The decoupled head network is shown in fig. 6: after the characteristic information is output from the neck network, inputting the decoupling head network, processing the characteristic information by a 3 × 3 convolutional layer, classifying different types of information, and then adding two parallel branches respectively provided with the 3 × 3 convolutional layers in a decoupling branch for executing a classification task and a regression task; and the classification branches adopt a full connection layer structure. The direction problem of the detection target frame is converted from regression to classification problem through a long-edge definition method and a CSL method, and the directed loss detection is classified into classification branches in a decoupling detection structure. The various types of losses collectively facilitate updating of the network weight parameters. And obtaining the training model after various loss calculation iterations are finished.
As can be seen in fig. 6, Cls (class loss), Angle (Angle loss) and Reg (regression box information) IoU (bounding box loss) are obtained for each layer feature through the FC (fully connected layer) layer in the regression branch and through the Conv layer.
The multitask penalty function is:
Figure BDA0003507808070000101
Figure BDA0003507808070000102
where N denotes the number of anchor points, objnIs a binary value used to distinguish the foreground from the background. L isreg、LCSL、Lcls、LiouThe regression loss function, the angle loss function, the class loss function, and the bounding box loss function are respectively expressed.
Figure BDA0003507808070000103
Wherein v ═ v (v)x,vy,vw,vh) The coordinates of the box of the GT are indicated,
Figure BDA0003507808070000104
representing the predicted box coordinates.
GT is a real box. x is the numerical difference between the prediction box and the real box.
Figure BDA0003507808070000105
Figure BDA0003507808070000106
The angle loss formula is as follows:
Figure BDA0003507808070000107
where M represents the total number of angle categories, θ n is the angle category, yiθnFor the angle sign function, if the category is true, 1 is taken, otherwise 0 is taken. PiθnIs the probability that the observed sample belongs to the angle sample.
The classification loss function is as follows:
Figure BDA0003507808070000111
where i denotes the ith sample, PinIndicating the probability that the ith sample is predicted as the nth class.
The frame loss formula is as follows:
Figure BDA0003507808070000112
wherein
Figure BDA0003507808070000113
C represents the minimum box between the coverage prediction bounding box and the real bounding box, B is the prediction bounding box, and Bgt is the real bounding box.
Cls and Angle in figure 6 correspond to Cls and Angle of the FC layer in figure 3,
in the embodiment of the invention, Cls (class loss), Angle (Angle loss), Reg (regression frame loss) and IoU (frame loss) are weighted and accumulated together, so that the model can adaptively adjust the coefficients of four losses, and the weight parameters of the model are saved. In the Loss calculating part, the Loss is calculated by adding directed Loss on the basis of the original confidence coefficient Loss, the classification Loss and the regression Loss. When the directional loss is calculated, the regression problem is converted into the classification problem through the long edge definition method and the CSL method, so that the angle class loss is calculated after the label of the labeling square frame is processed by the CSL, and the detection effect can be improved through the directional loss calculation method verified through experiments. The regression Loss of YOLOv5 used in horizontal box detection is usually calculated by different IOUs such as GIOU and CIOU, but the IOU formula between two rotating boxes is usually not derivable, which makes us have to perform back propagation by rotating the IOU Loss function to adjust its parameters, and the rotation IOU that is not derivable is usually approximated in rotation detection to make the network normally train, so this method is not suitable for the rotating box target detection task. In the embodiment of the invention, the regression problem is converted into the classification problem by combining the long edge definition method and the CSL, and the original IOU loss function of YOLOv5 can be applied, so that the GIOU is still adopted for calculation. The confidence Loss part selects the rotation frame IOU as the weight coefficient of the confidence branch, so that the phenomenon that the influence of small-range angle deviation on the rotation frame IOU is too large when the IOU of a horizontal frame is adopted for calculation is avoided, and further, the phenomenon of too large score occurs when the frame parameter prediction is correct but the angle parameter is wrong.
Calculating the minimum enclosed area q1 of two frames between the coordinate information of the prediction frame of each part position of the seedling and the coordinate information of the labeling frame, which are finally predicted, and the intersection area q2, calculating the proportion q of the enclosed area occupied by the area which does not belong to the two frames in the enclosed area through q1 and q2, namely lq1-q2 l/q 1, calculating the GloU loss between the coordinate information of the prediction frame and the coordinate information of the labeling frame, namely the GloU is p-q, performing back propagation on the GloU loss in the training network, enabling the GloU loss to be smaller and smaller until the model converges, repeating the calculation once for each iteration of the network, storing the model once, comparing the test precision in each model after the training iteration is completed, and taking the model with the highest precision as the optimal model.
The classification loss is binary cross entropy loss (BCE loss). The most common calculation index in the regression loss of the bounding box is the intersection ratio (IoU), which can obtain the distance between the predicted box and the real box, thereby reflecting the detection effect. A is the area of the predicted frame, B is the area of the real frame, and the calculation method is as follows
Figure BDA0003507808070000121
The various types of losses collectively facilitate updating of the network weight parameters. In other words, the update of the weight parameters is determined by the back propagation of the loss. And obtaining the trained model after various loss calculation iterations are finished.
The parameters of the training process of the embodiment of the invention are exemplarily set as follows: the iteration times are 300, the batch sizes are 8 respectively, a polynomial attenuation learning rate scheduling strategy is adopted, and the initial learning rate is 0.1.
And 103, inputting the seedling image to be detected into the target detection model to obtain a seedling detection result.
Inputting a seedling image to be predicted, obtaining predicted Cls (class information), Angle (Angle information) and Reg (regression box information) based on a target detection model, and mapping the information to the input seedling image to obtain a prediction result.
The embodiment of the invention also provides a seedling rotating frame target detection system, which comprises:
the labeling module is used for labeling the plurality of historical seedling sample images, obtaining a label of each historical seedling sample image, and constructing a sample set containing the historical seedling images and the labels of the historical seedling images;
the label of the historical seedling sample image is (c, x, y, l, s, CSL (theta));
wherein c is the category of the seedlings in the historical seedling sample image; x and y represent the coordinates of the central point of the labeled box of the seedling in the historical seedling sample image; l and s respectively represent the length change and the short side of the marked box; theta represents a clockwise included angle between the long side of the labeling square frame and the horizontal axis, and CSL (theta) represents an angle category when the clockwise included angle between the long side of the labeling square frame and the horizontal axis is theta;
Figure BDA0003507808070000131
wherein g (-) is a window function, r is a radius of the window function, and θ' is an angle class parameter.
The model training module is used for training the improved YOLOv5 model by using the sample set to obtain a trained improved YOLOv5 model as a target detection model; the improved YOLOv5 model is obtained by replacing the convolution in the CBL structure in the YOLOv5 model with a variability convolution and replacing the head network in the YOLOv5 model with a decoupling head network.
The improved YOLOv5 model includes a backbone network, a neck network, and a decoupled header network connected in series.
The homogeneous backbone network comprises a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure which are connected in sequence; the CSP structure comprises a first convolution layer, a first normalization layer and a first Leaky _ relu activation function, the first CSP1_3 structure and the second CSP1_3 structure respectively comprise an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module comprises a variable convolution layer, a second normalization layer and a second Leaky _ relu activation function.
The neck network includes three downsampling layers and three upsampling layers.
The decoupling head network comprises a third convolution layer, a decoupling branch and a classification branch which are connected with the third convolution layer in parallel; the decoupling branch comprises a fourth convolution layer, a regression frame loss function and a frame loss, and the classifying branch comprises a full connection layer, a category loss function and an angle loss.
Wherein the functional expression of the variable convolutional layer is as follows:
Figure BDA0003507808070000132
Figure BDA0003507808070000133
wherein, y (p)0) P representing variable convolutional layer output0Characteristic value of (a), p0Is input at an arbitrary position, p, in the feature map of the variable convolution layernIs p0The nth sample position in a centered grid R of 3 x 3 convolution kernels, Δ pnIs pnOffset of (2), x (p)0+pn+Δpn) To representInputting p in the feature map of the variable convolution layer0+pn+ΔpnThe characteristic value of (c).
And the detection module is used for inputting the seedling image to be detected into the target detection model to obtain a seedling detection result.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a seedling rotating frame target detection method and system, and provides an improved YOLOv5 model for seedling rotating frame target detection based on a decoupling structure based on YOLOv 5. According to the improved YOLOv5 model, the problem of irregular edges of detected targets is solved by introducing deformable convolution in a backbone network, the detection precision is improved, an original detection task is decoupled into a plurality of subtasks, each subtask is specially designed to learn the optimal characteristics of the corresponding task, the classification, the position, the size and the direction of the detected objects are further detected, the specific angle, the position and the size of seedlings can be accurately detected by comparing the current square frame, and the method has the advantages that the registration accuracy and the registration efficiency can be greatly improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A seedling rotating frame target detection method is characterized by comprising the following steps:
labeling the plurality of historical seedling sample images to obtain a label of each historical seedling sample image, and constructing a sample set comprising the historical seedling images and the labels of the historical seedling images;
training an improved YOLOv5 model by using the sample set to obtain a trained improved YOLOv5 model as a target detection model; the improved Yolov5 model is obtained by replacing the convolution in the CBL structure in the Yolov5 model with a variability convolution and replacing the head network in the Yolov5 model with a decoupling head network;
and inputting the seedling image to be detected into the target detection model to obtain a seedling detection result.
2. The seedling rotating frame target detection method of claim 1, wherein the label of the historical seedling sample image is (c, x, y, l, s, CSL (θ));
wherein c is the category of the seedlings in the historical seedling sample image; (x, y) represents the coordinates of the central point of the labeled box of the seedling in the historical seedling sample image; l and s respectively represent the length change and the short side of the marked box; theta represents a clockwise included angle between the long side of the labeling square frame and the horizontal axis, and CSL (theta) represents an angle category when the clockwise included angle between the long side of the labeling square frame and the horizontal axis is theta;
Figure FDA0003507808060000011
wherein g (-) is a window function, r is a radius of the window function, and θ' is an angle class parameter.
3. The seedling rotating frame target detection method of claim 1, wherein the improved YOLOv5 model comprises a backbone network, a neck network and a decoupling head network connected in sequence;
the homogeneous backbone network comprises a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure which are connected in sequence; the CSP structure comprises a first convolution layer, a first normalization layer and a first Leaky _ relu activation function, the first CSP1_3 structure and the second CSP1_3 structure respectively comprise an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module comprises a variable convolution layer, a second normalization layer and a second Leaky _ relu activation function;
the neck network comprises three down-sampling layers and three up-sampling layers;
the decoupling head network comprises a third convolution layer, a decoupling branch and a classification branch which are connected with the third convolution layer in parallel; the decoupling branch comprises a fourth convolution layer, a regression frame loss function and a frame loss, and the classifying branch comprises a full connection layer, a category loss function and an angle loss.
4. A seedling rotating frame target detection method according to claim 3, wherein the function expression of the variable convolution layer is:
Figure FDA0003507808060000021
wherein, y (p)0) P representing variable convolutional layer output0Characteristic value of (a), p0Is input at an arbitrary position, p, in the feature map of the variable convolution layernIs p0The nth sample position in a centered grid R of 3 x 3 convolution kernels, Δ pnIs pnOffset of (2), x (p)0+pn+Δpn) P in a feature diagram representing an input variable convolution layer0+pn+ΔpnCharacteristic value of (a), w (p)n) P in a feature diagram representing an input variable convolution layer0+pn+ΔpnThe weight of the characteristic value of (a).
5. The seedling rotating frame target detection method of claim 1, wherein a total loss function adopted in the process of training the improved YOLOv5 model by using the sample set is as follows:
Figure FDA0003507808060000022
wherein L represents the total loss function, N represents the number of anchor points, i represents the ith anchor point, objiRepresenting a binary value, L, for distinguishing the foreground from the backgroundreg、LCSL、Lcls、LiouRespectively representing a regression loss function, an angle loss function, a category loss function and a frame loss function, (x, y) representing the coordinate of the central point of a labeled square frame of seedlings in a historical seedling sample image, l and s respectively representing the length change and the short side of the labeled square frame, theta represents the clockwise included angle between the long side of the labeled square frame and a horizontal axis, CSL (theta) represents the angle category when the clockwise included angle between the long side of the labeled square frame and the horizontal axis is theta, and lambda (theta) represents the angle category when the clockwise included angle between the long side of the labeled square frame and the horizontal axis is theta1、λ2、λ3And λ4Weights representing a regression loss function, an angle loss function, a class loss function, and a bounding box loss function, respectively.
6. A seedling rotating frame target detection method according to claim 5, wherein the regression loss function is:
Figure FDA0003507808060000023
wherein a represents the difference value between the prediction block and the labeling block,
Figure FDA0003507808060000024
the function of the smoothing is represented by a smooth function,
Figure FDA0003507808060000025
the angle loss function is:
Figure FDA0003507808060000026
where M represents the total number of angle categories, θ n is the angle category, yiθnFor the angle sign function of the ith anchor point, if the angle type is true, 1 is taken, otherwise, 0 and P are takeniθnThe probability that the angle class of the ith anchor point belongs to the angle class thetan;
the class loss function is:
Figure FDA0003507808060000031
wherein C represents the number of seedling categories,
Figure FDA0003507808060000032
the seedling category representing the ith anchor point belongs to the seedling category cnThe probability of (a) of (b) being,
Figure FDA0003507808060000033
representing the ith anchor point category function;
the frame loss function is:
Figure FDA0003507808060000034
where IoU denotes the intersection ratio of the prediction box and the annotation box,
Figure FDA0003507808060000035
c denotes the minimum box covered between the predicted box and the labeled box, B is the predicted box, BgtThe boxes are labeled.
7. A seedling rotating frame target detection system, the system comprising:
the labeling module is used for labeling the plurality of historical seedling sample images, obtaining a label of each historical seedling sample image, and constructing a sample set containing the historical seedling images and the labels of the historical seedling images;
the model training module is used for training the improved YOLOv5 model by using the sample set to obtain a trained improved YOLOv5 model as a target detection model; the improved Yolov5 model is obtained by replacing the convolution in the CBL structure in the Yolov5 model with a variability convolution and replacing the head network in the Yolov5 model with a decoupling head network;
and the detection module is used for inputting the seedling image to be detected into the target detection model to obtain a seedling detection result.
8. The seedling rotating frame target detecting system of claim 7, wherein the label of the historical seedling sample image is (c, x, y, l, s, CSL (θ));
wherein c is the category of the seedlings in the historical seedling sample image; x and y represent the coordinates of the central point of the marked box of the seedling in the historical seedling sample image; l and s respectively represent the length change and the short side of the marked box; theta represents a clockwise included angle between the long side of the labeling square frame and the horizontal axis, and CSL (theta) represents an angle category when the clockwise included angle between the long side of the labeling square frame and the horizontal axis is theta;
Figure FDA0003507808060000036
wherein g (-) is a window function, r is a radius of the window function, and θ' is an angle class parameter.
9. The seedling rotation frame target detection system of claim 7, wherein the improved YOLOv5 model comprises a backbone network, a neck network, and a decoupling head network connected in sequence;
the homogeneous backbone network comprises a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure which are connected in sequence; the CSP structure comprises a first convolution layer, a first normalization layer and a first Leaky _ relu activation function, the first CSP1_3 structure and the second CSP1_3 structure respectively comprise an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module comprises a variable convolution layer, a second normalization layer and a second Leaky _ relu activation function;
the neck network comprises three down-sampling layers and three up-sampling layers;
the decoupling head network comprises a third convolution layer, a decoupling branch and a classification branch which are connected with the third convolution layer in parallel; the decoupling branch comprises a fourth convolution layer, a regression frame loss function and a frame loss, and the classifying branch comprises a full connection layer, a category loss function and an angle loss.
10. A seedling rotation frame target detection system as claimed in claim 9, wherein the function expression of the variable convolution layer is:
Figure FDA0003507808060000041
wherein, y (p)0) P representing variable convolutional layer output0Characteristic value of (a), p0Is input at an arbitrary position, p, in the feature map of the variable convolution layernIs p0The nth sample position in a centered grid R of 3 x 3 convolution kernels, Δ pnIs pnOffset of (2), x (p)0+pn+Δpn) P in a feature diagram representing an input variable convolution layer0+pn+ΔpnThe characteristic value of (c).
CN202210143675.6A 2022-02-17 2022-02-17 Seedling rotating frame target detection method and system Pending CN114493975A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210143675.6A CN114493975A (en) 2022-02-17 2022-02-17 Seedling rotating frame target detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210143675.6A CN114493975A (en) 2022-02-17 2022-02-17 Seedling rotating frame target detection method and system

Publications (1)

Publication Number Publication Date
CN114493975A true CN114493975A (en) 2022-05-13

Family

ID=81481712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210143675.6A Pending CN114493975A (en) 2022-02-17 2022-02-17 Seedling rotating frame target detection method and system

Country Status (1)

Country Link
CN (1) CN114493975A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881763A (en) * 2022-05-18 2022-08-09 中国工商银行股份有限公司 Method, device, equipment and medium for post-loan supervision of aquaculture
CN117611791A (en) * 2023-10-20 2024-02-27 哈尔滨工业大学 Method for detecting flying target based on feature separation deformable convolution

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881763A (en) * 2022-05-18 2022-08-09 中国工商银行股份有限公司 Method, device, equipment and medium for post-loan supervision of aquaculture
CN117611791A (en) * 2023-10-20 2024-02-27 哈尔滨工业大学 Method for detecting flying target based on feature separation deformable convolution

Similar Documents

Publication Publication Date Title
CN111652216B (en) Multi-scale target detection model method based on metric learning
CN112150425B (en) Unsupervised intravascular ultrasound image registration method based on neural network
Wang et al. Vision-assisted BIM reconstruction from 3D LiDAR point clouds for MEP scenes
CN114493975A (en) Seedling rotating frame target detection method and system
CN111695562B (en) Autonomous robot grabbing method based on convolutional neural network
CN110766041B (en) Deep learning-based pest detection method
CN109447078A (en) A kind of detection recognition method of natural scene image sensitivity text
CN111368769B (en) Ship multi-target detection method based on improved anchor point frame generation model
CN112766229B (en) Human face point cloud image intelligent identification system and method based on attention mechanism
CN110069993A (en) A kind of target vehicle detection method based on deep learning
CN112364931A (en) Low-sample target detection method based on meta-feature and weight adjustment and network model
CN110059765B (en) Intelligent mineral identification and classification system and method
CN115049619B (en) Efficient flaw detection method for complex scene
CN114743007A (en) Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion
Hou et al. A pointer meter reading recognition method based on YOLOX and semantic segmentation technology
CN116883393A (en) Metal surface defect detection method based on anchor frame-free target detection algorithm
CN111476167A (en) student-T distribution assistance-based one-stage direction remote sensing image target detection method
CN118096785A (en) Image segmentation method and system based on cascade attention and multi-scale feature fusion
CN115049833A (en) Point cloud component segmentation method based on local feature enhancement and similarity measurement
CN117593514B (en) Image target detection method and system based on deep principal component analysis assistance
He et al. Visual recognition and location algorithm based on optimized YOLOv3 detector and RGB depth camera
Miao et al. A two-step phenotypic parameter measurement strategy for overlapped grapes under different light conditions
CN117593243A (en) Compressor appearance self-adaptive detection method guided by reliable pseudo tag
CN117253041A (en) Depth instance segmentation method based on contour intersection ratio loss
CN114372502B (en) Angle-adaptive elliptical template target detector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination