CN114493975A - Method and system for target detection of seedling rotating frame - Google Patents

Method and system for target detection of seedling rotating frame Download PDF

Info

Publication number
CN114493975A
CN114493975A CN202210143675.6A CN202210143675A CN114493975A CN 114493975 A CN114493975 A CN 114493975A CN 202210143675 A CN202210143675 A CN 202210143675A CN 114493975 A CN114493975 A CN 114493975A
Authority
CN
China
Prior art keywords
seedling
angle
historical
function
convolution layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210143675.6A
Other languages
Chinese (zh)
Inventor
徐大伟
胡东阳
王家豪
翟永杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN202210143675.6A priority Critical patent/CN114493975A/en
Publication of CN114493975A publication Critical patent/CN114493975A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0014Image feed-back for automatic industrial control, e.g. robot with camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Robotics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a seedling rotating frame target detection method and system, and provides an improved YOLOv5 model for seedling rotating frame target detection based on a decoupling structure based on YOLOv 5. The method comprises the steps of introducing deformable convolution into a backbone network to solve the problem of irregular edges of detected targets (seedlings), improving detection accuracy, then setting a decoupling head network in YOLOv5, decoupling an original detection task into a plurality of subtasks, independently designing each subtask to learn the optimal characteristics of the corresponding task, and finally integrating all detection results to obtain a final rotation detection frame. The method realizes accurate identification of the seedlings based on the improved YOLOv5 model.

Description

一种种苗旋转框目标检测方法及系统A method and system for target detection of seedling rotating frame

技术领域technical field

本发明涉及种苗培育技术领域,特别是涉及一种种苗旋转框目标检测方法及系统。The invention relates to the technical field of seedling cultivation, in particular to a method and system for detecting a target of a seedling rotating frame.

背景技术Background technique

随着科学技术的迅猛发展,种苗组织培养技术逐渐走向成熟,其作为一种无性繁殖技术,具有培育周期短、成活率高、产量高、可降低生产成本等众多优势,具有很好的发展前景。仿照传统人工切割种苗的工作流程,种苗切割机器人可分为种苗的识别系统、种苗的位姿与坐标分析系统和切割执行系统。种苗的目标识别是开发种苗切割机器人的必要环节。With the rapid development of science and technology, seedling tissue culture technology is gradually becoming mature. As a kind of asexual reproduction technology, it has many advantages such as short cultivation cycle, high survival rate, high yield, and can reduce production costs, and has a very good development. prospect. Following the workflow of traditional manual cutting of seedlings, the seedling cutting robot can be divided into a seedling identification system, a seedling pose and coordinate analysis system and a cutting execution system. The target recognition of seedlings is a necessary link in the development of seedling cutting robots.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明提供了一种种苗旋转框目标检测方法及系统,以实现种苗的精确识别。In view of this, the present invention provides a seedling rotation frame target detection method and system to achieve accurate identification of seedlings.

为实现上述目的,本发明提供了如下方案:For achieving the above object, the present invention provides the following scheme:

一种种苗旋转框目标检测方法,所述方法包括如下步骤:A seedling rotation frame target detection method, the method comprises the following steps:

对多个历史种苗样本图像进行标注,获得每个历史种苗样本图像的标签,构建包含历史种苗图像和历史种苗图像的标签的样本集;Annotate multiple historical seedling sample images, obtain the label of each historical seedling sample image, and construct a sample set containing the historical seedling image and the label of the historical seedling image;

利用所述样本集对改进YOLOv5模型进行训练,获得训练后的改进YOLOv5模型,作为目标检测模型;所述改进YOLOv5模型为将YOLOv5模型中的CBL结构中的卷积替换为可变性卷积,且将YOLOv5模型中头部网络替换为解耦头部网络得到的;The improved YOLOv5 model is trained by using the sample set, and the improved YOLOv5 model after training is obtained as a target detection model; the improved YOLOv5 model is to replace the convolution in the CBL structure in the YOLOv5 model with a variable convolution, and Replacing the head network in the YOLOv5 model with the decoupling head network;

将待测种苗图像输入目标检测模型,获得种苗检测结果。Input the image of the seedling to be tested into the target detection model to obtain the seedling detection result.

可选的,历史种苗样本图像的标签为(c,x,y,l,s,CSL(θ));Optionally, the label of the historical seedling sample image is (c,x,y,l,s,CSL(θ));

其中,c为历史种苗样本图像中种苗的类别;x,y表示历史种苗样本图像中种苗的标注方框的中心点坐标;l和s分别表示标注方框的长变和短边;θ表示标注方框的长边与水平轴的顺时针夹角,CSL(θ)表示标注方框的长边与水平轴的顺时针夹角为θ时的角度类别;Among them, c is the category of the seedlings in the historical seedling sample image; x, y represent the coordinates of the center point of the marked box of the seedlings in the historical seedling sample image; l and s represent the long and short sides of the marked box, respectively ; θ represents the clockwise angle between the long side of the labeled box and the horizontal axis, CSL(θ) represents the angle category when the clockwise angle between the long side of the labeled box and the horizontal axis is θ;

Figure BDA0003507808070000021
Figure BDA0003507808070000021

其中,g(·)为窗口函数,r为窗口函数的半径,θ′为角度类别参数。Among them, g(·) is the window function, r is the radius of the window function, and θ′ is the angle category parameter.

可选的,所述改进YOLOv5模型包括顺次连接的骨干网络、颈部网络和解耦头部网络;Optionally, the improved YOLOv5 model includes a sequentially connected backbone network, a neck network, and a decoupled head network;

所述均骨干网络包括顺次连接的Focus结构、CSP1_1结构、第一CSP1_3结构、第二CSP1_3结构和SPP结构;所述CSP结构包括第一卷积层、第一归一化层和第一Leaky_relu激活函数,所述第一CSP1_3结构和所述第二CSP1_3结构均包括改进CBL模块、残差模块、第二卷积层和Concate层,所述改进CBL模块包括可变卷积层、第二归一化层和第二Leaky_relu激活函数;The uniform backbone network includes a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure connected in sequence; the CSP structure includes a first convolution layer, a first normalization layer and a first Leaky_relu Activation function, the first CSP1_3 structure and the second CSP1_3 structure both include an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module includes a variable convolution layer, a second normalization layer. Unification layer and second Leaky_relu activation function;

所述颈部网络包括三个下采样层和三个上采样层;the neck network includes three down-sampling layers and three up-sampling layers;

所述解耦头部网络包括第三卷积层,与所述第三卷积层并行连接的解耦分支和分类分支;所述解耦分支包括第四卷积层、回归框损失函数和边框损失,所述分类分支包括全连接层、类别损失函数和角度损失。The decoupling head network includes a third convolution layer, a decoupling branch and a classification branch connected in parallel with the third convolution layer; the decoupling branch includes a fourth convolution layer, a regression box loss function and a bounding box loss, the classification branch includes a fully connected layer, a class loss function, and an angle loss.

可选的,所述可变卷积层的函数表达式为:

Figure BDA0003507808070000022
Figure BDA0003507808070000023
Optionally, the function expression of the variable convolution layer is:
Figure BDA0003507808070000022
Figure BDA0003507808070000023

其中,y(p0)表示可变卷积层输出的p0处的特征值,p0是输入可变卷积层的特征图中的任意位置,pn为p0为中心的网格R中的第n个采样位置,网格R为3×3的卷积核,Δpn为pn的偏移量,x(p0+pn+Δpn)表示输入可变卷积层的特征图中的p0+pn+Δpn处的特征值。Among them, y(p 0 ) represents the feature value at p 0 output by the variable convolution layer, p 0 is any position in the feature map of the input variable convolution layer, and p n is the grid R centered at p 0 The nth sampling position in the grid R is the 3×3 convolution kernel, Δp n is the offset of p n , and x(p 0 +p n +Δp n ) represents the feature of the input variable convolutional layer Eigenvalues at p 0 +p n + Δpn in the graph.

可选的,利用所述样本集对改进YOLOv5模型进行训练的过程中采用的总损失函数为:Optionally, the total loss function used in the process of training the improved YOLOv5 model using the sample set is:

Figure BDA0003507808070000024
Figure BDA0003507808070000024

其中,L表示总损失函数,N表示锚点的数量,i表示第i个锚点,obji表示用于区分前景与背景的二进制值,Lreg、LCSL、Lcls、Liou分别表示回归损失函数、角度损失函数、类别损失函数和边框损失函数,(x,y)表示历史种苗样本图像中种苗的标注方框的中心点坐标,l和s分别表示标注方框的长变和短边,θ表示标注方框的长边与水平轴的顺时针夹角,CSL(θ)表示标注方框的长边与水平轴的顺时针夹角为θ时的角度类别,λ1、λ2、λ3和λ4分别表示回归损失函数、角度损失函数、类别损失函数和边框损失函数的权重。Among them, L represents the total loss function, N represents the number of anchor points, i represents the ith anchor point, obj i represents the binary value used to distinguish foreground and background, L reg , L CSL , L cls , and L iou represent regression, respectively Loss function, angle loss function, category loss function and frame loss function, (x, y) represents the coordinates of the center point of the labeled box of the seedlings in the historical seedling sample images, and l and s respectively represent the length change and sum of the labeled boxes. Short side, θ represents the clockwise angle between the long side of the labeled box and the horizontal axis, CSL(θ) represents the angle category when the clockwise angle between the long side of the labeled box and the horizontal axis is θ, λ 1 , λ 2 , λ 3 and λ 4 represent the weights of regression loss function, angle loss function, class loss function and bounding box loss function, respectively.

可选的,所述回归损失函数为:

Figure BDA0003507808070000031
Optionally, the regression loss function is:
Figure BDA0003507808070000031

其中,a表示预测方框和标注方框的差异值,

Figure BDA0003507808070000032
表示平滑函数,
Figure BDA0003507808070000033
Among them, a represents the difference between the prediction box and the label box,
Figure BDA0003507808070000032
represents a smooth function,
Figure BDA0003507808070000033

所述角度损失函数为:

Figure BDA0003507808070000034
The angle loss function is:
Figure BDA0003507808070000034

其中,M表示角度类别总数,θn为角度类别,yiθn为第i个锚点的角度符号函数,角度类别为真则取1,反之取0,Piθn为第i个锚点的角度类别属于角度类别θn的概率;Among them, M represents the total number of angle categories, θn is the angle category, y iθn is the angle sign function of the ith anchor point, 1 if the angle category is true, 0 otherwise, P iθn is the angle category of the ith anchor point belonging to the probability of the angle class θn;

所述类别损失函数为:

Figure BDA0003507808070000035
The class loss function is:
Figure BDA0003507808070000035

其中,C表示种苗类别的数量,

Figure BDA0003507808070000036
表示第i个锚点的种苗类别属于种苗类别cn的概率,
Figure BDA0003507808070000037
表示第i个锚点类别函数;Among them, C represents the number of seedling categories,
Figure BDA0003507808070000036
represents the probability that the seedling category of the i- th anchor belongs to the seedling category cn,
Figure BDA0003507808070000037
Represents the i-th anchor category function;

所述边框损失函数为:

Figure BDA0003507808070000038
The bounding box loss function is:
Figure BDA0003507808070000038

其中,IoU表示预测方框与标注方框的交并比,

Figure BDA0003507808070000039
C表示覆盖在预测方框与标注方框之间的最小框,B为预测方框,Bgt为标注方框。Among them, IoU represents the intersection ratio of the prediction box and the label box,
Figure BDA0003507808070000039
C represents the smallest box covered between the prediction box and the label box, B is the prediction box, and B gt is the label box.

一种种苗旋转框目标检测系统,所述系统包括:A seedling rotation frame target detection system, the system comprises:

标注模块,用于对多个历史种苗样本图像进行标注,获得每个历史种苗样本图像的标签,构建包含历史种苗图像和历史种苗图像的标签的样本集;The labeling module is used to label multiple historical seedling sample images, obtain the label of each historical seedling sample image, and construct a sample set including the historical seedling image and the label of the historical seedling image;

模型训练模块,用于利用所述样本集对改进YOLOv5模型进行训练,获得训练后的改进YOLOv5模型,作为目标检测模型;所述改进YOLOv5模型为将YOLOv5模型中的CBL结构中的卷积替换为可变性卷积,且将YOLOv5模型中头部网络替换为解耦头部网络得到的;The model training module is used to train the improved YOLOv5 model using the sample set, and obtain the improved YOLOv5 model after training as a target detection model; the improved YOLOv5 model is to replace the convolution in the CBL structure in the YOLOv5 model with The variable convolution is obtained by replacing the head network in the YOLOv5 model with the decoupled head network;

检测模块,用于将待测种苗图像输入目标检测模型,获得种苗检测结果。The detection module is used to input the image of the seedling to be tested into the target detection model to obtain the seedling detection result.

可选的,历史种苗样本图像的标签为(c,x,y,l,s,CSL(θ));Optionally, the label of the historical seedling sample image is (c,x,y,l,s,CSL(θ));

其中,c为历史种苗样本图像中种苗的类别;x,y表示历史种苗样本图像中种苗的标注方框的中心点坐标;l和s分别表示标注方框的长变和短边;θ表示标注方框的长边与水平轴的顺时针夹角,CSL(θ)表示标注方框的长边与水平轴的顺时针夹角为θ时的角度类别;Among them, c is the category of the seedlings in the historical seedling sample image; x, y represent the coordinates of the center point of the marked box of the seedlings in the historical seedling sample image; l and s represent the long and short sides of the marked box, respectively ; θ represents the clockwise angle between the long side of the labeled box and the horizontal axis, CSL(θ) represents the angle category when the clockwise angle between the long side of the labeled box and the horizontal axis is θ;

Figure BDA0003507808070000041
Figure BDA0003507808070000041

其中,g(·)为窗口函数,r为窗口函数的半径,θ′为角度类别参数。Among them, g(·) is the window function, r is the radius of the window function, and θ′ is the angle category parameter.

可选的,所述改进YOLOv5模型包括顺次连接的骨干网络、颈部网络和解耦头部网络;Optionally, the improved YOLOv5 model includes a sequentially connected backbone network, a neck network, and a decoupled head network;

所述均骨干网络包括顺次连接的Focus结构、CSP1_1结构、第一CSP1_3结构、第二CSP1_3结构和SPP结构;所述CSP结构包括第一卷积层、第一归一化层和第一Leaky_relu激活函数,所述第一CSP1_3结构和所述第二CSP1_3结构均包括改进CBL模块、残差模块、第二卷积层和Concate层,所述改进CBL模块包括可变卷积层、第二归一化层和第二Leaky_relu激活函数;The uniform backbone network includes a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure connected in sequence; the CSP structure includes a first convolution layer, a first normalization layer and a first Leaky_relu Activation function, the first CSP1_3 structure and the second CSP1_3 structure both include an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module includes a variable convolution layer, a second normalization layer. Unification layer and second Leaky_relu activation function;

所述颈部网络包括三个下采样层和三个上采样层;the neck network includes three down-sampling layers and three up-sampling layers;

所述解耦头部网络包括第三卷积层,与所述第三卷积层并行连接的解耦分支和分类分支;所述解耦分支包括第四卷积层、回归框损失函数和边框损失,所述分类分支包括全连接层、类别损失函数和角度损失。The decoupling head network includes a third convolution layer, a decoupling branch and a classification branch connected in parallel with the third convolution layer; the decoupling branch includes a fourth convolution layer, a regression box loss function and a bounding box loss, the classification branch includes a fully connected layer, a class loss function, and an angle loss.

可选的,所述可变卷积层的函数表达式为:

Figure BDA0003507808070000042
Figure BDA0003507808070000043
Optionally, the function expression of the variable convolution layer is:
Figure BDA0003507808070000042
Figure BDA0003507808070000043

其中,y(p0)表示可变卷积层输出的p0处的特征值,p0是输入可变卷积层的特征图中的任意位置,pn为p0为中心的网格R中的第n个采样位置,网格R为3×3的卷积核,Δpn为pn的偏移量,x(p0+pn+Δpn)表示输入可变卷积层的特征图中的p0+pn+Δpn处的特征值。Among them, y(p 0 ) represents the feature value at p 0 output by the variable convolution layer, p 0 is any position in the feature map of the input variable convolution layer, and p n is the grid R centered at p 0 The nth sampling position in the grid R is the 3×3 convolution kernel, Δp n is the offset of p n , and x(p 0 +p n +Δp n ) represents the feature of the input variable convolutional layer Eigenvalues at p 0 +p n + Δpn in the graph.

根据本发明提供的具体实施例,本发明公开了以下技术效果:According to the specific embodiments provided by the present invention, the present invention discloses the following technical effects:

本发明公开一种种苗旋转框目标检测方法及系统,本发明的方法基于YOLOv5提出一种基于解耦结构的用于种苗旋转框目标检测的改进YOLOv5模型。在骨干网络中引入可变形卷积来解决被检测目标(种苗)边缘不规则问题,提升检测精度,然后在YOLOv5中设置解耦头部网络,将原有的检测任务解耦为多个子任务,对每个子任务进行单独设计来学习对应任务的最优特征,最后整合所有检测结果,得到最终的旋转检测框。本发明的方法基于改进YOLOv5模型实现了种苗的精确识别。The invention discloses a method and a system for detecting a target of a rotating frame of seedlings. The method of the present invention proposes an improved YOLOv5 model based on a decoupling structure for target detection of a rotating frame of seedlings based on YOLOv5. Introduce deformable convolution into the backbone network to solve the edge irregularity problem of the detected target (seedling), improve the detection accuracy, and then set up a decoupling head network in YOLOv5 to decouple the original detection task into multiple subtasks , each subtask is individually designed to learn the optimal features of the corresponding task, and finally all detection results are integrated to obtain the final rotated detection frame. The method of the invention realizes the accurate identification of seedlings based on the improved YOLOv5 model.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings required in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some of the present invention. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative labor.

图1为本发明实施例提供的一种种苗旋转框目标检测方法的流程图;Fig. 1 is the flow chart of a kind of seedling rotation frame target detection method provided in the embodiment of the present invention;

图2为本发明实施例提供的旋转框定义方法的示意图;图2(a)为90°角范围的五参数法的示意图,图2(b)为180°角范围的五参数法的示意图,图2(c)为有序四边形表示法的示意图;2 is a schematic diagram of a method for defining a rotation frame provided by an embodiment of the present invention; FIG. 2(a) is a schematic diagram of a five-parameter method in an angular range of 90°, and FIG. 2(b) is a schematic diagram of a five-parameter method in an angular range of 180°, Figure 2(c) is a schematic diagram of the ordered quadrilateral representation;

图3为本发明实施例提供的改进YOLOv5模型的结构示意图;3 is a schematic structural diagram of an improved YOLOv5 model provided by an embodiment of the present invention;

图4为本发明实施例提供的可变卷积层的结构示意图;4 is a schematic structural diagram of a variable convolution layer provided by an embodiment of the present invention;

图5为本发明实施例提供的感受野对比图;其中,图5(a)二维卷积层的感受野示意图,图5(b)为可变卷积层的感受野示意图;5 is a comparison diagram of the receptive field provided by an embodiment of the present invention; wherein, FIG. 5(a) is a schematic diagram of the receptive field of a two-dimensional convolutional layer, and FIG. 5(b) is a schematic diagram of the receptive field of a variable convolutional layer;

图6为本发明实施例提供的解耦头部网络的结构示意图。FIG. 6 is a schematic structural diagram of a decoupling header network provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明的目的是提供一种种苗旋转框目标检测方法及系统,以实现种苗的精确识别。The purpose of the present invention is to provide a method and system for detecting the target of a seedling rotating frame, so as to realize the accurate identification of the seedling.

为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

以蝴蝶兰种苗识别为例,蝴蝶兰种苗目标识别是要对种苗的不同部位进行识别,旋转框检测通常用于航天遥感图像的目标识别,朱煜等针对复杂背景下小目标较多的问题提出了R2-FRCNN旋转框检测网络,利用粗调和细调两阶段实现检测,粗调将水平框转换为旋转框,而细调进一步优化旋转框定位,并在DOTA遥感数据集上进行了验证。XueYang等针对复杂航空图像提出了一种端到端的多类别旋转框检测算法,把两个不同层的特征进行融合,并且在网络的第二阶段,使用五参数(x,y,w,h,θ)来定义任意方向矩形。Ran Qin等在航空图像旋转框目标检测工作中提出一种任意定向区域简易网络(AO-RPN)来生成水平锚点的转换方向建议,并提出一种多头网络(MRDet)来将检测任务解耦为多个子任务,通过对每个任务分支专门设计来学习每项的最优特征,在DOTA和HRSC2016数据集中取得了较好的检测效果。现有技术中并没有关于使用YOLOv5模型进行种苗识别的相关记载。Taking the identification of Phalaenopsis seedlings as an example, the target recognition of Phalaenopsis seedlings is to identify different parts of the seedlings. The rotating frame detection is usually used for target recognition of aerospace remote sensing images. Zhu Yu et al. The R2-FRCNN rotation frame detection network is proposed, which uses two stages of coarse adjustment and fine adjustment to achieve detection. The coarse adjustment converts the horizontal frame into a rotation frame, and the fine adjustment further optimizes the rotation frame positioning, and carried out on the DOTA remote sensing data set. verify. Xue Yang et al. proposed an end-to-end multi-class rotation frame detection algorithm for complex aerial images, which fused the features of two different layers, and in the second stage of the network, five parameters (x, y, w, h, θ) to define any direction rectangle. Ran Qin et al. proposed an Arbitrary Oriented Area Simple Network (AO-RPN) to generate translation direction proposals for horizontal anchors in the work of aerial image rotation frame target detection, and proposed a multi-head network (MRDet) to decouple the detection task. For multiple sub-tasks, by specially designing each task branch to learn the optimal features of each item, good detection results have been achieved in the DOTA and HRSC2016 datasets. There is no relevant record about using the YOLOv5 model for seedling identification in the prior art.

如图1所示,本发明提供一种种苗旋转框目标检测方法,所述方法包括如下步骤:As shown in Figure 1, the present invention provides a kind of seedling rotation frame target detection method, and described method comprises the following steps:

步骤101,对多个历史种苗样本图像进行标注,获得每个历史种苗样本图像的标签,构建包含历史种苗图像和历史种苗图像的标签的样本集。Step 101: Label a plurality of historical seedling sample images, obtain a label of each historical seedling sample image, and construct a sample set including the historical seedling image and the labels of the historical seedling image.

采用工业相机进行拍照,获取1505张种苗图像,并把获取的种苗图像按照8:2分为训练集和测试集。Using an industrial camera to take pictures, 1505 seedling images were obtained, and the obtained seedling images were divided into training set and test set according to 8:2.

现有的旋转框定义法大体归为两种,包括五参数的方向定义法和八参数的四边形定义法。五参数定义法通过增加方向参数θ来实现任意方向的旋转框检测,角度起止点定义参照不同也会导致不同的角度范围,如图2所示,图2(a)90°范围内的五参数法的示意图,该方法中角度θ表示旋转框与x轴所成的锐角,并且这条框记为w,另一条框记为h,以此方法定义的θ表示范围是[-90,0),图2(b)180°范围内的五参数法的示意图,图2(b)180°范围内的五参数法为长边定义法,长边定义法中角度θ表示旋转框长边h与x轴缩成的夹角,此时角度的定义范围就是[-90,90),图2(c)为有序四边形定义法的示意图,有序四边形定义法表示形式为(x1,y1,x2,y2,x3,y3,x4,y4)以最左边的点为起始点依次排序。The existing rotation frame definition methods are generally classified into two types, including the five-parameter direction definition method and the eight-parameter quadrilateral definition method. The five-parameter definition method realizes the rotation frame detection in any direction by adding the direction parameter θ, and the definition of the starting and ending points of the angle will lead to different angle ranges. Schematic diagram of the method, in this method, the angle θ represents the acute angle formed by the rotation frame and the x-axis, and this frame is marked as w, and the other frame is marked as h. The θ defined by this method represents the range of [-90,0) , Figure 2(b) is a schematic diagram of the five-parameter method in the range of 180°, and the five-parameter method in the range of 180° in Figure 2(b) is the long-side definition method. In the long-side definition method, the angle θ represents the length of the rotation frame h and The angle formed by the indentation of the x - axis. At this time, the definition range of the angle is [-90, 90). Figure 2 (c) is a schematic diagram of the definition method of the ordered quadrilateral. , x 2 , y 2 , x 3 , y 3 , x 4 , y 4 ) are sorted from the leftmost point as the starting point.

基于90°角度回归算法中,由于角度的周期性(periodicity of angular,PoA)和边缘的可交换性(exchangeability of edges,EoE),在角度处于边缘状态时有向损失会非常大,增加回归难度;在180°的长边定义法中损失的突增只来源于PoA;而四边形定义法对角点预排序,如检测框与真实框不一致,会出现边界超限问题,产生一个较大的损失值。In the regression algorithm based on 90° angle, due to the periodicity of angular (PoA) and the exchangeability of edges (EoE), the directional loss will be very large when the angle is in the edge state, which increases the difficulty of regression. ; In the 180° long-side definition method, the sudden increase in loss only comes from PoA; while the quadrilateral definition method pre-sorts the corner points, if the detection frame is inconsistent with the real frame, the boundary overrun problem will occur, resulting in a larger loss. value.

至此,基于180°回归的长边定义法中的参数只有θ存在边界问题,而环形平滑标签(Circular Smooth Label,CSL)方法可以处理边界超限所带来的问题,所以本文结合长边定义法与CSL来定义旋转框角度,通过将旋转框角度的回归问题转化成一个分类问题,来限制检测结果的范围,进而避免因基于回归方式的角度检测方法所造成的损失较大的问题。本发明利用公式(1)将整个定义的角度范围进行分类,1°归为一类。公式(1)中的g(x)是窗口函数,窗口半径由r来控制。窗口函数满足周期性、对称性、极值性和单调性四大性质。通过设置窗口函数可以使模型有了衡量检测标签与真实标签之间的角度距离,即在一定范围内的检测值越靠近真实值,其损失值越小。So far, only θ in the long-side definition method based on 180° regression has a boundary problem, and the Circular Smooth Label (CSL) method can deal with the problem caused by the boundary exceeding the limit, so this paper combines the long-side definition method. The rotation frame angle is defined with CSL, and the regression problem of the rotation frame angle is transformed into a classification problem to limit the range of detection results, thereby avoiding the problem of large losses caused by the regression-based angle detection method. The present invention uses formula (1) to classify the entire defined angle range, and 1° is classified into one category. g(x) in formula (1) is a window function, and the window radius is controlled by r. The window function satisfies the four properties of periodicity, symmetry, extreme value and monotonicity. By setting the window function, the model can measure the angular distance between the detection label and the real label, that is, the closer the detection value within a certain range is to the real value, the smaller the loss value.

具体的,数据集采用PASALVOC标注格式,利用rolabelImg标注工具对种苗图片中目标进行人工标注,标注文件包括真实目标的矩阵坐标参数。标注后的数据存放在xml格式的标签文件,先将xml标签文件转换为YOLOv5所需要的txt文件,转换后的格式为(c,x,y,l,s,θ)其中,c为类别;x,y表示方框中心点坐标;l,s分别表示标注框的长短边;θ表示为x轴与长边的顺时针夹角,且θ∈[1,180)。环形平滑标签(Circular Smooth Label,CSL)方法可以处理边界超限所带来的问题,结合长边定义法与CSL来定义旋转框角度,通过将旋转框角度的回归问题转化成一个分类问题,来限制预测结果的范围,进而避免因基于回归方式的角度预测方法所造成的损失较大的问题。我们将整个定义的角度范围进行分类,1°归为一类。CSL的表达式为:Specifically, the dataset adopts the PASALVOC labeling format, and uses the rolabelImg labeling tool to manually label the targets in the seedling images, and the labeling files include the matrix coordinate parameters of the real targets. The labeled data is stored in the label file in xml format. First, convert the xml label file into the txt file required by YOLOv5. The converted format is (c, x, y, l, s, θ) where c is the category; x, y represent the coordinates of the center point of the box; l, s represent the long and short sides of the labeled box, respectively; θ represents the clockwise angle between the x-axis and the long side, and θ∈[1,180). The Circular Smooth Label (CSL) method can deal with the problem caused by the boundary exceeding the limit. It combines the long edge definition method and CSL to define the angle of the rotating frame. By converting the regression problem of the angle of the rotating frame into a classification problem, Limit the range of prediction results, thereby avoiding the problem of large losses caused by regression-based angle prediction methods. We categorize the entire defined range of angles, with 1° in one category. The expression of CSL is:

Figure BDA0003507808070000071
Figure BDA0003507808070000071

公式中的g(x)是窗口函数,窗口半径由r来控制。窗口函数满足周期性、对称性、极值性和单调性四大性质,较为典型的可选脉冲函数、高斯函数等。θ′为角度类别参数。将标签中的角度θ输入到公式1中,窗口函数g(x)将原标签转化为角度类别,分类标签(c,x,y,l,s,CSL(θ))。The g(x) in the formula is the window function, and the window radius is controlled by r. The window function satisfies the four properties of periodicity, symmetry, extreme value and monotonicity, and the more typical optional impulse function, Gaussian function, etc. θ′ is the angle category parameter. Input the angle θ in the label into Equation 1, the window function g(x) converts the original label into an angle category, and classifies the label (c, x, y, l, s, CSL(θ)).

步骤102,利用所述样本集对改进YOLOv5模型进行训练,获得训练后的改进YOLOv5模型,作为目标检测模型;所述改进YOLOv5模型为将YOLOv5模型中的CBL结构中的卷积替换为可变性卷积,且将YOLOv5模型中头部网络替换为解耦头部网络得到的。Step 102, using the sample set to train the improved YOLOv5 model to obtain the improved YOLOv5 model after training as a target detection model; the improved YOLOv5 model is to replace the convolution in the CBL structure in the YOLOv5 model with a variability volume product, and replace the head network in the YOLOv5 model with the decoupled head network.

如图3所示,改进YOLOv5模型包括骨干网络、颈部网络和解耦头部网络。As shown in Figure 3, the improved YOLOv5 model includes a backbone network, a neck network, and a decoupled head network.

1、骨干网络1. Backbone network

如图3所示,改进YOLOv5模型中骨干网络(backbone)用来提取图像中的特征表示,其主要包括Focus结构、CSP1_1结构、CSP1_3结构、SPP结构。其中,Focus结构将输入种苗图像进行切片操作,具体的,将步骤101中图像变成特征图,然后送入CSPl_1结构。CSPl_1结构包括卷积层(Conv)、归一化层(BN)和Leaky_relu激活函数。CSPl_3结构包括改进CBL模块、残差模块以及卷积层、Concate。SPP结构将特征图分别采用1*1、5*5、9*9和13*13的最大池化方式处理后,在通过concat操作进行多尺度的特征融合。蝴蝶兰种苗由于其植物固有特性,被检测目标边缘多为不规则的结构,具有几何多样性,因此传统的卷积操作在针对蝴蝶兰种苗旋转框检测效果并不理想。针对这个问题引入更适应几何形变的可变形卷积,使卷积核的感受野能更多关注目标边缘特征,优化检测性能,基于此本发明实施例提供一种改进CBL(Conv Layer+BN Layer+LeakyReLU Layer)模块,将现有的CBL模块中的卷积层替换为可变卷积层。As shown in Figure 3, the backbone network in the improved YOLOv5 model is used to extract the feature representation in the image, which mainly includes the Focus structure, the CSP1_1 structure, the CSP1_3 structure, and the SPP structure. Among them, the Focus structure performs a slicing operation on the input seedling image. Specifically, the image in step 101 is turned into a feature map, and then sent to the CSP1_1 structure. The CSPl_1 structure includes a convolutional layer (Conv), a normalization layer (BN) and a Leaky_relu activation function. CSP1_3 structure includes improved CBL module, residual module and convolution layer, Concate. The SPP structure uses the maximum pooling methods of 1*1, 5*5, 9*9 and 13*13 to process the feature maps respectively, and then performs multi-scale feature fusion through the concat operation. Due to its inherent plant characteristics, phalaenopsis seedlings have irregular structures and geometric diversity. Therefore, the traditional convolution operation is not ideal for rotating frame detection of phalaenopsis seedlings. In view of this problem, a deformable convolution that is more suitable for geometric deformation is introduced, so that the receptive field of the convolution kernel can pay more attention to the target edge features and optimize the detection performance. Based on this, the embodiment of the present invention provides an improved CBL (Conv Layer+BN Layer +LeakyReLU Layer) module, which replaces the convolutional layer in the existing CBL module with a variable convolutional layer.

传统的二维卷积是通过固定结构的网格R在特征图上进行滑动采样,采样值w按权重加权求和。网格R定义了采样区域的大小和权重w。输入特征图x中每个位置p0的输出y(p0)为:The traditional two-dimensional convolution is to perform sliding sampling on the feature map through a fixed-structure grid R, and the sampled values w are weighted and summed. The grid R defines the size and weight w of the sampling area. The output y(p 0 ) for each position p 0 in the input feature map x is:

Figure BDA0003507808070000081
Figure BDA0003507808070000081

其中pn是以p0为中心的网格R里所有的采样位置。如图4所示,在可变形卷积中,对每个采样点p0增加一个偏移量pn,对应输出y(p0)变为:where p n is all sampling positions in grid R centered at p 0 . As shown in Figure 4, in deformable convolution, an offset p n is added to each sampling point p 0 , and the corresponding output y(p 0 ) becomes:

Figure BDA0003507808070000091
Figure BDA0003507808070000091

x是种苗特征图,p0是特征图中的每个位置,其中pn是以p0为中心的网格R里所有的采样位置,网格R是3×3的卷积核。本发明的骨干网络(backbone)中的所有CBL结构中的卷积都替换为可变性卷积。x is the seedling feature map, p 0 is each position in the feature map, where p n is all the sampling positions in the grid R centered on p 0 , and the grid R is the 3×3 convolution kernel. The convolutions in all CBL structures in the backbone of the present invention are replaced with variable convolutions.

现有的卷积层和本发明实施例提出的可变卷积层的感受野对比结果,参见图5,根据图5可知针对蝴蝶兰种苗边缘几何形状不规则情况,用可变形卷积解决其目标检测问题更有效。The comparison result of the receptive field of the existing convolution layer and the variable convolution layer proposed in the embodiment of the present invention is shown in Figure 5. According to Figure 5, it can be seen that the geometric shape of the edge of the Phalaenopsis orchid seedling is irregular, and the deformable convolution is used to solve the problem. Its object detection problem is more efficient.

可变形卷积改变了传统的抽样区域形状,在模型计算过程中,采样点可能会拓展到感兴趣区域外的部分,并在上述结构的基础上引入权重,不仅学习每个位置的偏移值,还对每个采样点的权重进行学习,减小极端抽样点对网络特征提取影响,进而保证种苗特征提取的有效性。此步骤结束后输出初步提取的种苗特征信息。Deformable convolution changes the shape of the traditional sampling area. During the model calculation process, the sampling point may be extended to the part outside the area of interest, and weights are introduced on the basis of the above structure, not only learning the offset value of each position , and also learn the weight of each sampling point to reduce the influence of extreme sampling points on network feature extraction, thereby ensuring the effectiveness of seedling feature extraction. After this step, the preliminary extracted seedling feature information is output.

2、颈部网络2. The neck network

如图3所示本发明实施例中的颈部网络示例性的采用目标检测的通用结构:SPP结合PANet。使用PANet增加了上采样和下采样的次数,采用了三个上采样和三个下采样,使低层信息中的特征信息完整的传递到高层中,减少了信息传递过程中的损失,提高了低层信息的利用率,增加了种苗目标检测的精度。此步骤输出加强后的种苗特征信息。As shown in FIG. 3 , the neck network in the embodiment of the present invention exemplarily adopts the general structure of target detection: SPP combined with PANet. The use of PANet increases the number of upsampling and downsampling, and adopts three upsampling and three downsampling, so that the feature information in the low-level information is completely transmitted to the high-level, which reduces the loss in the process of information transmission and improves the low-level information. The utilization of information increases the accuracy of seedling target detection. This step outputs the enhanced seedling feature information.

3、解耦头部网络3. Decoupling the head network

解耦头部网络为针对耦合问题提出一种针对旋转框目标的解耦检测结构(见图6解耦头部网络的结构示意图):本发明实施例用一个解耦的头部结构替换原有的YOLO头部(HEAD)以提高检测性能,用于优化旋转框的定位效果。The decoupling head network proposes a decoupling detection structure for the rotating frame target for the coupling problem (see Figure 6 for a schematic diagram of the structure of the decoupling head network): the embodiment of the present invention replaces the original head structure with a decoupling head structure The YOLO head (HEAD) to improve the detection performance is used to optimize the positioning effect of the rotating frame.

解耦头部网络如图6所示:特征信息从颈部网络输出后,输入解耦头部网络,经过3*3卷积层进行处理后将不同类别信息进行分类处理,然后在解耦分支中添加两个分别带有3×3的卷积层的并行分支分别用于执行分类任务和回归任务;而分类分支采用全连接层结构。通过长边定义法和CSL方法已将检测目标框方向问题从回归转化为分类问题,在解耦检测结构中,将有向损失检测归入分类分支。各类损失共同促进网络权重参数的更新。各类损失计算迭代结束后,即得到训练模型。The decoupling head network is shown in Figure 6: After the feature information is output from the neck network, it is input into the decoupling head network. After processing by 3*3 convolution layers, different categories of information are classified and processed, and then the decoupling branch Two parallel branches with 3 × 3 convolutional layers are added to perform classification tasks and regression tasks respectively; while the classification branch adopts a fully connected layer structure. The problem of detecting target frame orientation has been transformed from regression to classification through the long-side definition method and CSL method. In the decoupled detection structure, the directed loss detection is classified into the classification branch. Various losses jointly promote the update of network weight parameters. After all kinds of loss calculation iterations, the training model is obtained.

图6中可以看到,在回归分支对每一层特征通过FC(全连接层)层得到Cls(类别损失)、Angle(角度损失),通过Conv层得到Reg(回归框信息)IoU(边框损失)。As can be seen in Figure 6, in the regression branch, Cls (category loss) and Angle (angle loss) are obtained through the FC (full connection layer) layer for each layer of features, and Reg (regression box information) IoU (frame loss) is obtained through the Conv layer. ).

多任务损失函数为:

Figure BDA0003507808070000101
Figure BDA0003507808070000102
The multi-task loss function is:
Figure BDA0003507808070000101
Figure BDA0003507808070000102

其中,N表示锚点的数量,objn是二进制值,用于区分前景与背景。Lreg、LCSL、Lcls、Liou分别表示回归损失函数、角度损失函数、类别损失函数和边框损失函数。Among them, N represents the number of anchor points, and obj n is a binary value, which is used to distinguish the foreground from the background. L reg , L CSL , L cls , and L iou represent regression loss function, angle loss function, class loss function, and bounding box loss function, respectively.

Figure BDA0003507808070000103
Figure BDA0003507808070000103

其中v=(vx,vy,vw,vh)表示GT的框坐标,

Figure BDA0003507808070000104
表示预测的框坐标。where v=(v x , v y , v w , v h ) represents the frame coordinates of GT,
Figure BDA0003507808070000104
Represents the predicted box coordinates.

GT是真实框。x为预测框和真实框之间的数值差异。GT is the real box. x is the numerical difference between the predicted box and the ground-truth box.

Figure BDA0003507808070000105
Figure BDA0003507808070000105

Figure BDA0003507808070000106
Figure BDA0003507808070000106

角度损失公式如下:The angle loss formula is as follows:

Figure BDA0003507808070000107
Figure BDA0003507808070000107

其中,M表示角度类别总数,θn为角度类别,yiθn为角度符号函数,类别为真则取1,反之取0。Piθn为观测样本属于该角度样本的概率。Among them, M represents the total number of angle categories, θn is the angle category, y iθn is the angle sign function, 1 if the category is true, and 0 otherwise. P iθn is the probability that the observed sample belongs to this angle sample.

分类损失函数如下:The classification loss function is as follows:

Figure BDA0003507808070000111
Figure BDA0003507808070000111

其中,i表示第i个样本,Pin表示第i个样本预测为第n类的概率。Among them, i represents the ith sample, and Pin represents the probability that the ith sample is predicted to be the nth class.

边框损失公式如下:The frame loss formula is as follows:

Figure BDA0003507808070000112
Figure BDA0003507808070000112

其中

Figure BDA0003507808070000113
in
Figure BDA0003507808070000113

C表示覆盖预测边界框与真实边界框之间的最小框,B为预测边界框,Bgt为真实边界框。C represents the smallest box covering between the predicted bounding box and the ground-truth bounding box, B is the predicted bounding box, and Bgt is the ground-truth bounding box.

图6中的Cls和Angle对应的图3中的FC层的Cls和Angle,Cls and Angle in Figure 6 correspond to Cls and Angle of the FC layer in Figure 3,

本发明实施例将Cls(类别损失)、Angle(角度损失)、Reg(回归框损失)和IoU(边框损失)进行加权累加到一起使模型能够自适应的调整四种损失的系数,保存模型的权重参数。在计算损失部分,通过在原有的置信度Loss、分类Loss和回归Loss的基础上添加有向损失来实现。在计算有向损失时,通过上述长边定义法和CSL方法已将回归问题转化为分类问题,因此本发明实施例将标注方框的标签先经CSL处理后再计算角度类别损失,经实验验证该有向损失计算方法可以提升检测效果。YOLOv5在用于水平框检测中回归Loss通常采用GIOU、CIOU等不同的IOU进行计算,但两个旋转框之间的IOU公式通常都不可导,这使得我们必须要通过旋转IOU损失函数进行反向传播从而调整自身的参数,旋转检测中通常将不可导的旋转IOU进行近似使网络正常训练,所以这种方法本并不适合旋转框目标检测任务。而本发明实施例采用长边定义法与CSL相结合的方式将回归问题转化为分类问题,可适用YOLOv5的原IOU损失函数,故此部分仍采用GIOU进行计算。置信度Loss部分选择旋转框IOU作为置信度分支的权重系数,避免采用水平边框的IOU计算时小范围的角度偏差对旋转框IOU影响过大,进而出现边框参数预测正确但角度参数错误时会有过大的score情况。In the embodiment of the present invention, Cls (category loss), Angle (angle loss), Reg (regression frame loss), and IoU (frame loss) are weighted and accumulated together, so that the model can adaptively adjust the coefficients of the four losses, and save the model's weight parameter. In the part of calculating the loss, it is realized by adding directed loss on the basis of the original confidence Loss, classification Loss and regression Loss. When calculating the directional loss, the regression problem has been transformed into a classification problem by the above-mentioned long-side definition method and CSL method. Therefore, in the embodiment of the present invention, the label of the marked box is processed by CSL before calculating the angle category loss, which is verified by experiments. The directed loss calculation method can improve the detection effect. YOLOv5 regression Loss for horizontal frame detection usually uses different IOUs such as GIOU and CIOU for calculation, but the IOU formula between the two rotation frames is usually not steerable, which makes us have to rotate the IOU loss function to reverse Propagation to adjust its own parameters. In rotation detection, the non-steerable rotation IOU is usually approximated to make the network train normally, so this method is not suitable for the rotation frame target detection task. However, in the embodiment of the present invention, the regression problem is converted into a classification problem by combining the long-side definition method and CSL, and the original IOU loss function of YOLOv5 can be applied, so GIOU is still used for calculation in this part. In the confidence loss part, the rotation frame IOU is selected as the weight coefficient of the confidence branch, so as to avoid a small range of angular deviations when calculating the IOU of the horizontal frame, which will have too much influence on the rotation frame IOU, and then there will be problems when the frame parameters are correctly predicted but the angle parameters are wrong. Excessive score situation.

计算最终预测的种苗各部位位置预测框的坐标信息与标注框的坐标信息之间两个框的最小闭包区域面积q1,交集的面积q2,通过q1和q2,计算出闭包区域中不属于两个框的区域占闭包区域的比重q=lq1-q2|/q1,再计算预测框的坐标信息与标注框的坐标信息之间的GloU损失,即GloU=p-q,将GloU损失在训练网络中进行反向传播,并使GloU损失越来越小,直至模型收敛,网络每迭代一次,上述过程重复计算一次,模型保存一次,等到训练迭代完成之后,比较每个模型中的测试精度,取精度最大的模型作为最优模型。Calculate the minimum closure area area q1 of the two boxes between the coordinate information of the prediction frame of each part of the seedlings and the coordinate information of the labeling frame, and the area of the intersection q2. Through q1 and q2, calculate the area in the closure area. The proportion of the area belonging to the two boxes in the closure area is q=lq1-q2|/q1, and then calculate the GloU loss between the coordinate information of the prediction frame and the coordinate information of the labeling frame, that is, GloU=p-q, and the GloU loss is used in the training. Backpropagation is performed in the network, and the GloU loss becomes smaller and smaller until the model converges. The above process is repeated once for each iteration of the network, and the model is saved once. After the training iteration is completed, the test accuracy in each model is compared. The model with the highest accuracy is taken as the optimal model.

分类损失采用的是二元交叉熵损失(BCE loss)。而边界框回归损失中最常用的计算指标是交并比(IoU),交并比可以获得预测框和真实框之间的距离,从而反映检测的效果。A为预测框面积,B为真实框面积计算方法如公式如下The classification loss adopts the binary cross entropy loss (BCE loss). The most commonly used calculation indicator in the bounding box regression loss is the intersection-over-union ratio (IoU). A is the predicted frame area, and B is the real frame area. The formula is as follows

Figure BDA0003507808070000121
Figure BDA0003507808070000121

各类损失共同促进网络权重参数的更新。换言之,权重参数的更新是由损失反向传播决定的。各类损失计算迭代结束后,即得到训练后的模型。Various losses jointly promote the update of network weight parameters. In other words, the update of weight parameters is determined by loss backpropagation. After all kinds of loss calculation iterations, the trained model is obtained.

本发明实施例的训练过程的参数示例性的设置为:迭代次数为300,批量大小分别为8,采用多项式衰减学习率调度策略,初始学习率为0.1。The parameters of the training process in the embodiment of the present invention are exemplarily set as follows: the number of iterations is 300, the batch size is 8, the polynomial decay learning rate scheduling strategy is adopted, and the initial learning rate is 0.1.

步骤103,将待测种苗图像输入目标检测模型,获得种苗检测结果。Step 103: Input the image of the seedling to be tested into the target detection model to obtain the seedling detection result.

输入待预测的种苗图像,基于目标检测模型得到预测的Cls(类别信息)、Angle(角度信息)和Reg(回归框信息),将这些信息映射到输入的种苗图像上得到预测结果。Input the seedling image to be predicted, and obtain the predicted Cls (category information), Angle (angle information) and Reg (regression frame information) based on the target detection model, and map these information to the input seedling image to obtain the prediction result.

本发明实施例还提供一种种苗旋转框目标检测系统,所述系统包括:The embodiment of the present invention also provides a seedling rotation frame target detection system, the system includes:

标注模块,用于对多个历史种苗样本图像进行标注,获得每个历史种苗样本图像的标签,构建包含历史种苗图像和历史种苗图像的标签的样本集;The labeling module is used to label multiple historical seedling sample images, obtain the label of each historical seedling sample image, and construct a sample set including the historical seedling image and the label of the historical seedling image;

历史种苗样本图像的标签为(c,x,y,l,s,CSL(θ));The label of the historical seedling sample image is (c,x,y,l,s,CSL(θ));

其中,c为历史种苗样本图像中种苗的类别;x,y表示历史种苗样本图像中种苗的标注方框的中心点坐标;l和s分别表示标注方框的长变和短边;θ表示标注方框的长边与水平轴的顺时针夹角,CSL(θ)表示标注方框的长边与水平轴的顺时针夹角为θ时的角度类别;Among them, c is the category of the seedlings in the historical seedling sample image; x, y represent the coordinates of the center point of the marked box of the seedlings in the historical seedling sample image; l and s represent the long and short sides of the marked box, respectively ; θ represents the clockwise angle between the long side of the labeled box and the horizontal axis, CSL(θ) represents the angle category when the clockwise angle between the long side of the labeled box and the horizontal axis is θ;

Figure BDA0003507808070000131
Figure BDA0003507808070000131

其中,g(·)为窗口函数,r为窗口函数的半径,θ′为角度类别参数。Among them, g(·) is the window function, r is the radius of the window function, and θ′ is the angle category parameter.

模型训练模块,用于利用所述样本集对改进YOLOv5模型进行训练,获得训练后的改进YOLOv5模型,作为目标检测模型;所述改进YOLOv5模型为将YOLOv5模型中的CBL结构中的卷积替换为可变性卷积,且将YOLOv5模型中头部网络替换为解耦头部网络得到的。The model training module is used to train the improved YOLOv5 model using the sample set, and obtain the improved YOLOv5 model after training as a target detection model; the improved YOLOv5 model is to replace the convolution in the CBL structure in the YOLOv5 model with The variable convolution is obtained by replacing the head network in the YOLOv5 model with the decoupling head network.

所述改进YOLOv5模型包括顺次连接的骨干网络、颈部网络和解耦头部网络。The improved YOLOv5 model includes a sequentially connected backbone network, a neck network, and a decoupled head network.

所述均骨干网络包括顺次连接的Focus结构、CSP1_1结构、第一CSP1_3结构、第二CSP1_3结构和SPP结构;所述CSP结构包括第一卷积层、第一归一化层和第一Leaky_relu激活函数,所述第一CSP1_3结构和所述第二CSP1_3结构均包括改进CBL模块、残差模块、第二卷积层和Concate层,所述改进CBL模块包括可变卷积层、第二归一化层和第二Leaky_relu激活函数。The uniform backbone network includes a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure connected in sequence; the CSP structure includes a first convolution layer, a first normalization layer and a first Leaky_relu Activation function, the first CSP1_3 structure and the second CSP1_3 structure both include an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module includes a variable convolution layer, a second normalization layer. Unification layer and second Leaky_relu activation function.

所述颈部网络包括三个下采样层和三个上采样层。The neck network includes three downsampling layers and three upsampling layers.

所述解耦头部网络包括第三卷积层,与所述第三卷积层并行连接的解耦分支和分类分支;所述解耦分支包括第四卷积层、回归框损失函数和边框损失,所述分类分支包括全连接层、类别损失函数和角度损失。The decoupling head network includes a third convolution layer, a decoupling branch and a classification branch connected in parallel with the third convolution layer; the decoupling branch includes a fourth convolution layer, a regression box loss function and a bounding box loss, the classification branch includes a fully connected layer, a class loss function, and an angle loss.

其中,所述可变卷积层的函数表达式为:

Figure BDA0003507808070000132
Figure BDA0003507808070000133
Wherein, the function expression of the variable convolution layer is:
Figure BDA0003507808070000132
Figure BDA0003507808070000133

其中,y(p0)表示可变卷积层输出的p0处的特征值,p0是输入可变卷积层的特征图中的任意位置,pn为p0为中心的网格R中的第n个采样位置,网格R为3×3的卷积核,Δpn为pn的偏移量,x(p0+pn+Δpn)表示输入可变卷积层的特征图中的p0+pn+Δpn处的特征值。Among them, y(p 0 ) represents the feature value at p 0 output by the variable convolution layer, p 0 is any position in the feature map of the input variable convolution layer, and p n is the grid R centered at p 0 The nth sampling position in the grid R is the 3×3 convolution kernel, Δp n is the offset of p n , and x(p 0 +p n +Δp n ) represents the feature of the input variable convolutional layer Eigenvalues at p 0 +p n + Δpn in the graph.

检测模块,用于将待测种苗图像输入目标检测模型,获得种苗检测结果。The detection module is used to input the image of the seedling to be tested into the target detection model to obtain the seedling detection result.

根据本发明提供的具体实施例,本发明公开了以下技术效果:According to the specific embodiments provided by the present invention, the present invention discloses the following technical effects:

本发明公开一种种苗旋转框目标检测方法及系统,本发明的方法基于YOLOv5提出一种基于解耦结构的用于种苗旋转框目标检测的改进YOLOv5模型。本发明的改进YOLOv5模型,在骨干网络中引入可变形卷积来解决被检测目标边缘不规则问题,提升检测精度,将原有的检测任务解耦为多个子任务,对每个子任务进行专门设计来学习对应任务的最优特征,进而来检测被检对象的分类、位置、大小和方向,对比现在的方形框,能够精确检测出种苗的具体角度、位置和大小,采用本发明方法的优势是可大幅提高配准正确率与配准效率。The invention discloses a method and a system for detecting a target of a rotating frame of seedlings. The method of the present invention proposes an improved YOLOv5 model based on a decoupling structure for target detection of a rotating frame of seedlings based on YOLOv5. The improved YOLOv5 model of the present invention introduces deformable convolution into the backbone network to solve the problem of irregular edges of the detected target, improves the detection accuracy, decouples the original detection task into multiple sub-tasks, and designs each sub-task specially To learn the optimal features of the corresponding task, and then to detect the classification, position, size and direction of the inspected object, compared with the current square frame, the specific angle, position and size of the seedling can be accurately detected, using the advantages of the method of the present invention It can greatly improve the registration accuracy and registration efficiency.

本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method.

本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本发明的限制。The principles and implementations of the present invention are described herein using specific examples. The descriptions of the above embodiments are only used to help understand the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the present invention There will be changes in the specific implementation and application scope. In conclusion, the contents of this specification should not be construed as limiting the present invention.

Claims (10)

1. A seedling rotating frame target detection method is characterized by comprising the following steps:
labeling the plurality of historical seedling sample images to obtain a label of each historical seedling sample image, and constructing a sample set comprising the historical seedling images and the labels of the historical seedling images;
training an improved YOLOv5 model by using the sample set to obtain a trained improved YOLOv5 model as a target detection model; the improved Yolov5 model is obtained by replacing the convolution in the CBL structure in the Yolov5 model with a variability convolution and replacing the head network in the Yolov5 model with a decoupling head network;
and inputting the seedling image to be detected into the target detection model to obtain a seedling detection result.
2. The seedling rotating frame target detection method of claim 1, wherein the label of the historical seedling sample image is (c, x, y, l, s, CSL (θ));
wherein c is the category of the seedlings in the historical seedling sample image; (x, y) represents the coordinates of the central point of the labeled box of the seedling in the historical seedling sample image; l and s respectively represent the length change and the short side of the marked box; theta represents a clockwise included angle between the long side of the labeling square frame and the horizontal axis, and CSL (theta) represents an angle category when the clockwise included angle between the long side of the labeling square frame and the horizontal axis is theta;
Figure FDA0003507808060000011
wherein g (-) is a window function, r is a radius of the window function, and θ' is an angle class parameter.
3. The seedling rotating frame target detection method of claim 1, wherein the improved YOLOv5 model comprises a backbone network, a neck network and a decoupling head network connected in sequence;
the homogeneous backbone network comprises a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure which are connected in sequence; the CSP structure comprises a first convolution layer, a first normalization layer and a first Leaky _ relu activation function, the first CSP1_3 structure and the second CSP1_3 structure respectively comprise an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module comprises a variable convolution layer, a second normalization layer and a second Leaky _ relu activation function;
the neck network comprises three down-sampling layers and three up-sampling layers;
the decoupling head network comprises a third convolution layer, a decoupling branch and a classification branch which are connected with the third convolution layer in parallel; the decoupling branch comprises a fourth convolution layer, a regression frame loss function and a frame loss, and the classifying branch comprises a full connection layer, a category loss function and an angle loss.
4. A seedling rotating frame target detection method according to claim 3, wherein the function expression of the variable convolution layer is:
Figure FDA0003507808060000021
wherein, y (p)0) P representing variable convolutional layer output0Characteristic value of (a), p0Is input at an arbitrary position, p, in the feature map of the variable convolution layernIs p0The nth sample position in a centered grid R of 3 x 3 convolution kernels, Δ pnIs pnOffset of (2), x (p)0+pn+Δpn) P in a feature diagram representing an input variable convolution layer0+pn+ΔpnCharacteristic value of (a), w (p)n) P in a feature diagram representing an input variable convolution layer0+pn+ΔpnThe weight of the characteristic value of (a).
5. The seedling rotating frame target detection method of claim 1, wherein a total loss function adopted in the process of training the improved YOLOv5 model by using the sample set is as follows:
Figure FDA0003507808060000022
wherein L represents the total loss function, N represents the number of anchor points, i represents the ith anchor point, objiRepresenting a binary value, L, for distinguishing the foreground from the backgroundreg、LCSL、Lcls、LiouRespectively representing a regression loss function, an angle loss function, a category loss function and a frame loss function, (x, y) representing the coordinate of the central point of a labeled square frame of seedlings in a historical seedling sample image, l and s respectively representing the length change and the short side of the labeled square frame, theta represents the clockwise included angle between the long side of the labeled square frame and a horizontal axis, CSL (theta) represents the angle category when the clockwise included angle between the long side of the labeled square frame and the horizontal axis is theta, and lambda (theta) represents the angle category when the clockwise included angle between the long side of the labeled square frame and the horizontal axis is theta1、λ2、λ3And λ4Weights representing a regression loss function, an angle loss function, a class loss function, and a bounding box loss function, respectively.
6. A seedling rotating frame target detection method according to claim 5, wherein the regression loss function is:
Figure FDA0003507808060000023
wherein a represents the difference value between the prediction block and the labeling block,
Figure FDA0003507808060000024
the function of the smoothing is represented by a smooth function,
Figure FDA0003507808060000025
the angle loss function is:
Figure FDA0003507808060000026
where M represents the total number of angle categories, θ n is the angle category, yiθnFor the angle sign function of the ith anchor point, if the angle type is true, 1 is taken, otherwise, 0 and P are takeniθnThe probability that the angle class of the ith anchor point belongs to the angle class thetan;
the class loss function is:
Figure FDA0003507808060000031
wherein C represents the number of seedling categories,
Figure FDA0003507808060000032
the seedling category representing the ith anchor point belongs to the seedling category cnThe probability of (a) of (b) being,
Figure FDA0003507808060000033
representing the ith anchor point category function;
the frame loss function is:
Figure FDA0003507808060000034
where IoU denotes the intersection ratio of the prediction box and the annotation box,
Figure FDA0003507808060000035
c denotes the minimum box covered between the predicted box and the labeled box, B is the predicted box, BgtThe boxes are labeled.
7. A seedling rotating frame target detection system, the system comprising:
the labeling module is used for labeling the plurality of historical seedling sample images, obtaining a label of each historical seedling sample image, and constructing a sample set containing the historical seedling images and the labels of the historical seedling images;
the model training module is used for training the improved YOLOv5 model by using the sample set to obtain a trained improved YOLOv5 model as a target detection model; the improved Yolov5 model is obtained by replacing the convolution in the CBL structure in the Yolov5 model with a variability convolution and replacing the head network in the Yolov5 model with a decoupling head network;
and the detection module is used for inputting the seedling image to be detected into the target detection model to obtain a seedling detection result.
8. The seedling rotating frame target detecting system of claim 7, wherein the label of the historical seedling sample image is (c, x, y, l, s, CSL (θ));
wherein c is the category of the seedlings in the historical seedling sample image; x and y represent the coordinates of the central point of the marked box of the seedling in the historical seedling sample image; l and s respectively represent the length change and the short side of the marked box; theta represents a clockwise included angle between the long side of the labeling square frame and the horizontal axis, and CSL (theta) represents an angle category when the clockwise included angle between the long side of the labeling square frame and the horizontal axis is theta;
Figure FDA0003507808060000036
wherein g (-) is a window function, r is a radius of the window function, and θ' is an angle class parameter.
9. The seedling rotation frame target detection system of claim 7, wherein the improved YOLOv5 model comprises a backbone network, a neck network, and a decoupling head network connected in sequence;
the homogeneous backbone network comprises a Focus structure, a CSP1_1 structure, a first CSP1_3 structure, a second CSP1_3 structure and an SPP structure which are connected in sequence; the CSP structure comprises a first convolution layer, a first normalization layer and a first Leaky _ relu activation function, the first CSP1_3 structure and the second CSP1_3 structure respectively comprise an improved CBL module, a residual module, a second convolution layer and a Concate layer, and the improved CBL module comprises a variable convolution layer, a second normalization layer and a second Leaky _ relu activation function;
the neck network comprises three down-sampling layers and three up-sampling layers;
the decoupling head network comprises a third convolution layer, a decoupling branch and a classification branch which are connected with the third convolution layer in parallel; the decoupling branch comprises a fourth convolution layer, a regression frame loss function and a frame loss, and the classifying branch comprises a full connection layer, a category loss function and an angle loss.
10. A seedling rotation frame target detection system as claimed in claim 9, wherein the function expression of the variable convolution layer is:
Figure FDA0003507808060000041
wherein, y (p)0) P representing variable convolutional layer output0Characteristic value of (a), p0Is input at an arbitrary position, p, in the feature map of the variable convolution layernIs p0The nth sample position in a centered grid R of 3 x 3 convolution kernels, Δ pnIs pnOffset of (2), x (p)0+pn+Δpn) P in a feature diagram representing an input variable convolution layer0+pn+ΔpnThe characteristic value of (c).
CN202210143675.6A 2022-02-17 2022-02-17 Method and system for target detection of seedling rotating frame Pending CN114493975A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210143675.6A CN114493975A (en) 2022-02-17 2022-02-17 Method and system for target detection of seedling rotating frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210143675.6A CN114493975A (en) 2022-02-17 2022-02-17 Method and system for target detection of seedling rotating frame

Publications (1)

Publication Number Publication Date
CN114493975A true CN114493975A (en) 2022-05-13

Family

ID=81481712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210143675.6A Pending CN114493975A (en) 2022-02-17 2022-02-17 Method and system for target detection of seedling rotating frame

Country Status (1)

Country Link
CN (1) CN114493975A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881763A (en) * 2022-05-18 2022-08-09 中国工商银行股份有限公司 Method, device, equipment and medium for post-loan supervision of aquaculture
CN117611791A (en) * 2023-10-20 2024-02-27 哈尔滨工业大学 Method for detecting flying target based on feature separation deformable convolution

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881763A (en) * 2022-05-18 2022-08-09 中国工商银行股份有限公司 Method, device, equipment and medium for post-loan supervision of aquaculture
CN117611791A (en) * 2023-10-20 2024-02-27 哈尔滨工业大学 Method for detecting flying target based on feature separation deformable convolution

Similar Documents

Publication Publication Date Title
CN112132006B (en) Intelligent forest land and building extraction method for cultivated land protection
CN114638784B (en) A copper tube surface defect detection method and device based on FE-YOLO
WO2020177432A1 (en) Multi-tag object detection method and system based on target detection network, and apparatuses
WO2020047738A1 (en) Automatic pest counting method based on combination of multi-scale feature fusion network and positioning model
US20230206603A1 (en) High-precision point cloud completion method based on deep learning and device thereof
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
CN111062915A (en) A real-time steel pipe defect detection method based on improved YOLOv3 model
CN111652216A (en) Multi-scale object detection model method based on metric learning
CN106127204A (en) A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks
CN115439458A (en) Industrial image defect target detection algorithm based on depth map attention
CN112507896B (en) A method for detecting cherry fruits using the improved YOLO-V4 model
CN111368690A (en) Deep learning-based video image ship detection method and system under influence of sea waves
CN103903013A (en) Optimization algorithm of unmarked flat object recognition
CN111881743B (en) Facial feature point positioning method based on semantic segmentation
CN114092487A (en) Target fruit instance segmentation method and system
CN114493975A (en) Method and system for target detection of seedling rotating frame
CN111598942A (en) A method and system for automatic positioning of power facility meters
CN110059765B (en) A system and method for intelligent identification and classification of minerals
CN111985325A (en) Aerial small target rapid identification method in extra-high voltage environment evaluation
CN118196309B (en) High-definition visual detection and identification system based on image processing industrial personal computer
CN114676776A (en) A Transformer-based Fine-Grained Image Classification Method
Cui et al. Real-time detection of wood defects based on SPP-improved YOLO algorithm
CN110263855A (en) A method of it is projected using cobasis capsule and carries out image classification
Tu et al. Toward automatic plant phenotyping: starting from leaf counting
CN117593514B (en) Image target detection method and system based on deep principal component analysis assistance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination