CN111814827A - Keypoint target detection method based on YOLO - Google Patents

Keypoint target detection method based on YOLO Download PDF

Info

Publication number
CN111814827A
CN111814827A CN202010514432.XA CN202010514432A CN111814827A CN 111814827 A CN111814827 A CN 111814827A CN 202010514432 A CN202010514432 A CN 202010514432A CN 111814827 A CN111814827 A CN 111814827A
Authority
CN
China
Prior art keywords
yolo
frame
upper left
left corner
offset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010514432.XA
Other languages
Chinese (zh)
Other versions
CN111814827B (en
Inventor
徐光柱
屈金山
万秋波
雷帮军
石勇涛
夏平
陈鹏
吴正平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Feifei Animation Co ltd
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN202010514432.XA priority Critical patent/CN111814827B/en
Publication of CN111814827A publication Critical patent/CN111814827A/en
Application granted granted Critical
Publication of CN111814827B publication Critical patent/CN111814827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

基于YOLO的关键点目标检测方法,包括数据集的制作与处理:在原始标注框为水平矩形框的标注数据集上,添加各个关键点到标注框左上角顶点的偏移距离(Δx,Δy),标注框左上角的顶点位置坐标为(LUx,LUy),满足LUx小于所有关键点的x方向上的值,LUy小于所有关键点的y方向上的值,此时,各个关键点位置均为:以标注框左上顶点为坐标轴原点时坐标轴的第四象限。基于预测框左上角顶点偏移量的点目标检测:通过YOLO得到预测框,同时得到各个关键点与预测框左上角顶点的偏移量,将网络输出的对应各个关键点的偏移量(Δx,Δy)与预测框左上角顶点的坐标(LUx,LUy)相加,即可得到关键点的坐标位置。

Figure 202010514432

YOLO-based keypoint target detection method, including data set production and processing: On the labeled data set with the original horizontal rectangular frame, add the offset distance (Δx, Δy) of each key point to the upper left corner of the labeled frame , the vertex position coordinates of the upper left corner of the label box are (LUx, LUy), which satisfies that LUx is less than the value in the x direction of all key points, and LUy is less than the value in the y direction of all key points. At this time, the position of each key point is : The fourth quadrant of the coordinate axis when the upper left vertex of the callout box is the origin of the coordinate axis. Point target detection based on the offset of the upper left corner vertex of the prediction frame: The prediction frame is obtained through YOLO, and the offset between each key point and the upper left corner vertex of the prediction frame is obtained at the same time, and the offset corresponding to each key point (Δx) output by the network is obtained. ,Δy) and the coordinates (LUx, LUy) of the upper left corner of the prediction frame are added to obtain the coordinate position of the key point.

Figure 202010514432

Description

基于YOLO的关键点目标检测方法Keypoint target detection method based on YOLO

技术领域technical field

本发明涉及目标检测技术领域,具体涉及一种基于YOLO的关键点目标检测方法。The invention relates to the technical field of target detection, in particular to a key point target detection method based on YOLO.

背景技术Background technique

基于深度学习技术的视觉目标检测近年来取得了长足的发展,但仍存在很多挑战性问题。首先,目前的视觉目标检测模型输出的都是目标的包围框,缺乏对目标关键点的检测,如人脸检测中的五官特征点,人体检测中的肢体关节点等。另一方面,目前的目标检测算法对旋转目标检测一直是一个难点,目前众多的目标检测的预测框均为水平的矩形包围框。主要有两个原因:1)、目标检测中多数目标使用水平的矩形框就可满足条件,这和观测的视角存在很大的关系,从人站立的角度观测到的目标多数为水平的矩形。2)、深度学习模型的训练高度依赖于数据集的标注,而目前多数数据集的标注框仍然为水平矩形框。Visual object detection based on deep learning technology has made great progress in recent years, but there are still many challenging problems. First of all, the current visual target detection model outputs the bounding box of the target, and lacks detection of key points of the target, such as facial feature points in face detection, limb joint points in human body detection, etc. On the other hand, the current target detection algorithm has always been a difficult point for rotating target detection, and the prediction boxes of many target detections are horizontal rectangular bounding boxes. There are two main reasons: 1) Most of the targets in target detection can meet the conditions by using a horizontal rectangular frame, which has a great relationship with the viewing angle of observation. Most of the targets observed from the perspective of people standing are horizontal rectangles. 2) The training of the deep learning model is highly dependent on the annotation of the dataset, and the annotation frame of most datasets is still a horizontal rectangular frame.

随着目标检测技术的不断发展,人们意识到通过关键点来对目标进行定位是一种可行的方案,于是文献[1]Law H,Deng J.CornerNet:Detecting Objects as PairedKeypoints[J].International Journal of Computer Vision,2020,128(3):642-656,提出一种分别预测目标左上角和目标右下角的方法,通过这两个关键的角点形成的矩形框来定位目标,相比于中心点预测方法更简单,但是其本质上仍然是得出一个水平矩形框,不输出点目标。文献[2]Zhou X,Wang D,

Figure BDA0002529657220000011
P.Objects as Points.arXive-prints,2019:arXiv:1904.07850在文献[1]的基础上添加了中心关键点,用三个关键点来检测目标,提高了准确率和召回率,但其本质仍然是用3个关键点来确定预测框,最终并不输出关键点。With the continuous development of target detection technology, people realize that it is a feasible solution to locate the target through key points, so the literature [1]Law H, Deng J.CornerNet: Detecting Objects as PairedKeypoints[J].International Journal of Computer Vision, 2020, 128(3): 642-656, proposes a method to predict the upper left corner of the target and the lower right corner of the target respectively. The rectangular frame formed by these two key corners is used to locate the target, compared to the center The point prediction method is simpler, but its essence is still to draw a horizontal rectangular box and not output a point target. Reference [2] Zhou X, Wang D,
Figure BDA0002529657220000011
P.Objects as Points.arXive-prints, 2019: arXiv: 1904.07850 adds the central key point based on the literature [1], uses three key points to detect the target, improves the accuracy and recall rate, but its essence is still It uses 3 key points to determine the prediction frame, and finally does not output the key points.

中国专利(CN201810363952.8)提出一种基于深度学习的手掌检测与关键点定位方法,该方法利用FasterR-CNN网络进行训练,检测时得到手掌轮廓候选框以及定位手掌关键点,再调整候选框阈值,从候选框中筛选最佳的具备关键点定位的手掌图像。The Chinese patent (CN201810363952.8) proposes a deep learning-based palm detection and key point location method. The method uses the FasterR-CNN network for training, obtains the palm outline candidate frame and locates the palm key points during detection, and then adjusts the candidate frame threshold. , and select the best palm image with keypoint localization from the candidate frame.

另外,旋转、倾斜目标的检测同样为人们所关注,文献[3]Ma J,Shao W,Ye H,etal.Arbitrary-Oriented Scene Text Detection via Rotation Proposals[J].IEEETransactions on Multimedia,2018,20(11):3111-3122.提出了一种任意方向文本检测方案,通过设置带角度的旋转的anchors-Rotation Anchors,再经过RRoI(旋转感兴趣区域)池化层将候选框映射到特征图上,前往分类器得到结果。但是RRPN存在速度太慢的问题。In addition, the detection of rotating and tilting targets also attracts attention. Literature [3] Ma J, Shao W, Ye H, etal. Arbitrary-Oriented Scene Text Detection via Rotation Proposals [J]. IEEE Transactions on Multimedia, 2018, 20 ( 11): 3111-3122. A text detection scheme in any direction is proposed. By setting the anchors-Rotation Anchors with an angle, and then through the RRoI (rotation region of interest) pooling layer, the candidate frame is mapped to the feature map. Go to the classifier to get the result. But RRPN has the problem of being too slow.

文献[4]Yang X,Liu Q,Yan J,et al.R3Det:Refined Single-Stage Detectorwith Feature Refinement for Rotating Object.arXiv e-prints.2019:arXiv:1908.05612。针对RRPN存在的问题,使用RetinaNet构造单阶段检测框架,使用RefineDet思想,对一阶段检测结果细化,从而提高了速度。中国专利“201910381699.3”提出一种基于旋转区域提取的舰船多目标检测方法,通过对旋转目标进行标注,通过计算置信度最高的预选框与其他预选框的旋转交并比得到最终检测结果,但是检测精度难以保证。Reference [4] Yang X, Liu Q, Yan J, et al. R3Det: Refined Single-Stage Detectorwith Feature Refinement for Rotating Object. arXiv e-prints. 2019: arXiv: 1908.05612. Aiming at the problems of RRPN, RetinaNet is used to construct a single-stage detection framework, and the RefineDet idea is used to refine the first-stage detection results, thereby improving the speed. Chinese patent "201910381699.3" proposes a multi-target detection method for ships based on rotation area extraction. By labeling the rotating target, the final detection result is obtained by calculating the rotation intersection of the preselected frame with the highest confidence and other preselected frames, but The detection accuracy is difficult to guarantee.

文献[5]Redmon J,Divvala S K,Girshick R,et al.You Only Look Once:Unified,Real-Time Object Detection[C].computer vision and patternrecognition,2016:779-788.(You only look once)是由Joseph Redmon和Ali Farhadi等人于2015年提出的基于单个神经网络的目标检测系统。YOLO为了保证检测的效率,提出one-stage的思想,不同于R-CNN等two-stage算法需要生成区域建议,消耗算力而导致速度较慢,YOLO不生成区域建议,而是利用单个卷积神经网络,将输入图片分成n*n个网格,对每个网格进行预测,直接对目标进行分类和回归,实现端到端的检测,因此检测速度大幅提升。YOLO在GPU上达到45fps,同时其简化版本达到155fps。之后YOLO为了提高精度,又相继提出YOL09000、YOLOv3。如:文献[6]Redmon J,Farhadi A.YOL09000:Better,Faster,Stronger[C].IEEE Conference on Computer Vision&Pattern Recognition,2017:7263-7271。文献[7]Redmon J,Farhadi A.YOLOv3:An Incremental Improvement.arXive-prints,2018:arXiv:1804.02767。Reference [5] Redmon J, Divvala SK, Girshick R, et al. You Only Look Once: Unified, Real-Time Object Detection [C]. computer vision and patternrecognition, 2016: 779-788. (You only look once) yes A single neural network based object detection system proposed by Joseph Redmon and Ali Farhadi et al. in 2015. In order to ensure the efficiency of detection, YOLO proposes the idea of one-stage, which is different from two-stage algorithms such as R-CNN, which need to generate regional proposals, which consumes computing power and leads to slow speed. YOLO does not generate regional proposals, but uses a single convolution. The neural network divides the input image into n*n grids, predicts each grid, directly classifies and regresses the target, and realizes end-to-end detection, so the detection speed is greatly improved. YOLO hits 45fps on the GPU, while its simplified version hits 155fps. Later, in order to improve the accuracy, YOLO proposed YOL09000 and YOLOv3 successively. For example: Literature [6] Redmon J, Farhadi A. YOL09000: Better, Faster, Stronger [C]. IEEE Conference on Computer Vision & Pattern Recognition, 2017: 7263-7271. Reference [7] Redmon J, Farhadi A. YOLOv3: An Incremental Improvement. arXive-prints, 2018: arXiv: 1804.02767.

YOLO作为一种性能优异的通用目标检测系统,其在速度上的优势保证了其在工程上应用的可行性,因此人们尝试使用YOLO来解决相关问题,但原始YOLO在目标检测中仅仅输出水平矩形框作为目标框。因此文献[8]Lei J,Gao C,Hu J,et al.OrientationAdaptive YOLOv3 for Object Detection in Remote Sensing Images[C],2019:586-597.提出了一种扩展YOLO的方法来解决旋转矩形目标的定位问题,在YOLO的输出中增加了一个theta输出,即预测框的旋转角度,但这种方法仅能解决矩形的平面旋转问题,对于非规则的的矩形,如内旋之后类似梯形的矩形目标仅仅通过旋转仍然无法准确定位。同时中国专利“CN201910707178.2”也提出一种基于YOLOv3的旋转矩形目标检测方法,通过将检测目标设置为5位向量(x,y,w,h,θ),添加一个角度θ,使用带旋转角度的锚点来检测旋转目标,该方法同样只能应对平面简单旋转,对内旋等场景仍然无法精确定位。中国专利“CN201910879419.1”提出一种基于改进的YOLO算法的水下目标检测算法,设计了新的损失函数,将物体长宽比信息加入到损失函数之中,从而提高了对水下物体旋转侧翻的等情况的检测效果,但涉及的场景有限。中国专利“CN201910856434.4”提出一种基于YOLO模型的车牌定位和识别方法,其中为了提高对车牌的定位精度,训练一个改进的YOLO卷积神经网络和一个卷积增强的SRCNN(Super Resolution)卷积神经网络,在YOLO卷积神经网络训练时,采用maxout激活函数替代原模型的激活函数,增强了拟合能力。As a general target detection system with excellent performance, YOLO's speed advantage ensures its feasibility in engineering applications. Therefore, people try to use YOLO to solve related problems, but the original YOLO only outputs horizontal rectangles in target detection. box as the target box. Therefore, the literature [8] Lei J, Gao C, Hu J, et al. OrientationAdaptive YOLOv3 for Object Detection in Remote Sensing Images [C], 2019: 586-597. A method of extending YOLO is proposed to solve the problem of rotating rectangular targets. For the positioning problem, a theta output is added to the output of YOLO, that is, the rotation angle of the predicted frame, but this method can only solve the plane rotation problem of the rectangle. For irregular rectangles, such as trapezoid-like rectangular targets after internal rotation There is still no accurate positioning just by rotating. At the same time, the Chinese patent "CN201910707178.2" also proposes a rotation rectangle target detection method based on YOLOv3. By setting the detection target as a 5-bit vector (x, y, w, h, θ), adding an angle θ, using the belt rotation The anchor point of the angle is used to detect the rotating target. This method can also only deal with the simple rotation of the plane, and still cannot accurately locate the scene such as internal rotation. The Chinese patent "CN201910879419.1" proposes an underwater target detection algorithm based on the improved YOLO algorithm, designs a new loss function, and adds the object aspect ratio information into the loss function, thereby improving the detection of underwater object rotation. The detection effect of rollover and other situations is limited, but the scenes involved are limited. The Chinese patent "CN201910856434.4" proposes a license plate location and recognition method based on the YOLO model. In order to improve the location accuracy of the license plate, an improved YOLO convolutional neural network and a convolutionally enhanced SRCNN (Super Resolution) volume are trained. Integral neural network, in the training of YOLO convolutional neural network, the maxout activation function is used to replace the activation function of the original model, which enhances the fitting ability.

上述针对YOLO的改进方法,虽然一定程度上提高了YOLO模型应对复杂场景下目标检测的能力。但是YOLO仍然存在如下问题:1)对于存在关键点的视觉目标检测中,关键点的检测同样重要,如人脸检测中的五官特征、人体检测中的肢体关节点等,而YOLO缺乏对这些关键点的检测。2)现实存在许多不规则的矩形,由不同的视角导致的旋转角度下的长宽比例较大的矩形物体,如各种角度的车牌、空中拍摄的车辆,舰船等目标。YOLO对于这些旋转倾斜的矩形目标的的预测框会包含大量与目标无关的冗余信息。The above improvement method for YOLO improves the ability of the YOLO model to deal with target detection in complex scenes to a certain extent. However, YOLO still has the following problems: 1) In the detection of visual objects with key points, the detection of key points is also important, such as facial features in face detection, limb joint points in human body detection, etc., and YOLO lacks these key points. point detection. 2) There are many irregular rectangles in reality, and rectangular objects with a large aspect ratio under the rotation angle caused by different perspectives, such as license plates of various angles, vehicles photographed in the air, ships and other targets. The prediction frame of YOLO for these rotated and inclined rectangular targets contains a lot of redundant information that is irrelevant to the target.

发明内容SUMMARY OF THE INVENTION

针对上述技术问题,本发明提供一种基于YOLO的关键点目标检测方法,通过在原始YOLO的基础上,增加点目标的检测算法,使YOLO具备检测点目标的能力,使YOLO可同时输出目标检测框和关键点,同时在具体应用中实现对旋转矩形物体的精准定位。In view of the above technical problems, the present invention provides a YOLO-based key point target detection method. By adding a point target detection algorithm on the basis of the original YOLO, YOLO has the ability to detect point targets, so that YOLO can output target detection at the same time. frame and key points, and at the same time achieve precise positioning of rotated rectangular objects in specific applications.

本发明采取的技术方案为:The technical scheme adopted in the present invention is:

基于YOLO的关键点目标检测方法,包括以下步骤:The YOLO-based keypoint target detection method includes the following steps:

步骤一、数据集的制作与处理:Step 1. Data set production and processing:

在原始标注框为水平矩形框的标注数据集上,添加各个关键点到标注框左上角顶点的偏移距离(Δx,Δy),标注框左上角的顶点位置坐标为(LUx,LUy),满足LUx小于所有关键点的x方向上的值,LUy小于所有关键点的y方向上的值,此时,各个关键点位置均为:以标注框左上顶点为坐标轴原点时坐标轴的第四象限。On the annotation dataset whose original annotation frame is a horizontal rectangular frame, add the offset distance (Δx, Δy) of each key point to the vertex of the upper left corner of the annotation frame, and the position coordinates of the vertex at the upper left corner of the annotation frame are (LUx, LUy), which satisfies LUx is less than the value in the x direction of all key points, and LUy is less than the value in the y direction of all key points. At this time, the positions of each key point are: the fourth quadrant of the coordinate axis when the upper left vertex of the label box is the origin of the coordinate axis .

步骤二、基于预测框左上角顶点偏移量的点目标检测:Step 2. Point target detection based on the offset of the upper left corner vertex of the prediction frame:

首先,通过YOLO得到预测框,同时得到各个关键点与预测框左上角顶点的偏移量,将网络输出的对应各个关键点的偏移量(Δx,Δy)与预测框左上角顶点的坐标(LUx,LUy)相加,即可得到关键点的坐标位置。First, the prediction frame is obtained through YOLO, and the offset between each key point and the upper left corner vertex of the prediction frame is obtained at the same time, and the offset (Δx, Δy) corresponding to each key point output by the network is compared with the coordinates of the upper left corner vertex of the prediction frame ( LUx, LUy) are added to get the coordinate position of the key point.

本发明一种基于YOLO的关键点目标检测方法,优点在于:A YOLO-based key point target detection method of the present invention has the advantages of:

1:在YOLO使用预测框定位的同时,并行预测出各个关键点距离预测框左上角顶点的距离,再结合预测框左上角顶点的位置得出各个关键点的位置。为点目标的检测提供了一种可行的方案。1: While YOLO uses the prediction frame to locate, the distance of each key point from the upper left corner of the prediction frame is predicted in parallel, and then the position of each key point is obtained by combining the position of the upper left corner of the prediction frame. It provides a feasible solution for point target detection.

2:本发明点目标检测方案可以应对旋转倾斜矩形目标检测时的定位不准确,包含大量冗余信息的问题。为诸如倾斜车牌、交通标识、高空舰船遥感检测等提供了一种精确定位的方案。2: The point target detection scheme of the present invention can deal with the problem of inaccurate positioning and a large amount of redundant information contained in the detection of rotating and inclined rectangular targets. It provides a precise positioning solution for such as inclined license plates, traffic signs, remote sensing detection of high-altitude ships, etc.

3:本发明所提出的基于预测框左上角顶点偏移量的点目标检测具有通用性,能够应用于所有需要输出特征点和目标框的目标检测任务中。3: The point target detection based on the offset of the upper left corner vertex of the prediction frame proposed by the present invention is universal and can be applied to all target detection tasks that require outputting feature points and target frames.

4:本发明提供了一种点目标检测的方案:具体为通过点目标距离初始预测框左上角顶点的偏移量来得到点目标的位置,从而得到关键点的位置。在原有目标检测基础上,通过增加点输出,扩展了one-stage算法的检测能力,使其能够适用于更多的场景,如点目标检测、旋转目标的精准定位等。4: The present invention provides a point target detection solution: specifically, the position of the point target is obtained by the offset of the point target from the upper left corner vertex of the initial prediction frame, thereby obtaining the position of the key point. On the basis of the original target detection, by adding point output, the detection capability of the one-stage algorithm is expanded, making it applicable to more scenarios, such as point target detection, accurate positioning of rotating targets, etc.

5:针对YOLO点目标检测方案的具体实现,本发明在YOLO中设计了一种点目标相对预测框左上角顶点偏移量的损失计算函数,从而改进了YOLO的损失函数。同时,扩展了YOLO的输出,使YOLO在输出目标框的同时输出关键点的位置信息,使YOLO的性能更加强大,满足更多的应用需求。5: For the specific implementation of the YOLO point target detection scheme, the present invention designs a loss calculation function in YOLO for the offset of the point target relative to the upper left corner vertex of the prediction frame, thereby improving the YOLO loss function. At the same time, the output of YOLO is expanded, so that YOLO outputs the position information of key points while outputting the target frame, which makes the performance of YOLO more powerful and meets more application requirements.

附图说明Description of drawings

图1(1)为当前人脸检测中仅仅得到人脸包围框图(缺乏关键点的检测)。Figure 1(1) shows that only the frame of the face frame is obtained in the current face detection (lack of detection of key points).

图1(2)为在倾斜的路牌检测中预测框包含冗余信息示意图。Fig. 1(2) is a schematic diagram showing that the prediction frame contains redundant information in inclined street sign detection.

图1(3)为本发明可以在人脸检测中同时输出目标框与关键点的示意图。FIG. 1(3) is a schematic diagram showing that the present invention can simultaneously output a target frame and key points in face detection.

图1(4)为本发明可以通过关键点检测实现对倾斜路牌的精准检测示意图。Fig. 1(4) is a schematic diagram of the accurate detection of inclined street signs that can be realized by key point detection in the present invention.

图2为YOLO目标检测流程示意图。Figure 2 is a schematic diagram of the YOLO target detection process.

图3为YOLOV3位置预测与anchor的关系图。Figure 3 shows the relationship between YOLOV3 position prediction and anchor.

图4为YOLO检测旋转矩形框效果不理想图。Figure 4 shows the unsatisfactory effect of YOLO detecting the rotating rectangular frame.

图5(1)为本发明数据集制作以及算法设计流程图;Fig. 5 (1) is the data set making and algorithm design flow chart of the present invention;

图5(2)为本发明模型检测目标流程图。Figure 5(2) is a flow chart of the model detection target of the present invention.

图6为通过偏移量得到关键点位置图。Figure 6 shows the key point position map obtained by offset.

图7为YOLOV3和关键点方案标注格式图。Figure 7 shows the annotation format of YOLOV3 and the key point scheme.

图8位YOLO在旋转矩形目标检测中的具体应用流程图。Figure 8 is a flow chart of the specific application of YOLO in rotating rectangle target detection.

具体实施方式Detailed ways

基于YOLO的关键点目标检测方法,通过在原始YOLO的基础上,增加点目标的检测算法,使YOLO具备检测点目标的能力,使YOLO可同时输出目标检测框和关键点,同时在具体应用中实现对旋转矩形物体的精准定位,如图1(1)~图1(4)所示。Based on YOLO's key point target detection method, by adding a point target detection algorithm on the basis of the original YOLO, YOLO has the ability to detect point targets, so that YOLO can output target detection frames and key points at the same time, while in specific applications Accurate positioning of rotating rectangular objects is achieved, as shown in Figure 1(1) to Figure 1(4).

图1(1)为当前人脸检测中仅仅得到人脸包围框示意图,缺乏关键点的检测。Figure 1(1) is a schematic diagram of only obtaining a face bounding box in the current face detection, lacking the detection of key points.

图1(2)为在倾斜的路牌检测中预测框包含冗余信息示意图,效果不理想。Fig. 1(2) is a schematic diagram showing that the prediction frame contains redundant information in inclined street sign detection, and the effect is not ideal.

图1(3)为本发明方案在人脸检测中同时输出目标框与关键点的示意图。FIG. 1(3) is a schematic diagram of simultaneously outputting a target frame and key points in face detection according to the solution of the present invention.

图1(4)为本发明方案通过关键点检测实现对倾斜路牌的精准检测的示意图。Fig. 1(4) is a schematic diagram of the solution of the present invention realizing accurate detection of inclined street signs through key point detection.

(一):YOLO的核心思想,如图2所示,YOLO将输入图像分成SxS个格子,如果物体实际标注的中心点位于某个格子,那么这个格子就对该物体进行检测。YOLO目标检测的原理,这里以YOLOv3目标边框预测的原理为例,YOLOv3在预测时,首先训练前通过k-means聚类生成9个anchor,对应输出的3个尺度的特征图,每个尺度有3个anchors。在416的网络输入尺寸下,YOLOv3的3个尺度的输出的特征图大小分别为13*13、26*26、52*52,分别用于检测大、中、小三个尺度的目标。对于每个尺度下的特征图,YOLOv3给出3个anchor,对于特征图上的每个像素grid,都会有3个anchor进行预测,找出最合适的anchor,给出相应的offset,即为预测框。YOLOv3对每个预测框,给出4个值,tx、ty、tw、th,而4个值和最终预测的bbox映射关系如公式(1.1)~(1.4)所示。(1): The core idea of YOLO, as shown in Figure 2, YOLO divides the input image into SxS grids. If the actual center point of the object is located in a grid, then this grid will detect the object. The principle of YOLO target detection, here is the principle of YOLOv3 target frame prediction as an example, YOLOv3 first generates 9 anchors through k-means clustering before training, corresponding to the output feature maps of 3 scales, each scale has 3 anchors. Under the network input size of 416, the output feature map sizes of the three scales of YOLOv3 are 13*13, 26*26, and 52*52, respectively, which are used to detect large, medium, and small scale targets respectively. For the feature map at each scale, YOLOv3 gives 3 anchors, and for each pixel grid on the feature map, there will be 3 anchors for prediction, find the most suitable anchor, and give the corresponding offset, which is the prediction frame. YOLOv3 gives 4 values for each prediction box, t x , ty , t w , th h , and the mapping relationship between the 4 values and the final predicted bbox is shown in formulas (1.1) to (1.4).

bx=δ(tx)+cx(1.1)b x =δ(t x )+c x (1.1)

by=δ(ty)+cy(1.2)b y =δ( ty )+ cy (1.2)

Figure BDA0002529657220000051
Figure BDA0002529657220000051

Figure BDA0002529657220000052
Figure BDA0002529657220000052

公式(1.1)~(1.4)为yolov3中输出值和预测框之间的映射公式。Formulas (1.1) to (1.4) are the mapping formulas between the output value and the prediction frame in yolov3.

其中,tx、ty分别表示坐标偏移的值,tw、th则表示为预测的尺度的缩放,其中pw、ph分别表示anchor的宽和高。δ(tx)、δ(ty)用于表示某个目标的中心点相对负责检测这个目标的网格的偏移量,如图3中所标明,其中Cx,Cy表示中心点坐标grid cell的左上角坐标,最终所获得的bx、bx、bw、bh、为相对于特征图的中心点坐标和宽高。YOLOv3的损失函数如公式(1.5)所示。Among them, t x and ty represent the value of the coordinate offset, respectively, tw and th represent the scaling of the predicted scale, and p w and ph represent the width and height of the anchor, respectively. δ(t x ), δ(t y ) are used to represent the offset of the center point of a certain target relative to the grid responsible for detecting the target, as indicated in Figure 3, where C x , C y represent the coordinates of the center point The coordinates of the upper left corner of the grid cell, and the finally obtained b x , b x , b w , and b h are relative to the center point coordinates and width and height of the feature map. The loss function of YOLOv3 is shown in Equation (1.5).

Figure BDA0002529657220000053
Figure BDA0002529657220000053

Figure BDA0002529657220000061
Figure BDA0002529657220000061

Figure BDA0002529657220000062
Figure BDA0002529657220000062

Figure BDA0002529657220000063
Figure BDA0002529657220000063

Figure BDA0002529657220000064
Figure BDA0002529657220000064

Figure BDA0002529657220000065
Figure BDA0002529657220000065

Figure BDA0002529657220000066
Figure BDA0002529657220000066

YOLOv3损失函数包含中心坐标损失Losscenter,式(1.5.1)、宽高损失Losswh式(1.5.2)、置信度损失Lossscore式(1.5.3~1.5.4)、类别损失Lossclass式(1.5.4)共4个部分的损失。式(1.5)中各变量的含义如下:YOLOv3 loss function includes center coordinate loss Loss center , formula (1.5.1), width and height loss Loss wh formula (1.5.2), confidence loss Loss score formula (1.5.3 ~ 1.5.4), category loss Loss class formula (1.5.4) Loss of 4 parts in total. The meaning of each variable in formula (1.5) is as follows:

其中,SxS为网络划分图片的网格数,B为每个网格预测的边界框数目,

Figure BDA0002529657220000067
为网格i中第j个边界框的预测。其中各个部分公式中各变量的含义则分别为:公式(1.5.1)λcoord为动态参数,
Figure BDA0002529657220000068
为中心坐标的真值,Cxyi为预测值;公式(1.5.2)中,
Figure BDA0002529657220000069
Figure BDA00025296572200000610
表示该目标宽度和高度的真实值,wi和hi分别表示网络预测该目标的高度和宽度;公式(1.5.3)和公式(1.5.4),由包含目标时的置信度损失1.5.3和不含目标时的置信度损失1.5.4两个部分构成,其中λnoobj为不含目标时网络的误差的系数,
Figure BDA00025296572200000611
和Ci分别代表检测目标的置信度真值和网络预测置信度;式(1.5.5)中
Figure BDA00025296572200000612
为检测目标概率的真值。Among them, SxS is the number of grids that the network divides the picture, B is the number of bounding boxes predicted by each grid,
Figure BDA0002529657220000067
is the prediction of the jth bounding box in grid i. The meaning of each variable in each part of the formula is: formula (1.5.1) λ coord is a dynamic parameter,
Figure BDA0002529657220000068
is the true value of the center coordinate, and Cxy i is the predicted value; in formula (1.5.2),
Figure BDA0002529657220000069
and
Figure BDA00025296572200000610
Represents the true value of the width and height of the target, w i and hi represent the height and width of the target predicted by the network respectively; formula (1.5.3) and formula (1.5.4), the confidence loss when the target is included is 1.5. 3 and the confidence loss 1.5.4 without the target are composed of two parts, where λ noobj is the coefficient of the error of the network without the target,
Figure BDA00025296572200000611
and C i represent the true value of the confidence of the detection target and the confidence of the network prediction, respectively; in formula (1.5.5)
Figure BDA00025296572200000612
is the true value of the detection target probability.

(二):基于预测框左上角顶点偏移的点目标检测:(2): Point target detection based on the offset of the upper left corner of the prediction frame:

在检测非水平矩形目标时,原始YOLO通过预测中心点和宽高来得到最终的水平的预测框,这就导致如图4所示的问题,图4中为两艘停泊在港口的航母,原始YOLO在理想状态下预测框为黄色框,其中包含了大量与目标无关的其他信息,同时两艘航母的预测框高度重叠,在非极大值抑制NMS的时候容易剔除另外的预测而导致漏检的发生。如文献[9]Neubeck A,Gool L JV.Efficient Non-Maximum Suppression[C].InternationalConference on Pattern Recognition,2006:850-855中记载的技术方案。而理想的预测框则应该为红色的旋转矩形预测框,但这无疑超出了YOLO的能力范围。When detecting non-horizontal rectangular targets, the original YOLO obtains the final horizontal prediction frame by predicting the center point and width and height, which leads to the problem shown in Figure 4. In Figure 4, there are two aircraft carriers berthed in the port. The original In an ideal state, YOLO's prediction box is a yellow box, which contains a lot of other information unrelated to the target. At the same time, the prediction boxes of the two aircraft carriers overlap in height. When the non-maximum value suppresses NMS, it is easy to eliminate other predictions and cause missed detection. happened. Such as the technical solution described in the document [9] Neubeck A, Gool L JV. Efficient Non-Maximum Suppression [C]. International Conference on Pattern Recognition, 2006: 850-855. The ideal prediction box should be a red rotated rectangular prediction box, but this is undoubtedly beyond the scope of YOLO's capabilities.

(三):本发明一种基于YOLO的关键点目标检测方法:(3): a kind of key point target detection method based on YOLO of the present invention:

不仅用于使YOLO获得检测点目标的能力,同时在具体应用中,可以解决图4所示的旋转目标检测问题,在检测出关键点后,连接各个关键点即可得到更加准确的红色预测框。方案的流程图如图5(1)、图5(2)所示:首先制作数据集,之后设计基于预测框左上角顶点偏移的点目标的检测算法,设计损失函数,训练模型,在检测时通过预测框和点偏移量得出点目标。Not only is it used to enable YOLO to obtain the ability to detect point targets, but also in specific applications, it can solve the rotating target detection problem shown in Figure 4. After the key points are detected, connect each key point to get a more accurate red prediction frame . The flow chart of the scheme is shown in Figure 5(1) and Figure 5(2): first create a data set, then design a detection algorithm based on the offset of the top left corner of the prediction frame, design a loss function, train the model, and then detect When the point target is obtained by the prediction box and the point offset.

本发明中点目标检测的原理如图6所示,该图6中关键点的个数为4,图6中虚线框为anchor,(pw,ph)为anchor的宽和高,蓝色框为目标预测框,4个红色箭头分别表示欲检测目标的4个关键点相对预测框左上角顶点偏移的距离,绿色框则为在旋转矩形检测中最后得到的旋转目标框。其中,各个关键点相对于预测框左上角的偏移量的距离的公式如公式(1.8)~(2.1)所示。The principle of midpoint target detection in the present invention is shown in FIG. 6 , the number of key points in FIG. 6 is 4, the dotted box in FIG. 6 is the anchor, (p w , p h ) is the width and height of the anchor, and the blue The frame is the target prediction frame, the 4 red arrows represent the offset distance of the 4 key points of the target to be detected relative to the top left corner of the prediction frame, and the green frame is the final rotation target frame obtained in the rotation rectangle detection. The formulas for the distance of each key point relative to the offset of the upper left corner of the prediction frame are shown in formulas (1.8) to (2.1).

本发明中模型对每个预测框,会输出tx、ty、tw、th以及4组偏移量,其中,tx、ty、tw、th用于得到预测框,即为蓝色包围框,故公式(1.6)~(1.7)首先求出预测框的宽高(bw,bh),再通过预测框的(bw,bh)得到4组偏移量,如公式(1.8)~(2.1)所示。其中,D1x,D1y为D1点相对于预测框左上角的在x轴和y轴方向上的偏移距离。同理,D2x,D2y、D3x,D3y、D4x,D4y分别表示目标关键点D2、D3、D4到预测框左上角顶点的偏移距离。The model in the present invention outputs t x , ty , t w , th and 4 sets of offsets for each prediction frame, wherein t x , ty , t w , and th are used to obtain the prediction frame, that is, It is a blue bounding box, so formulas (1.6)~(1.7) first calculate the width and height of the prediction frame (b w , b h ), and then obtain 4 sets of offsets through the prediction frame (b w , b h ), As shown in formulas (1.8) to (2.1). Among them, D1 x , D1 y are the offset distances of the D1 point relative to the upper left corner of the prediction frame in the x-axis and y-axis directions. Similarly, D2 x , D2 y , D3 x , D3 y , D4 x , and D4 y represent the offset distances from the target key points D2, D3, and D4 to the top-left corner vertex of the prediction frame, respectively.

Figure BDA0002529657220000071
Figure BDA0002529657220000071

Figure BDA0002529657220000072
Figure BDA0002529657220000072

D1X=δ(tx1)·bw D1y=δ(ty1)·bh (1.8);D1 X =δ(t x1 )·b w D1 y =δ(t y1 )·b h (1.8);

D2X=δ(tx2)·bw D2y=δ(ty2)·bh (1.9);D2 X =δ(t x2 )·b w D2 y =δ(t y2 )·b h (1.9);

D3X=δ(tx3)·bw D3y=δ(ty3)·bh (2.0);D3 X =δ(t x3 )·b w D3 y =δ(t y3 )·b h (2.0);

D4X=δ(tx4)·bw D4y=δ(ty4)·bh (2.1)。D4 X =δ(t x4 )·b w D4 y =δ(t y4 )·b h (2.1).

公式(1.6)~(2.1)为基于预测框左上角顶点偏移算法的计算公式。Formulas (1.6) to (2.1) are the calculation formulas based on the algorithm of the vertex offset in the upper left corner of the prediction frame.

YOLO点目标检测中关键点的损失函数如公式(2.2)所示,该式子中关键点个数为4:The loss function of key points in YOLO point target detection is shown in formula (2.2), and the number of key points in this formula is 4:

Figure BDA0002529657220000073
Figure BDA0002529657220000073

Figure BDA0002529657220000081
Figure BDA0002529657220000081

若关键点增多,关键点损失函数将如公式(2.3)所示,式子中m为关键点的数量:If the number of key points increases, the key point loss function will be as shown in formula (2.3), where m is the number of key points:

Figure BDA0002529657220000082
Figure BDA0002529657220000082

公式(2.2)~公式(2.3)为基于预测框左上角顶点偏移算法的偏移量损失函数计算公式本发明是在原始YOLOv3的检测中增加了关键点的计算损失,因此本发明的最终损失函数应为:Formulas (2.2) to (2.3) are the calculation formulas of the offset loss function based on the top left corner vertex offset algorithm of the prediction frame. The present invention adds the calculation loss of key points in the detection of the original YOLOv3, so the final loss of the present invention is The function should be:

LossKeyPoint_offset=Lossyolov3+LossKeyPoint (2.4)。Loss KeyPoint_offset = Loss yolov3 + Loss KeyPoint (2.4).

公式(2.4)为基于预测框左上角顶点偏移算法的总的损失函数计算公式。The formula (2.4) is the calculation formula of the total loss function based on the vertex offset algorithm of the upper left corner of the prediction frame.

数据集的制作与处理:YOLO点目标检测中对数据集的处理关键在于训练数据集中添加关键点的位置信息,在原始标注框为水平矩形框的标注数据集上添加关键点的位置信息。即在标注数据中添加各个关键点到标注框左上角顶点的偏移距离(Δx,Δy)。需要注意,标注框左上角的顶点位置坐标为(LUx,LUy)需满足LUx小于所有关键点的x方向上的值,LUy小于所有关键点的y方向上的值。此时,各个关键点位置均为以标注框左上顶点为坐标轴原点时的坐标轴的第四象限。图7所示分别为原始YOLO中的训练数据的标注格式和关键点检测方案标注格式。此处的标记格式为关键点为4个时,当关键点的数目增加时,其对应的训练数据集的标注同样需增加对应关键点的偏移距离。The production and processing of the data set: The key to the processing of the data set in the YOLO point target detection is to add the position information of the key points in the training data set, and add the position information of the key points to the label data set whose original label frame is a horizontal rectangular frame. That is, add the offset distance (Δx, Δy) of each key point to the top left corner vertex of the annotation frame in the annotation data. It should be noted that the vertex position coordinates of the upper left corner of the label box are (LUx, LUy), which must satisfy that LUx is less than the value in the x direction of all key points, and LUy is less than the value in the y direction of all key points. At this time, the positions of each key point are the fourth quadrant of the coordinate axis when the upper left vertex of the callout box is the origin of the coordinate axis. Figure 7 shows the labeling format of the training data in the original YOLO and the labeling format of the keypoint detection scheme, respectively. The label format here is that when the number of key points is 4, when the number of key points increases, the labeling of the corresponding training data set also needs to increase the offset distance of the corresponding key points.

模型的检测:本发明基于预测框左上角顶点偏移量的点目标检测方案用于解决YOLO的点目标检测,首先,通过YOLO得到预测框,同时得到各个关键点与预测框左上角顶点的偏移量。将网络输出的对应各个关键点的偏移量(Δx,Δy)与预测框左上角顶点的坐标(LUx,LUy)相加,即可得到关键点的坐标位置。在具体的应用中,如点目标检测中,人脸五官特征点的检测,仅得到各个关键点位置即可,无需后续处理。而当本方案具体应用到旋转矩形目标的检测中时,因在旋转倾斜矩形目标的精准定位中,需要根据给出的4个关键点来绘制出精准的定位框。因此其流程如图8所示,图片输入到YOLO网络中,得到预测框bbox同时得到关键点到预测框左上角的偏移量,根据预测框左上角顶点和4个关键点的偏移量计算出关键点所在的位置,再将关键点进行连接,得到精准的红色定位框。Detection of the model: The point target detection scheme based on the offset of the upper left corner of the prediction frame of the present invention is used to solve the point target detection of YOLO. First, the prediction frame is obtained through YOLO, and the offset of each key point and the upper left corner of the prediction frame is obtained simultaneously. shift. The coordinate position of the key point can be obtained by adding the offset (Δx, Δy) of each key point output by the network to the coordinate (LUx, LUy) of the upper left corner of the prediction frame. In specific applications, such as point target detection, in the detection of facial features, only the position of each key point can be obtained without subsequent processing. When this solution is specifically applied to the detection of rotating rectangular targets, because in the precise positioning of rotating and inclined rectangular targets, it is necessary to draw an accurate positioning frame according to the given four key points. Therefore, the process is shown in Figure 8. The picture is input into the YOLO network, the prediction frame bbox is obtained, and the offset from the key point to the upper left corner of the prediction frame is obtained, which is calculated according to the offset of the upper left corner vertex of the prediction frame and the 4 key points. Find the location of the key points, and then connect the key points to get an accurate red positioning frame.

YOLO作为一种性能优异的通用目标检测系统,其在检测精度和检测速度上都有较好的表现,但是其水平预测框对于点目标的检测以及倾斜旋转矩形目标无法给出更好的解决方案,因此本发明通过基于YOLO预测框左上角顶点偏移的点目标检测算法,使YOLO具有检测点目标的能力,同时应用在旋转倾斜目标的检测中,从而达到拓展YOLO的目的。As a general target detection system with excellent performance, YOLO has good performance in detection accuracy and detection speed, but its horizontal prediction frame cannot provide a better solution for the detection of point targets and oblique rotation rectangular targets Therefore, the present invention enables YOLO to have the ability to detect point targets through the point target detection algorithm based on the offset of the upper left corner of the YOLO prediction frame, and is simultaneously applied in the detection of rotating and inclined targets, thereby achieving the purpose of expanding YOLO.

本发明提出一种基于预测框左上角偏移的点目标检测的方案,该方案可以同时输出目标框和特征点,解决关键点的检测,如人脸五官特征点、人体肢体关节点等,同时解决如倾斜视角下的交通标识牌、广告牌、高空遥感等旋转倾斜矩形目标的精准定位。The present invention proposes a point target detection scheme based on the offset of the upper left corner of the prediction frame. The scheme can output the target frame and feature points at the same time to solve the detection of key points, such as facial feature points, human limb joint points, etc., and at the same time It solves the precise positioning of rotating and inclined rectangular targets such as traffic signs, billboards, and high-altitude remote sensing under oblique viewing angles.

Claims (4)

1.基于YOLO的关键点目标检测方法,其特征在于包括以下步骤:1. The key point target detection method based on YOLO is characterized in that comprising the following steps: 步骤一、数据集的制作与处理:Step 1. Data set production and processing: 在原始标注框为水平矩形框的标注数据集上,添加各个关键点到标注框左上角顶点的偏移距离(Δx,Δy),标注框左上角的顶点位置坐标为(LUx,LUy),LUx小于所有关键点的x方向上的值,LUy小于所有关键点的y方向上的值,此时,各个关键点位置均为:以标注框左上顶点为坐标轴原点时坐标轴的第四象限;On the label dataset whose original label frame is a horizontal rectangular frame, add the offset distance (Δx, Δy) of each key point to the upper left corner vertex of the label frame, and the coordinates of the vertex position at the upper left corner of the label frame are (LUx, LUy), LUx is less than the value in the x direction of all key points, and LUy is less than the value in the y direction of all key points. At this time, the positions of each key point are: the fourth quadrant of the coordinate axis when the upper left vertex of the label box is the origin of the coordinate axis; 步骤二、基于预测框左上角顶点偏移量的点目标检测:Step 2. Point target detection based on the offset of the upper left corner vertex of the prediction frame: 首先通过YOLO得到预测框,同时得到各个关键点与预测框左上角顶点的偏移量,将网络输出的对应各个关键点的偏移量(Δx,Δy)与预测框左上角顶点的坐标(LUx,LUy)相加,即可得到关键点的坐标位置。First, the prediction frame is obtained through YOLO, and the offset between each key point and the upper left corner vertex of the prediction frame is obtained at the same time. , LUy) are added to obtain the coordinate position of the key point. 2.根据权利要求1所述基于YOLO的关键点目标检测方法,其特征在于:2. the key point target detection method based on YOLO according to claim 1, is characterized in that: 所述步骤一中,In the first step, 关键点的个数为4,各个关键点相对于预测框左上角的偏移量的距离的公式如(1.8)~(2.1)所示,模型对每个预测框,会输出tx、ty、tw、th以及4组偏移量,tx、ty、tw、th用于预测原始目标框,即为蓝色包围框bbox,所以通过公式(1.6)~(1.7)首先求出预测框的宽高(bw,bh),再通过预测框的(bw,bh)得到4组偏移量,如公式(1.8)~(2.1)所示;The number of key points is 4. The formula for the distance of each key point relative to the offset of the upper left corner of the prediction frame is shown in (1.8) to (2.1). The model outputs t x and ty for each prediction frame. , t w , t h and 4 sets of offsets, t x , ty , t w , th h are used to predict the original target frame, which is the blue bounding frame bbox, so through formulas (1.6) ~ (1.7) first Find the width and height of the prediction frame (b w , b h ), and then obtain 4 sets of offsets through the prediction frame (b w , b h ), as shown in formulas (1.8) to (2.1);
Figure FDA0002529657210000011
Figure FDA0002529657210000011
Figure FDA0002529657210000012
Figure FDA0002529657210000012
D1X=δ(tx1)·bw D1y=δ(ty1)·bh (1.8)D1 X =δ(t x1 )·b w D1 y =δ(t y1 )·b h (1.8) D2X=δ(tx2)·bw D2y=δ(ty2)·bh (1.9)D2 X =δ(t x2 )·b w D2 y =δ(t y2 )·b h (1.9) D3X=δ(tx3)·bw D3y=δ(ty3)·bh (2.0)D3 X =δ(t x3 )·b w D3 y =δ(t y3 )·b h (2.0) D4X=δ(tx4)·bw D4y=δ(ty4)·bh (2.1)D4 X =δ(t x4 )·b w D4 y =δ(t y4 )·b h (2.1) 其中:D1x,D1y为D1点相对于预测框左上角的在x轴和y轴方向上的偏移距离;同理,D2x,D2y、D3x,D3y、D4x,D4y分别表示目标关键点D2、D3、D4到预测框左上角顶点的偏移距离。Among them: D1 x , D1 y is the offset distance of D1 point relative to the upper left corner of the prediction frame in the x-axis and y-axis directions; similarly, D2 x , D2 y , D3 x , D3 y , D4 x , D4 y Respectively represent the offset distances from the target key points D2, D3, and D4 to the top left corner of the prediction frame.
3.根据权利要求2所述基于YOLO的关键点目标检测方法,其特征在于:YOLO点目标检测中关键点的损失函数如公式(2.2)所示,该式子中关键点个数为4:3. the key point target detection method based on YOLO according to claim 2, is characterized in that: the loss function of key point in YOLO point target detection is as shown in formula (2.2), and the number of key points in this formula is 4:
Figure FDA0002529657210000021
Figure FDA0002529657210000021
若关键点增多,关键点损失函数将如(2.3)所示,式子中m为关键点的数量:If the number of key points increases, the key point loss function will be as shown in (2.3), where m is the number of key points:
Figure FDA0002529657210000022
Figure FDA0002529657210000022
YOLO点目标检测是在原始YOLOv3的检测中增加了关键点的计算损失,因此最终损失函数为:YOLO point target detection is to increase the calculation loss of key points in the original YOLOv3 detection, so the final loss function is: LossKeyPoint_offset=Lossyolov3+LossKeyPoint (2.4)。Loss KeyPoint_offset = Loss yolov3 + Loss KeyPoint (2.4).
4.根据权利要求2所述基于YOLO的关键点目标检测方法,其特征在于:4. the key point target detection method based on YOLO according to claim 2, is characterized in that: 所述步骤二中,图片输入到YOLO网络中,得到预测框的同时得到关键点到预测框左上角的偏移量,根据预测框左上角顶点和4个关键点的偏移量,计算出关键点所在的位置,再将关键点进行连接,得到精准的定位框。In the second step, the picture is input into the YOLO network, the offset from the key point to the upper left corner of the prediction frame is obtained while the prediction frame is obtained, and the key point is calculated according to the offset of the upper left corner vertex of the prediction frame and the four key points. The position of the point, and then the key points are connected to obtain an accurate positioning frame.
CN202010514432.XA 2020-06-08 2020-06-08 YOLO-based key point target detection method Active CN111814827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010514432.XA CN111814827B (en) 2020-06-08 2020-06-08 YOLO-based key point target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010514432.XA CN111814827B (en) 2020-06-08 2020-06-08 YOLO-based key point target detection method

Publications (2)

Publication Number Publication Date
CN111814827A true CN111814827A (en) 2020-10-23
CN111814827B CN111814827B (en) 2024-06-11

Family

ID=72844777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010514432.XA Active CN111814827B (en) 2020-06-08 2020-06-08 YOLO-based key point target detection method

Country Status (1)

Country Link
CN (1) CN111814827B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256724A (en) * 2021-07-07 2021-08-13 上海影创信息科技有限公司 Handle inside-out vision 6-degree-of-freedom positioning method and system
CN113255671A (en) * 2021-07-05 2021-08-13 浙江智慧视频安防创新中心有限公司 Target detection method, system, device and medium for object with large length-width ratio
CN113420774A (en) * 2021-03-24 2021-09-21 成都理工大学 Target detection technology for irregular graph
CN113537158A (en) * 2021-09-09 2021-10-22 科大讯飞(苏州)科技有限公司 Image target detection method, device, equipment and storage medium
CN113537342A (en) * 2021-07-14 2021-10-22 浙江智慧视频安防创新中心有限公司 Object detection method, device, storage medium and terminal in an image
CN113888741A (en) * 2021-12-06 2022-01-04 智洋创新科技股份有限公司 Method for correcting rotating image of instrument in power distribution room
CN114219991A (en) * 2021-12-06 2022-03-22 安徽省配天机器人集团有限公司 Target detection method, device and computer readable storage medium
WO2023184123A1 (en) * 2022-03-28 2023-10-05 京东方科技集团股份有限公司 Detection method and device for violation of rules and regulations

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229442A (en) * 2018-02-07 2018-06-29 西南科技大学 Face fast and stable detection method in image sequence based on MS-KCF
CN108960340A (en) * 2018-07-23 2018-12-07 电子科技大学 Convolutional neural networks compression method and method for detecting human face
CN109117831A (en) * 2018-09-30 2019-01-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
CN110490256A (en) * 2019-08-20 2019-11-22 中国计量大学 A vehicle detection method based on key point heat map
CN110580445A (en) * 2019-07-12 2019-12-17 西北工业大学 An Improved Face Keypoint Detection Method Based on GIoU and Weighted NMS
CN110930454A (en) * 2019-11-01 2020-03-27 北京航空航天大学 Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229442A (en) * 2018-02-07 2018-06-29 西南科技大学 Face fast and stable detection method in image sequence based on MS-KCF
CN108960340A (en) * 2018-07-23 2018-12-07 电子科技大学 Convolutional neural networks compression method and method for detecting human face
CN109117831A (en) * 2018-09-30 2019-01-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
CN110580445A (en) * 2019-07-12 2019-12-17 西北工业大学 An Improved Face Keypoint Detection Method Based on GIoU and Weighted NMS
CN110490256A (en) * 2019-08-20 2019-11-22 中国计量大学 A vehicle detection method based on key point heat map
CN110930454A (en) * 2019-11-01 2020-03-27 北京航空航天大学 Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420774A (en) * 2021-03-24 2021-09-21 成都理工大学 Target detection technology for irregular graph
CN113255671A (en) * 2021-07-05 2021-08-13 浙江智慧视频安防创新中心有限公司 Target detection method, system, device and medium for object with large length-width ratio
CN113256724A (en) * 2021-07-07 2021-08-13 上海影创信息科技有限公司 Handle inside-out vision 6-degree-of-freedom positioning method and system
CN113537342A (en) * 2021-07-14 2021-10-22 浙江智慧视频安防创新中心有限公司 Object detection method, device, storage medium and terminal in an image
CN113537158A (en) * 2021-09-09 2021-10-22 科大讯飞(苏州)科技有限公司 Image target detection method, device, equipment and storage medium
CN113888741A (en) * 2021-12-06 2022-01-04 智洋创新科技股份有限公司 Method for correcting rotating image of instrument in power distribution room
CN114219991A (en) * 2021-12-06 2022-03-22 安徽省配天机器人集团有限公司 Target detection method, device and computer readable storage medium
WO2023184123A1 (en) * 2022-03-28 2023-10-05 京东方科技集团股份有限公司 Detection method and device for violation of rules and regulations

Also Published As

Publication number Publication date
CN111814827B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
CN111814827A (en) Keypoint target detection method based on YOLO
CN109598241B (en) Recognition method of ships at sea based on satellite imagery based on Faster R-CNN
CN111553347B (en) Scene text detection method oriented to any angle
CN110728200A (en) Real-time pedestrian detection method and system based on deep learning
Wang et al. Led2-net: Monocular 360deg layout estimation via differentiable depth rendering
CN111968177A (en) Mobile robot positioning method based on fixed camera vision
CN110991444B (en) License plate recognition method and device for complex scene
CN112560675B (en) Bird Vision Object Detection Method Combining YOLO and Rotation-Fusion Strategy
CN103761747B (en) Target tracking method based on weighted distribution field
CN111476089A (en) Pedestrian detection method, system and terminal based on multi-mode information fusion in image
Zhu et al. Arbitrary-oriented ship detection based on retinanet for remote sensing images
CN113255555A (en) Method, system, processing equipment and storage medium for identifying Chinese traffic sign board
CN108022243A (en) Method for detecting paper in a kind of image based on deep learning
CN107948586A (en) Trans-regional moving target detecting method and device based on video-splicing
CN111767854B (en) SLAM loop detection method combined with scene text semantic information
Wang et al. Led 2-net: Monocular 360 layout estimation via differentiable depth rendering
CN115661255B (en) A laser SLAM loop detection and correction method
CN104200213A (en) Vehicle detection method based on multiple parts
CN104050674B (en) Salient region detection method and device
CN116363168A (en) A remote sensing video target tracking method and system based on super-resolution network
Li et al. Improved YOLOv5s algorithm for small target detection in UAV aerial photography
CN112598055B (en) Helmet wearing detection method, computer-readable storage medium and electronic device
CN112417958B (en) Remote sensing image rotating target detection method
Osuna-Coutiño et al. Structure extraction in urbanized aerial images from a single view using a CNN-based approach
CN116524026A (en) Dynamic vision SLAM method based on frequency domain and semantics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240131

Address after: 1003, Building A, Zhiyun Industrial Park, No. 13 Huaxing Road, Tongsheng Community, Dalang Street, Longhua District, Shenzhen City, Guangdong Province, 518000

Applicant after: Shenzhen Wanzhida Enterprise Management Co.,Ltd.

Country or region after: China

Address before: 443002 No. 8, University Road, Xiling District, Yichang, Hubei

Applicant before: CHINA THREE GORGES University

Country or region before: China

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240501

Address after: No. 102, Commercial Building, Yuehu Park, No. 140 Hongshan Road, Hongshan Street, Kaifu District, Changsha City, Hunan Province, 410000

Applicant after: Hunan Feifei Animation Co.,Ltd.

Country or region after: China

Address before: 1003, Building A, Zhiyun Industrial Park, No. 13 Huaxing Road, Tongsheng Community, Dalang Street, Longhua District, Shenzhen City, Guangdong Province, 518000

Applicant before: Shenzhen Wanzhida Enterprise Management Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant