CN111814827A - Key point target detection method based on YOLO - Google Patents

Key point target detection method based on YOLO Download PDF

Info

Publication number
CN111814827A
CN111814827A CN202010514432.XA CN202010514432A CN111814827A CN 111814827 A CN111814827 A CN 111814827A CN 202010514432 A CN202010514432 A CN 202010514432A CN 111814827 A CN111814827 A CN 111814827A
Authority
CN
China
Prior art keywords
frame
yolo
key point
left corner
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010514432.XA
Other languages
Chinese (zh)
Other versions
CN111814827B (en
Inventor
徐光柱
屈金山
万秋波
雷帮军
石勇涛
夏平
陈鹏
吴正平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Feifei Animation Co ltd
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN202010514432.XA priority Critical patent/CN111814827B/en
Publication of CN111814827A publication Critical patent/CN111814827A/en
Application granted granted Critical
Publication of CN111814827B publication Critical patent/CN111814827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The key point target detection method based on the YOLO comprises the following steps of data set production and processing: on the labeling data set of which the original labeling frame is a horizontal rectangular frame, adding offset distances (delta x, delta y) from each key point to a vertex at the top left corner of the labeling frame, wherein the vertex position coordinate at the top left corner of the labeling frame is (LUx, lue), which satisfies LUx being smaller than the values of all key points in the x direction, LUy being smaller than the values of all key points in the y direction, and at this time, the positions of each key point are: and the fourth quadrant of the coordinate axis when the upper left vertex of the marking frame is taken as the origin of the coordinate axis. Point target detection based on the top left corner vertex offset of the prediction frame: and obtaining a prediction frame through YOLO, obtaining the offset of each key point and the top left corner vertex of the prediction frame, and adding the offset (delta x, delta y) output by the network and corresponding to each key point to the coordinates (LUx, LUy) of the top left corner vertex of the prediction frame to obtain the coordinate position of each key point.

Description

Key point target detection method based on YOLO
Technical Field
The invention relates to the technical field of target detection, in particular to a key point target detection method based on YOLO.
Background
Visual target detection based on deep learning techniques has been a great deal of development in recent years, but still has many challenging problems. Firstly, the existing visual target detection model outputs all bounding boxes of targets, and detection of key points of the targets is lacked, such as feature points of five sense organs in human face detection, joint points of limbs in human body detection, and the like. On the other hand, the current target detection algorithm is always a difficult point for detecting a rotating target, and the prediction frames of the current target detection are all horizontal rectangular surrounding frames. There are two main reasons: 1) most targets in target detection can meet the condition by using a horizontal rectangular frame, the condition has a great relation with the observation visual angle, and most targets observed from the angle of standing of people are horizontal rectangles. 2) The training of the deep learning model is highly dependent on the labeling of the data set, and the labeling frame of most data sets is still a horizontal rectangular frame at present.
With the continuous development of target detection technology, it is realized that locating targets through key points is a feasible solution, and document [1 ]]Law H,Deng J.CornerNet:Detecting Objects as PairedKeypoints[J]International Journal of Computer Vision, 2020, 128 (3): 642- "656", a method for predicting the top left corner and the bottom right corner of a target respectively is proposed, where the target is located by a rectangular frame formed by the two key corner points, which is simpler than the center point prediction method, but it still obtains a horizontal rectangular frame and does not output a point target. Document [2 ]]Zhou X,Wang D,
Figure BDA0002529657220000011
Objects as points, arxive-prints, 2019: arXiv: 1904.07850 in document [1 ]]On the basis of the method, a central key point is added, three key points are used for detecting the target, the accuracy and the recall rate are improved, but the essence is that 3 key points are still used for determining a prediction frame, and finally, the key points are not output.
Chinese patent (CN201810363952.8) proposes a palm detection and key point positioning method based on deep learning, which utilizes a FasterR-CNN network to train, obtains a palm contour candidate frame and positions a palm key point during detection, then adjusts a candidate frame threshold value, and screens the best palm image with key point positioning from the candidate frame.
Also, the Detection of rotating and tilting targets is also of interest, and documents [3] Ma J, Shao W, Ye H, et, armrest-Oriented Scene Text Detection via Rotation probes [ J ]. ieee transactions on Multimedia, 2018, 20 (11): 3111-. However, RRPN suffers from too slow a speed.
Document [4] Yang X, Liu Q, Yan J, et al.R3Det: refired Single-Stage Detector with Feature Refraction for marking object. arXiv e-prints.2019: arXiv: 1908.05612. aiming at the problems of RRPN, a single-stage detection frame is constructed by RetinaNet, and a one-stage detection result is refined by using a RefineDet thought, so that the speed is improved. Chinese patent '201910381699.3' proposes a ship multi-target detection method based on rotation region extraction, by labeling the rotation target, the final detection result is obtained by calculating the rotation and translation ratio of the preselection frame with the highest confidence coefficient and other preselection frames, but the detection precision is difficult to guarantee.
Document [5] Redmon J, Divvala S K, Girshick R, et al, You Only Look one: unifield, Real-Time Object Detection [ C ]. computer vision and pattern recognition, 2016: 779-788 (youonlylook once) is a single neural network-based target detection system proposed in 2015 by Joseph Redmon and Ali faradai et al. The YOLO provides a one-stage idea for ensuring the detection efficiency, which is different from the two-stage algorithms such as R-CNN and the like that an area suggestion needs to be generated, and the speed is slow due to the consumption of computing power. YOLO reaches 45fps on GPU while its simplified version reaches 155 fps. Then YOLO successively proposed YOL09000 and YOLOv3 for improving accuracy. Such as: document [6] Redmon J, Farhadi A. YOL09000: better, Faster, Stronger [ C ]. IEEE Conference on Computer Vision & Pattern Recognition, 2017: 7263-7271. Document [7] Redmon J, Farhadi A. YOLOv3: arxive-prints, 2018: arXiv: 1804.02767.
YOLO as a general target detection system with excellent performance, its advantage in speed ensures its feasibility of application in engineering, so people try to use YOLO to solve related problems, but original YOLO outputs only a horizontal rectangular frame as a target frame in target detection. Thus, documents [8] Lei J, Gao C, Hu J, et al, organic adaptive Yolov3 for Object Detection in Remote Sensing Images [ C ], 2019: 586- & 597, a method of expanding YOLO is proposed to solve the positioning problem of the rotating rectangular object, and a theta output, that is, the rotation angle of the prediction box, is added to the output of YOLO, but this method can only solve the planar rotation problem of the rectangle, and for irregular rectangles, such as the rectangular object similar to trapezoid after the internal rotation, can not be accurately positioned only by the rotation. Meanwhile, the Chinese patent 'CN 201910707178.2' also provides a rotation rectangular target detection method based on YOLOv3, the detection target is set to be a 5-bit vector (x, y, w, h and theta), an angle theta is added, and an anchor point with a rotation angle is used for detecting the rotation target. The Chinese patent 'CN 201910879419.1' proposes an underwater target detection algorithm based on an improved YOLO algorithm, a new loss function is designed, and the length-width ratio information of an object is added into the loss function, so that the detection effect on the conditions of the rotation and the rollover of the underwater object is improved, but the related scenes are limited. Chinese patent "CN 201910856434.4" proposes a license plate positioning and recognition method based on a YOLO model, wherein in order to improve the positioning accuracy of the license plate, an improved YOLO convolutional neural network and a convolution-enhanced SRCNN (super resolution) convolutional neural network are trained, and during the training of the YOLO convolutional neural network, a maxout activation function is adopted to replace the activation function of an original model, so that the fitting capability is enhanced.
The improved method for the YOLO model improves the capability of the YOLO model for detecting the target in a complex scene to a certain extent. However, YOLO still has the following problems: 1) for the detection of visual targets with key points, the detection of key points is also important, such as the features of five sense organs in human face detection, the joint points of limbs in human body detection, etc., and YOLO lacks the detection of these key points. 2) In reality, many irregular rectangles exist, and rectangular objects with large length-width ratio under the rotation angle caused by different visual angles, such as license plates at various angles, vehicles shot in the air, ships and other targets. The prediction box of YOLO for these rotated and tilted rectangular objects will contain a large amount of redundant information that is not related to the object.
Disclosure of Invention
In view of the above technical problems, the present invention provides a method for detecting a key point target based on YOLO, which adds a detection algorithm of a point target on the basis of an original YOLO, so that the YOLO has the capability of detecting the point target, and can simultaneously output a target detection frame and key points, and simultaneously realize accurate positioning of a rotating rectangular object in specific applications.
The technical scheme adopted by the invention is as follows:
the key point target detection method based on the YOLO comprises the following steps:
step one, manufacturing and processing a data set:
on the labeling data set of which the original labeling frame is a horizontal rectangular frame, adding offset distances (delta x, delta y) from each key point to a vertex at the top left corner of the labeling frame, wherein the vertex position coordinate at the top left corner of the labeling frame is (LUx, lue), which satisfies LUx being smaller than the values of all key points in the x direction, LUy being smaller than the values of all key points in the y direction, and at this time, the positions of each key point are: and the fourth quadrant of the coordinate axis when the upper left vertex of the marking frame is taken as the origin of the coordinate axis.
Step two, point target detection based on the top left corner vertex offset of the prediction frame:
firstly, obtaining a prediction frame through YOLO, simultaneously obtaining the offset of each key point and the top left corner vertex of the prediction frame, and adding the offset (delta x, delta y) output by the network and corresponding to each key point to the coordinates (LUx, LUy) of the top left corner vertex of the prediction frame to obtain the coordinate position of the key point.
The invention relates to a key point target detection method based on YOLO, which has the advantages that:
1: and when the YOLO uses the prediction frame for positioning, the distance between each key point and the top left corner vertex of the prediction frame is predicted in parallel, and the position of each key point is obtained by combining the positions of the top left corner vertices of the prediction frame. Provides a feasible scheme for detecting the point target.
2: the point target detection scheme can solve the problems of inaccurate positioning and containing a large amount of redundant information during the detection of the rotating inclined rectangular target. The scheme of accurate positioning is provided for inclined license plates, traffic signs, high-altitude ship remote sensing detection and the like.
3: the point target detection based on the vertex offset of the upper left corner of the prediction frame, which is provided by the invention, has universality and can be applied to all target detection tasks needing to output feature points and target frames.
4: the invention provides a point target detection scheme, which comprises the following steps: specifically, the position of the point target is obtained through the offset of the point target from the top left corner vertex of the initial prediction frame, so that the position of the key point is obtained. On the basis of original target detection, the detection capability of the one-stage algorithm is expanded by adding point output, so that the method can be suitable for more scenes, such as point target detection, accurate positioning of a rotating target and the like.
5: aiming at the specific implementation of the YOLO point target detection scheme, the invention designs a loss calculation function of the point target relative to the top left corner vertex offset of the prediction frame in the YOLO, thereby improving the loss function of the YOLO. Meanwhile, the output of the YOLO is expanded, so that the YOLO outputs the position information of the key points while outputting the target frame, the performance of the YOLO is stronger, and more application requirements are met.
Drawings
Fig. 1(1) shows that only face bounding boxes (lacking detection of key points) are obtained in the current face detection.
Fig. 1(2) is a schematic diagram of a prediction frame including redundant information in the inclined guideboard detection.
Fig. 1(3) is a schematic diagram of the present invention capable of simultaneously outputting a target frame and a key point in face detection.
Fig. 1(4) is a schematic diagram of the present invention for accurately detecting an inclined guideboard through key point detection.
Fig. 2 is a schematic view of a YOLO target detection process.
FIG. 3 is a graph of the position prediction of YOLOV3 versus the anchor.
FIG. 4 is a diagram illustrating the effect of a rotating rectangular box for the YOLO detection.
FIG. 5(1) is a flow chart of data set generation and algorithm design according to the present invention;
FIG. 5(2) is a flow chart of detecting targets by the model of the present invention.
FIG. 6 is a graph of keypoint locations by offset.
FIG. 7 is a YOLOV3 and keypoint schema markup format diagram.
FIG. 8 is a flow chart of the specific application of the bit YOLO in the detection of the rotating rectangular target.
Detailed Description
The key point target detection method based on the YOLO adds a detection algorithm of a point target on the basis of the original YOLO, so that the YOLO has the capability of detecting the point target, the YOLO can simultaneously output a target detection frame and key points, and meanwhile, the accurate positioning of a rotating rectangular object is realized in specific application, as shown in fig. 1(1) to fig. 1 (4).
Fig. 1(1) is a schematic diagram of only a face bounding box obtained in the current face detection, and lacks detection of key points.
Fig. 1(2) is a schematic diagram of the prediction frame including redundant information in the inclined guideboard detection, which is not ideal.
Fig. 1(3) is a schematic diagram of outputting a target frame and a key point simultaneously in face detection according to the scheme of the present invention.
Fig. 1(4) is a schematic diagram of the scheme of the invention for realizing accurate detection of the inclined guideboard through key point detection.
(I): the core idea of YOLO, as shown in fig. 2, is that YOLO divides an input image into SxS grids, and if the center point of an object is actually labeled in a certain grid, the grid detects the object. The principle of YOLO target detection, here, taking the principle of YOLO 3 target border prediction as an example, when YOLO 3 predicts, 9 anchors are generated by k-means clustering before training, and 3 anchors are provided for each scale corresponding to the output feature map of 3 scales. At the network input size of 416, the feature map sizes of the outputs of YOLOv3 in 3 scales are 13 × 13, 26 × 26, and 52 × 52, respectively, for detecting targets in three scales, large, medium, and small, respectively. For the feature map under each scale, YOLOv3 gives 3 anchors, and for each pixel grid on the feature map, there are 3 anchors to predict, find the most suitable anchor, give the corresponding offset, that is, the prediction frame. YOLOv3 gives 4 values, t, for each prediction boxx、ty、tw、thAnd the 4 values and the final predicted bbox mapping relationship are as shown in equations (1.1) to (1.4).
bx=(tx)+cx(1.1)
by=(ty)+cy(1.2)
Figure BDA0002529657220000051
Figure BDA0002529657220000052
Equations (1.1) to (1.4) are the mapping equations between the output values and the prediction boxes in yolov 3.
Wherein, tx、tyRespectively representing the value of the coordinate offset, tw、thThen it is expressed as a scaling of the predicted scale, where pw、phRepresenting the width and height of the anchor, respectively. (t)x)、(ty) For indicating the offset of the centre point of a certain object with respect to the grid responsible for detecting this object, as indicated in fig. 3, where Cx,CyRepresenting the coordinate of the upper left corner of the grid cell of the central point coordinate, and finally obtaining the bx、bx、bw、bhAnd is the center point coordinate and width height relative to the feature map. The loss function of YOLOv3 is shown in equation (1.5).
Figure BDA0002529657220000053
Figure BDA0002529657220000061
Figure BDA0002529657220000062
Figure BDA0002529657220000063
Figure BDA0002529657220000064
Figure BDA0002529657220000065
Figure BDA0002529657220000066
The Yolov3 Loss function contains the Loss of center coordinates LosscenterWide height loss of the formula (1.5.1)LosswhEquation (1.5.2), Loss of confidence LossscoreFormula (1.5.3-1.5.4) and class LossclassThe loss of 4 fractions in equation (1.5.4). The variables in formula (1.5) have the following meanings:
wherein SxS is the number of meshes of the network partition picture, B is the number of bounding boxes predicted by each mesh,
Figure BDA0002529657220000067
is the prediction of the jth bounding box in grid i. Wherein the meanings of the variables in the partial formulas are respectively as follows: formula (1.5.1) λcoordIn order to be a dynamic parameter,
Figure BDA0002529657220000068
cxy being the true value of the center coordinateiIs a predicted value; in the formula (1.5.2),
Figure BDA0002529657220000069
and
Figure BDA00025296572200000610
true values, w, representing the width and height of the targetiAnd hiRespectively representing the height and width of the target predicted by the network; equation (1.5.3) and equation (1.5.4), consisting of a confidence loss 1.5.3 with target and a confidence loss 1.5.4 without target, where λnoobjThe coefficients for the error of the network when not containing the target,
Figure BDA00025296572200000611
and CiRespectively representing a confidence truth value and a network prediction confidence of a detected target; in the formula (1.5.5)
Figure BDA00025296572200000612
Is the true value of the probability of detecting the target.
(II): point target detection based on the top left corner vertex offset of the prediction box:
when detecting a non-horizontal rectangular target, the original YOLO obtains a final horizontal prediction frame by predicting a central point and a width and a height, which causes the problem shown in fig. 4, two aircraft carriers parked at a port are shown, the original YOLO in an ideal state is a yellow frame, which contains a large amount of other information irrelevant to the target, and meanwhile, the prediction frames of the two aircraft carriers are highly overlapped, so that when the non-maximum value inhibits NMS, another prediction is easily removed, which causes a missing detection. Such as the document [9] Neubeck A, Gool L JV. efficient Non-Maximum Suppression [ C ]. International conference on Pattern Recognition, 2006: 850-855. The ideal prediction frame should be a red rotating rectangular prediction frame, but this is certainly beyond the capability of YOLO.
(III): the invention relates to a key point target detection method based on YOLO, which comprises the following steps:
the method is not only used for enabling the YOLO to obtain the capability of detecting a point target, but also can solve the problem of detecting a rotating target shown in FIG. 4 in specific application, and after the key points are detected, the more accurate red prediction frame can be obtained by connecting the key points. The scheme flow chart is shown in fig. 5(1) and fig. 5 (2): firstly, a data set is made, then a point target detection algorithm based on the top left corner vertex deviation of a prediction frame is designed, a loss function is designed, a model is trained, and a point target is obtained through the prediction frame and the point deviation amount during detection.
The principle of point target detection in the present invention is shown in fig. 6, where the number of key points in fig. 6 is 4, the dotted line frame in fig. 6 is anchor, (p)w,ph) The width and height of the anchor are shown, the blue frame is a target prediction frame, 4 red arrows respectively indicate the offset distance of 4 key points of a target to be detected relative to the top left corner of the prediction frame, and the green frame is a rotating target frame finally obtained in the rotating rectangle detection. The formula of the distance of each key point from the offset of the upper left corner of the prediction frame is shown in formulas (1.8) to (2.1).
In the invention, the model outputs t to each prediction framex、ty、tw、thAnd 4 sets of offsets, where tx、ty、tw、thTo obtain a prediction frame, i.e., a blue bounding frame, equations (1.6) to (1.7) first determine the prediction frameWidth and height (b)w,bh) Then through the prediction box (b)w,bh) 4 sets of offsets are obtained as shown in equations (1.8) - (2.1). Wherein, D1x,D1yIs the offset distance of point D1 in the x-axis and y-axis directions from the upper left corner of the prediction box. Similarly, D2x,D2y、D3x,D3y、D4x,D4yRespectively representing the offset distances of the target key points D2, D3 and D4 to the top left corner vertex of the prediction box.
Figure BDA0002529657220000071
Figure BDA0002529657220000072
D1X=(tx1)·bwD1y=(ty1)·bh(1.8);
D2X=(tx2)·bwD2y=(ty2)·bh(1.9);
D3X=(tx3)·bwD3y=(ty3)·bh(2.0);
D4X=(tx4)·bwD4y=(ty4)·bh(2.1)。
Equations (1.6) to (2.1) are calculation equations based on the prediction box top left corner vertex offset algorithm.
The loss function of the key points in the Yolo point target detection is shown in a formula (2.2), wherein the number of the key points in the formula is 4:
Figure BDA0002529657220000073
Figure BDA0002529657220000081
if the number of the key points is increased, the key point loss function is shown as the formula (2.3), wherein m is the number of the key points:
Figure BDA0002529657220000082
formula (2.2) -formula (2.3) are offset loss function calculation formulas based on the prediction box top left vertex offset algorithm, the calculation loss of key points is increased in the detection of the original YOLOv3, and therefore the final loss function of the invention is as follows:
LossKeyPoint_offset=Lossyolov3+LossKeyPoint(2.4)。
equation (2.4) is the overall penalty function calculation equation based on the prediction box top left vertex offset algorithm.
Making and processing a data set: the key to processing the data set in the YoLO point target detection is to add the position information of the key point in the training data set, and add the position information of the key point on the labeled data set of which the original labeled frame is a horizontal rectangular frame. Namely, adding the offset distance (Δ x, Δ y) from each key point to the top left vertex of the labeling box in the labeling data. It should be noted that the coordinates of the vertex position in the upper left corner of the labeling box (LUx, lue) need to satisfy LUx being smaller than the value of all the key points in the x direction and LUy being smaller than the value of all the key points in the y direction. At this time, each key point position is a fourth quadrant of the coordinate axis when the upper left vertex of the labeling frame is taken as the origin of the coordinate axis. Fig. 7 shows the labeling format of the training data and the labeling format of the keypoint detection scheme in the original YOLO, respectively. When the number of the key points is 4, the offset distance of the corresponding key point needs to be increased for labeling the corresponding training data set when the number of the key points is increased.
And (3) detection of the model: the point target detection scheme based on the offset of the top left corner vertex of the prediction frame is used for solving the point target detection of YOLO. And adding the offset (delta x, delta y) which is output by the network and corresponds to each key point to the coordinates (LUx, LUy) of the top left corner vertex of the prediction frame to obtain the coordinate position of the key point. In specific application, for example, in point target detection, the detection of facial feature points only needs to obtain the positions of all key points, and subsequent processing is not needed. When the scheme is specifically applied to detection of the rotating rectangular target, the accurate positioning frame needs to be drawn according to the given 4 key points in the accurate positioning of the rotating inclined rectangular target. Therefore, the process is as shown in fig. 8, the picture is input into the YOLO network, the offset from the key point to the upper left corner of the prediction frame is obtained while the prediction frame bbox is obtained, the position of the key point is calculated according to the vertex of the upper left corner of the prediction frame and the offsets of 4 key points, and then the key points are connected to obtain the accurate red positioning frame.
The YOLO has better performance on detection precision and detection speed as a general target detection system with excellent performance, but a horizontal prediction frame of the YOLO cannot provide a better solution for the detection of a point target and an inclined rotating rectangular target.
The invention provides a point target detection scheme based on the offset of the upper left corner of a prediction frame, which can simultaneously output a target frame and feature points, solve the detection of key points, such as facial feature points, human body limb joint points and the like, and simultaneously solve the accurate positioning of rotating inclined rectangular targets, such as a traffic signboard, a billboard, high-altitude remote sensing and the like under an inclined visual angle.

Claims (4)

1. The key point target detection method based on the YOLO is characterized by comprising the following steps of:
step one, manufacturing and processing a data set:
adding offset distances (delta x, delta y) from each key point to the top left corner vertex of the labeling frame on the labeling data set of which the original labeling frame is a horizontal rectangular frame, wherein the vertex position coordinate of the top left corner of the labeling frame is (LUx, LUy), LUx is smaller than the value of all key points in the x direction, LUy is smaller than the value of all key points in the y direction, and at the moment, the position of each key point is as follows: a fourth quadrant of the coordinate axis when the upper left vertex of the marking frame is taken as the origin of the coordinate axis;
step two, point target detection based on the top left corner vertex offset of the prediction frame:
firstly, obtaining a prediction frame through YOLO, simultaneously obtaining the offset of each key point and the top left corner vertex of the prediction frame, and adding the offset (delta x, delta y) output by a network and corresponding to each key point with the coordinates (LUx, LUy) of the top left corner vertex of the prediction frame to obtain the coordinate position of the key point.
2. The method of claim 1 for key point target detection based on YOLO, wherein:
in the first step, the first step is carried out,
the number of the key points is 4, the formula of the distance of each key point relative to the offset of the upper left corner of the prediction frame is shown as (1.8) to (2.1), and the model outputs t to each prediction framex、ty、tw、thAnd 4 sets of offsets, tx、ty、tw、thSince the original target frame is predicted as the blue bounding box bbox, the width and height (b) of the predicted frame are first determined by equations (1.6) to (1.7)w,bh) Then through the prediction box (b)w,bh) Obtaining 4 groups of offsets, as shown in formulas (1.8) - (2.1);
Figure FDA0002529657210000011
Figure FDA0002529657210000012
D1X=(tx1)·bwD1y=(ty1)·bh(1.8)
D2X=(tx2)·bwD2y=(ty2)·bh(1.9)
D3X=(tx3)·bwD3y=(ty3)·bh(2.0)
D4X=(tx4)·bwD4y=(ty4)·bh(2.1)
wherein: d1x,D1yThe offset distance in the x-axis and y-axis directions of point D1 with respect to the upper left corner of the prediction box; similarly, D2x,D2y、D3x,D3y、D4x,D4yRespectively representing the offset distances of the target key points D2, D3 and D4 to the top left corner vertex of the prediction box.
3. The method of claim 2 for key point target detection based on YOLO, wherein: the loss function of the key points in the Yolo point target detection is shown in a formula (2.2), wherein the number of the key points in the formula is 4:
Figure FDA0002529657210000021
if the number of the key points is increased, the key point loss function is shown as (2.3), and m in the formula is the number of the key points:
Figure FDA0002529657210000022
the YOLO point target detection is the detection of original YOLO v3 with the addition of the computational loss of key points, so the final loss function is:
LossKeyPoint_offset=Lossyolov3+LossKeyPoint(2.4)。
4. the method of claim 2 for key point target detection based on YOLO, wherein:
in the second step, the picture is input into the YOLO network, the offset from the key point to the upper left corner of the prediction frame is obtained while the prediction frame is obtained, the position of the key point is calculated according to the top of the upper left corner of the prediction frame and the offsets of 4 key points, and then the key points are connected to obtain the accurate positioning frame.
CN202010514432.XA 2020-06-08 2020-06-08 YOLO-based key point target detection method Active CN111814827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010514432.XA CN111814827B (en) 2020-06-08 2020-06-08 YOLO-based key point target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010514432.XA CN111814827B (en) 2020-06-08 2020-06-08 YOLO-based key point target detection method

Publications (2)

Publication Number Publication Date
CN111814827A true CN111814827A (en) 2020-10-23
CN111814827B CN111814827B (en) 2024-06-11

Family

ID=72844777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010514432.XA Active CN111814827B (en) 2020-06-08 2020-06-08 YOLO-based key point target detection method

Country Status (1)

Country Link
CN (1) CN111814827B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256724A (en) * 2021-07-07 2021-08-13 上海影创信息科技有限公司 Handle inside-out vision 6-degree-of-freedom positioning method and system
CN113255671A (en) * 2021-07-05 2021-08-13 浙江智慧视频安防创新中心有限公司 Target detection method, system, device and medium for object with large length-width ratio
CN113420774A (en) * 2021-03-24 2021-09-21 成都理工大学 Target detection technology for irregular graph
CN113537342A (en) * 2021-07-14 2021-10-22 浙江智慧视频安防创新中心有限公司 Method and device for detecting object in image, storage medium and terminal
CN113537158A (en) * 2021-09-09 2021-10-22 科大讯飞(苏州)科技有限公司 Image target detection method, device, equipment and storage medium
CN113888741A (en) * 2021-12-06 2022-01-04 智洋创新科技股份有限公司 Method for correcting rotating image of instrument in power distribution room
CN114219991A (en) * 2021-12-06 2022-03-22 安徽省配天机器人集团有限公司 Target detection method, device and computer readable storage medium
WO2023184123A1 (en) * 2022-03-28 2023-10-05 京东方科技集团股份有限公司 Detection method and device for violation of rules and regulations

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229442A (en) * 2018-02-07 2018-06-29 西南科技大学 Face fast and stable detection method in image sequence based on MS-KCF
CN108960340A (en) * 2018-07-23 2018-12-07 电子科技大学 Convolutional neural networks compression method and method for detecting human face
CN109117831A (en) * 2018-09-30 2019-01-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
CN110490256A (en) * 2019-08-20 2019-11-22 中国计量大学 A kind of vehicle checking method based on key point thermal map
CN110580445A (en) * 2019-07-12 2019-12-17 西北工业大学 Face key point detection method based on GIoU and weighted NMS improvement
CN110930454A (en) * 2019-11-01 2020-03-27 北京航空航天大学 Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229442A (en) * 2018-02-07 2018-06-29 西南科技大学 Face fast and stable detection method in image sequence based on MS-KCF
CN108960340A (en) * 2018-07-23 2018-12-07 电子科技大学 Convolutional neural networks compression method and method for detecting human face
CN109117831A (en) * 2018-09-30 2019-01-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
CN110580445A (en) * 2019-07-12 2019-12-17 西北工业大学 Face key point detection method based on GIoU and weighted NMS improvement
CN110490256A (en) * 2019-08-20 2019-11-22 中国计量大学 A kind of vehicle checking method based on key point thermal map
CN110930454A (en) * 2019-11-01 2020-03-27 北京航空航天大学 Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420774A (en) * 2021-03-24 2021-09-21 成都理工大学 Target detection technology for irregular graph
CN113255671A (en) * 2021-07-05 2021-08-13 浙江智慧视频安防创新中心有限公司 Target detection method, system, device and medium for object with large length-width ratio
CN113256724A (en) * 2021-07-07 2021-08-13 上海影创信息科技有限公司 Handle inside-out vision 6-degree-of-freedom positioning method and system
CN113537342A (en) * 2021-07-14 2021-10-22 浙江智慧视频安防创新中心有限公司 Method and device for detecting object in image, storage medium and terminal
CN113537158A (en) * 2021-09-09 2021-10-22 科大讯飞(苏州)科技有限公司 Image target detection method, device, equipment and storage medium
CN113888741A (en) * 2021-12-06 2022-01-04 智洋创新科技股份有限公司 Method for correcting rotating image of instrument in power distribution room
CN114219991A (en) * 2021-12-06 2022-03-22 安徽省配天机器人集团有限公司 Target detection method, device and computer readable storage medium
WO2023184123A1 (en) * 2022-03-28 2023-10-05 京东方科技集团股份有限公司 Detection method and device for violation of rules and regulations

Also Published As

Publication number Publication date
CN111814827B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
CN111814827B (en) YOLO-based key point target detection method
Zakharov et al. Dpod: 6d pose object detector and refiner
CN108564616B (en) Fast robust RGB-D indoor three-dimensional scene reconstruction method
CN110288657B (en) Augmented reality three-dimensional registration method based on Kinect
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
CN112633277A (en) Channel ship board detection, positioning and identification method based on deep learning
CN108427924A (en) A kind of text recurrence detection method based on rotational sensitive feature
CN109871823B (en) Satellite image ship detection method combining rotating frame and context information
Lu et al. A cnn-transformer hybrid model based on cswin transformer for uav image object detection
CN111968177A (en) Mobile robot positioning method based on fixed camera vision
CN108122256A (en) It is a kind of to approach under state the method for rotating object pose measurement
CN112001926A (en) RGBD multi-camera calibration method and system based on multi-dimensional semantic mapping and application
CN104794737A (en) Depth-information-aided particle filter tracking method
CN112560852A (en) Single-stage target detection method with rotation adaptive capacity based on YOLOv3 network
CN111767854B (en) SLAM loop detection method combined with scene text semantic information
CN114972423A (en) Aerial video moving target detection method and system
CN114387346A (en) Image recognition and prediction model processing method, three-dimensional modeling method and device
CN113284185B (en) Rotating target detection method for remote sensing target detection
Zhang et al. An improved YOLO algorithm for rotated object detection in remote sensing images
Cheng et al. An augmented reality image registration method based on improved ORB
Sun et al. A fast multi-target detection method based on improved YOLO
Lee et al. Camera pose estimation using voxel-based features for autonomous vehicle localization tracking
Li et al. Efficient and accurate object detection for 3D point clouds in intelligent visual internet of things
CN113139965A (en) Indoor real-time three-dimensional semantic segmentation method based on depth map
Wei et al. An efficient point cloud-based 3d single stage object detector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240131

Address after: 1003, Building A, Zhiyun Industrial Park, No. 13 Huaxing Road, Tongsheng Community, Dalang Street, Longhua District, Shenzhen City, Guangdong Province, 518000

Applicant after: Shenzhen Wanzhida Enterprise Management Co.,Ltd.

Country or region after: China

Address before: 443002 No. 8, University Road, Xiling District, Yichang, Hubei

Applicant before: CHINA THREE GORGES University

Country or region before: China

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240501

Address after: No. 102, Commercial Building, Yuehu Park, No. 140 Hongshan Road, Hongshan Street, Kaifu District, Changsha City, Hunan Province, 410000

Applicant after: Hunan Feifei Animation Co.,Ltd.

Country or region after: China

Address before: 1003, Building A, Zhiyun Industrial Park, No. 13 Huaxing Road, Tongsheng Community, Dalang Street, Longhua District, Shenzhen City, Guangdong Province, 518000

Applicant before: Shenzhen Wanzhida Enterprise Management Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant