CN115909405A - Character interaction detection method based on YOLOv5 - Google Patents

Character interaction detection method based on YOLOv5 Download PDF

Info

Publication number
CN115909405A
CN115909405A CN202211512924.0A CN202211512924A CN115909405A CN 115909405 A CN115909405 A CN 115909405A CN 202211512924 A CN202211512924 A CN 202211512924A CN 115909405 A CN115909405 A CN 115909405A
Authority
CN
China
Prior art keywords
interaction
interactive
detection
frame
people
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211512924.0A
Other languages
Chinese (zh)
Inventor
叶海波
张诗凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202211512924.0A priority Critical patent/CN115909405A/en
Publication of CN115909405A publication Critical patent/CN115909405A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a character interaction detection (HOI) method based on YOLOv 5. The human interaction detection task aims to detect human and object with interaction relation in the picture and interaction actions of the human and the object. Previous researches do not pay much attention to detection speed, and people hope to detect HOI relationship more quickly to adapt to scenes with high real-time requirements, so that the researches are inspired by YOLO (object detection algorithm), a quick target detection algorithm is used for HOI detection, and YOLOHOI is provided. The method for generating the interactive frame is designed, a double-target detection structure is provided, and an interactive area is regarded as a special object, so that the model has the capability of detecting the special object in the interactive area, the interactive relation is detected while target detection is executed, and the purpose of rapid detection is achieved.

Description

Character interaction detection method based on YOLOv5
Technical Field
The invention relates to the technical field of computer vision and human interaction detection, in particular to a human interaction detection method based on YOLOv 5.
Background
The human interaction detection task aims to detect people and objects with interaction relation in the picture and corresponding interaction actions of the people and the objects, and finally detects and outputs a triple of < human, object, action >. There are many scenarios that require real-time HOI detection, such as the field of automated driving, where identifying HOI relationships helps the model to identify and analyze relationships between people and objects in the scene, and where real-time is very important, so we are studying how to perform fast HOI detection.
In most of the past HOI detection researches, two-stage algorithms are adopted, which serially execute target detection and interactive classification, and people and objects obtained after target detection are combined one by one, and the people are input into another interactive classification network for action type detection. These methods are very time consuming due to this two-stage structural limitation. To overcome the structural deficiencies of these methods, some work has investigated parallel HOI detection methods to increase detection speed, called one-stage algorithms.
This is more efficient because the one-phase method performs object detection and interaction detection in parallel. They require matching rules that define interaction detection, e.g., PPDM and IPNet introduce the concept of interaction points to match interactions, and uniodets use a union of people and objects for detection. Our aim is to study the detection of HOI in scenes where the real-time requirements are high. Although many one-stage methods have achieved significant improvements in detection accuracy, they do not pay much attention to detection speed. Therefore, we have designed a so-called yooloi to perform the HOI detection as soon as possible.
Disclosure of Invention
The red boxes in fig. 3 are the interaction areas between the person and the object. With human judgment, the interaction behavior can be easily recognized only by a part of the red box. For example, with the red box in the right figure, you can easily determine the interaction of a person drinking water as you see the cup, face and hands. If the red frames can be obtained before the target detection result is obtained, then the dual target detection model simultaneously carries out target detection and interactive classification, and the purpose of real-time detection can be achieved.
The invention discloses a character interaction detection method based on YOLOv5, which comprises the following steps:
step 1: inputting an original picture and extracting picture characteristics;
step 2: the double target detection branches respectively detect an object example and an interaction frame;
and step 3: the target detection branch is responsible for detecting people and objects;
and 4, step 4: the interaction detection branch is responsible for detecting the interaction frame;
and 5: carrying out HOI relation pairing;
in the step 2, the dual target detection branch respectively detects the object instance and the interaction frame, and the method comprises the following steps:
the method is different from a general solution idea of the character interaction relation detection problem, and treats the core problem as a double target detection task, wherein the double detection comprises basic target detection of people and objects and special target detection of an interaction area. The interactive region between each person-object pair is concerned, the complex many-to-many relation detection is solved, the interactive region is regarded as a special target, the target detection result is obtained, meanwhile, the interactive detection result is also obtained, and the effect of rapidly detecting the person interactive relation is achieved. On the basis of the original YOLOv5 model structure, an interactive frame detection head is added for detecting an interactive frame. The basic target detection and the interactive area detection share a main feature extraction network. To summarize, YOLOv5 has two detection branches, one branch being responsible for detecting the base target and the other branch being responsible for detecting the interaction region.
In step 3, the target detection branch is responsible for detecting people and objects, and the method comprises the following steps:
the detection outputs a confidence level of whether an object exists, four parameters required for determining the frame of the object and a score of each target category.
During training, according to the positions of the human and the object, an interactive region, namely an interactive frame, between the human and the object is calculated and fitted through a formula, the interactive frame is regarded as a special target and is input into a target detection network for training, and the model has the capability of predicting the interactive region.
Through a series of tests, an interactive frame generation formula is obtained. The formula can enable the center point of the generated interaction frame to be located on a connecting line of the center points of the human body and the object, and the area which is useful for classifying the interaction is covered as much as possible.
The main considerations when designing this formula are: 1) And the center point of the interactive frame is on the connecting line of the center points of the person and the object. 2) When the object is far smaller than the human body, the interactive frame covers the object completely and covers the human body partially. 3) When the object is much larger than the person, the opposite is true.
In the step 4, the interaction detection branch is responsible for detecting the interaction frame, and the method comprises the following steps:
on the basis of the known borders of people and objects, a series of interactive frames are obtained according to a formula designed in the step three, and the interactive frames and the corresponding interactive categories are used as special target input YOLOv5 model structures to detect the interactive frames. And detecting and outputting a confidence coefficient of whether the interaction relationship exists, four parameters required for determining the interaction frame and a score of each interaction action.
In step 5, the pair of HOI relationships is performed, which comprises the following steps:
and pairwise people and objects in the basic target detection result are subjected to interaction frame generation formula to obtain interaction frames to be judged, intersection ratio between all the interaction frames to be judged and the predicted interaction frame is calculated IoU, the interaction frame to be judged with the largest IoU is selected and reserved for each predicted interaction frame, the people and the objects generating the interaction frames to be judged are considered to have an interaction relationship, and the motion types of the predicted interaction frames are endowed to the people and the objects to obtain an interaction relationship predicted value.
Drawings
Fig. 1 is a schematic diagram.
Fig. 2 is a diagram illustrating a conventional two-stage human interaction detection method.
FIG. 3 is a character frame and interaction frame presentation.
FIG. 4 is a diagram of a model framework of the present invention.
Detailed Description
The character interaction detection method based on YOLOv5 provided by the invention integrally comprises five steps:
step 1: inputting an original picture and extracting characteristics;
step 2: the double target detection branches respectively detect an object example and an interaction frame;
and step 3: the target detection branch is responsible for detecting people and objects;
and 4, step 4: the interaction detection branch is responsible for detecting the interaction frame;
and 5: carrying out HOI relation pairing;
in step 3, the target detection branch is responsible for detecting people and objects, and the method comprises the following steps:
setting the channel dimension of the extracted feature vector to be (nc + 5) × 3, wherein nc represents the number of the types of the objects, namely 80;5=4+1:4 represents four parameters required for determining the frame of the object, namely a center point coordinate parameter and a width and height parameter, and 1 represents a confidence level for detecting whether the object exists or not. The detection process follows the method of YOLOv 5.
During training, according to the positions of the human and the object, an interactive region, namely an interactive frame, between the human and the object is calculated and fitted through a formula, the interactive frame is regarded as a special target and is input into a target detection network for training, and the model has the capability of predicting the interactive region.
Through a series of tests, an interactive frame generation formula is obtained. The formula can enable the center point of the generated interaction frame to be located on a connecting line of the center points of the human body and the object, and the area which is useful for classifying the interaction is covered as much as possible.
The main considerations when designing this formula are: 1) And the center point of the interactive frame is on the connecting line of the center points of the person and the object. 2) When the object is far smaller than the human body, the interactive frame covers the object completely and covers the human body partially. 3) When the object is much larger than the person, the opposite is true. In light of the above considerations, the formula for these factors is designed as follows:
Figure BSA0000290086450000021
x a =Ratio L *x o +(1-Ratio L )*x h
y a =Ratio L *y o +(1-Ratio L )*y h
Figure BSA0000290086450000031
w a =Ratio s *min(w h ,w o )+(1-Ratio s )*max(w h ,w o )
h a =Ratio s *min(h h ,h o )+(1-Ratio s )*max(h h ,h o )
wherein (w) h ,h h )、(w o ,h o ) The width and height of the human frame and the object frame respectively (x) h ,y h )、(x o ,y o ) Represents the center points of the human and the object, respectively, (w) a ,h a )、(x a ,y a ) Information representing the interactive box. Ratio (R) L Representing a scale factor, ratio, of the coordinates of the center point S Representing the area scale factor.
In the step 4, the interaction detection branch is responsible for detecting the interaction frame, and the method comprises the following steps:
and (3) calculating the position of the interactive frame through the formula in the step (3) according to the positions of the people and the objects, giving the corresponding interactive category to the interactive frame, and inputting the interactive frame into an interactive detection branch as a special target.
Adding an interactive frame detection head structure in the yaml configuration file, setting the channel dimension of the extracted feature vector to be (nh + 5) × 3, and setting nh to represent the number of types of interactive actions, namely 117;5=4+1: and 4 represents four parameters required for determining the interactive frame, namely a central point coordinate parameter and a width and height parameter, and 1 represents a confidence level for detecting whether an interactive relationship exists or not. The detection procedure followed the method of YOLOv 5.
In the step 4, the HOI relationship pairing is performed, which comprises the following steps:
pairwise people and objects in the basic target detection result are processed according to the interaction frame generation formula to obtain the interaction frame judge _ bbox to be judged ai (x ai ,y ai ,w ai ,h ai ) Wherein i is 1-M, and M is the number of the pairwise combination and pairing of the human and the object. Predicting the interaction Box as Presct _ bbox aj (x aj ,y aj ,w aj ,h aj ) Where j belongs to 1-N, N being the number of predicted interaction boxes. Calculating all interaction frames to be judged and predicting interactionIntersection ratio between frames IoU, i.e.:
Figure BSA0000290086450000032
where area (predict _ bbox) represents a region of the prediction box, and area (judge _ bbox) represents a region of the prediction box. Interactive frame prediction _ bbox for each prediction aj Selecting the largest interactive frame to be judged, namely judge _ bbox, which is reserved for IoU ai And considering that the interaction relationship exists between the people and the object of the interaction frame to be judged generated by the formula in the step 2, and endowing the predicted interaction frame with the action type of the interaction frame to obtain an interaction relationship predicted value.
In summary, we want to detect the HOI relationships faster to accommodate some time-critical scenarios. In consideration of the rapid detection advantage of the target detection algorithm YOLO, we apply the method to HOI detection and propose our YOLOHOI model. This solution differs from the general HOI detection approach in that it treats the core problem of HOI detection as a dual target detection task and focuses on detecting the interaction area of each person pair. We treat the interaction region as a special object and therefore the model has the capability to detect this special object of the interaction region. In addition to proposing a model, we also devise a formula for generating an interaction box. To demonstrate that yollosoi is effective, we performed experiments on three scale sizes. Training in the HOI-A dataset uses a cosine annealing learning rate. The detection accuracy is improved with the increase of the convolution layer number, but when the network scale is larger, the improvement effect of the increase of the convolution layer number on the accuracy is less obvious. The highest value of mAP is 57.58%, the corresponding average reasoning time is only 21.67ms, the speed exceeds the existing model, the accuracy exceeds part of the traditional two-stage model, but a certain gap exists between the traditional two-stage model and the optimal model of the accuracy, and the YOLO algorithm is not very serious in the aspect of detection precision.
TABLE 1 training results for three different size networks
Figure BSA0000290086450000033
TABLE 2 comparison of Performance with other models
Figure BSA0000290086450000041
/>

Claims (5)

1. A character interaction detection method based on YOLOv5 is characterized by comprising the following operation steps:
step 1: inputting an original picture and extracting picture characteristics;
step 2: the double target detection branches respectively detect an object example and an interaction frame;
and step 3: the target detection branch outputs object related information;
and 4, step 4: the interaction detection branch outputs the relevant information of the interaction frame;
and 5: HOI relationship pairing is carried out through two branch results;
in the step 1, the input picture has three channels, the picture input size is 512, and the dual-branch target detection shares features extracted by the backbone network.
2. The method as claimed in claim 1, wherein the dual target detection branch detects the object instance and the interaction frame in step 2, specifically as follows:
the method is different from the general solution idea of the character interaction relation detection problem, the core problem is regarded as a double target detection task, the double detection comprises basic target detection including human and object and special target detection in an interaction area, the interaction area between each human and object pair is concerned, the complex many-to-many relation detection is solved, the interaction area is regarded as a special target, the target detection result is obtained, meanwhile, the interaction detection result is also obtained, and the effect of rapidly detecting the character interaction relation is achieved.
3. The method for detecting interaction of people based on YOLOv5 as claimed in claim 1, wherein the target detection branch is responsible for detecting people and objects in step 3, specifically as follows:
using the Yolov5 model structure to detect and output a confidence level of whether an object exists, four parameters required for determining the frame of the object and a score of each target category,
during training, according to the positions of people and objects, the interactive region between the people and the objects, namely the interactive frame, is calculated and fitted through the formula designed by the invention, the interactive frame is taken as a special target and is input into the target detection network for training, so that the model has the capability of predicting the interactive region,
through a series of tests, an interactive frame generation formula is obtained, the formula enables the generated interactive frame center point to be located on a connecting line of a person and an object center point, an area which is useful for interactive action classification is covered as much as possible, and main consideration factors in designing the formula are as follows: 1) make mutual frame central point on the line of people and object central point, 2) when the object is far than the people little, mutual frame covers the object entirely, covers the part human body, 3) when the object is far than the people big, contrary to last, the formula design is as follows:
Figure FSA0000290086440000011
x a =Ratio L *x o +(1-Ratio L )*x h
y a =Ratio L *y o +(1-Ratio L )*y h
Figure FSA0000290086440000012
w a =Ratio s *min(w h ,w o )+(1-Ratio s )*max(w h ,w o )
h a =Ratio s *min(h h ,h o )+(1-Ratio s )*max(h h ,h o )
wherein the content of the first and second substances,(w h ,h h )、(w o ,h o ) The width and height of the human frame and the object frame respectively (x) h ,y h )、(x o ,y o ) Represents the center points of the human and the object, respectively, (w) a ,h a )、(x a ,y a ) Information representing the interactive box; ratio L Representing a scale factor, ratio, of the coordinates of the center point s Representing the area scale factor.
4. The person interaction detection method based on YOLOv5 as claimed in claim 1, wherein the interaction detection branch detection interaction frame in step 4 is specifically as follows:
on the basis of the known frames of people and objects, a series of interactive frames are obtained according to a formula designed in the step three, the interactive frames and the corresponding interactive categories are used as special targets to be input into a model structure of YOLOv5, the interactive frames are detected, the confidence degree of whether the interactive relationship exists or not is detected and output, four parameters required by the interactive frames are determined, and the score of each interactive action is determined.
5. The method for detecting human interaction based on YOLOv5 as claimed in claim 1, wherein in step 5, the pair of HOI relationships is performed as follows:
and pairwise people and objects in the basic target detection result are subjected to interaction frame generation formula to obtain interaction frames to be judged, intersection ratio between all the interaction frames to be judged and the predicted interaction frame is calculated IoU, the interaction frame to be judged with the largest IoU is selected and reserved for each predicted interaction frame, the people and the objects generating the interaction frames to be judged are considered to have an interaction relationship, and the motion types of the predicted interaction frames are endowed to the people and the objects to obtain an interaction relationship predicted value.
CN202211512924.0A 2022-11-29 2022-11-29 Character interaction detection method based on YOLOv5 Pending CN115909405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211512924.0A CN115909405A (en) 2022-11-29 2022-11-29 Character interaction detection method based on YOLOv5

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211512924.0A CN115909405A (en) 2022-11-29 2022-11-29 Character interaction detection method based on YOLOv5

Publications (1)

Publication Number Publication Date
CN115909405A true CN115909405A (en) 2023-04-04

Family

ID=86474308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211512924.0A Pending CN115909405A (en) 2022-11-29 2022-11-29 Character interaction detection method based on YOLOv5

Country Status (1)

Country Link
CN (1) CN115909405A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883916A (en) * 2023-09-08 2023-10-13 深圳市国硕宏电子有限公司 Conference abnormal behavior detection method and system based on deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883916A (en) * 2023-09-08 2023-10-13 深圳市国硕宏电子有限公司 Conference abnormal behavior detection method and system based on deep learning

Similar Documents

Publication Publication Date Title
CN109543606B (en) Human face recognition method with attention mechanism
CN112818862B (en) Face tampering detection method and system based on multi-source clues and mixed attention
CN112200045B (en) Remote sensing image target detection model establishment method based on context enhancement and application
CN108197618B (en) Method and device for generating human face detection model
CN111126202A (en) Optical remote sensing image target detection method based on void feature pyramid network
CN110619319A (en) Improved MTCNN model-based face detection method and system
CN112085072B (en) Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information
CN110598698B (en) Natural scene text detection method and system based on adaptive regional suggestion network
CN109829924A (en) A kind of image quality evaluating method based on body feature analysis
US20220301219A1 (en) Methods, apparatuses, devices and storage medium for predicting correlation between objects
CN115909405A (en) Character interaction detection method based on YOLOv5
CN114565891A (en) Smoke and fire monitoring method and system based on graph generation technology
CN112668638A (en) Image aesthetic quality evaluation and semantic recognition combined classification method and system
CN114662605A (en) Flame detection method based on improved YOLOv5 model
WO2022222036A1 (en) Method and apparatus for determining parking space
CN113887649A (en) Target detection method based on fusion of deep-layer features and shallow-layer features
CN117197146A (en) Automatic identification method for internal defects of castings
CN110136098B (en) Cable sequence detection method based on deep learning
CN115661904A (en) Data labeling and domain adaptation model training method, device, equipment and medium
CN111950586B (en) Target detection method for introducing bidirectional attention
CN111274894A (en) Improved YOLOv 3-based method for detecting on-duty state of personnel
Sun et al. Intelligent Site Detection Based on Improved YOLO Algorithm
CN117152746B (en) Method for acquiring cervical cell classification parameters based on YOLOV5 network
CN112115976B (en) Model training method, model training device, storage medium and electronic equipment
Zhaona et al. Electric Equipment Panel Detection and Segmentation based on Mask R-CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination