CN114170686A

CN114170686A - Elbow bending behavior detection method based on human body key points

Info

Publication number: CN114170686A
Application number: CN202111482813.5A
Authority: CN
Inventors: 宫法明; 李云静; 高亚婷
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-03-11

Abstract

The invention discloses elbow bending behavior detection based on human body key points, which comprises the following steps: inputting a video or picture to be detected, and positioning a human target; acquiring a high-quality personnel target candidate frame by using a regional multi-person posture estimation frame (RMPE) to obtain 17-person key point coordinate information and confidence score; carrying out thresholding on the confidence score to remove the key point coordinates with low detection precision; combining the coordinate information of key points of the human body, providing a three-point positioning method, obtaining a probability value of elbow bending behavior according to the angle range and distance ratio between the key points, and positioning a small target detection area; detecting small target objects by using a transfer learning and target detection algorithm to obtain confidence scores of target categories; and combining the elbow bending behavior probability value and the target category confidence score to realize the detection of the elbow bending behavior of the human body. The method can realize the detection of the elbow bending behavior of the human body under the complex scene, detect the behavior of the potential safety hazard in time and perform early warning.

Description

Elbow bending behavior detection method based on human body key points

Technical Field

The invention belongs to the field of computer graphics and image processing, and relates to an elbow bending behavior detection method based on human body key points.

Background

With the continuous progress of artificial intelligence technology, the application of computer vision in various aspects is more and more extensive, and more remarkably, the application is in the aspect of video monitoring intelligent recognition analysis. The traditional manual monitoring camera frame consumes a large amount of manpower and material resources costs, is low in efficiency, is easy to misjudge and omit, can automatically monitor violation behaviors and give an alarm by a computer due to the intelligent video monitoring, and has very important significance. At present, human body key point detection and behavior analysis make certain progress, and the methods have good identification effect aiming at single scene or specific behavior and the like. Two-dimensional human body key point detection algorithms mainly comprise two methods, one is a top-down detection method, a person target is detected firstly, and then key point detection is carried out on the detected person; the other method is a bottom-up detection method, which detects all key points of the human body first and then performs grouping association on the key points. The human body behavior analysis is based on the human body key point detection, and based on the key point information, the human body behavior is further deduced and classified.

Because a plurality of connections and common points exist among key points, behavior analysis and target detection of a human body, and three points are mutually associated, a plurality of methods combine two points or three points, and research is developed based on a basic framework of target detection. The framework commonly used for target detection is mainly divided into two categories: one is the one-stage target detection of SSD, YOLO and the like series, and the other is the two-stage target detection of R-CNN, Faster R-CNN and the like series. The first-stage detection speed is high but the precision is slightly low, the second-stage precision is high but the detection speed is low, and a first-stage target detection framework with a high speed is generally used in the aspect of industrial operation field detection, so that the detection efficiency and the real-time performance are ensured. The existing target detection algorithm and behavior analysis are mainly developed and researched aiming at simple scenes, most training data sets are obtained from good environments, the detection effect is good in a single scene, and the application requirements in real and complex scenes cannot be met. Aiming at other complex places such as an offshore oil operation platform, the detection background is complex, the number of interferents is large, the video picture is fuzzy, and the like, so that higher identification accuracy is difficult to obtain, the difficulty of target detection and video analysis is increased, and the detection of small target objects in the picture becomes more difficult. Therefore, in actual operation sites with complex backgrounds, more interference, blurred pictures and the like, accurate and efficient detection is realized to analyze elbow bending behaviors of human bodies, and potential safety hazard behaviors existing in the operation sites are found and stopped in time, which is a difficult problem to be solved urgently.

Disclosure of Invention

In order to overcome the defects, the invention provides an elbow bending behavior detection method based on human body key points, which comprises the following specific steps:

s1, inputting a video or picture to be detected, and positioning a person target;

s2, extracting key points of human bones through an RMPE frame to obtain 2D key point coordinates of the human bones and confidence scores;

s3, carrying out thresholding on the confidence score and removing the key point coordinates with low detection precision;

s4, based on the key points of the human body, obtaining the probability value of elbow bending behavior by using a three-point positioning method (left hand/right hand, left elbow/right elbow, left shoulder/right shoulder) and combining the angle range and the distance ratio;

s5, positioning a small target detection area through key points of the wrist and the nose and the wrist and the ear;

s6, detecting the small target detection area obtained in the S5 based on the transfer learning and target detection algorithm to obtain the confidence score of the target category;

s7, combining the elbow bending behavior probability value with the target category confidence score to realize the identification of the elbow bending behavior of the human body;

and S8, outputting the type result of the elbow bending behavior detection of the human body.

The technical scheme of the invention is characterized by comprising the following steps:

for step S2, the present invention employs a top-down detection method, which first detects a person and then obtains key points of the person based on the region pose estimation framework (RMPE). RMPE proposes a Symmetric Space Transformer Network (SSTN), parametric pose non-maximum suppression (NMS), pose guided region generator (PGPG) for positioning errors and pose redundancy in single-person pose estimation. The SSTN is composed of three parts of a Space Transformation Network (STN), a posture estimation network (SPPE) and a space inverse transformation network (SDTN), inaccurate pictures are input to the three parts of network structures passing through the SSTN for posture estimation, an estimation result is mapped to an original input image to adjust an original personnel target detection frame, and the posture is aligned to the center of the image. STN is an affine transformation in 2D, defined as follows:

in the formula (1), the reaction mixture is,

in order to transform the coordinates of the keypoints,

to coordinates before transformation, θ₁，θ₂，θ₃Is a transformation parameter; SDTN is defined as follows:

in the formula (2), gamma₁，γ₂，γ₃For transforming the parameters, the relationship is as follows:

[γ₁ γ₂]＝[θ₁ θ₂]^-1 (3)

γ₃＝-1×[γ₁ γ₂]θ₃ (4)

a parametric pose non-maximum suppression (NMS) is used to eliminate poses that are too close or similar to the target pose, taking the pose with the highest score as the reference, and repeating elimination of other poses that are close to the reference pose until a single pose remains. The elimination criterion formula is as follows:

f(P_i,P_j|Λ,η)＝1[d(P_i,P_j|Λ,λ)≤η] (5)

in the formula (5), P_iIs a redundant gesture, P_jIs a reference pose, the distance function d () includes a pose distance and a spatial distance, η is a threshold to eliminate other poses; attitude guide area generator (PGPG) for increasingThe number of samples is convenient for well training the network, and finally, 17 key point 2D coordinates of the human body part and confidence scores are input.

For step S3, 17 parts of the human body keypoints obtained by the present invention respectively correspond to one 2D human body keypoint coordinate and confidence score. The body parts which cannot be detected or are not accurately detected in the input picture cause errors on the detection of the elbow bending behavior of the human body, and the confidence scores of the parts are usually less than 0.40 through analysis, so 0.40 is used as the threshold value for the detection of the elbow bending behavior of the human body. And comparing the confidence score of each key point with a threshold value, removing the key point coordinates with low detection precision, and analyzing elbow bending behaviors of the key points with the confidence scores larger than the threshold value.

For step S4, the invention obtains the probability value of elbow bending behavior according to the coordinate information of the elbow shoulder key points and the angle and distance ratio. Respectively calculating the numerical values of the included angles of the left elbow and the right elbow according to the coordinates of the three key points of the shoulder, the elbow and the wrist; and calculating the distance ratio from the left wrist to the nose and the left wrist to the left eye (or from the left wrist to the left ear) and the distance ratio from the right wrist to the nose and the distance ratio from the right wrist to the right eye (or from the right wrist to the right ear), and finally obtaining the probability value of elbow bending behavior. Let the coordinates of the left shoulder, the left elbow, the left wrist, the right shoulder, the right elbow and the right wrist be L1(x1, y1), L2(x2, y2), L3(x3, y3), R1(x1, y1), R2(x2, y2), R3(x3, y3), respectively, then the elbow angle can be expressed as:

in the formula (6), θ represents the included angle degree; assuming that the coordinates of the nose, left eye, left ear, right eye and right ear are L0(x0, y0), L4(x4, y4), L5(x5, y5), R4(x4, y4), R5(x5, y5), respectively, the ratio of the distance from the wrist to the nose and the wrist to the eye can be expressed as

The ratio of wrist-to-nose and wrist-to-ear distances may be expressed as

And classifying the elbow bending behaviors of the human body and obtaining a probability value by combining the ratio of the included angle range of the elbow and the distance between the key points.

For step S5, the present invention locates the small target monitoring area by the coordinate information of the key points of the wrist, nose and ear. Connecting key points of the wrist and the nose, drawing a circle C1 by taking the linear distance between the two points as the diameter, and solving the circumscribed square S1 of the circle C1 to be positioned as a small target detection area 1; connecting the key points of the wrist and the ear, drawing a circle C2 by taking the linear distance between the two points as the diameter, and solving the circumscribed square S2 of the circle C2 to be positioned as a small target detection area 2; and finally, detecting the small target in the two parts of areas to be detected.

For step S6, the present invention combines the transfer learning and the target detection algorithm to detect small targets. And based on a target detection algorithm, pre-training is carried out on the COCO data set to obtain a pre-training model, and the data set of a specific scene and a specific human elbow bending behavior is subjected to transfer learning on the pre-training model. A stage target detection algorithm is adopted, and preprocessing is performed on the picture at the input end of model training, wherein the preprocessing comprises methods such as Mosaic data enhancement, self-adaptive anchor frame calculation and the like; adopting DarkNet53 and Focus structure as reference network; and (3) taking the CIOU _ Loss as a Loss function of a Bounding box, wherein the Loss function formula is as follows:

in the formula (7), IoU is the intersection ratio of any two anchor frames, A and B represent any two anchor frames, a smallest anchor frame C is found to contain A and B, and | C | represents the total area of the anchor frames, so that the confidence score of the small target category is finally obtained.

The elbow bending behavior detection method based on the key points of the human body solves the problem that the elbow bending behavior detection accuracy rate of the human body is low in the complex scene of an offshore oil platform by the existing target detection technology, and has the following advantages:

(1) the method effectively solves the problems of personnel target detection and human body key point acquisition under the conditions of complex background and shielding, can better acquire all information of the 2D key point coordinates of the human body and the behavior posture of the human body, and has high detection precision and good real-time property.

(2) The method can solve the problem of human elbow bending behavior detection in a complex scene, and can be applied to various scenes such as oil fields, gas stations and the like by acquiring the information of the key points of the human body, performing topological calculation on the coordinates and confidence scores of the key points, and positioning a small target detection area.

(3) The method can be applied to the complex scene of the offshore oil platform and integrated into the existing system of the offshore monitoring platform, and the plug and play of the model is realized. And sending the detection result to the user in a log mode, and detecting and stopping the potential safety hazard behavior in time.

Drawings

Fig. 1 is a flowchart of an elbow bending behavior detection method based on human body key points in the present invention.

Fig. 2 is a flowchart of acquiring key point information of a human body in the present invention.

Fig. 3 is a flowchart of the small target detection based on the target detection algorithm in the present invention.

Detailed Description

The invention is described in further detail below with reference to the following figures and detailed description:

a method for detecting elbow flexion behavior based on human body key points, as shown in fig. 1, is a flowchart of the method for detecting elbow flexion behavior based on human body key points, and the method includes:

s1, positioning of personnel targets, detecting the original video or the images directly under the ocean oil platform operation scene with fuzzy monitoring video images, more interferents and serious personnel shielding, and low detection accuracy. The invention carries out preprocessing operation on the input video or picture, and improves the quality of input data through data enhancement, image deblurring processing and the like; the fast R-CNN is taken as a network for detecting the personnel target, a human body candidate frame in the picture is extracted, the personnel target is positioned, and the coordinate information and the confidence score of the target bounding box are output.

And S2, acquiring the key points of the human body, and extracting the key point information of the human body through a region posture estimation framework (RMPE) after positioning the human body to the personnel target. Transforming the obtained personnel target candidate frame to the central position of the image by using a Space Transformation Network (STN) for obtaining a high-quality accurate human body candidate region; predicting key points and postures of a human body by using a posture estimation network (SPPE), and outputting key point coordinate information and confidence scores of all key point parts; and mapping the estimated human body posture to an original image coordinate through a space inverse transformation network (SDTN), and finally, using a posture non-maximum suppression algorithm (P-NMS) to reserve the human body posture with the highest confidence score and eliminate other redundant postures. A pose non-maximum suppression algorithm (P-NMS) is used to eliminate poses too close or similar to the target pose, choosing the pose with the highest score as the reference, and repeating elimination of other poses close to the reference pose until a single pose remains. The elimination criterion formula is as follows:

f(P_i,P_j|Λ,η)＝1[d(P_i,P_j|Λ,λ)≤η] (8)

in the formula (8), P_iIs a redundant gesture, P_jIs the reference pose, the distance function d () includes the pose distance and the spatial distance, and η is a threshold to eliminate other poses. In the training process, the weights of all layers of the parallel SPPE are frozen to avoid incorrect transformation of human body posture, and the transformation is regarded as a regularization process in a training stage, so that the condition of local optimization can be effectively avoided.

S3, performing thresholding treatment, wherein the obtained 17 parts of the human body key points respectively correspond to one 2D human body key point coordinate and confidence score. The body parts which cannot be detected or are not accurately detected in the input picture cause errors on the detection of the elbow bending behavior of the human body, and the confidence scores of the parts are usually less than 0.40 through analysis, so 0.40 is used as the threshold value for the detection of the elbow bending behavior of the human body. And comparing the confidence score of each key point with a threshold value, removing the key point coordinates with low detection precision, and analyzing elbow bending behaviors of the key points with the confidence scores larger than the threshold value.

S4, calculating the included angle numerical values of the left elbow and the right elbow respectively according to the three-point key point coordinates of the shoulder, the elbow and the wrist according to the elbow bending behavior probability; and calculating the distance ratio from the left wrist to the nose and the left wrist to the left eye (or from the left wrist to the left ear) and the distance ratio from the right wrist to the nose and the distance ratio from the right wrist to the right eye (or from the right wrist to the right ear), and finally obtaining the probability value of elbow bending behavior. Let the coordinates of the left shoulder, the left elbow, the left wrist, the right shoulder, the right elbow and the right wrist be L1(x1, y1), L2(x2, y2), L3(x3, y3), R1(x1, y1), R2(x2, y2), R3(x3, y3), respectively, then the elbow angle can be expressed as:

in the formula (9), θ represents the angle degree; assuming that the coordinates of the nose, left eye, left ear, right eye and right ear are L0(x0, y0), L4(x4, y4), L5(x5, y5), R4(x4, y4), R5(x5, y5), respectively, the ratio of the distance from the wrist to the nose and the wrist to the eye can be expressed as

The ratio of wrist-to-nose and wrist-to-ear distances may be expressed as

S5, positioning a small target detection area, connecting key points of the wrist and the nose, drawing a circle C1 by taking the linear distance between two points as the diameter, and solving an external square S1 of the circle C1 to position the small target detection area 1; connecting the key points of the wrist and the ear, drawing a circle C2 by taking the linear distance between the two points as the diameter, and solving the circumscribed square S2 of the circle C2 to be positioned as a small target detection area 2; and finally, detecting the small target in the two parts of areas to be detected. Assuming that coordinates of two points are (x1, y1), (x2, y2), respectively, an equation of a circle having coordinates of (a, b) as (x1+ x2/2, y1+ y2/2) and a diameter of two points is as follows:

(x-a)²+(y-b)²＝r² (10)

in the formula (10), r is the radius length of a circle, and the side length of a circumscribed square of the circle is 2 r.

And S6, identifying small targets, pre-training on the COCO data set based on a target detection algorithm to obtain a pre-training model, and performing transfer learning on the data set of a specific scene and a specific elbow bending behavior of a human body on the pre-training model. A one-stage target detection algorithm is adopted, a public data set and a data set under the working scene of the offshore oil platform are used for training in a combined mode, and preprocessing is carried out on pictures at the input end of model training, wherein the preprocessing comprises methods of Mosaic data enhancement, self-adaptive anchor frame calculation and the like; adopting DarkNet53 and Focus structure as reference network; and (3) taking the CIOU _ Loss as a Loss function of a Bounding box, wherein the Loss function formula is as follows:

in the formula (11), IoU is the intersection ratio of any two anchor frames, a and B represent any two anchor frames, a smallest anchor frame C is found to contain a and B, and | C | represents the total area of the anchor frames, so as to finally obtain the confidence score of the small target category.

S7, identifying the elbow bending behavior of the human body, combining the elbow bending behavior probability value with the confidence score of the target category, and according to the ratio of 60%: the 40% probability infers the human elbow bending behavior category. Inputting a picture to be detected, firstly detecting whether elbow bending behaviors exist or not, carrying out target class detection on the basis of detecting the elbow bending behaviors, classifying and identifying the target through the full connection layer of the last layer and softmax classification, and simultaneously outputting confidence score of the target class.

And S8, outputting the result of the elbow bending behavior detection of the human body, including the behavior classification result and the confidence score, and realizing the elbow bending behavior detection based on the key points of the human body.

In conclusion, the elbow bending behavior detection method based on the human key points solves the problem of human elbow bending behavior detection in complex scenes, and is suitable for scenes with complex background, serious shielding, fuzzy target and the like; the provided topological calculation based on the key points is a general model for detecting elbow bending behaviors of a human body, and can be applied to various scenes such as oil fields, gas stations and the like to detect potential safety hazard behaviors in time and perform early warning.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. An elbow bending behavior detection method based on human body key points is characterized by comprising the following specific steps:

2. The method for detecting elbow flexion behavior based on human body key points as claimed in claim 1, wherein for step S2, the invention adopts a top-down detection method, based on the region posture estimation framework (RMPE), to detect human body first and then to obtain human body key points. RMPE proposes a Symmetric Space Transformer Network (SSTN), parametric pose non-maximum suppression (NMS), pose guided region generator (PGPG) for positioning errors and pose redundancy in single-person pose estimation. The SSTN is composed of three parts of a Space Transformation Network (STN), a posture estimation network (SPPE) and a space inverse transformation network (SDTN), inaccurate pictures are input to the three parts of network structures passing through the SSTN for posture estimation, an estimation result is mapped to an original input image to adjust an original personnel target detection frame, and the posture is aligned to the center of the image. STN is an affine transformation in 2D, defined as follows:

in the formula (1), the reaction mixture is,

in order to transform the coordinates of the object,

to transform the front coordinate, theta₁，θ₂，θ₃Is a transformation parameter; SDTN is defined as follows:

in the formula (2), θ₁，θ₂，θ₃，γ₁，γ₂，γ₃For transforming the parameters, the relationship is as follows:

[γ₁ γ₂]＝[θ₁ θ₂]^-1 (3)

γ₃＝-1×[γ₁ γ₂]θ₃ (4)

f(P_i,P_j|Λ,η)＝1[d(P_i,P_j|Λ,λ)≤η] (5)

in the formula (5), P_iIs a redundant gesture, P_jIs a reference posture, the distance function d () includes a posture distance and a space distance, η is a threshold for judging whether to eliminate other postures; a pose guidance area generator (PGPG) is used to increase the number of samples for a good training network, and finally 17 keypoint 2D coordinates and confidence scores of the human body part are input.

3. The method for detecting elbow bending behavior based on human body key points as claimed in claim 1, wherein for step S3, 17 parts of the human body key points obtained by the method respectively correspond to a 2D human body key point coordinate and a confidence score. The body parts which cannot be detected or are not accurately detected in the input picture cause errors on the detection of the elbow bending behavior of the human body, and the confidence scores of the parts are usually less than 0.40 through analysis, so 0.40 is used as the threshold value for the detection of the elbow bending behavior of the human body. And comparing the confidence score of each key point with a threshold value, removing the key point coordinates with low detection precision, and analyzing elbow bending behaviors of the key points with the confidence scores larger than the threshold value.

4. The method for detecting elbow bending behavior based on human body key points as claimed in claim 1, wherein for step S4, the invention obtains the probability value of elbow bending behavior according to the elbow shoulder key point coordinate information and the angle and distance ratio. Respectively calculating the numerical values of the included angles of the left elbow and the right elbow according to the coordinates of the three key points of the shoulder, the elbow and the wrist; and calculating the distance ratio from the left wrist to the nose and the left wrist to the left eye (or from the left wrist to the left ear) and the distance ratio from the right wrist to the nose and the distance ratio from the right wrist to the right eye (or from the right wrist to the right ear), and finally obtaining the probability value of elbow bending behavior. Let the coordinates of the left shoulder, the left elbow, the left wrist, the right shoulder, the right elbow and the right wrist be L1(x1, y1), L2(x2, y2), L3(x3, y3), R1(x1, y1), R2(x2, y2), R3(x3, y3), respectively, then the elbow angle can be expressed as:

assuming that the coordinates of the nose, left eye, left ear, right eye and right ear are L0(x0, y0), L4(x4, y4), L5(x5, y5), R4(x4, y4), R5(x5, y5), respectively, the ratio of the distance from the wrist to the nose and the wrist to the eye can be expressed as

The ratio of wrist-to-nose and wrist-to-ear distances may be expressed as

5. The method for detecting elbow bending behavior based on human body key points as claimed in claim 1, wherein for step S5, the invention locates the small target monitoring area through the coordinate information of the key points of wrist, nose and ear. Connecting key points of the wrist and the nose, drawing a circle C1 by taking the linear distance between the two points as the diameter, and solving the circumscribed square S1 of the circle C1 to be positioned as a small target detection area 1; connecting the key points of the wrist and the ear, drawing a circle C2 by taking the linear distance between the two points as the diameter, and solving the circumscribed square S2 of the circle C2 to be positioned as a small target detection area 2; and finally, detecting the small target in the two parts of areas to be detected.

6. The method for detecting elbow bending behavior based on human body key points as claimed in claim 1, wherein for step S6, the invention combines transfer learning and object detection algorithm to detect small objects. And based on a target detection algorithm, pre-training is carried out on the COCO data set to obtain a pre-training model, and the data set of a specific scene and a specific human elbow bending behavior is subjected to transfer learning on the pre-training model. A stage target detection algorithm is adopted, and preprocessing is performed on the picture at the input end of model training, wherein the preprocessing comprises methods such as Mosaic data enhancement, self-adaptive anchor frame calculation and the like; adopting DarkNet53 and Focus structure as reference network; and (3) taking the CIOU _ Loss as a Loss function of a Bounding box, wherein the Loss function formula is as follows: