CN114202722A - Fall detection method based on convolutional neural network and multi-discriminant features - Google Patents

Fall detection method based on convolutional neural network and multi-discriminant features Download PDF

Info

Publication number
CN114202722A
CN114202722A CN202111398864.XA CN202111398864A CN114202722A CN 114202722 A CN114202722 A CN 114202722A CN 202111398864 A CN202111398864 A CN 202111398864A CN 114202722 A CN114202722 A CN 114202722A
Authority
CN
China
Prior art keywords
human body
joint
theta
point
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111398864.XA
Other languages
Chinese (zh)
Inventor
王鑫
郑晓岩
刘凤宁
张吟龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Jianzhu University
Original Assignee
Shenyang Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Jianzhu University filed Critical Shenyang Jianzhu University
Priority to CN202111398864.XA priority Critical patent/CN114202722A/en
Publication of CN114202722A publication Critical patent/CN114202722A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention provides a fall detection method based on a Convolutional Neural Network (CNN) and multiple discriminant features. Firstly, the algorithm utilizes CNN to extract the coordinates and skeleton information of human body joint points in a video sequence; secondly, performing secondary processing on the coordinates of the joint points by means of the transformation relation between the points and the vectors, and calculating the included angles between the spine, the left lower leg and the right lower leg of the human body and the ground respectively to serve as a multi-feature extraction structure so as to collect rich falling information; and finally, comprehensively analyzing the comparison result of the included angle and the threshold value and the change of the height-width ratio of the human body calibration frame, and realizing the falling detection function. In addition, the invention designs an IoT system framework for processing the video sequence by the cloud, so as to relieve the problem of insufficient computing capability of the user terminal. Compared with the traditional fall detection algorithm, the method has higher accuracy and better universality.

Description

Fall detection method based on convolutional neural network and multi-discriminant features
Technical Field
The invention relates to an image processing technology, in particular to a fall detection method based on a convolutional neural network and multiple discriminant features.
Background
The results of the world health organization's studies show that falls have become a major safety hazard for elderly people over 65 years of age, with 30% of the elderly population falling at least once a year. If the person falls down, the person cannot be rescued in time, secondary injury is easily caused, and even the person is threatened to life. Therefore, fall detection has become a research hotspot for scholars at home and abroad. The existing fall detection algorithms can be divided into non-vision and vision, and a sensor is embedded into wearable equipment or installed in an indoor environment by virtue of the advantages of low price, portability, easy arrangement and the like of sensor equipment based on a non-vision method, such as an accelerometer, a gyroscope, a vibration sensor, a pressure sensor and the like, and is used for acquiring the motion information of the old and monitoring the state of the old, so that certain effect is achieved. However, the detection precision is low due to the fact that the equipment needs to be worn for a long time, the detection range is limited, external vibration interference and the like, and large-scale popularization is difficult.
The vision-based method can collect abundant human body posture information, and can also extract and track the characteristics in the video stream to detect falling, thereby effectively making up the defects of the non-vision method. On the basis of a visual method, the method combines a convolutional neural network, designs a multi-discriminant feature structure to process a video sequence frame by frame, extracts and tracks coordinates and skeleton information of human body joint points, and avoids loss of feature information of a detection target. Different from the traditional method which only extracts single characteristic information as the basis for judging falling, the method carries out secondary processing on the joint point coordinates to generate three vectors representing the positions and the directions of the spine, the left shank and the right shank of the human body. And constructing a multi-discrimination characteristic structure through a comparison result of an included angle between depth fusion vectors and a threshold value and the change of the length-width ratio of a human body calibration frame, thereby realizing an accurate falling detection function. Meanwhile, the cloud server is fully utilized to improve the real-time performance of detection, and finally the defects of insufficient feature extraction, poor generalization of discrimination basis, insufficient terminal computing capacity and the like in the traditional vision method are effectively overcome.
Disclosure of Invention
The invention provides a fall detection method based on a convolutional neural network and multiple discriminant features, and the method can realize an accurate person fall detection function.
The technical scheme adopted by the invention for realizing the purpose is as follows:
a fall detection method based on a convolutional neural network and multi-discriminant features comprises the following steps:
step 1) feature extraction: processing an original video sequence frame by using a VGG-19 network, inputting a feature map obtained after the original video sequence is processed into an OpenPose model, and obtaining coordinates of human body joint points and posture information of a human body skeleton;
step 2), fall discrimination: based on the posture information of the human skeleton, connecting the obtained coordinates of the human joint points to obtain 3 key feature vectors, taking the length-width ratio of the set human body calibration frame and the included angles between the 3 key feature vectors and the horizontal direction as distinguishing features, and comparing the distinguishing features with a threshold value to realize the detection and distinguishing of human body falling.
The feature extraction comprises the following steps:
step 1.1: processing an original video sequence frame by using the first 10 layers of the VGG-19 network to generate a feature map which is used as the input of an OpenPose model;
step 1.2: the OpenPose model performs 2 stages of processing on the feature map, wherein the first stage is used for labeling a confidence map of a joint point: predicting the coordinates of the joint points according to the feature map; the second stage is used to generate a partial affinity domain vector field: connecting the predicted joint point coordinates pairwise to form a limb vector, namely predicting a vector diagram between the joint points as human skeleton posture information;
step 1.3: and (5) iterating the step 1.2 until the set iteration times are reached, and obtaining the coordinates of the human body joint points and the posture information of the human body skeleton.
Adding a loss function after each stage, and solving the problem that the data set cannot be completely labeled by carrying out spatial weighting on the loss functioniPartial affinity field prediction phase of
Figure BDA0003370982530000021
And loss function of joint confidence map stageNumber of
Figure BDA0003370982530000022
Respectively expressed as:
Figure BDA0003370982530000023
Figure BDA0003370982530000024
wherein the content of the first and second substances,
Figure BDA0003370982530000025
is the true partial affinity field of the position of any point p in the image, C is the limb type,
Figure BDA0003370982530000031
for any point p in the image at stage tiThe partial affinity field that is generated is,
Figure BDA0003370982530000032
is a real confidence map of the position of any point p in the image, J is a joint point type,
Figure BDA0003370982530000033
for any point p in the image at stage tiAnd generating a confidence map, wherein w (p) is a binary mask, and when the label at the pixel point p is missing, w (p) is 0, and conversely, w (p) is 1.
The first stage is used for labeling a confidence map of a joint point, and specifically comprises the following steps: if only one person exists in the feature map, generating a part confidence map by each joint point of the person, and if a plurality of persons exist in the feature map, correspondingly forming a peak value on each visible joint point j of the person k to represent the most accurate predicted real joint point position; wherein the confidence map of any point p in the feature map
Figure BDA0003370982530000034
Expressed as:
Figure BDA0003370982530000035
wherein xj,k∈R2The actual position of the joint point j of the person k in the feature map is shown, R is the size of the feature map, sigma is used for controlling the diffusion degree of the peak value, namely the probability distribution situation of the corresponding joint point, further the score of each predicted joint point is obtained, and a group of discrete part candidate positions are obtained by executing non-maximum value suppression, namely:
Figure BDA0003370982530000036
wherein the content of the first and second substances,
Figure BDA0003370982530000037
representing a set of j-type joint points.
The second stage is used for generating a partial affinity domain vector field, specifically: combining every two candidate joint points randomly to form candidate limb vectors, calculating the line integral of partial affinity domain vector fields, and detecting the connection reliability, namely an integral value E:
Figure BDA0003370982530000038
wherein u is at a joint point of the human body
Figure BDA00033709825300000310
P (u) are corresponding sampling values, Lc(p (u)) is the slave joint j1Pointing to a joint point j2A unit vector of (a);
all the candidate related node sets are:
Figure BDA0003370982530000039
wherein the content of the first and second substances,
Figure BDA0003370982530000041
an mth candidate position representing a part j class; determining the possible limb connection between every two parts at the same time, defined as
Figure BDA0003370982530000042
Representing joint points
Figure BDA0003370982530000043
And
Figure BDA0003370982530000044
and (3) whether the limbs are connected or not, if the limbs are connected, the value is 1, otherwise, the value is 0, the candidate limbs are connected according to a bipartite graph matching mode that two edges cannot share one node, and finally, the edge with the maximum integral value is found out to be used as the selected limb for representing the position of the limb:
Figure BDA0003370982530000045
Figure BDA0003370982530000046
Figure BDA0003370982530000047
wherein E ismnIs present at j1The mth node of a class and exists at j2Integral value of partial affinity field of nth node of class, EcIs a collection of class C limbs, WcFor all possible connected class C limbs, Qj1Is j1Set of joint-like points, Qj2Is j2A set of joint-like nodes.
In the second stage, 25 joint points are detected, and the joint points are labeled with numbers 0 to 24, wherein the number 0 represents nose, the number 1 represents neck, the numbers 5 and 2 represent left and right shoulders, respectively, and the numbers 6 and 3 represent left and right shoulders, respectivelyLeft and right elbows, numerals 7 and 4 represent left and right wrists, respectively, numeral 8 represents a center of gravity point, numerals 12 and 9 represent left and right crotches, numerals 13 and 10 represent left and right knees, numerals 14 and 11 represent left and right ankles, numerals 16 and 15 represent left and right eyes, numerals 18 and 17 represent left and right ears, numerals 19 and 21 represent left and right big toes, respectively, numerals 20 and 22 represent left and right small toes, respectively, numerals 24 and 23 represent left and right heels, respectively, and the coordinates of the respective joint points are defined as: nose (x)0,y0) Neck (x)1,y1) …, right heel (x)23,y23) Left heel (x)24,y24)。
The fall discrimination includes the steps of:
step 2.1: generating corresponding characteristic vectors by connecting adjacent joint points, and respectively representing the positions and the directions of the spine, the left lower leg and the right lower leg of the human body;
step 2.2: constructing a human body calibration frame;
step 2.3: and respectively comparing the characteristic vector and the human body calibration frame with a threshold value to judge whether the detected target falls down.
The step 2.1 is specifically as follows: let the feature vectors of the spine, left lower leg and right lower leg of the human body respectively be
Figure BDA0003370982530000051
Figure BDA0003370982530000052
Use the neck (x)1,y1) Center of gravity (x)8,y8) Right knee (x)10,y10) Right ankle (x)11,y11) Left knee (x)13,y13) Left ankle (x)14,y14) Is expressed as:
Figure BDA0003370982530000053
Figure BDA0003370982530000054
Figure BDA0003370982530000055
by using
Figure BDA0003370982530000056
A unit vector representing the x-axis parallel to the ground and having the right direction of the feature map as the positive direction is described as
Figure BDA0003370982530000057
Then
Figure BDA0003370982530000058
And
Figure BDA0003370982530000059
the corresponding angle is formulated as:
Figure BDA00033709825300000510
wherein, theta1Representing the included angle between the spine of the human body and the ground; theta2Representing the included angle between the left shank and the ground; theta3Representing the angle between the right calf and the ground.
The step 2.2 specifically comprises the following steps: using Xmax、Ymax、Xmin、YminConstructing a human calibration frame, wherein Xmax、Ymax、Xmin、YminFor the minimum point (X) of the figure's joint point coordinate value in each frame of imagemin,Ymin) And the point (X) having the largest coordinate valuemax,Ymax),Xmax-XminWidth, Y, of a calibration frame representing a human bodymax-YminRepresenting the length of the human body calibration box, the aspect ratio of the human body calibration box is:
Figure BDA00033709825300000511
the step 2.3 is specifically as follows:
setting the threshold range of the inclination angle to 0 DEG < alpha < thetaiBeta is less than 180 degrees and alpha is less than beta, and the length-width ratio threshold of the calibration frame is set as J;
the discrimination result is divided into a falling state and a normal state, wherein:
(1) falling state
When theta is1Satisfies the conditions of more than 0 DEG and less than alpha, or more than beta and less than 180 DEG and theta2Or theta3When at least one of the conditions is more than 0 degrees and less than alpha degrees or more than beta and less than 180 degrees, and R is less than J, falling occurs;
(2) normal state
When theta is1、θ2、θ3When alpha is larger than or smaller than beta, the person is in a standing state;
when theta is1Satisfies the conditions of more than 0 DEG and less than alpha, or more than beta and less than 180 DEG and theta2And theta3When alpha is larger than or smaller than beta, the figure is in a stooping state;
when theta is1Satisfies the conditions that alpha is larger than beta and theta is smaller than beta2Or theta3When at least one of the degrees is more than 0 degrees and less than alpha degrees or more than beta and less than 180 degrees, the person is in a squatting or sitting state;
r is more than J when the person is in a normal state.
The invention has the following beneficial effects and advantages:
1. the method can process the video sequence frame by frame to extract and track the coordinates and skeleton information of the human body joint points, thereby avoiding the loss of the characteristic information of the detection target and ensuring the effectiveness.
2. The method designs a multi-feature extraction structure taking the included angle between the spine of the human body and the ground, the included angle between the left shank and the right shank and the ground and the length-width ratio of a human body calibration frame as falling-down distinguishing features, and solves the problem of poor generalization of distinguishing bases in the traditional method.
3. The method designs an IoT system framework for fall detection, solves the problem of insufficient computing power of the terminal equipment through the strong computing power of the cloud server, and completes real-time fall detection in multiple scenes.
Drawings
Fig. 1 is a flow chart of a fall detection method;
fig. 2 is a flow chart of an IoT system-based fall detection method;
fig. 3 is a fall detection implementation schematic diagram.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as modified in the spirit and scope of the present invention as set forth in the appended claims.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
As shown in fig. 1 and 3, the method of the present invention mainly comprises two parts:
(1) extracting characteristics;
(2) judging falling;
the specific implementation process is as follows.
Step 1: and (5) feature extraction.
The features include human body joint point coordinates and skeleton posture information. The feature map is processed on two branches of part confidence prediction and part affinity field association simultaneously in a multistage way, and character joint point coordinate prediction and human body skeleton information extraction are completed.
Step 1.1: a loss function is calculated.
By bottom-up means, i.e. detection firstAnd (4) taking out all joint points, associating the joint points with individuals, inputting an original picture, initializing by the top 10 layers of the VGG-19, fine-tuning, and generating a group of feature maps as input of the first stage. In the first stage of the iteration there are two branches, branch 1 being responsible for the thermodynamic diagram of the prediction of the joint point position, i.e. the approximate position of the joint point; branch 2 is responsible for predicting the vector graph of the relationship between pairs of nodes, i.e. the correct connection of neighboring nodes. Meanwhile, a loss function is applied after each iteration stage, and the loss function is subjected to spatial weighting to solve the practical problem that some data sets cannot be completely labeled. E.g. phase tiPartial affinity field prediction branch penalty function of
Figure BDA0003370982530000071
And loss function of joint confidence branch
Figure BDA0003370982530000072
Expressed as:
Figure BDA0003370982530000073
Figure BDA0003370982530000074
wherein the content of the first and second substances,
Figure BDA0003370982530000075
is the true partial affinity field of the position of any point p in the image, C is the limb type,
Figure BDA0003370982530000076
for any point p in the image at stage tiThe partial affinity field generated.
Figure BDA0003370982530000077
Is a real confidence map of the position of any point p in the image, J is a joint point type,
Figure BDA0003370982530000081
for any point p in the image at stage tiAnd generating a confidence map, wherein each type of joint point corresponds to one confidence map. W (p) is a binary mask, where w (p) is 0 when the label at the pixel point p is missing, and w (p) is 1.
Step 1.2: the confidence map of the two-dimensional joint point is labeled, i.e., each specific part of the body can be represented by pixel coordinates in the image. Ideally, if a person appears in the image, a location confidence map is generated for each joint of the person. If there are multiple persons in the figure, there is a peak corresponding to each visible joint point j of the person k to represent the most accurate predicted real position. Wherein the confidence map at an arbitrary point p in the image
Figure BDA0003370982530000082
Can be expressed as:
Figure BDA0003370982530000083
xj,k∈R2the true position of the joint point j of the person k in the image, R is the size of the image, and σ is used for controlling the diffusion degree of the peak value, namely the probability distribution condition of the corresponding joint point. And further obtaining the score of each predicted joint point, and ensuring the accuracy of selecting the body joint point by executing non-maximum suppression, namely:
Figure BDA0003370982530000084
wherein
Figure BDA0003370982530000085
Representing a set of j-type joint points.
Step 1.3: partial affinity field generation. By suppressing the confidence map by non-maxima in step 1.2, a set of discrete site candidate locations is obtained. The various candidate joint points can be combined arbitrarily to generate a large number of candidate limbs. However, there are often a large number of false connections in these limbs, so the optimal matching result, i.e. the exact limb position, is found by the partial affinity domain vector field. The reliability of the connection is detected by calculating the magnitude of the line integral of the respective partial affinity fields along the line segment connecting the candidate site positions, i.e. the integrated value E:
Figure BDA0003370982530000086
wherein u is at a joint point of the human body
Figure BDA0003370982530000087
And p (u) are corresponding sampled values. L isc(p (u)) is the slave joint j1Pointing to a joint point j2The unit vector of (2).
All the candidate related node sets are:
Figure BDA0003370982530000091
wherein the content of the first and second substances,
Figure BDA0003370982530000092
the mth candidate position of the part j class is shown. Determining the possible limb connection between every two parts at the same time, defined as
Figure BDA0003370982530000093
Representing joint points
Figure BDA0003370982530000094
And
Figure BDA0003370982530000095
and if the connection is not carried out, the value is 1, otherwise, the value is 0. And connecting the candidate limbs according to a bipartite graph matching mode that two edges cannot share one node. The final goal is to find the most weighted edge as the selected limb:
Figure BDA0003370982530000096
Figure BDA0003370982530000097
Figure BDA0003370982530000098
wherein E iscAs a weight, EmnIs present at j1The mth node of a class and exists at j2Integral value of partial affinity field of nth node of class, EcIs a collection of class C limbs, WcFor all possible connected class C limbs, Qj1Is j1Set of joint-like points, Qj1Is j2A set of joint-like nodes. Formula (1.8) and formula (1.9) limit that no two edges can share a joint point, i.e. no two limbs of the same type share the same joint point. Therefore, the correct connection of each type of limb can be obtained by selecting the maximum value of E, and then connecting limbs sharing the same joint point into a human skeleton. A total of 25 joints are detected simultaneously, and each joint is labeled with numbers 0 to 24 for convenience of description. Wherein number 0 represents a nose, number 1 represents a neck, numbers 5 and 2 represent left and right shoulders, numbers 6 and 3 represent left and right elbows, numbers 7 and 4 represent left and right wrists, number 8 represents a center of gravity point, numbers 12 and 9 represent left and right thighs, numbers 13 and 10 represent left and right knees, numbers 14 and 11 represent left and right ankles, numbers 16 and 15 represent left and right eyes, numbers 18 and 17 represent left and right ears, numbers 19 and 21 represent left and right big toes, numbers 20 and 22 represent left and right small toes, numbers 24 and 23 represent left and right heels, and the respective joint point coordinates are defined as: nose (x)0,y0) Neck (x)1,y1) …, right heel (x)23,y23) Left heel (x)24,y24)。
Step 2: and (5) judging falling.
The invention utilizes the characteristic that the information contained in the trunk of the human body is richer than a single joint point in the falling process to connect the extracted coordinates of the joint points of the human body, and constructs a multi-discriminant feature extraction structure consisting of an included angle between the spine of the human body and the horizontal direction, an included angle between the left shank and the right shank and the horizontal direction, and the length-width ratio of a human body calibration frame, so as to complete the falling detection task of the human body.
Step 2.1: and calculating the inclination angles of the spine, the left leg and the right leg of the human body.
According to the characteristic that information contained in the trunk of the human body is richer than a single joint point in the falling process, the corresponding feature vectors are generated by connecting adjacent joint points and respectively represent the positions and the directions of the spine, the left lower leg and the right lower leg of the human body. The joint points of the invention are respectively neck, gravity center point, right knee, right ankle, left knee and left ankle, and the coordinates (x) are respectively used1,y1)、(x8,y8)、(x10,y10)、(x11,y11)、(x13,y13)、(x14,y14) And (4) showing.
Therefore, the feature vector can be obtained by the following formula
Figure BDA0003370982530000101
Figure BDA0003370982530000102
Figure BDA0003370982530000103
Figure BDA0003370982530000104
Wherein the content of the first and second substances,
Figure BDA0003370982530000105
representing a human spine vector;
Figure BDA0003370982530000106
representing a left calf vector;
Figure BDA0003370982530000107
representing the right calf vector. In combination with
Figure BDA0003370982530000108
Unit vector representing the x-axis and is noted
Figure BDA0003370982530000109
Because it is always parallel to the ground, it can be used to represent the direction vector of the real ground
Figure BDA00033709825300001010
And
Figure BDA00033709825300001011
the corresponding angle can be formulated as:
Figure BDA00033709825300001012
wherein, theta1Representing the included angle between the spine of the human body and the ground; theta2Representing the included angle between the left shank and the ground; theta3Representing the angle between the right calf and the ground.
Step 2.2: human body calibration frame aspect ratio calculation
On the basis of the step 2.1, considering that the length-width ratio of the human body is changed in the process of falling of the person, X is selected in the methodmax、Ymax、Xmin、YminConstructing a human calibration frame, wherein Xmax、Ymax、Xmin、YminFor the minimum point (X) of the figure's joint point coordinate value in each frame of image after algorithm processingmin,Ymin) And the point (X) having the largest coordinate valuemax,Ymax) As another feature to discriminate whether or not a fall has occurred. Xmax-XminWidth, Y, of a calibration frame representing the human bodymax-YminRepresenting the length of the human body calibration frame, the aspect ratio of the human body calibration frame is:
Figure BDA0003370982530000111
theta during falling1、θ2、θ3And R are constantly changing, so it is necessary to determine the threshold range at which the person falls. In order to make the method universal, the threshold range of the inclination angle is firstly set to be 0 degrees < alpha < thetaiBeta is less than 180 degrees and alpha is less than beta, and the length-width ratio threshold of the calibration frame is set as J.
Step 2.3: fusing the fall distinguishing features extracted in the step 2.1 and the step 2.2 to obtain a parameter theta1、θ2、θ3And the comparison result of the R and the corresponding threshold value can judge whether the detected target falls down or not.
The discrimination result can be divided into a fall state and a normal state, wherein:
(1) falling state
When theta is1Satisfies the conditions of more than 0 DEG and less than alpha, or more than beta and less than 180 DEG and theta2Or theta3When at least one of the conditions is more than 0 degrees and less than alpha degrees or more than beta and less than 180 degrees, and R is less than J, falling occurs;
(2) normal state
When theta is1、θ2、θ3When alpha is larger than or smaller than beta, the person is in a standing state;
when theta is1Satisfies the conditions of more than 0 DEG and less than alpha, or more than beta and less than 180 DEG and theta2And theta3When alpha is larger than or smaller than beta, the figure is in a stooping state;
when theta is1Satisfies the conditions that alpha is larger than beta and theta is smaller than beta2Or theta3When at least one of the degrees is more than 0 degrees and less than alpha degrees or more than beta and less than 180 degrees, the person is in a squatting or sitting state;
r is more than J when the person is in a normal state.
As shown in fig. 2, the detection method of the present invention is embedded in a cloud server, and the purpose of multi-scene simultaneous detection is achieved by using the computing power of the cloud server.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A fall detection method based on a convolutional neural network and multi-discriminant features is characterized by comprising the following steps:
step 1) feature extraction: processing an original video sequence frame by using a VGG-19 network, inputting a feature map obtained after the original video sequence is processed into an OpenPose model, and obtaining coordinates of human body joint points and posture information of a human body skeleton;
step 2), fall discrimination: based on the posture information of the human skeleton, connecting the obtained coordinates of the human joint points to obtain 3 key feature vectors, taking the length-width ratio of the set human body calibration frame and the included angles between the 3 key feature vectors and the horizontal direction as distinguishing features, and comparing the distinguishing features with a threshold value to realize the detection and distinguishing of human body falling.
2. The method for fall detection based on convolutional neural network and multi-discriminant features as claimed in claim 1, wherein the feature extraction comprises the following steps:
step 1.1: processing an original video sequence frame by using the first 10 layers of the VGG-19 network to generate a feature map which is used as the input of an OpenPose model;
step 1.2: the OpenPose model performs 2 stages of processing on the feature map, wherein the first stage is used for labeling a confidence map of a joint point: predicting the coordinates of the joint points according to the feature map; the second stage is used to generate a partial affinity domain vector field: connecting the predicted joint point coordinates pairwise to form a limb vector, namely predicting a vector diagram between the joint points as human skeleton posture information;
step 1.3: and (5) iterating the step 1.2 until the set iteration times are reached, and obtaining the coordinates of the human body joint points and the posture information of the human body skeleton.
3. The method as claimed in claim 2, wherein a loss function is added after each stage, and spatial weighting is performed on the loss function to solve the problem that the data set cannot be completely labeled, and the stage t is a stage tiPartial affinity field prediction phase of
Figure FDA0003370982520000011
And loss function of joint confidence map stage
Figure FDA0003370982520000012
Respectively expressed as:
Figure FDA0003370982520000013
Figure FDA0003370982520000021
wherein the content of the first and second substances,
Figure FDA0003370982520000022
is the true partial affinity field of the position of any point p in the image, C is the limb type,
Figure FDA0003370982520000023
for any point p in the image at stage tiThe partial affinity field that is generated is,
Figure FDA0003370982520000024
is a real confidence map of the position of any point p in the image, J is a joint point type,
Figure FDA0003370982520000025
for any point p in the image at stage tiAnd generating a confidence map, wherein w (p) is a binary mask, and when the label at the pixel point p is missing, w (p) is 0, and conversely, w (p) is 1.
4. The fall detection method based on the convolutional neural network and the multi-discriminant feature as claimed in claim 2, wherein the first stage is configured to label a confidence map of a joint point, specifically: if only one person exists in the feature map, generating a part confidence map by each joint point of the person, and if a plurality of persons exist in the feature map, correspondingly forming a peak value on each visible joint point j of the person k to represent the most accurate predicted real joint point position; wherein the confidence map of any point p in the feature map
Figure FDA0003370982520000026
Expressed as:
Figure FDA0003370982520000027
wherein xj,k∈R2The actual position of the joint point j of the person k in the feature map is shown, R is the size of the feature map, sigma is used for controlling the diffusion degree of the peak value, namely the probability distribution situation of the corresponding joint point, further the score of each predicted joint point is obtained, and a group of discrete part candidate positions are obtained by executing non-maximum value suppression, namely:
Figure FDA0003370982520000028
wherein the content of the first and second substances,
Figure FDA0003370982520000029
representing a set of j-type joint points.
5. The fall detection method based on the convolutional neural network and the multi-discriminant features as claimed in claim 2, wherein the second stage is configured to generate a partial affinity domain vector field, specifically: combining every two candidate joint points randomly to form candidate limb vectors, calculating the line integral of partial affinity domain vector fields, and detecting the connection reliability, namely an integral value E:
Figure FDA00033709825200000210
wherein u is at a joint point of the human body
Figure FDA00033709825200000311
P (u) are corresponding sampling values, Lc(p (u)) is the slave joint j1Pointing to a joint point j2A unit vector of (a);
all the candidate related node sets are:
Figure FDA0003370982520000031
wherein the content of the first and second substances,
Figure FDA0003370982520000032
an mth candidate position representing a part j class; determining the possible limb connection between every two parts at the same time, defined as
Figure FDA0003370982520000033
Representing joint points
Figure FDA0003370982520000034
And
Figure FDA0003370982520000035
if the two edges are connected, the value is 1 if the two edges are connected, otherwise the value is 0, and two edges which cannot share one node are determined according to the condition that the two edges are connectedAnd a partial graph matching mode, wherein connection of the candidate limbs is performed, and finally, the edge with the maximum integral value is found as the selected limb and is used for representing the position of the limb:
Figure FDA0003370982520000036
Figure FDA0003370982520000037
Figure FDA0003370982520000038
wherein E ismnIs present at j1The mth node of a class and exists at j2Integral value of partial affinity field of nth node of class, EcIs a collection of class C limbs, WcFor all the possible connected class C limbs,
Figure FDA0003370982520000039
is j1A set of the joint-like nodes is provided,
Figure FDA00033709825200000310
is j2A set of joint-like nodes.
6. A fall detection method based on convolutional neural network and multi-discriminant features as claimed in claim 5, wherein 25 joint points are detected in the second stage and are labeled with numbers 0 to 24, wherein number 0 represents nose, number 1 represents neck, numbers 5 and 2 represent left and right shoulders, numbers 6 and 3 represent left and right elbows, numbers 7 and 4 represent left and right wrists, respectively, number 8 represents barycenter point, numbers 12 and 9 represent left and right crotch, numbers 13 and 10 represent left and right knees, numbers 14 and 11 represent left and right ankles, respectively, numbers 16 and 15 represent left and right eyes,numbers 18 and 17 represent left and right ears, numbers 19 and 21 represent left and right big toes, numbers 20 and 22 represent left and right small toes, numbers 24 and 23 represent left and right heels, respectively, and the respective joint point coordinates are defined as: nose (x)0,y0) Neck (x)1,y1) …, right heel (x)23,y23) Left heel (x)24,y24)。
7. A method for fall detection based on convolutional neural network and multi-discriminant features as claimed in claim 1, wherein the fall discrimination comprises the following steps:
step 2.1: generating corresponding characteristic vectors by connecting adjacent joint points, and respectively representing the positions and the directions of the spine, the left lower leg and the right lower leg of the human body;
step 2.2: constructing a human body calibration frame;
step 2.3: and respectively comparing the characteristic vector and the human body calibration frame with a threshold value to judge whether the detected target falls down.
8. The method for fall detection based on convolutional neural network and multi-discriminant features as claimed in claim 6 or 7, wherein the step 2.1 specifically comprises: let the feature vectors of the spine, left lower leg and right lower leg of the human body respectively be
Figure FDA0003370982520000041
Use the neck (x)1,y1) Center of gravity (x)8,y8) Right knee (x)10,y10) Right ankle (x)11,y11) Left knee (x)13,y13) Left ankle (x)14,y14) Is expressed as:
Figure FDA0003370982520000042
Figure FDA0003370982520000043
Figure FDA0003370982520000044
by using
Figure FDA0003370982520000045
A unit vector representing the x-axis parallel to the ground and having the right direction of the feature map as the positive direction is described as
Figure FDA0003370982520000046
Then
Figure FDA0003370982520000047
And
Figure FDA0003370982520000048
the corresponding angle is formulated as:
Figure FDA0003370982520000049
wherein, theta1Representing the included angle between the spine of the human body and the ground; theta2Representing the included angle between the left shank and the ground; theta3Representing the angle between the right calf and the ground.
9. The method for fall detection based on convolutional neural network and multi-discriminant features as claimed in claim 7, wherein the step 2.2 specifically comprises: using Xmax、Ymax、Xmin、YminConstructing a human calibration frame, wherein Xmax、Ymax、Xmin、YminFor the minimum point (X) of the figure's joint point coordinate value in each frame of imagemin,Ymin) And the point (X) having the largest coordinate valuemax,Ymax),Xmax-XminWidth, Y, of a calibration frame representing a human bodymax-YminRepresenting the length of the human body calibration box, the aspect ratio of the human body calibration box is:
Figure FDA0003370982520000051
10. the method for fall detection based on convolutional neural network and multi-discriminant features as claimed in claim 7, wherein the step 2.3 specifically comprises:
setting the threshold range of the inclination angle to 0 DEG < alpha < thetaiBeta is less than 180 degrees and alpha is less than beta, and the length-width ratio threshold of the calibration frame is set as J;
the discrimination result is divided into a falling state and a normal state, wherein:
(1) falling state
When theta is1Satisfies the conditions of more than 0 DEG and less than alpha, or more than beta and less than 180 DEG and theta2Or theta3When at least one of the conditions is more than 0 degrees and less than alpha degrees or more than beta and less than 180 degrees, and R is less than J, falling occurs;
(2) normal state
When theta is1、θ2、θ3When alpha is larger than or smaller than beta, the person is in a standing state;
when theta is1Satisfies the conditions of more than 0 DEG and less than alpha, or more than beta and less than 180 DEG and theta2And theta3When alpha is larger than or smaller than beta, the figure is in a stooping state;
when theta is1Satisfies the conditions that alpha is larger than beta and theta is smaller than beta2Or theta3When at least one of the degrees is more than 0 degrees and less than alpha degrees or more than beta and less than 180 degrees, the person is in a squatting or sitting state;
r is more than J when the person is in a normal state.
CN202111398864.XA 2021-11-24 2021-11-24 Fall detection method based on convolutional neural network and multi-discriminant features Pending CN114202722A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111398864.XA CN114202722A (en) 2021-11-24 2021-11-24 Fall detection method based on convolutional neural network and multi-discriminant features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111398864.XA CN114202722A (en) 2021-11-24 2021-11-24 Fall detection method based on convolutional neural network and multi-discriminant features

Publications (1)

Publication Number Publication Date
CN114202722A true CN114202722A (en) 2022-03-18

Family

ID=80648614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111398864.XA Pending CN114202722A (en) 2021-11-24 2021-11-24 Fall detection method based on convolutional neural network and multi-discriminant features

Country Status (1)

Country Link
CN (1) CN114202722A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273243A (en) * 2022-09-27 2022-11-01 深圳比特微电子科技有限公司 Fall detection method and device, electronic equipment and computer readable storage medium
CN116863500A (en) * 2023-06-14 2023-10-10 中国人民解放军总医院第一医学中心 Patient out-of-bed monitoring method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273243A (en) * 2022-09-27 2022-11-01 深圳比特微电子科技有限公司 Fall detection method and device, electronic equipment and computer readable storage medium
CN116863500A (en) * 2023-06-14 2023-10-10 中国人民解放军总医院第一医学中心 Patient out-of-bed monitoring method and system
CN116863500B (en) * 2023-06-14 2024-05-10 中国人民解放军总医院第一医学中心 Patient out-of-bed monitoring method and system

Similar Documents

Publication Publication Date Title
CN110135375B (en) Multi-person attitude estimation method based on global information integration
Zhang et al. Martial arts, dancing and sports dataset: A challenging stereo and multi-view dataset for 3d human pose estimation
CN110222665A (en) Human motion recognition method in a kind of monitoring based on deep learning and Attitude estimation
CN110969114A (en) Human body action function detection system, detection method and detector
Chaudhari et al. Yog-guru: Real-time yoga pose correction system using deep learning methods
CN114724241A (en) Motion recognition method, device, equipment and storage medium based on skeleton point distance
CN114067358A (en) Human body posture recognition method and system based on key point detection technology
CN108052896A (en) Human bodys&#39; response method based on convolutional neural networks and support vector machines
CN114202722A (en) Fall detection method based on convolutional neural network and multi-discriminant features
CN108875586B (en) Functional limb rehabilitation training detection method based on depth image and skeleton data multi-feature fusion
CN112084878B (en) Method for judging operator gesture standardization degree
Chen et al. Fall detection system based on real-time pose estimation and SVM
CN112528812A (en) Pedestrian tracking method, pedestrian tracking device and pedestrian tracking system
CN106815855A (en) Based on the human body motion tracking method that production and discriminate combine
CN112800892B (en) Human body posture recognition method based on openposition
WO2020107847A1 (en) Bone point-based fall detection method and fall detection device therefor
CN113111767A (en) Fall detection method based on deep learning 3D posture assessment
Yang et al. Human exercise posture analysis based on pose estimation
Yan et al. Human-object interaction recognition using multitask neural network
Muhammad et al. Mono camera-based human skeletal tracking for squat exercise Abnormality detection using double Exponential smoothing
CN102156994A (en) Joint positioning method of single-view unmarked human motion tracking
Liu et al. Key algorithm for human motion recognition in virtual reality video sequences based on hidden markov model
WO2020147797A1 (en) Image processing method and apparatus, image device, and storage medium
Kishore et al. Smart yoga instructor for guiding and correcting yoga postures in real time
CN111353347B (en) Action recognition error correction method, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination