CN110532850B - Fall detection method based on video joint points and hybrid classifier - Google Patents
Fall detection method based on video joint points and hybrid classifier Download PDFInfo
- Publication number
- CN110532850B CN110532850B CN201910589503.XA CN201910589503A CN110532850B CN 110532850 B CN110532850 B CN 110532850B CN 201910589503 A CN201910589503 A CN 201910589503A CN 110532850 B CN110532850 B CN 110532850B
- Authority
- CN
- China
- Prior art keywords
- human body
- image
- fall
- video
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a falling detection method based on video joint points and a mixed classifier. The traditional video-based fall detection algorithm relies on manual fall feature extraction, falls are detected by means of a linear discrimination classifier, and the model is simple but low in accuracy. The invention is as follows: 1. and extracting each frame image of the detected video clip. 2. Acquiring a human body joint data matrix; 3. a plurality of behavior matrices are established. 4. And calculating the time characteristic parameters and the space characteristic parameters. 5. Primary classification; 6. and (5) secondary classification. The method for extracting the human body bone joint points solves the problem that the human body posture cannot be accurately estimated by extracting the human body aspect ratio, the projection area and the like in the traditional method. The invention adopts a sliding window with fixed size to construct a behavior matrix, can model in time and space axes at the same time, and fully expresses the characteristics of the falling behavior.
Description
Technical Field
The invention belongs to the technical field of fall detection, and particularly relates to a fall detection method based on video joint points and a mixed classifier.
Background
Scholars at home and abroad have made many researches on the falling of old people, and the main falling detection methods mainly include three types: wearable sensor based fall detection, environmental sensor based fall detection, and video image based detection. Fall detection based on wearable sensors sets a threshold value to detect fall mainly by collecting data collected by wearable sensors, and the final detection result is affected if the threshold value is set inaccurately. The fall detection based on the environmental sensor mainly judges the fall through data obtained by a pressure sensor or a sound detection device on the ground, and if the environmental noise is too large, the data can be abnormal. Video-based fall detection mainly uses a general camera or a depth camera installed in daily life to acquire video data and then determines a fall by image recognition. Compared with a detection algorithm based on a wearable sensor and an environment sensor, the falling detection based on the video does not need old people to wear equipment, is not easily influenced by the environment, and is wider in practical application. However, the traditional video-based fall detection algorithm relies on manual fall feature extraction, falls are detected by means of a linear discriminant classifier method, the model is simple, the accuracy is low, the existing deep learning-based fall detection algorithm model is complex, and the detection time consumption is difficult to reduce.
Disclosure of Invention
The invention aims to provide a fall detection method based on video joint points and a mixed classifier.
The method comprises the following specific steps:
step 1, extracting each frame image of the detected video clip.
Step 2, collecting human body joint point data of each frame of image in the detected video clip by using human body node collection software to obtain a human body joint data matrix;
step 3, constructing s behavior matrixes according to the human body joint data set obtained in the step 2; the operator is a round-down operator; m is the number of images. W is the total frame number of the behavior matrix, and the value of W is 8-15.
Ith behavior matrix Ki,matrixIs represented by formula (1), i is 1,2, …, s.
Ki,matrix=(K2i-1,K2i,…,K2i+W-1)T (1)
In the formula (1), KtThe expression of the skeletal vector of the t image is shown as the formula (2); t 1, 2.
Kt=(Kt,0,Kt,1,…,Kt,N-1) (2)
In the formula (2), KtjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. N is a person in one frame imageNumber of body joint points.
And 4, calculating the time characteristic parameters and the space characteristic parameters.
4-1, calculating the height information h of the human body in the m imagestAs shown in formula (3); t 1,2,. said, m;
ht=ht,head-ht,foot (3)
in the formula (3), ht,headAs the head ordinate h of the human body in the t-th imaget,footIs the foot ordinate of the human body in the t-th image.
4-2, respectively extracting human height information h in the image corresponding to each behavior matrixtIs taken as a spatial maximum value hi,maxSpatial minimum hi,min;i=1,2,…,s。
4-3, calculating the height difference delta h of the adjacent frames of each image from the second image respectively as shown in formulas (4) and (5)t,one(ii) a From the sixth image, the height difference Δ h of five frames of each image is calculatedt,five;
Δht,one=ht-ht-1 (4)
Δht,five=ht-ht-5 (5)
4-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrixt,oneMaximum value, minimum value, five-frame height difference Δ ht,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values hi,one,maxShort-term minimum hi,one,minMaximum value h for a long timei,five,maxLong time minimum hi,five,min。
4-5, constructing a falling feature vector Fi,SVM=(hi,max,hi,min,hi,one,max,hi,one,min,hi,five,max,hi,five,min);i=1,2,…,s。
Step 5, carrying out feature vector F on s fallsi,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients Pi. Setting the threshold interval to [ P ]min,Pmax]。
Confidence level P if all fallsiAre all less than PminIf yes, judging that the pedestrian in the video does not fall, and ending the falling detection; if there is at least one fall confidence level PiGreater than PmaxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, judging that the class falling condition occurs, and entering the step 6.
Step 6, setting the fall confidence coefficient PiIn the threshold interval [ Pmax,Pmin]Inner fall feature vector Fi,SVMAnd inputting the input into a trained secondary classifier, and outputting whether the pedestrian falls down in the video by the secondary classifier.
Preferably, in step 1, a gaussian denoising process is performed on each frame image of the detected video segment.
Preferably, in step 1, the image of each frame of the detected video segment is subjected to a graying process.
Preferably, in step 2, the human body node acquisition software is openpos software. The total number of detected human body joint points is 18, and the detected human body joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17. The human joint contains three data (x, y, score). x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point. Each row of the human body joint data matrix is eighteen joint point parameters on one frame image.
Preferably, in step 2, the data filling is performed on the human body joint points missing in each frame of image, and the specific method is as follows: if the next human body joint point of the missing joint point is not missing, filling the missing joint point with the next human body joint point; if the next human body joint point of the missing node is missing and the previous human body joint point is not missing, filling the missing node with the previous human body joint point; otherwise, deleting the image of the missing node.
Preferably, in step 2, the human body joint points detected by the human body node acquisition software include a right ankle, a left ankle, a right eye, a left eye, a right ear, and a left ear.
In step 4-1, ht,headThe expression of (b) is shown in formula (6); h ist,footThe expression of (b) is shown in formula (7);
in formulae (6) and (7), yt,14、yt,15、yt,16、yt,17Respectively the vertical coordinates of the right eye, the left eye, the right ear and the left ear of the joint point in the t image; y ist,10、yt,13The vertical coordinates of the right ankle and the left ankle in the t-th image are respectively.
Preferably, after steps 4-5 have been performed, each fall feature vector F is assigned according to equation (8)i,SVMThe elements in the table are normalized.
In formula (8), data'pThe p element after the fall feature vector normalization; datapThe p element before the fall feature vector is normalized; dataminIs the minimum value before the fall feature vector normalization; datamaxIs the maximum value of the fall feature vector before normalization.
Preferably, in step 5, the primary classifier uses a support vector machine. Pmax=0.8,Pmin=0.2。
Preferably, in step 6, the secondary classifier includes a plurality of convolutional layers, pooling layers, and three fully-connected layers, which are alternately connected. In the convolution mode of the convolutional neural network, the padding parameter is set to SAME. The scale of the convolution kernel in the convolutional layer is 3 × 3. The size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected to the next convolutional layer by an activation function Relu. The three fully-connected layers are connected in turn by the activation function Relu. The dropout specific gravity of the fully connected layer was set to 0.5. The last convolutional layer is connected to the first fully-connected layer by the activation function Relu. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. And 2 neurons are arranged in the third fully-connected layer.
Preferably, in step 6, the secondary classifier comprises one convolution layer, one pooling layer and three fully-connected layers connected in sequence by the activation function Relu. In the convolution method of the convolutional neural network, padding parameter is set to 'valid'. Four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolution kernels are correspondingly connected with 8 × 1, 6 × 1, 4 × 1 and 2 × 1 pooling operators respectively. The step size of the pooling layer is 1. The dropout specific gravity of the fully connected layer was set to 0.5. 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer. And 2 neurons are arranged in the third full-junction layer.
The invention has the beneficial effects that:
1. the method for extracting the human body bone joint points solves the problem that the human body posture cannot be accurately estimated by extracting the human body aspect ratio, the projection area and the like in the traditional method.
2. According to the method, the falling is judged only according to a single-frame video image, time axis information is lost, a sliding window with a fixed size is adopted to construct a behavior matrix, modeling can be carried out on time and space axes at the same time, and falling behavior characteristics are fully expressed.
3. The behavior matrix is firstly input into the primary classifier and then input into the secondary classifier, so that the problem of low detection precision of the traditional machine learning method is solved, the existing deep learning method is high in precision but complex in model and long in detection time, and therefore detection time is reduced while accuracy is not reduced.
Drawings
FIG. 1 is a flow chart of example 1 of the present invention;
FIG. 2 is a schematic diagram of behavior matrix generation in example 1 and example 2 of the present invention;
FIG. 3 is a flowchart of embodiment 2 of the present invention;
FIG. 4 is a histogram of the recognition accuracy of the present invention;
FIG. 5 is a histogram of recognition durations of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, a fall detection method based on video joints and a hybrid classifier specifically includes the following steps:
step 1, collecting videos through a camera. When a person passes by, a detected video segment containing the person is intercepted in the video.
And 2, performing Gaussian denoising treatment on each frame image of the detected video segment.
Step 3, collecting human body joint point data of each frame of image in the detected video segment by using human body node collection software, and filling data in the missing human body joint points to obtain a human body joint data matrix; each row of the human body joint data matrix is eighteen joint point parameters on one frame of image; the row numbers of the human body joint data matrix are arranged according to the time sequence.
The human body node acquisition software adopts OpenPose software. The OpenPose software can select three models, namely an MPI model, a COCO model and a BODY25 model, the COCO model is adopted in the embodiment, the total number of detected human BODY joint points is 18, and the detected human BODY joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17. The human joint contains three data (x, y, score). x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point.
The method for filling data in the missing human body joint points comprises the following steps:
due to the existence of light and shading, the human joint points acquired by the OpenPose software may have individual joint point loss. If the next human body joint point of the missing joint point (according to the sequence in the step 3, the next joint point of the left ear-17 is the nose-0) is not missing, the missing joint point is filled with the next human body joint point; if the next human body joint point of the missing joint point is missing and the previous human body joint point is not missing, filling the missing joint point with the previous human body joint point; otherwise, deleting the frame image.
Step 4, as shown in fig. 1, according to the human body joint data set obtained in the step 3, s behavior matrixes constructed by sliding window pairs are utilized; the operator is a round-down operator; and m is the number of images for detecting the human body joint points.
As shown in fig. 2. The sliding window is a fixed size window for storing time series data, and as time goes on, the sliding window moves along the time axis, newly entered data is stored at the bottom of the window, and data at the top is removed. The invention adopts a sliding window to construct a behavior matrix, the total frame number W of the behavior matrix is set to be 10, the total joint point number N is 18, the size of the behavior matrix is [10,18 multiplied by 3], namely the length of the sliding window is 54, and the width of the sliding window is 10. Setting the step size of sliding to 2, and finally constructing a plurality of behavior matrixes with the size of 10 x 54.
Ith behavior matrix Ki,matrixIs represented by formula (1), i ═ 1,2, …, s; each row of the behavior matrix corresponds to a skeleton vector of one frame of image;
Ki,matrix=(K2i-1,K2i,…,K2i+W-1)T (1)
in the formula (1), KtThe expression of the skeletal vector of the t image is shown as the formula (2); t 1, 2.
Kt=(Kt,0,Kt,1,…,Kt,N-1) (2)
In the formula (2), KtjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. The parameters contain three data (x, y, score). The size of each behavior matrix is thus [ W, Nx 3]],
And 5, calculating the time characteristic parameters and the space characteristic parameters.
In order to fully express the falling behavior, the invention extracts the time characteristic and the space characteristic from each behavior matrix at the same time, and the method comprises the following steps:
5-1, calculating the height information h of the human body in the m imagestAs shown in formula (3); t 1, 2.
ht=ht,head-ht,foot (3)
In the formula (3), ht,headThe expression is shown in formula (4) as the head ordinate of the human body in the t-th image. h ist,footThe expression is shown in formula (4) as the ordinate of the foot of the human body in the t-th image.
In formulae (4) and (5), yt,14、yt,15、yt,16、yt,17Respectively are the vertical coordinates of the right eye-14, the left eye-15, the right ear-16 and the left ear-17 of the joint points in the t image; y ist,10、yt,13The ordinate of the right ankle-10 and the left ankle-13 in the t-th image are respectively.
5-2, respectively extracting the maximum value and the minimum value of the human height information in the image corresponding to each behavior matrix as the maximum value h of the spacei,maxSpatial minimum hi,min;i=1,2,…,s。
5-3, as shown in formulas (6) and (7)From the second image, the height difference Δ h between adjacent frames of each image is calculatedt,one(ii) a From the sixth image, the height difference Δ h of five frames of each image is calculatedfiveAs shown in fig. 4 and 5.Δ ht,oneCan represent the change value of the height between the adjacent frames, namely the speed of the body descending of the human body in the falling process when falling; Δ ht,fiveIndicating the change in height between five frames before and after a fall.
Δht,one=ht-ht-1 (6)
Δht,five=ht-ht-5 (7)
5-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrixt,oneMaximum value, minimum value, five-frame height difference Δ ht,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values hi,one,maxShort-term minimum hi,one,minMaximum value h for a long timei,five,maxLong time minimum hi,five,min。
5-5, mixingi,max、hi,min、hi,one,max、hi,one,min、hi,five,max、hi,five,minFeature fusion into fall feature vector Fi,SVM, Fi,SVM=(hi,max,hi,min,hi,one,max,hi,one,min,hi,five,max,hi,five,min);i=1,2,…,s。
5-6. according to the formula (9), each falling feature vector Fi,SVMNormalizing the inner elements; and further eliminating the identification error caused by individual differences such as posture, distance and the like in the video.
In formula (9), data'pThe p element after the fall feature vector normalization; datapThe p element before the fall feature vector is normalized; dataminIs the minimum value before the fall feature vector normalization; datamaxIs the maximum value of the fall feature vector before normalization.
Step 6: the normalized s fall feature vectors Fi,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients Pi. The primary classifier employs a Support Vector Machine (SVM). Setting the threshold interval to [ P ]min,Pmax],Pmax=0.8,Pmin=0.2。
Confidence level P if all fallsiAre all less than PminIf yes, judging that the pedestrian in the video does not fall, and ending the falling detection; if there is at least one fall confidence level PiGreater than PmaxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, the class falling situation is judged to occur, and the step 7 is entered.
Step 7, setting the fall confidence level PiIn the threshold interval [ Pmax,Pmin]Inner fall feature vector Fi,SVMAnd inputting the input into a trained secondary classifier, and outputting whether the pedestrian falls down in the video by the secondary classifier.
The second-stage classifier adopts a CNN convolutional neural network and comprises a plurality of convolutional layers, a pooling layer and three full-connection layers which are alternately connected. In the convolution mode of the convolutional neural network, the padding parameter is set to SAME. The scale of the convolution kernel in the convolutional layer is 3 × 3. The size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected to the next convolutional layer by an activation function Relu (rectified Linear units). The three fully-connected layers are connected in turn by the activation function Relu. The dropout specific gravity of the fully connected layer was set to 0.5. The last convolutional layer is connected to the first fully-connected layer by the activation function Relu. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. And 2 neurons are arranged in the third full-connection layer and used for outputting judgment results.
Example 2
As shown in fig. 3, a fall detection method based on video joints and a hybrid classifier differs from embodiment 1 in that:
in step 2, not performing gaussian denoising processing but performing graying processing on each frame image of the detected video segment.
In step 6, the secondary classifier is different from that in embodiment 1, and the secondary classifier in this embodiment is a multi-scale convolutional neural network (denoted as multiconn) and includes a convolutional layer, a pooling layer, and three fully-connected layers, which are sequentially connected by an activation function Relu. In the convolution method of the convolutional neural network, padding parameter is set to 'valid'. Four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; the convolution kernels of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are correspondingly connected with the pooling operators of 8 × 1, 6 × 1, 4 × 1 and 2 × 1 respectively. The step size of the pooling layer is 1. The dropout specific gravity of the fully connected layer was set to 0.5. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. The third fully connected layer is provided with 2 neurons.
The two-stage classifier is characterized in that convolution kernels with different sizes are arranged on convolution layers of a traditional convolution neural network, and different convolution kernels are subjected to pooling by different pooling operators. Since the size of the behavior matrix constructed in the step 4 is 10 × 54, four different convolution kernels are set, namely, convolution kernels with different scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9; and respectively inputting the feature maps obtained by convolution kernels with different sizes into pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 for pooling, so as to obtain different falling features, paving the falling features, inputting the falling features into three full-connection layers, and outputting falling detection results.
In order to verify the precision and efficiency of the method, tests of the recognition precision and the recognition duration are respectively carried out on the SVM classifier which is used independently, the CNN classifier which is used independently, the MultiCNN classifier which is used independently, the method in the embodiment 1 and the method in the embodiment 2; the result of the recognition accuracy is shown in fig. 4 in comparison with a histogram; as can be seen from fig. 4, the CNN classifier alone, the MultiCNN classifier alone, and the embodiments 1 and 2 can achieve higher accuracy.
The results of identifying the duration are shown in FIG. 5 in comparison to a histogram; it can be seen from fig. 5 that the recognition duration of the embodiments 1 and 2 is significantly shorter than that of the CNN classifier alone or that of the MultiCNN classifier alone. Therefore, the method and the device can greatly reduce the falling identification time length, reduce the calculated amount and improve the calculation efficiency on the premise of ensuring the falling identification precision.
Claims (10)
1. A fall detection method based on video joint points and a hybrid classifier is characterized in that: step 1, extracting each frame image of a detected video clip;
step 2, collecting human body joint data of each frame of image in the detected video segment by using human body node collection software to obtain a human body joint data matrix;
step 3, constructing s behavior matrixes according to the human body joint data set obtained in the step 2; the operator is a round-down operator; m is the number of images; w is the total frame number of the behavior matrix, and the value of W is 8-15;
ith behavior matrix Ki,matrixIs represented by formula (1), i ═ 1,2, …, s;
Ki,matrix=(K2i-1,K2i,…,K2i+W-1)T (1)
in the formula (1), KtThe expression of the skeletal vector of the t image is shown as the formula (2); t 1,2,. said, m;
Kt=(Kt,0,Kt,1,…,Kt,N-1) (2)
in the formula (2), KtjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. N is the number of human body joint points in one frame of image;
step 4, calculating time characteristic parameters and space characteristic parameters;
4-1, calculating m picturesHeight information h of human body in imagetAs shown in formula (3); t 1,2,. said, m;
ht=ht,head-ht,foot (3)
in the formula (3), ht,headAs the head ordinate h of the human body in the t-th imaget,footThe foot ordinate of the human body in the t image is shown;
4-2, respectively extracting human height information h in the image corresponding to each behavior matrixtIs taken as the maximum value and the minimum value ofi,maxSpatial minimum hi,min;i=1,2,…,s;
4-3, calculating the height difference Deltah of adjacent frames of each image from the second image as shown in formulas (4) and (5)t,one(ii) a From the sixth image, the height difference Deltah of five frames of each image is calculatedt,five;
△ht,one=ht-ht-1 (4)
△ht,five=ht-ht-5 (5)
4-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrixt,oneMaximum value, minimum value, five-frame height difference Δ ht,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values hi,one,maxShort-term minimum hi,one,minLong time maximum hi,five,maxLong time minimum hi,five,min;
4-5, constructing a falling feature vector Fi,SVM=(hi,max,hi,min,hi,one,max,hi,one,min,hi,five,max,hi,five,min);i=1,2,…,s;
Step 5, carrying out feature vector F on s fallsi,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients Pi(ii) a Setting the threshold interval to [ P ]min,Pmax];
Confidence level P if all fallsiAre all less than PminThen determine in the videoThe pedestrian does not fall, and the falling detection is finished; if at least one fall confidence level P existsiGreater than PmaxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, judging that the similar falling condition occurs, and entering the step 6;
step 6, setting the fall confidence coefficient PiIn the threshold interval [ Pmax,Pmin]Inner fall feature vector Fi,SVMAnd inputting the input into a trained secondary classifier, and outputting whether the pedestrian falls down in the video by the secondary classifier.
2. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 1, each frame image of the detected video segment is subjected to Gaussian denoising processing.
3. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 1, graying each frame of image of the detected video clip.
4. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in the step 2, the human body node acquisition software adopts OpenPose software; the total number of detected human body joint points is 18, and the detected human body joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17; the human joint point contains three data (x, y, score); x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point; each row of the human body joint data matrix is eighteen joint point parameters on one frame image.
5. A fall detection method based on video joint points and hybrid classifiers according to claim 4, characterized in that: in step 2, data filling is performed on missing human body joint points in each frame of image, and the specific method is as follows: if the next human body joint point of the missing joint point is not missing, filling the missing joint point with the next human body joint point; if the next human body joint point of the missing node is missing and the previous human body joint point is not missing, filling the missing node with the previous human body joint point; otherwise, deleting the image of the missing node.
6. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 2, the human body joint points detected by the human body node acquisition software comprise a right ankle, a left ankle, a right eye, a left eye, a right ear and a left ear;
in step 4-1, ht,headThe expression of (b) is shown in formula (6); h ist,footThe expression of (b) is shown in formula (7);
in formulae (6) and (7), yt,14、yt,15、yt,16、yt,17Respectively the vertical coordinates of the right eye, the left eye, the right ear and the left ear of the joint point in the t image; y ist,10、yt,13The vertical coordinates of the right ankle and the left ankle in the t-th image are respectively.
7. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: after the steps 4-5 are performed, each fall feature vector F is subjected to the following equation (8)i,SVMNormalizing the inner elements;
in formula (8), data'pThe p element after the fall feature vector normalization; datapThe p element before the fall feature vector normalization; dataminIs the minimum value before the fall feature vector normalization; datamaxIs the maximum value before the fall feature vector normalization.
8. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 5, a support vector machine is adopted by the primary classifier; pmax=0.8,Pmin=0.2。
9. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 6, the secondary classifier comprises a plurality of convolution layers, a pooling layer and three full-connection layers which are alternately connected; in a convolution mode of the convolutional neural network, padding parameters are set as SAME; the scale of the convolution kernel in the convolution layer is 3 multiplied by 3; the size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected with the next convolution layer through an activation function Relu; the three full connection layers are connected sequentially through an activation function Relu; the dropout specific gravity of the full connection layer is set to be 0.5; the last convolution layer is connected with the first full-connection layer through an activation function Relu; 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer; and 2 neurons are arranged in the third fully-connected layer.
10. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 6, the secondary classifier comprises a convolution layer, a pooling layer and three full-connection layers which are sequentially connected through an activation function Relu; in the convolution mode of the convolutional neural network, a padding parameter is set to 'valid'; four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolution kernels are correspondingly connected with 8 × 1, 6 × 1, 4 × 1 and 2 × 1 pooling operators respectively; the step length of the pooling layer is 1; the dropout specific gravity of the full connection layer is set to be 0.5; 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer; and 2 neurons are arranged in the third fully-connected layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910589503.XA CN110532850B (en) | 2019-07-02 | 2019-07-02 | Fall detection method based on video joint points and hybrid classifier |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910589503.XA CN110532850B (en) | 2019-07-02 | 2019-07-02 | Fall detection method based on video joint points and hybrid classifier |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110532850A CN110532850A (en) | 2019-12-03 |
CN110532850B true CN110532850B (en) | 2021-11-02 |
Family
ID=68659847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910589503.XA Active CN110532850B (en) | 2019-07-02 | 2019-07-02 | Fall detection method based on video joint points and hybrid classifier |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532850B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111243229A (en) * | 2019-12-31 | 2020-06-05 | 浙江大学 | Old people falling risk assessment method and system |
CN111832412B (en) * | 2020-06-09 | 2024-04-09 | 北方工业大学 | Sounding training correction method and system |
CN112215185B (en) * | 2020-10-21 | 2022-08-05 | 成都信息工程大学 | System and method for detecting falling behavior from monitoring video |
CN112541424A (en) * | 2020-12-07 | 2021-03-23 | 南京工程学院 | Real-time detection method for pedestrian falling under complex environment |
CN113204989B (en) * | 2021-03-19 | 2022-07-29 | 南京邮电大学 | Human body posture space-time feature extraction method for tumble analysis |
CN113095295B (en) * | 2021-05-08 | 2023-08-18 | 广东工业大学 | Fall detection method based on improved key frame extraction |
CN118015520A (en) * | 2024-03-15 | 2024-05-10 | 上海摩象网络科技有限公司 | Vision-based nursing detection system and method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564005A (en) * | 2018-03-26 | 2018-09-21 | 电子科技大学 | A kind of human body tumble discrimination method based on convolutional neural networks |
CN109920208A (en) * | 2019-01-31 | 2019-06-21 | 深圳绿米联创科技有限公司 | Tumble prediction technique, device, electronic equipment and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8179268B2 (en) * | 2008-03-10 | 2012-05-15 | Ramot At Tel-Aviv University Ltd. | System for automatic fall detection for elderly people |
-
2019
- 2019-07-02 CN CN201910589503.XA patent/CN110532850B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564005A (en) * | 2018-03-26 | 2018-09-21 | 电子科技大学 | A kind of human body tumble discrimination method based on convolutional neural networks |
CN109920208A (en) * | 2019-01-31 | 2019-06-21 | 深圳绿米联创科技有限公司 | Tumble prediction technique, device, electronic equipment and system |
Non-Patent Citations (2)
Title |
---|
Video-based Fall Detection for Seniors with Human Pose Estimation;Zhanyuan Huang等;《IEEE》;20190214;第1-4页 * |
基于深度相机的老年跌倒监护系统;申代友等;《中国医学物理学杂志》;20190228;第36卷(第2期);第223-228页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110532850A (en) | 2019-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110532850B (en) | Fall detection method based on video joint points and hybrid classifier | |
CN108830252B (en) | Convolutional neural network human body action recognition method fusing global space-time characteristics | |
CN103942577B (en) | Based on the personal identification method for establishing sample database and composite character certainly in video monitoring | |
CN111401144B (en) | Escalator passenger behavior identification method based on video monitoring | |
CN109543526B (en) | True and false facial paralysis recognition system based on depth difference characteristics | |
CN103955699B (en) | A kind of real-time fall events detection method based on monitor video | |
CN105956582A (en) | Face identifications system based on three-dimensional data | |
CN107506692A (en) | A kind of dense population based on deep learning counts and personnel's distribution estimation method | |
CN107220603A (en) | Vehicle checking method and device based on deep learning | |
CN106529442A (en) | Pedestrian identification method and apparatus | |
CN105160400A (en) | L21 norm based method for improving convolutional neural network generalization capability | |
CN105160310A (en) | 3D (three-dimensional) convolutional neural network based human body behavior recognition method | |
CN109271918B (en) | Method for distinguishing people with balance ability disorder based on gravity center shift model | |
CN108596087B (en) | Driving fatigue degree detection regression model based on double-network result | |
CN107944399A (en) | A kind of pedestrian's recognition methods again based on convolutional neural networks target's center model | |
CN111488850B (en) | Neural network-based old people falling detection method | |
CN110263728A (en) | Anomaly detection method based on improved pseudo- three-dimensional residual error neural network | |
CN110705468B (en) | Eye movement range identification method and system based on image analysis | |
KR20190105180A (en) | Apparatus for Lesion Diagnosis Based on Convolutional Neural Network and Method thereof | |
CN104298974A (en) | Human body behavior recognition method based on depth video sequence | |
CN113610046B (en) | Behavior recognition method based on depth video linkage characteristics | |
CN105160285A (en) | Method and system for recognizing human body tumble automatically based on stereoscopic vision | |
CN114492634B (en) | Fine granularity equipment picture classification and identification method and system | |
CN111797705A (en) | Action recognition method based on character relation modeling | |
CN107967944A (en) | A kind of outdoor environment big data measuring of human health method and platform based on Hadoop |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |