CN110532850B - Fall detection method based on video joint points and hybrid classifier - Google Patents

Fall detection method based on video joint points and hybrid classifier Download PDF

Info

Publication number
CN110532850B
CN110532850B CN201910589503.XA CN201910589503A CN110532850B CN 110532850 B CN110532850 B CN 110532850B CN 201910589503 A CN201910589503 A CN 201910589503A CN 110532850 B CN110532850 B CN 110532850B
Authority
CN
China
Prior art keywords
human body
image
fall
video
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910589503.XA
Other languages
Chinese (zh)
Other versions
CN110532850A (en
Inventor
蔡文郁
郑雪晨
郭嘉豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910589503.XA priority Critical patent/CN110532850B/en
Publication of CN110532850A publication Critical patent/CN110532850A/en
Application granted granted Critical
Publication of CN110532850B publication Critical patent/CN110532850B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a falling detection method based on video joint points and a mixed classifier. The traditional video-based fall detection algorithm relies on manual fall feature extraction, falls are detected by means of a linear discrimination classifier, and the model is simple but low in accuracy. The invention is as follows: 1. and extracting each frame image of the detected video clip. 2. Acquiring a human body joint data matrix; 3. a plurality of behavior matrices are established. 4. And calculating the time characteristic parameters and the space characteristic parameters. 5. Primary classification; 6. and (5) secondary classification. The method for extracting the human body bone joint points solves the problem that the human body posture cannot be accurately estimated by extracting the human body aspect ratio, the projection area and the like in the traditional method. The invention adopts a sliding window with fixed size to construct a behavior matrix, can model in time and space axes at the same time, and fully expresses the characteristics of the falling behavior.

Description

Fall detection method based on video joint points and hybrid classifier
Technical Field
The invention belongs to the technical field of fall detection, and particularly relates to a fall detection method based on video joint points and a mixed classifier.
Background
Scholars at home and abroad have made many researches on the falling of old people, and the main falling detection methods mainly include three types: wearable sensor based fall detection, environmental sensor based fall detection, and video image based detection. Fall detection based on wearable sensors sets a threshold value to detect fall mainly by collecting data collected by wearable sensors, and the final detection result is affected if the threshold value is set inaccurately. The fall detection based on the environmental sensor mainly judges the fall through data obtained by a pressure sensor or a sound detection device on the ground, and if the environmental noise is too large, the data can be abnormal. Video-based fall detection mainly uses a general camera or a depth camera installed in daily life to acquire video data and then determines a fall by image recognition. Compared with a detection algorithm based on a wearable sensor and an environment sensor, the falling detection based on the video does not need old people to wear equipment, is not easily influenced by the environment, and is wider in practical application. However, the traditional video-based fall detection algorithm relies on manual fall feature extraction, falls are detected by means of a linear discriminant classifier method, the model is simple, the accuracy is low, the existing deep learning-based fall detection algorithm model is complex, and the detection time consumption is difficult to reduce.
Disclosure of Invention
The invention aims to provide a fall detection method based on video joint points and a mixed classifier.
The method comprises the following specific steps:
step 1, extracting each frame image of the detected video clip.
Step 2, collecting human body joint point data of each frame of image in the detected video clip by using human body node collection software to obtain a human body joint data matrix;
step 3, constructing s behavior matrixes according to the human body joint data set obtained in the step 2;
Figure BDA0002115643570000011
Figure BDA0002115643570000021
the operator is a round-down operator; m is the number of images. W is the total frame number of the behavior matrix, and the value of W is 8-15.
Ith behavior matrix Ki,matrixIs represented by formula (1), i is 1,2, …, s.
Ki,matrix=(K2i-1,K2i,…,K2i+W-1)T (1)
In the formula (1), KtThe expression of the skeletal vector of the t image is shown as the formula (2); t 1, 2.
Kt=(Kt,0,Kt,1,…,Kt,N-1) (2)
In the formula (2), KtjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. N is a person in one frame imageNumber of body joint points.
And 4, calculating the time characteristic parameters and the space characteristic parameters.
4-1, calculating the height information h of the human body in the m imagestAs shown in formula (3); t 1,2,. said, m;
ht=ht,head-ht,foot (3)
in the formula (3), ht,headAs the head ordinate h of the human body in the t-th imaget,footIs the foot ordinate of the human body in the t-th image.
4-2, respectively extracting human height information h in the image corresponding to each behavior matrixtIs taken as a spatial maximum value hi,maxSpatial minimum hi,min;i=1,2,…,s。
4-3, calculating the height difference delta h of the adjacent frames of each image from the second image respectively as shown in formulas (4) and (5)t,one(ii) a From the sixth image, the height difference Δ h of five frames of each image is calculatedt,five
Δht,one=ht-ht-1 (4)
Δht,five=ht-ht-5 (5)
4-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrixt,oneMaximum value, minimum value, five-frame height difference Δ ht,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values hi,one,maxShort-term minimum hi,one,minMaximum value h for a long timei,five,maxLong time minimum hi,five,min
4-5, constructing a falling feature vector Fi,SVM=(hi,max,hi,min,hi,one,max,hi,one,min,hi,five,max,hi,five,min);i=1,2,…,s。
Step 5, carrying out feature vector F on s fallsi,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients Pi. Setting the threshold interval to [ P ]min,Pmax]。
Confidence level P if all fallsiAre all less than PminIf yes, judging that the pedestrian in the video does not fall, and ending the falling detection; if there is at least one fall confidence level PiGreater than PmaxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, judging that the class falling condition occurs, and entering the step 6.
Step 6, setting the fall confidence coefficient PiIn the threshold interval [ Pmax,Pmin]Inner fall feature vector Fi,SVMAnd inputting the input into a trained secondary classifier, and outputting whether the pedestrian falls down in the video by the secondary classifier.
Preferably, in step 1, a gaussian denoising process is performed on each frame image of the detected video segment.
Preferably, in step 1, the image of each frame of the detected video segment is subjected to a graying process.
Preferably, in step 2, the human body node acquisition software is openpos software. The total number of detected human body joint points is 18, and the detected human body joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17. The human joint contains three data (x, y, score). x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point. Each row of the human body joint data matrix is eighteen joint point parameters on one frame image.
Preferably, in step 2, the data filling is performed on the human body joint points missing in each frame of image, and the specific method is as follows: if the next human body joint point of the missing joint point is not missing, filling the missing joint point with the next human body joint point; if the next human body joint point of the missing node is missing and the previous human body joint point is not missing, filling the missing node with the previous human body joint point; otherwise, deleting the image of the missing node.
Preferably, in step 2, the human body joint points detected by the human body node acquisition software include a right ankle, a left ankle, a right eye, a left eye, a right ear, and a left ear.
In step 4-1, ht,headThe expression of (b) is shown in formula (6); h ist,footThe expression of (b) is shown in formula (7);
Figure BDA0002115643570000031
Figure BDA0002115643570000032
in formulae (6) and (7), yt,14、yt,15、yt,16、yt,17Respectively the vertical coordinates of the right eye, the left eye, the right ear and the left ear of the joint point in the t image; y ist,10、yt,13The vertical coordinates of the right ankle and the left ankle in the t-th image are respectively.
Preferably, after steps 4-5 have been performed, each fall feature vector F is assigned according to equation (8)i,SVMThe elements in the table are normalized.
Figure BDA0002115643570000041
In formula (8), data'pThe p element after the fall feature vector normalization; datapThe p element before the fall feature vector is normalized; dataminIs the minimum value before the fall feature vector normalization; datamaxIs the maximum value of the fall feature vector before normalization.
Preferably, in step 5, the primary classifier uses a support vector machine. Pmax=0.8,Pmin=0.2。
Preferably, in step 6, the secondary classifier includes a plurality of convolutional layers, pooling layers, and three fully-connected layers, which are alternately connected. In the convolution mode of the convolutional neural network, the padding parameter is set to SAME. The scale of the convolution kernel in the convolutional layer is 3 × 3. The size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected to the next convolutional layer by an activation function Relu. The three fully-connected layers are connected in turn by the activation function Relu. The dropout specific gravity of the fully connected layer was set to 0.5. The last convolutional layer is connected to the first fully-connected layer by the activation function Relu. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. And 2 neurons are arranged in the third fully-connected layer.
Preferably, in step 6, the secondary classifier comprises one convolution layer, one pooling layer and three fully-connected layers connected in sequence by the activation function Relu. In the convolution method of the convolutional neural network, padding parameter is set to 'valid'. Four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolution kernels are correspondingly connected with 8 × 1, 6 × 1, 4 × 1 and 2 × 1 pooling operators respectively. The step size of the pooling layer is 1. The dropout specific gravity of the fully connected layer was set to 0.5. 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer. And 2 neurons are arranged in the third full-junction layer.
The invention has the beneficial effects that:
1. the method for extracting the human body bone joint points solves the problem that the human body posture cannot be accurately estimated by extracting the human body aspect ratio, the projection area and the like in the traditional method.
2. According to the method, the falling is judged only according to a single-frame video image, time axis information is lost, a sliding window with a fixed size is adopted to construct a behavior matrix, modeling can be carried out on time and space axes at the same time, and falling behavior characteristics are fully expressed.
3. The behavior matrix is firstly input into the primary classifier and then input into the secondary classifier, so that the problem of low detection precision of the traditional machine learning method is solved, the existing deep learning method is high in precision but complex in model and long in detection time, and therefore detection time is reduced while accuracy is not reduced.
Drawings
FIG. 1 is a flow chart of example 1 of the present invention;
FIG. 2 is a schematic diagram of behavior matrix generation in example 1 and example 2 of the present invention;
FIG. 3 is a flowchart of embodiment 2 of the present invention;
FIG. 4 is a histogram of the recognition accuracy of the present invention;
FIG. 5 is a histogram of recognition durations of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, a fall detection method based on video joints and a hybrid classifier specifically includes the following steps:
step 1, collecting videos through a camera. When a person passes by, a detected video segment containing the person is intercepted in the video.
And 2, performing Gaussian denoising treatment on each frame image of the detected video segment.
Step 3, collecting human body joint point data of each frame of image in the detected video segment by using human body node collection software, and filling data in the missing human body joint points to obtain a human body joint data matrix; each row of the human body joint data matrix is eighteen joint point parameters on one frame of image; the row numbers of the human body joint data matrix are arranged according to the time sequence.
The human body node acquisition software adopts OpenPose software. The OpenPose software can select three models, namely an MPI model, a COCO model and a BODY25 model, the COCO model is adopted in the embodiment, the total number of detected human BODY joint points is 18, and the detected human BODY joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17. The human joint contains three data (x, y, score). x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point.
The method for filling data in the missing human body joint points comprises the following steps:
due to the existence of light and shading, the human joint points acquired by the OpenPose software may have individual joint point loss. If the next human body joint point of the missing joint point (according to the sequence in the step 3, the next joint point of the left ear-17 is the nose-0) is not missing, the missing joint point is filled with the next human body joint point; if the next human body joint point of the missing joint point is missing and the previous human body joint point is not missing, filling the missing joint point with the previous human body joint point; otherwise, deleting the frame image.
Step 4, as shown in fig. 1, according to the human body joint data set obtained in the step 3, s behavior matrixes constructed by sliding window pairs are utilized;
Figure BDA0002115643570000051
Figure BDA0002115643570000052
the operator is a round-down operator; and m is the number of images for detecting the human body joint points.
As shown in fig. 2. The sliding window is a fixed size window for storing time series data, and as time goes on, the sliding window moves along the time axis, newly entered data is stored at the bottom of the window, and data at the top is removed. The invention adopts a sliding window to construct a behavior matrix, the total frame number W of the behavior matrix is set to be 10, the total joint point number N is 18, the size of the behavior matrix is [10,18 multiplied by 3], namely the length of the sliding window is 54, and the width of the sliding window is 10. Setting the step size of sliding to 2, and finally constructing a plurality of behavior matrixes with the size of 10 x 54.
Ith behavior matrix Ki,matrixIs represented by formula (1), i ═ 1,2, …, s; each row of the behavior matrix corresponds to a skeleton vector of one frame of image;
Ki,matrix=(K2i-1,K2i,…,K2i+W-1)T (1)
in the formula (1), KtThe expression of the skeletal vector of the t image is shown as the formula (2); t 1, 2.
Kt=(Kt,0,Kt,1,…,Kt,N-1) (2)
In the formula (2), KtjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. The parameters contain three data (x, y, score). The size of each behavior matrix is thus [ W, Nx 3]],
And 5, calculating the time characteristic parameters and the space characteristic parameters.
In order to fully express the falling behavior, the invention extracts the time characteristic and the space characteristic from each behavior matrix at the same time, and the method comprises the following steps:
5-1, calculating the height information h of the human body in the m imagestAs shown in formula (3); t 1, 2.
ht=ht,head-ht,foot (3)
In the formula (3), ht,headThe expression is shown in formula (4) as the head ordinate of the human body in the t-th image. h ist,footThe expression is shown in formula (4) as the ordinate of the foot of the human body in the t-th image.
Figure BDA0002115643570000061
Figure BDA0002115643570000062
In formulae (4) and (5), yt,14、yt,15、yt,16、yt,17Respectively are the vertical coordinates of the right eye-14, the left eye-15, the right ear-16 and the left ear-17 of the joint points in the t image; y ist,10、yt,13The ordinate of the right ankle-10 and the left ankle-13 in the t-th image are respectively.
5-2, respectively extracting the maximum value and the minimum value of the human height information in the image corresponding to each behavior matrix as the maximum value h of the spacei,maxSpatial minimum hi,min;i=1,2,…,s。
5-3, as shown in formulas (6) and (7)From the second image, the height difference Δ h between adjacent frames of each image is calculatedt,one(ii) a From the sixth image, the height difference Δ h of five frames of each image is calculatedfiveAs shown in fig. 4 and 5.Δ ht,oneCan represent the change value of the height between the adjacent frames, namely the speed of the body descending of the human body in the falling process when falling; Δ ht,fiveIndicating the change in height between five frames before and after a fall.
Δht,one=ht-ht-1 (6)
Δht,five=ht-ht-5 (7)
5-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrixt,oneMaximum value, minimum value, five-frame height difference Δ ht,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values hi,one,maxShort-term minimum hi,one,minMaximum value h for a long timei,five,maxLong time minimum hi,five,min
5-5, mixingi,max、hi,min、hi,one,max、hi,one,min、hi,five,max、hi,five,minFeature fusion into fall feature vector Fi,SVM, Fi,SVM=(hi,max,hi,min,hi,one,max,hi,one,min,hi,five,max,hi,five,min);i=1,2,…,s。
5-6. according to the formula (9), each falling feature vector Fi,SVMNormalizing the inner elements; and further eliminating the identification error caused by individual differences such as posture, distance and the like in the video.
Figure BDA0002115643570000071
In formula (9), data'pThe p element after the fall feature vector normalization; datapThe p element before the fall feature vector is normalized; dataminIs the minimum value before the fall feature vector normalization; datamaxIs the maximum value of the fall feature vector before normalization.
Step 6: the normalized s fall feature vectors Fi,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients Pi. The primary classifier employs a Support Vector Machine (SVM). Setting the threshold interval to [ P ]min,Pmax],Pmax=0.8,Pmin=0.2。
Confidence level P if all fallsiAre all less than PminIf yes, judging that the pedestrian in the video does not fall, and ending the falling detection; if there is at least one fall confidence level PiGreater than PmaxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, the class falling situation is judged to occur, and the step 7 is entered.
Step 7, setting the fall confidence level PiIn the threshold interval [ Pmax,Pmin]Inner fall feature vector Fi,SVMAnd inputting the input into a trained secondary classifier, and outputting whether the pedestrian falls down in the video by the secondary classifier.
The second-stage classifier adopts a CNN convolutional neural network and comprises a plurality of convolutional layers, a pooling layer and three full-connection layers which are alternately connected. In the convolution mode of the convolutional neural network, the padding parameter is set to SAME. The scale of the convolution kernel in the convolutional layer is 3 × 3. The size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected to the next convolutional layer by an activation function Relu (rectified Linear units). The three fully-connected layers are connected in turn by the activation function Relu. The dropout specific gravity of the fully connected layer was set to 0.5. The last convolutional layer is connected to the first fully-connected layer by the activation function Relu. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. And 2 neurons are arranged in the third full-connection layer and used for outputting judgment results.
Example 2
As shown in fig. 3, a fall detection method based on video joints and a hybrid classifier differs from embodiment 1 in that:
in step 2, not performing gaussian denoising processing but performing graying processing on each frame image of the detected video segment.
In step 6, the secondary classifier is different from that in embodiment 1, and the secondary classifier in this embodiment is a multi-scale convolutional neural network (denoted as multiconn) and includes a convolutional layer, a pooling layer, and three fully-connected layers, which are sequentially connected by an activation function Relu. In the convolution method of the convolutional neural network, padding parameter is set to 'valid'. Four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; the convolution kernels of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are correspondingly connected with the pooling operators of 8 × 1, 6 × 1, 4 × 1 and 2 × 1 respectively. The step size of the pooling layer is 1. The dropout specific gravity of the fully connected layer was set to 0.5. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. The third fully connected layer is provided with 2 neurons.
The two-stage classifier is characterized in that convolution kernels with different sizes are arranged on convolution layers of a traditional convolution neural network, and different convolution kernels are subjected to pooling by different pooling operators. Since the size of the behavior matrix constructed in the step 4 is 10 × 54, four different convolution kernels are set, namely, convolution kernels with different scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9; and respectively inputting the feature maps obtained by convolution kernels with different sizes into pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 for pooling, so as to obtain different falling features, paving the falling features, inputting the falling features into three full-connection layers, and outputting falling detection results.
In order to verify the precision and efficiency of the method, tests of the recognition precision and the recognition duration are respectively carried out on the SVM classifier which is used independently, the CNN classifier which is used independently, the MultiCNN classifier which is used independently, the method in the embodiment 1 and the method in the embodiment 2; the result of the recognition accuracy is shown in fig. 4 in comparison with a histogram; as can be seen from fig. 4, the CNN classifier alone, the MultiCNN classifier alone, and the embodiments 1 and 2 can achieve higher accuracy.
The results of identifying the duration are shown in FIG. 5 in comparison to a histogram; it can be seen from fig. 5 that the recognition duration of the embodiments 1 and 2 is significantly shorter than that of the CNN classifier alone or that of the MultiCNN classifier alone. Therefore, the method and the device can greatly reduce the falling identification time length, reduce the calculated amount and improve the calculation efficiency on the premise of ensuring the falling identification precision.

Claims (10)

1. A fall detection method based on video joint points and a hybrid classifier is characterized in that: step 1, extracting each frame image of a detected video clip;
step 2, collecting human body joint data of each frame of image in the detected video segment by using human body node collection software to obtain a human body joint data matrix;
step 3, constructing s behavior matrixes according to the human body joint data set obtained in the step 2;
Figure FDA0002115643560000011
Figure FDA0002115643560000012
the operator is a round-down operator; m is the number of images; w is the total frame number of the behavior matrix, and the value of W is 8-15;
ith behavior matrix Ki,matrixIs represented by formula (1), i ═ 1,2, …, s;
Ki,matrix=(K2i-1,K2i,…,K2i+W-1)T (1)
in the formula (1), KtThe expression of the skeletal vector of the t image is shown as the formula (2); t 1,2,. said, m;
Kt=(Kt,0,Kt,1,…,Kt,N-1) (2)
in the formula (2), KtjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. N is the number of human body joint points in one frame of image;
step 4, calculating time characteristic parameters and space characteristic parameters;
4-1, calculating m picturesHeight information h of human body in imagetAs shown in formula (3); t 1,2,. said, m;
ht=ht,head-ht,foot (3)
in the formula (3), ht,headAs the head ordinate h of the human body in the t-th imaget,footThe foot ordinate of the human body in the t image is shown;
4-2, respectively extracting human height information h in the image corresponding to each behavior matrixtIs taken as the maximum value and the minimum value ofi,maxSpatial minimum hi,min;i=1,2,…,s;
4-3, calculating the height difference Deltah of adjacent frames of each image from the second image as shown in formulas (4) and (5)t,one(ii) a From the sixth image, the height difference Deltah of five frames of each image is calculatedt,five
△ht,one=ht-ht-1 (4)
△ht,five=ht-ht-5 (5)
4-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrixt,oneMaximum value, minimum value, five-frame height difference Δ ht,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values hi,one,maxShort-term minimum hi,one,minLong time maximum hi,five,maxLong time minimum hi,five,min
4-5, constructing a falling feature vector Fi,SVM=(hi,max,hi,min,hi,one,max,hi,one,min,hi,five,max,hi,five,min);i=1,2,…,s;
Step 5, carrying out feature vector F on s fallsi,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients Pi(ii) a Setting the threshold interval to [ P ]min,Pmax];
Confidence level P if all fallsiAre all less than PminThen determine in the videoThe pedestrian does not fall, and the falling detection is finished; if at least one fall confidence level P existsiGreater than PmaxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, judging that the similar falling condition occurs, and entering the step 6;
step 6, setting the fall confidence coefficient PiIn the threshold interval [ Pmax,Pmin]Inner fall feature vector Fi,SVMAnd inputting the input into a trained secondary classifier, and outputting whether the pedestrian falls down in the video by the secondary classifier.
2. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 1, each frame image of the detected video segment is subjected to Gaussian denoising processing.
3. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 1, graying each frame of image of the detected video clip.
4. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in the step 2, the human body node acquisition software adopts OpenPose software; the total number of detected human body joint points is 18, and the detected human body joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17; the human joint point contains three data (x, y, score); x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point; each row of the human body joint data matrix is eighteen joint point parameters on one frame image.
5. A fall detection method based on video joint points and hybrid classifiers according to claim 4, characterized in that: in step 2, data filling is performed on missing human body joint points in each frame of image, and the specific method is as follows: if the next human body joint point of the missing joint point is not missing, filling the missing joint point with the next human body joint point; if the next human body joint point of the missing node is missing and the previous human body joint point is not missing, filling the missing node with the previous human body joint point; otherwise, deleting the image of the missing node.
6. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 2, the human body joint points detected by the human body node acquisition software comprise a right ankle, a left ankle, a right eye, a left eye, a right ear and a left ear;
in step 4-1, ht,headThe expression of (b) is shown in formula (6); h ist,footThe expression of (b) is shown in formula (7);
Figure FDA0002115643560000031
Figure FDA0002115643560000032
in formulae (6) and (7), yt,14、yt,15、yt,16、yt,17Respectively the vertical coordinates of the right eye, the left eye, the right ear and the left ear of the joint point in the t image; y ist,10、yt,13The vertical coordinates of the right ankle and the left ankle in the t-th image are respectively.
7. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: after the steps 4-5 are performed, each fall feature vector F is subjected to the following equation (8)i,SVMNormalizing the inner elements;
Figure FDA0002115643560000033
in formula (8), data'pThe p element after the fall feature vector normalization; datapThe p element before the fall feature vector normalization; dataminIs the minimum value before the fall feature vector normalization; datamaxIs the maximum value before the fall feature vector normalization.
8. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 5, a support vector machine is adopted by the primary classifier; pmax=0.8,Pmin=0.2。
9. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 6, the secondary classifier comprises a plurality of convolution layers, a pooling layer and three full-connection layers which are alternately connected; in a convolution mode of the convolutional neural network, padding parameters are set as SAME; the scale of the convolution kernel in the convolution layer is 3 multiplied by 3; the size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected with the next convolution layer through an activation function Relu; the three full connection layers are connected sequentially through an activation function Relu; the dropout specific gravity of the full connection layer is set to be 0.5; the last convolution layer is connected with the first full-connection layer through an activation function Relu; 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer; and 2 neurons are arranged in the third fully-connected layer.
10. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 6, the secondary classifier comprises a convolution layer, a pooling layer and three full-connection layers which are sequentially connected through an activation function Relu; in the convolution mode of the convolutional neural network, a padding parameter is set to 'valid'; four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolution kernels are correspondingly connected with 8 × 1, 6 × 1, 4 × 1 and 2 × 1 pooling operators respectively; the step length of the pooling layer is 1; the dropout specific gravity of the full connection layer is set to be 0.5; 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer; and 2 neurons are arranged in the third fully-connected layer.
CN201910589503.XA 2019-07-02 2019-07-02 Fall detection method based on video joint points and hybrid classifier Active CN110532850B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910589503.XA CN110532850B (en) 2019-07-02 2019-07-02 Fall detection method based on video joint points and hybrid classifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910589503.XA CN110532850B (en) 2019-07-02 2019-07-02 Fall detection method based on video joint points and hybrid classifier

Publications (2)

Publication Number Publication Date
CN110532850A CN110532850A (en) 2019-12-03
CN110532850B true CN110532850B (en) 2021-11-02

Family

ID=68659847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910589503.XA Active CN110532850B (en) 2019-07-02 2019-07-02 Fall detection method based on video joint points and hybrid classifier

Country Status (1)

Country Link
CN (1) CN110532850B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111243229A (en) * 2019-12-31 2020-06-05 浙江大学 Old people falling risk assessment method and system
CN111832412B (en) * 2020-06-09 2024-04-09 北方工业大学 Sounding training correction method and system
CN112215185B (en) * 2020-10-21 2022-08-05 成都信息工程大学 System and method for detecting falling behavior from monitoring video
CN112541424A (en) * 2020-12-07 2021-03-23 南京工程学院 Real-time detection method for pedestrian falling under complex environment
CN113204989B (en) * 2021-03-19 2022-07-29 南京邮电大学 Human body posture space-time feature extraction method for tumble analysis
CN113095295B (en) * 2021-05-08 2023-08-18 广东工业大学 Fall detection method based on improved key frame extraction
CN118015520A (en) * 2024-03-15 2024-05-10 上海摩象网络科技有限公司 Vision-based nursing detection system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564005A (en) * 2018-03-26 2018-09-21 电子科技大学 A kind of human body tumble discrimination method based on convolutional neural networks
CN109920208A (en) * 2019-01-31 2019-06-21 深圳绿米联创科技有限公司 Tumble prediction technique, device, electronic equipment and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8179268B2 (en) * 2008-03-10 2012-05-15 Ramot At Tel-Aviv University Ltd. System for automatic fall detection for elderly people

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564005A (en) * 2018-03-26 2018-09-21 电子科技大学 A kind of human body tumble discrimination method based on convolutional neural networks
CN109920208A (en) * 2019-01-31 2019-06-21 深圳绿米联创科技有限公司 Tumble prediction technique, device, electronic equipment and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Video-based Fall Detection for Seniors with Human Pose Estimation;Zhanyuan Huang等;《IEEE》;20190214;第1-4页 *
基于深度相机的老年跌倒监护系统;申代友等;《中国医学物理学杂志》;20190228;第36卷(第2期);第223-228页 *

Also Published As

Publication number Publication date
CN110532850A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN110532850B (en) Fall detection method based on video joint points and hybrid classifier
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN103942577B (en) Based on the personal identification method for establishing sample database and composite character certainly in video monitoring
CN111401144B (en) Escalator passenger behavior identification method based on video monitoring
CN109543526B (en) True and false facial paralysis recognition system based on depth difference characteristics
CN103955699B (en) A kind of real-time fall events detection method based on monitor video
CN105956582A (en) Face identifications system based on three-dimensional data
CN107506692A (en) A kind of dense population based on deep learning counts and personnel's distribution estimation method
CN107220603A (en) Vehicle checking method and device based on deep learning
CN106529442A (en) Pedestrian identification method and apparatus
CN105160400A (en) L21 norm based method for improving convolutional neural network generalization capability
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN109271918B (en) Method for distinguishing people with balance ability disorder based on gravity center shift model
CN108596087B (en) Driving fatigue degree detection regression model based on double-network result
CN107944399A (en) A kind of pedestrian's recognition methods again based on convolutional neural networks target's center model
CN111488850B (en) Neural network-based old people falling detection method
CN110263728A (en) Anomaly detection method based on improved pseudo- three-dimensional residual error neural network
CN110705468B (en) Eye movement range identification method and system based on image analysis
KR20190105180A (en) Apparatus for Lesion Diagnosis Based on Convolutional Neural Network and Method thereof
CN104298974A (en) Human body behavior recognition method based on depth video sequence
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN105160285A (en) Method and system for recognizing human body tumble automatically based on stereoscopic vision
CN114492634B (en) Fine granularity equipment picture classification and identification method and system
CN111797705A (en) Action recognition method based on character relation modeling
CN107967944A (en) A kind of outdoor environment big data measuring of human health method and platform based on Hadoop

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant