CN110532850B

CN110532850B - Fall detection method based on video joint points and hybrid classifier

Info

Publication number: CN110532850B
Application number: CN201910589503.XA
Authority: CN
Inventors: 蔡文郁; 郑雪晨; 郭嘉豪
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-07-02
Filing date: 2019-07-02
Publication date: 2021-11-02
Anticipated expiration: 2039-07-02
Also published as: CN110532850A

Abstract

The invention discloses a falling detection method based on video joint points and a mixed classifier. The traditional video-based fall detection algorithm relies on manual fall feature extraction, falls are detected by means of a linear discrimination classifier, and the model is simple but low in accuracy. The invention is as follows: 1. and extracting each frame image of the detected video clip. 2. Acquiring a human body joint data matrix; 3. a plurality of behavior matrices are established. 4. And calculating the time characteristic parameters and the space characteristic parameters. 5. Primary classification; 6. and (5) secondary classification. The method for extracting the human body bone joint points solves the problem that the human body posture cannot be accurately estimated by extracting the human body aspect ratio, the projection area and the like in the traditional method. The invention adopts a sliding window with fixed size to construct a behavior matrix, can model in time and space axes at the same time, and fully expresses the characteristics of the falling behavior.

Description

Fall detection method based on video joint points and hybrid classifier

Technical Field

The invention belongs to the technical field of fall detection, and particularly relates to a fall detection method based on video joint points and a mixed classifier.

Background

Scholars at home and abroad have made many researches on the falling of old people, and the main falling detection methods mainly include three types: wearable sensor based fall detection, environmental sensor based fall detection, and video image based detection. Fall detection based on wearable sensors sets a threshold value to detect fall mainly by collecting data collected by wearable sensors, and the final detection result is affected if the threshold value is set inaccurately. The fall detection based on the environmental sensor mainly judges the fall through data obtained by a pressure sensor or a sound detection device on the ground, and if the environmental noise is too large, the data can be abnormal. Video-based fall detection mainly uses a general camera or a depth camera installed in daily life to acquire video data and then determines a fall by image recognition. Compared with a detection algorithm based on a wearable sensor and an environment sensor, the falling detection based on the video does not need old people to wear equipment, is not easily influenced by the environment, and is wider in practical application. However, the traditional video-based fall detection algorithm relies on manual fall feature extraction, falls are detected by means of a linear discriminant classifier method, the model is simple, the accuracy is low, the existing deep learning-based fall detection algorithm model is complex, and the detection time consumption is difficult to reduce.

Disclosure of Invention

The invention aims to provide a fall detection method based on video joint points and a mixed classifier.

The method comprises the following specific steps:

step 1, extracting each frame image of the detected video clip.

Step 2, collecting human body joint point data of each frame of image in the detected video clip by using human body node collection software to obtain a human body joint data matrix;

step 3, constructing s behavior matrixes according to the human body joint data set obtained in the step 2;

the operator is a round-down operator; m is the number of images. W is the total frame number of the behavior matrix, and the value of W is 8-15.

Ith behavior matrix K_i,matrixIs represented by formula (1), i is 1,2, …, s.

K_i,matrix＝(K_2i-1,K_2i,…,K_2i+W-1)^T (1)

In the formula (1), K_tThe expression of the skeletal vector of the t image is shown as the formula (2); t 1, 2.

K_t＝(K_t,0,K_t,1,…,K_t,N-1) (2)

In the formula (2), K_tjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. N is a person in one frame imageNumber of body joint points.

And 4, calculating the time characteristic parameters and the space characteristic parameters.

4-1, calculating the height information h of the human body in the m images_tAs shown in formula (3); t 1,2,. said, m;

h_t＝h_t,head-h_t,foot (3)

in the formula (3), h_t,headAs the head ordinate h of the human body in the t-th image_t,footIs the foot ordinate of the human body in the t-th image.

4-2, respectively extracting human height information h in the image corresponding to each behavior matrix_tIs taken as a spatial maximum value h_i,maxSpatial minimum h_i,min；i＝1,2,…,s。

4-3, calculating the height difference delta h of the adjacent frames of each image from the second image respectively as shown in formulas (4) and (5)_t,one(ii) a From the sixth image, the height difference Δ h of five frames of each image is calculated_t,five；

Δh_t,one＝h_t-h_t-1 (4)

Δh_t,five＝h_t-h_t-5 (5)

4-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrix_t,oneMaximum value, minimum value, five-frame height difference Δ h_t,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values h_i,one,maxShort-term minimum h_i,one,minMaximum value h for a long time_i,five,maxLong time minimum h_i,five,min。

4-5, constructing a falling feature vector F_i,SVM＝(h_i,max,h_i,min,h_i,one,max,h_i,one,min,h_i,five,max,h_i,five,min)；i＝1,2,…,s。

Step 5, carrying out feature vector F on s falls_i,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients P_i. Setting the threshold interval to [ P ]_min,P_max]。

Confidence level P if all falls_iAre all less than P_minIf yes, judging that the pedestrian in the video does not fall, and ending the falling detection; if there is at least one fall confidence level P_iGreater than P_maxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, judging that the class falling condition occurs, and entering the step 6.

Step 6, setting the fall confidence coefficient P_iIn the threshold interval [ P_max,P_min]Inner fall feature vector F_i,SVMAnd inputting the input into a trained secondary classifier, and outputting whether the pedestrian falls down in the video by the secondary classifier.

Preferably, in step 1, a gaussian denoising process is performed on each frame image of the detected video segment.

Preferably, in step 1, the image of each frame of the detected video segment is subjected to a graying process.

Preferably, in step 2, the human body node acquisition software is openpos software. The total number of detected human body joint points is 18, and the detected human body joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17. The human joint contains three data (x, y, score). x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point. Each row of the human body joint data matrix is eighteen joint point parameters on one frame image.

Preferably, in step 2, the data filling is performed on the human body joint points missing in each frame of image, and the specific method is as follows: if the next human body joint point of the missing joint point is not missing, filling the missing joint point with the next human body joint point; if the next human body joint point of the missing node is missing and the previous human body joint point is not missing, filling the missing node with the previous human body joint point; otherwise, deleting the image of the missing node.

Preferably, in step 2, the human body joint points detected by the human body node acquisition software include a right ankle, a left ankle, a right eye, a left eye, a right ear, and a left ear.

In step 4-1, h_t,headThe expression of (b) is shown in formula (6); h is_t,footThe expression of (b) is shown in formula (7);

in formulae (6) and (7), y_t，14、y_t，15、y_t，16、y_t，17Respectively the vertical coordinates of the right eye, the left eye, the right ear and the left ear of the joint point in the t image; y is_t，10、y_t，13The vertical coordinates of the right ankle and the left ankle in the t-th image are respectively.

Preferably, after steps 4-5 have been performed, each fall feature vector F is assigned according to equation (8)_i,SVMThe elements in the table are normalized.

In formula (8), data'_pThe p element after the fall feature vector normalization; data_pThe p element before the fall feature vector is normalized; data_minIs the minimum value before the fall feature vector normalization; data_maxIs the maximum value of the fall feature vector before normalization.

Preferably, in step 5, the primary classifier uses a support vector machine. P_max＝0.8，P_min＝0.2。

Preferably, in step 6, the secondary classifier includes a plurality of convolutional layers, pooling layers, and three fully-connected layers, which are alternately connected. In the convolution mode of the convolutional neural network, the padding parameter is set to SAME. The scale of the convolution kernel in the convolutional layer is 3 × 3. The size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected to the next convolutional layer by an activation function Relu. The three fully-connected layers are connected in turn by the activation function Relu. The dropout specific gravity of the fully connected layer was set to 0.5. The last convolutional layer is connected to the first fully-connected layer by the activation function Relu. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. And 2 neurons are arranged in the third fully-connected layer.

Preferably, in step 6, the secondary classifier comprises one convolution layer, one pooling layer and three fully-connected layers connected in sequence by the activation function Relu. In the convolution method of the convolutional neural network, padding parameter is set to 'valid'. Four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolution kernels are correspondingly connected with 8 × 1, 6 × 1, 4 × 1 and 2 × 1 pooling operators respectively. The step size of the pooling layer is 1. The dropout specific gravity of the fully connected layer was set to 0.5. 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer. And 2 neurons are arranged in the third full-junction layer.

The invention has the beneficial effects that:

1. the method for extracting the human body bone joint points solves the problem that the human body posture cannot be accurately estimated by extracting the human body aspect ratio, the projection area and the like in the traditional method.

2. According to the method, the falling is judged only according to a single-frame video image, time axis information is lost, a sliding window with a fixed size is adopted to construct a behavior matrix, modeling can be carried out on time and space axes at the same time, and falling behavior characteristics are fully expressed.

3. The behavior matrix is firstly input into the primary classifier and then input into the secondary classifier, so that the problem of low detection precision of the traditional machine learning method is solved, the existing deep learning method is high in precision but complex in model and long in detection time, and therefore detection time is reduced while accuracy is not reduced.

Drawings

FIG. 1 is a flow chart of example 1 of the present invention;

FIG. 2 is a schematic diagram of behavior matrix generation in example 1 and example 2 of the present invention;

FIG. 3 is a flowchart of embodiment 2 of the present invention;

FIG. 4 is a histogram of the recognition accuracy of the present invention;

FIG. 5 is a histogram of recognition durations of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, a fall detection method based on video joints and a hybrid classifier specifically includes the following steps:

step 1, collecting videos through a camera. When a person passes by, a detected video segment containing the person is intercepted in the video.

And 2, performing Gaussian denoising treatment on each frame image of the detected video segment.

Step 3, collecting human body joint point data of each frame of image in the detected video segment by using human body node collection software, and filling data in the missing human body joint points to obtain a human body joint data matrix; each row of the human body joint data matrix is eighteen joint point parameters on one frame of image; the row numbers of the human body joint data matrix are arranged according to the time sequence.

The human body node acquisition software adopts OpenPose software. The OpenPose software can select three models, namely an MPI model, a COCO model and a BODY25 model, the COCO model is adopted in the embodiment, the total number of detected human BODY joint points is 18, and the detected human BODY joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17. The human joint contains three data (x, y, score). x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point.

The method for filling data in the missing human body joint points comprises the following steps:

due to the existence of light and shading, the human joint points acquired by the OpenPose software may have individual joint point loss. If the next human body joint point of the missing joint point (according to the sequence in the step 3, the next joint point of the left ear-17 is the nose-0) is not missing, the missing joint point is filled with the next human body joint point; if the next human body joint point of the missing joint point is missing and the previous human body joint point is not missing, filling the missing joint point with the previous human body joint point; otherwise, deleting the frame image.

Step 4, as shown in fig. 1, according to the human body joint data set obtained in the step 3, s behavior matrixes constructed by sliding window pairs are utilized;

the operator is a round-down operator; and m is the number of images for detecting the human body joint points.

As shown in fig. 2. The sliding window is a fixed size window for storing time series data, and as time goes on, the sliding window moves along the time axis, newly entered data is stored at the bottom of the window, and data at the top is removed. The invention adopts a sliding window to construct a behavior matrix, the total frame number W of the behavior matrix is set to be 10, the total joint point number N is 18, the size of the behavior matrix is [10,18 multiplied by 3], namely the length of the sliding window is 54, and the width of the sliding window is 10. Setting the step size of sliding to 2, and finally constructing a plurality of behavior matrixes with the size of 10 x 54.

Ith behavior matrix K_i,matrixIs represented by formula (1), i ═ 1,2, …, s; each row of the behavior matrix corresponds to a skeleton vector of one frame of image;

K_i,matrix＝(K_2i-1,K_2i,…,K_2i+W-1)^T (1)

K_t＝(K_t,0,K_t,1,…,K_t,N-1) (2)

In the formula (2), K_tjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. The parameters contain three data (x, y, score). The size of each behavior matrix is thus [ W, Nx 3]]，

And 5, calculating the time characteristic parameters and the space characteristic parameters.

In order to fully express the falling behavior, the invention extracts the time characteristic and the space characteristic from each behavior matrix at the same time, and the method comprises the following steps:

5-1, calculating the height information h of the human body in the m images_tAs shown in formula (3); t 1, 2.

h_t＝h_t,head-h_t,foot (3)

In the formula (3), h_t,headThe expression is shown in formula (4) as the head ordinate of the human body in the t-th image. h is_t,footThe expression is shown in formula (4) as the ordinate of the foot of the human body in the t-th image.

In formulae (4) and (5), y_t，14、y_t，15、y_t，16、y_t，17Respectively are the vertical coordinates of the right eye-14, the left eye-15, the right ear-16 and the left ear-17 of the joint points in the t image; y is_t，10、y_t，13The ordinate of the right ankle-10 and the left ankle-13 in the t-th image are respectively.

5-2, respectively extracting the maximum value and the minimum value of the human height information in the image corresponding to each behavior matrix as the maximum value h of the space_i,maxSpatial minimum h_i,min；i＝1,2,…,s。

5-3, as shown in formulas (6) and (7)From the second image, the height difference Δ h between adjacent frames of each image is calculated_t,one(ii) a From the sixth image, the height difference Δ h of five frames of each image is calculated_fiveAs shown in fig. 4 and 5.Δ h_t,oneCan represent the change value of the height between the adjacent frames, namely the speed of the body descending of the human body in the falling process when falling; Δ h_t,fiveIndicating the change in height between five frames before and after a fall.

Δh_t,one＝h_t-h_t-1 (6)

Δh_t,five＝h_t-h_t-5 (7)

5-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrix_t,oneMaximum value, minimum value, five-frame height difference Δ h_t,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values h_i,one,maxShort-term minimum h_i,one,minMaximum value h for a long time_i,five,maxLong time minimum h_i,five,min。

5-5, mixing_i,max、h_i,min、h_i,one,max、h_i,one,min、h_i,five,max、h_i,five,minFeature fusion into fall feature vector F_i,SVM， F_i,SVM＝(h_i,max,h_i,min,h_i,one,max,h_i,one,min,h_i,five,max,h_i,five,min)；i＝1,2,…,s。

5-6. according to the formula (9), each falling feature vector F_i,SVMNormalizing the inner elements; and further eliminating the identification error caused by individual differences such as posture, distance and the like in the video.

In formula (9), data'_pThe p element after the fall feature vector normalization; data_pThe p element before the fall feature vector is normalized; data_minIs the minimum value before the fall feature vector normalization; data_maxIs the maximum value of the fall feature vector before normalization.

Step 6: the normalized s fall feature vectors F_i,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients P_i. The primary classifier employs a Support Vector Machine (SVM). Setting the threshold interval to [ P ]_min,P_max]，P_max＝0.8，P_min＝0.2。

Confidence level P if all falls_iAre all less than P_minIf yes, judging that the pedestrian in the video does not fall, and ending the falling detection; if there is at least one fall confidence level P_iGreater than P_maxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, the class falling situation is judged to occur, and the step 7 is entered.

Step 7, setting the fall confidence level P_iIn the threshold interval [ P_max,P_min]Inner fall feature vector F_i,SVMAnd inputting the input into a trained secondary classifier, and outputting whether the pedestrian falls down in the video by the secondary classifier.

The second-stage classifier adopts a CNN convolutional neural network and comprises a plurality of convolutional layers, a pooling layer and three full-connection layers which are alternately connected. In the convolution mode of the convolutional neural network, the padding parameter is set to SAME. The scale of the convolution kernel in the convolutional layer is 3 × 3. The size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected to the next convolutional layer by an activation function Relu (rectified Linear units). The three fully-connected layers are connected in turn by the activation function Relu. The dropout specific gravity of the fully connected layer was set to 0.5. The last convolutional layer is connected to the first fully-connected layer by the activation function Relu. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. And 2 neurons are arranged in the third full-connection layer and used for outputting judgment results.

Example 2

As shown in fig. 3, a fall detection method based on video joints and a hybrid classifier differs from embodiment 1 in that:

in step 2, not performing gaussian denoising processing but performing graying processing on each frame image of the detected video segment.

In step 6, the secondary classifier is different from that in embodiment 1, and the secondary classifier in this embodiment is a multi-scale convolutional neural network (denoted as multiconn) and includes a convolutional layer, a pooling layer, and three fully-connected layers, which are sequentially connected by an activation function Relu. In the convolution method of the convolutional neural network, padding parameter is set to 'valid'. Four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; the convolution kernels of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are correspondingly connected with the pooling operators of 8 × 1, 6 × 1, 4 × 1 and 2 × 1 respectively. The step size of the pooling layer is 1. The dropout specific gravity of the fully connected layer was set to 0.5. 1024 neurons are arranged in the first fully connected layer, and 512 neurons are arranged in the second fully connected layer. The third fully connected layer is provided with 2 neurons.

The two-stage classifier is characterized in that convolution kernels with different sizes are arranged on convolution layers of a traditional convolution neural network, and different convolution kernels are subjected to pooling by different pooling operators. Since the size of the behavior matrix constructed in the step 4 is 10 × 54, four different convolution kernels are set, namely, convolution kernels with different scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9; and respectively inputting the feature maps obtained by convolution kernels with different sizes into pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 for pooling, so as to obtain different falling features, paving the falling features, inputting the falling features into three full-connection layers, and outputting falling detection results.

In order to verify the precision and efficiency of the method, tests of the recognition precision and the recognition duration are respectively carried out on the SVM classifier which is used independently, the CNN classifier which is used independently, the MultiCNN classifier which is used independently, the method in the embodiment 1 and the method in the embodiment 2; the result of the recognition accuracy is shown in fig. 4 in comparison with a histogram; as can be seen from fig. 4, the CNN classifier alone, the MultiCNN classifier alone, and the embodiments 1 and 2 can achieve higher accuracy.

The results of identifying the duration are shown in FIG. 5 in comparison to a histogram; it can be seen from fig. 5 that the recognition duration of the embodiments 1 and 2 is significantly shorter than that of the CNN classifier alone or that of the MultiCNN classifier alone. Therefore, the method and the device can greatly reduce the falling identification time length, reduce the calculated amount and improve the calculation efficiency on the premise of ensuring the falling identification precision.

Claims

1. A fall detection method based on video joint points and a hybrid classifier is characterized in that: step 1, extracting each frame image of a detected video clip;

step 2, collecting human body joint data of each frame of image in the detected video segment by using human body node collection software to obtain a human body joint data matrix;

the operator is a round-down operator; m is the number of images; w is the total frame number of the behavior matrix, and the value of W is 8-15;

ith behavior matrix K_i,matrixIs represented by formula (1), i ═ 1,2, …, s;

K_i,matrix＝(K_2i-1,K_2i,…,K_2i+W-1)^T (1)

in the formula (1), K_tThe expression of the skeletal vector of the t image is shown as the formula (2); t 1,2,. said, m;

K_t＝(K_t,0,K_t,1,…,K_t,N-1) (2)

in the formula (2), K_tjParameters of a jth row and a jth +1 column in the human body joint data matrix are represented, wherein j is 0, 1. N is the number of human body joint points in one frame of image;

step 4, calculating time characteristic parameters and space characteristic parameters;

4-1, calculating m picturesHeight information h of human body in image_tAs shown in formula (3); t 1,2,. said, m;

h_t＝h_t,head-h_t,foot (3)

in the formula (3), h_t,headAs the head ordinate h of the human body in the t-th image_t,footThe foot ordinate of the human body in the t image is shown;

4-2, respectively extracting human height information h in the image corresponding to each behavior matrix_tIs taken as the maximum value and the minimum value of_i,maxSpatial minimum h_i,min；i＝1,2,…,s；

4-3, calculating the height difference Deltah of adjacent frames of each image from the second image as shown in formulas (4) and (5)_t,one(ii) a From the sixth image, the height difference Deltah of five frames of each image is calculated_t,five；

△h_t,one＝h_t-h_t-1 (4)

△h_t,five＝h_t-h_t-5 (5)

4-4, respectively extracting the height difference delta h of adjacent frames in the image corresponding to each behavior matrix_t,oneMaximum value, minimum value, five-frame height difference Δ h_t,fiveIs defined as the maximum value and the minimum value of (1), which are short-time maximum values h_i,one,maxShort-term minimum h_i,one,minLong time maximum h_i,five,maxLong time minimum h_i,five,min；

4-5, constructing a falling feature vector F_i,SVM＝(h_i,max,h_i,min,h_i,one,max,h_i,one,min,h_i,five,max,h_i,five,min)；i＝1,2,…,s；

Step 5, carrying out feature vector F on s falls_i,SVMRespectively input into the trained primary classifier to obtain s fall confidence coefficients P_i(ii) a Setting the threshold interval to [ P ]_min,P_max]；

Confidence level P if all falls_iAre all less than P_minThen determine in the videoThe pedestrian does not fall, and the falling detection is finished; if at least one fall confidence level P exists_iGreater than P_maxIf yes, determining that the pedestrian in the video falls, and ending the falling detection; otherwise, judging that the similar falling condition occurs, and entering the step 6;

2. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 1, each frame image of the detected video segment is subjected to Gaussian denoising processing.

3. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 1, graying each frame of image of the detected video clip.

4. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in the step 2, the human body node acquisition software adopts OpenPose software; the total number of detected human body joint points is 18, and the detected human body joint points are respectively nose-0, neck-1, right shoulder-2, right elbow-3, right wrist-4, left shoulder-5, left elbow-6, left wrist-7, right hip-8, right knee-9, right ankle-10, left hip-11, left knee-12, left ankle-13, right eye-14, left eye-15, right ear-16 and left ear-17; the human joint point contains three data (x, y, score); x is the abscissa of the human body joint point on the image, and y is the ordinate of the human body joint point on the image; score is the confidence of the human joint point; each row of the human body joint data matrix is eighteen joint point parameters on one frame image.

5. A fall detection method based on video joint points and hybrid classifiers according to claim 4, characterized in that: in step 2, data filling is performed on missing human body joint points in each frame of image, and the specific method is as follows: if the next human body joint point of the missing joint point is not missing, filling the missing joint point with the next human body joint point; if the next human body joint point of the missing node is missing and the previous human body joint point is not missing, filling the missing node with the previous human body joint point; otherwise, deleting the image of the missing node.

6. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 2, the human body joint points detected by the human body node acquisition software comprise a right ankle, a left ankle, a right eye, a left eye, a right ear and a left ear;

7. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: after the steps 4-5 are performed, each fall feature vector F is subjected to the following equation (8)_i,SVMNormalizing the inner elements;

in formula (8), data'_pThe p element after the fall feature vector normalization; data_pThe p element before the fall feature vector normalization; data_minIs the minimum value before the fall feature vector normalization; data_maxIs the maximum value before the fall feature vector normalization.

8. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 5, a support vector machine is adopted by the primary classifier; p_max＝0.8，P_min＝0.2。

9. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 6, the secondary classifier comprises a plurality of convolution layers, a pooling layer and three full-connection layers which are alternately connected; in a convolution mode of the convolutional neural network, padding parameters are set as SAME; the scale of the convolution kernel in the convolution layer is 3 multiplied by 3; the size of the pooling operator in the pooling layer is 2 multiplied by 2, and the step length is 1; the pooling layer is connected with the next convolution layer through an activation function Relu; the three full connection layers are connected sequentially through an activation function Relu; the dropout specific gravity of the full connection layer is set to be 0.5; the last convolution layer is connected with the first full-connection layer through an activation function Relu; 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer; and 2 neurons are arranged in the third fully-connected layer.

10. A fall detection method based on video joint points and hybrid classifiers according to claim 1, characterized in that: in step 6, the secondary classifier comprises a convolution layer, a pooling layer and three full-connection layers which are sequentially connected through an activation function Relu; in the convolution mode of the convolutional neural network, a padding parameter is set to 'valid'; four convolution kernels with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 are arranged in the convolution layer; four kinds of pooling operators with the sizes of 8 multiplied by 1, 6 multiplied by 1, 4 multiplied by 1 and 2 multiplied by 1 are arranged in the pooling layer; 3 × 3, 5 × 5, 7 × 7 and 9 × 9 convolution kernels are correspondingly connected with 8 × 1, 6 × 1, 4 × 1 and 2 × 1 pooling operators respectively; the step length of the pooling layer is 1; the dropout specific gravity of the full connection layer is set to be 0.5; 1024 neurons are arranged in the first full-junction layer, and 512 neurons are arranged in the second full-junction layer; and 2 neurons are arranged in the third fully-connected layer.