CN114627427A

CN114627427A - Fall detection method, system, storage medium and equipment based on spatio-temporal information

Info

Publication number: CN114627427A
Application number: CN202210536743.5A
Authority: CN
Inventors: 张友梅; 李江娇; 李彬; 高梦奇; 智昱旻; 周大正; 张明亮; 张瑜
Original assignee: Qilu University of Technology
Current assignee: Shandong Jiqing Technology Service Co ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-06-14
Anticipated expiration: 2042-05-18
Also published as: CN114627427B

Abstract

The invention relates to the technical field of image recognition, and provides a method, a system, a storage medium and equipment for fall detection based on spatio-temporal information, wherein the method comprises the following steps: acquiring a video including a target to be detected; detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points; using the human skeleton key points of all the images in the sliding window as a sample; extracting the spatial features of the sample by adopting a self-adaptive key point attention network; based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network; and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample. The fall detection method and the fall detection device enhance the identification capability of falling events and improve the accuracy of fall detection.

Description

Fall detection method, system, storage medium and equipment based on spatio-temporal information

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a method, a system, a storage medium and equipment for fall detection based on spatio-temporal information.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

Among the problems affecting the daily safety of the elderly, falling has become one of the main causes of injuries and deaths of the elderly. However, if the old people fall as early as possible, serious consequences can be reduced, and the corresponding fall detection and rescue services can ensure the safety of the old people as much as possible, so that the development of an intelligent detection and protection system related to the fall detection and rescue services has been the focus.

The existing fall detection methods can be mainly divided into two types: sensor device based methods and computer vision based methods. For the old, the sensor-based method has the defects of inconvenient wearing and easy forgetting; the computer vision-based method is mostly based on fixed equipment to collect data, uses a neural network to extract features such as human body contours, positions and speeds of specific parts and the like, and trains a classification network by combining the features.

At present, most of fall detection methods based on computer vision are based on posture estimation, use neural networks to extract features, and train classification networks by combining the features. However, the existing fall detection method based on computer vision has the disadvantages of high false detection rate, low accuracy rate, and poor real-time performance of part of algorithms, and is difficult to be applied to actual fall detection.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a method, a system, a storage medium and a device for fall detection based on spatiotemporal information, wherein spatiotemporal features are obtained through a self-adaptive key point attention network and a Long-Short-Term Memory (LSTM) network, so that the identification capability of a fall detection model on a fall incident is enhanced, and the accuracy of fall detection is improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

a first aspect of the invention provides a method of fall detection based on spatiotemporal information, comprising:

acquiring a video including a target to be detected;

detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;

using the human skeleton key points of all images in the sliding window as a sample;

extracting the spatial features of the sample by adopting a self-adaptive key point attention network;

based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;

and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.

Further, the samples are normalized by a batch normalization layer before being input into the adaptive keypoint attention network.

Further, the step of extracting the spatial features of the sample is as follows:

respectively carrying out global average pooling and global maximum pooling on the samples by adopting a global average pooling layer and a global maximum pooling layer to obtain a global average pooling result and a global maximum pooling result;

adding the global average pooling result and the global maximum pooling result, and passing through a full-connection layer to obtain the weights of all human skeleton key points;

and multiplying the weight by the sample to obtain the spatial characteristics of the sample.

Further, after the spatial characteristics of the sample are transformed and reshaped, the spatial characteristics are input into the improved long-time and short-time memory network.

Furthermore, the long-time and short-time memory network comprises a plurality of long-time and short-time memory units which are connected in sequence;

each long and short term memory unit comprises a lower layer memory unit, a middle layer memory unit and an upper layer memory unit;

in one long-and-short memory cell, the output of the lower memory cell is used as the input of the middle memory cell, and the output of the middle memory cell is used as the input of the upper memory cell.

Further, for two adjacent long and short term memory cells, the output of a layer of memory cell in the previous long and short term memory cell is used as the input of a layer of memory cell in the next long and short term memory cell.

Furthermore, the classification network comprises a plurality of fully-connected layers which are connected in sequence, and Dropout layers and activation functions which are connected in sequence exist in all fully-connected layers except the last fully-connected layer in the classification network.

A second aspect of the invention provides a fall detection system based on spatiotemporal information, comprising:

a video acquisition module configured to: acquiring a video including a target to be detected;

a keypoint extraction module configured to: detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;

a window sliding module configured to: using the human skeleton key points of all the images in the sliding window as a sample;

a spatial feature extraction module configured to: extracting the spatial features of the samples by adopting a self-adaptive key point attention network;

a spatiotemporal feature extraction module configured to: based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;

a classification module configured to: and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.

A third aspect of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps in the method for fall detection based on spatiotemporal information as described above.

A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps in the method for fall detection based on spatiotemporal information as described above.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a tumble detection method based on space-time information, which respectively extracts dynamic space attention characteristics of key points of a human body and dynamic time sequence characteristics of the key points of the human body through a self-adaptive key point attention network and a long-time and short-time memory network to obtain space-time characteristics, thereby enhancing the recognition capability of a tumble detection model on tumble events, improving the accuracy of tumble detection, having strong applicability and having certain real-time processing speed.

The invention provides a falling detection method based on space-time information, which is characterized in that a batch normalization layer is added before a sample is input into a self-adaptive key point attention network, so that the normalization processing of the sample is realized, and the convergence speed of a falling detection model is accelerated.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a flowchart of a fall detection method based on spatiotemporal information according to a first embodiment of the present invention;

FIG. 2 is a key point detail diagram of the first embodiment of the present invention;

FIG. 3 is a flow chart of spatial feature extraction of a sample according to a first embodiment of the present invention;

fig. 4(a) is a structure diagram of a long-term and short-term memory network according to a first embodiment of the present invention;

FIG. 4(b) is a schematic diagram of a single memory cell in an long-term memory network according to a first embodiment of the present invention;

FIG. 5 is a flow chart of spatiotemporal feature extraction and classification of samples according to a first embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

The embodiment provides a fall detection method based on spatiotemporal information, which specifically includes the following steps as shown in fig. 1:

step 1, obtaining a video containing a target to be detected.

Each frame of image in the video is an RGB image; the target to be detected is the elderly.

And 2, adopting a target detection model for each frame of image in the video, detecting to obtain a human body detection frame, and inputting the human body detection frame into the posture estimation model to extract human body skeleton key points.

The target detection model is YOLOv3 (You Only Look one (version 3)). The attitude Estimation model employs a Multi-stage attitude Estimation Network (MSPN).

Specifically, each frame of RGB image is input into YOLOv3, YOLOv3 detects a human body in the RGB image, YOLOv3 returns an image with a human body detection frame, and simultaneously deletes an image not including a human body, and adjusts the size of the human body detection frame to 256 × 192; and YOLOv3 is used as a detector, the adjusted human body detection frame is input into MSPN, and the two-dimensional key point position of the human body skeleton coordinate is calculated to obtain the human body skeleton key point.

The MSPN uses a backbone network in a plurality of cascade pyramid networks to repeatedly carry out down-sampling and up-sampling on the feature map to extract feature information; MSPN uses a cross-stage feature aggregation method, in each stage, the down-sampling and the up-sampling respectively input feature information into the next stage after a 1 × 1 convolution, thereby preventing the feature loss in the repeated sampling process; the MSPN utilizes the thought from coarse to fine, Gaussian convolution kernels with different sizes are adopted for different stages, the convolution kernel of the stage closer to the input is larger, and the convolution kernel of the stage farther away from the input is smaller, so that the accuracy of the attitude estimation model for estimating key points is improved, the estimation of the key points of the human body is further realized, and the key points of the skeleton of the human body in the image of the human body detection block are output.

After performing pose estimation on a human detection frame image corresponding to each frame of RGB image, a pose estimation model obtains 17 human skeleton key points in a txt format, each human skeleton key point is composed of X and Y coordinates, details of the detected human skeleton key points are given in FIG. 2, the 17 human skeleton key points are respectively marked as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 and 17, and the human skeleton key points respectively represent necks, right shoulders, right elbows, right wrists, left shoulders, left elbows, left wrists, right hips, right knees, right feet, left hips, left knees, left feet, right eyes, left ears and left ears.

And 3, taking the human skeleton key points of all the images in the sliding window as a sample.

The falling is a process, and if only an independent frame of image in the video is used, dynamic information in the falling process can be lost, so that a sliding window is introduced in the embodiment to process the key points of the skeleton of the human body, and the subsequent adaptive key point attention network is helped to extract the dynamic features of the key points. Namely, a plurality of continuous frames of images in the same video are adopted for fall detection, and a fall detection result of the target to be detected in the last frame of image is obtained.

The length of the sliding window is 30, and the sliding window is adopted for grouping, namely, the human skeleton key points of the current frame image and the human skeleton key points of the front continuous 29 frames of images are divided into a group, namely, a sample, and the label of the human skeleton key points of the last frame of image is used as the label of the sample (falling detection result). Therefore, the dimension of each sample is W × 17 × H, i.e., 30 × 17 × 2, where W denotes the length of the sliding window, 17 denotes the number of human skeletal key points of one frame image, and H denotes the dimension of one human skeletal key point (the dimension is 2 because one human skeletal key point is composed of X and Y coordinates).

And 4, inputting the sample into a falling detection model to obtain a falling detection result of the target to be detected. The fall detection model comprises a self-adaptive key point attention network, a long-time and short-time memory network and a classification network which are sequentially connected. Specifically, the method comprises the following steps:

step 401, extracting spatial features (dynamic spatial attention features) of the samples by using a self-adaptive key point attention network, that is, calculating the weight of each human skeleton key point in the samples to obtain samples with spatial attention weights.

In fall detection, some key points have a large influence on fall detection, such as: key points on the limbs, but some have less impact on fall detection, such as: key points such as ears and eyes on the head, attention mechanisms are introduced, different key points are given different weights, and key points with important fall detection are given larger weights.

The spatial feature extraction process of the sample, as shown in fig. 3, specifically includes:

step 40101, before the sample is input into the adaptive keypoint attention network, a batch normalization layer is adopted to normalize the sample, and a normalized sample is obtained:

（1）

wherein,

human skeletal key points representing batch data,

represents the mean of human skeletal key points of the batch data,

represents the variance of the human skeletal key points of the batch data,

is a variable added to prevent the denominator from appearing zero,

and

to learn the parameters, 1 and 0 are typically taken, respectively. In training, the batch data may be 64 samples, and thus, the size of the batch data is 64 × 30 × 17 × 2. The size of the batch data of the normalized sample composition was 64 × 17 × 30 × 2. Can use

Represents the firstjFirst of frame imageiThe X-coordinate of the individual's skeletal key points,

represents the firstjFirst of frame imageiY-coordinates of key points of the individual's body bones. The batch normalization layer is beneficial to accelerating the convergence speed of the fall detection model. The batch data is one sample at the time of testing.

Step 40102, inputting the normalized samples into an adaptive key point attention network.

(1) The adaptive key point attention network firstly adopts a global average pooling layer and a global maximum pooling layer to respectively perform global average pooling and global maximum pooling on samples (normalized samples) to obtain a global average pooling result

And global max pooling results

. If the batch data is 64 samples, the size of the global average pooling result and the size of the global maximum pooling result are both 64 × 17 × 2 × 1.

(2) Global average pooled results

And global max pooling results

After addition, the importance of each channel is predicted through a full-connection layer, and the weight of all human skeleton key points is obtained:

（2）

wherein,

representing a Sigmoid activation function that is,ReLUa function of the ReLU activation is represented,

and

it is shown that two fully-connected operations are performed,

，

，

the attenuation factor is set to be 1, so that the number of channels is ensured not to be attenuated all the time in the process of processing the sample by the full-connection layer, the final weight and the key points of the human skeleton are always in one-to-one correspondence,

and

the results of the global average pooling and the global maximum pooling are indicated, respectively.

(3) Obtaining the weight of each human skeleton key point through a full connection layer, multiplying the weight with the sample (normalized sample) to obtain the sample (spatial characteristic of the sample) with the spatial attention weight

：

（3）

After the sample passes through the adaptive key point attention network, the weights of the key points of the continuous 30 frames of images are obtained, namely the importance degree of the key points in the falling detection process, so that the extraction of the spatial features is completed. If the batch data is 64 samples, the spatial features of all samples have a size of 64 × 17 × 30 × 2.

And step 402, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network based on the space characteristics of the sample. And inputting the spatial characteristics of the sample into a long-time and short-time memory network after transformation and remodeling.

The LSTM network has great advantages in processing time series problems and has strong capability of contacting context. In the fall detection, a video or an image presents a sequential time relationship, and the fall is a process and has a large relation with previous and next frames, so an LSTM network is introduced, so that the current frame and the previous and next frames of the image have relevance instead of independent individuals.

Samples to be weighted with spatial attention

Through transformation and remodeling to obtain the sequence

Wherein the transformation is to characterize the spatial characteristics of the sample

Is exchanged, the remodeling is to

Are combined. If the batch data is 64 samples, the size of the result obtained after transformation is 64 × 30 × 17 × 2, and the size of the result obtained after remodeling is 64 × 30 × 34.

The sequences obtained after transformation and remodeling

Inputting the sample into an LSTM network, and extracting the space-time characteristics (key points with the space-time characteristics) of the sample.

As shown in fig. 5, the LSTM network includes a plurality of sequentially connected long-short-term memory (LSTM) units, each LSTM unit uses a three-layer, many-to-one LSTM network structure, wherein each LSTM unit, as shown in fig. 4(a), mainly includes three layers of sequentially connected storage units, that is, includes a lower layer storage unit, a middle layer storage unit, and an upper layer storage unit. In an LSTM cell, the output of the lower layer memory cell serves as the input to the middle layer memory cell, and the output of the middle layer memory cell serves as the input to the upper layer memory cell. For two adjacent LSTM units, the output of a layer of storage unit in the former LSTM unit is used as the input of a layer of storage unit in the latter LSTM unit. The output of the upper storage unit in the last LSTM unit is the spatio-temporal characteristics L of the samples extracted by the LSTM network. If the batch data is 64 samples, the output of the LSTM network is 128 neurons.

Each memory cell is composed of three gate structures and a cell state, as shown in fig. 4(b), fig. 4(b) is a schematic diagram of a memory cell enclosed by a dashed line in the long and short term memory network shown in fig. 4(a), an input gate controls which information is saved to a cell state, a forgetting gate determines the retention of information in a history cell state, and an output gate controls which cell states are exported, and the specific process of the memory cell can be represented by the following formula:

（4）

C _t+j=f _t+j*C _t+j-1+i _t+j*tanh(W _C·[h _t+j-1,N ^* _t+j]+b _C) （5）

h _t+j=o _t+j*tanh(C _t+j) （6）

wherein,W _fandW _care all a matrix of weights, and are,b _fandb _care all the error vectors, and are,f _t+j、i _t+jando _t+jrespectively represent the firstt+jThe outputs of the forgetting gate, the input gate and the output gate of the memory cell at the time, wherein,j=0,1,2,…,29，C _t+jis shown ast+jThe cell state of the memory cell at the time,N ^* _t+jis shown ast+jThe input vector of the storage unit at the time (if the storage unit is a lower storage unit, the input vector is the second in the sequence obtained after transformation and reshapingj+1 elementN _t+jIs also the firstj17 keypoints with spatial attention information for +1 frame image; if the storage unit is a middle-layer storage unit, the output vector of the lower-layer storage unit is obtained; if the memory cell is the upper layer memory cell, it is the output vector of the middle layer memory cell),h _t+jis shown ast+jThe output vector of the memory cell at a time instant,

representing a Sigmoid function.

And 403, obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.

The classification network is used for the binary classification of fall detection, i.e. fall or normal. As shown in fig. 5, the classification network includes a plurality of fully connected layers connected in sequence, all fully connected layers except the last fully connected layer in the classification network have a Dropout layer and an activation function (Sigmoid function) connected in sequence, in order to prevent overfitting of the fall detection model and improve the generalization capability of the fall detection model, a Dropout layer is present between the fully connected layers and the activation function for random inactivation, and the Dropout rate is 0.3. Specifically, as shown in fig. 5, the classification network includes 4 full-connection layers connected in sequence, that is, 3 full-connection layers except the last full-connection layer in the classification network.

Aiming at the problem of falling of the old, the dynamic spatial attention characteristics of the key points of the human body and the dynamic time sequence characteristics of the key points of the human body are respectively extracted through the adaptive key point attention network and the long-time and short-time memory network to obtain the space-time characteristics, so that the identification capability of a falling detection model on falling events is enhanced, the falling detection accuracy is improved, the applicability is strong, and a certain real-time processing speed is realized.

Before the samples are input into the adaptive key point attention network, a batch normalization layer is added, so that the normalization processing of the samples is realized, and the convergence speed of the fall detection model is accelerated.

Example two

The embodiment provides a fall detection system based on spatio-temporal information, which specifically comprises the following modules:

a spatial feature extraction module configured to: extracting the spatial features of the sample by adopting a self-adaptive key point attention network;

a spatiotemporal feature extraction module configured to: based on the spatial characteristics of the sample, extracting the spatiotemporal characteristics of the sample by adopting a long-short time memory network;

Before the samples are input into the adaptive key point attention network, a batch normalization layer is adopted to carry out normalization processing on the samples.

The method comprises the following steps of: respectively carrying out global average pooling and global maximum pooling on the samples by adopting a global average pooling layer and a global maximum pooling layer to obtain a global average pooling result and a global maximum pooling result; adding the global average pooling result and the global maximum pooling result, and passing through a full-connection layer to obtain the weights of all human skeleton key points; and multiplying the weight by the sample to obtain the spatial characteristics of the sample.

And inputting the spatial characteristics of the sample into the long-time and short-time memory network after transformation and remodeling.

The long-short time memory network comprises a plurality of long-short time memory units which are connected in sequence; each long-time memory unit comprises a lower-layer memory unit, a middle-layer memory unit and an upper-layer memory unit; in one long-and-short memory cell, the output of the lower memory cell is used as the input of the middle memory cell, and the output of the middle memory cell is used as the input of the upper memory cell. For two adjacent long and short term memory units, the output of a layer of memory unit in the previous long and short term memory unit is used as the input of the same layer of memory unit in the next long and short term memory unit.

The classification network comprises a plurality of fully-connected layers which are connected in sequence, all fully-connected layers except the last fully-connected layer in the classification network have activation functions, and a Dropout layer exists between the fully-connected layers and the activation functions.

It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.

EXAMPLE III

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps in the method for fall detection based on spatiotemporal information as described in the first embodiment above.

Example four

This embodiment provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for fall detection based on spatiotemporal information as described in the first embodiment above when executing the program.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for fall detection based on spatio-temporal information is characterized by comprising the following steps:

acquiring a video including a target to be detected;

using the human skeleton key points of all the images in the sliding window as a sample;

2. The spatiotemporal information-based fall detection method according to claim 1, wherein the samples are normalized using a batch normalization layer before being input into the adaptive keypoint attention network.

3. A method for fall detection based on spatiotemporal information as claimed in claim 1, characterized in that the step of extracting the spatial features of the samples is:

4. The method for fall detection based on spatiotemporal information as claimed in claim 1, wherein the spatial features of the sample are transformed and reshaped and then input into the long-short term memory network.

5. The method for fall detection based on spatiotemporal information as claimed in claim 1, wherein the long-short term memory network comprises a plurality of long-short term memory cells connected in sequence;

each long-time memory unit comprises a lower-layer memory unit, a middle-layer memory unit and an upper-layer memory unit;

6. The method as claimed in claim 5, wherein for two adjacent long-and-short memory cells, the output of a layer of memory cells in a previous long-and-short memory cell is used as the input of a layer of memory cells in a subsequent long-and-short memory cell.

7. A spatio-temporal information-based fall detection method according to claim 1, characterized in that the classification network comprises several fully connected layers connected in sequence, and in that there are Dropout layers and activation functions connected in sequence for all fully connected layers in the classification network except for the last fully connected layer.

8. Fall detection system based on temporal and spatial information, characterized by comprising:

9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps in the spatio-temporal information-based fall detection method as defined in any one of claims 1 to 7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor when executing the program realizes the steps in the spatiotemporal information based fall detection method as claimed in any of claims 1-7.