CN114627427A - Fall detection method, system, storage medium and equipment based on spatio-temporal information - Google Patents
Fall detection method, system, storage medium and equipment based on spatio-temporal information Download PDFInfo
- Publication number
- CN114627427A CN114627427A CN202210536743.5A CN202210536743A CN114627427A CN 114627427 A CN114627427 A CN 114627427A CN 202210536743 A CN202210536743 A CN 202210536743A CN 114627427 A CN114627427 A CN 114627427A
- Authority
- CN
- China
- Prior art keywords
- sample
- fall detection
- network
- layer
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 82
- 238000003860 storage Methods 0.000 title claims abstract description 26
- 230000015654 memory Effects 0.000 claims abstract description 80
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000011176 pooling Methods 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 13
- 230000003044 adaptive effect Effects 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 8
- 230000002123 temporal effect Effects 0.000 claims 1
- 210000004027 cell Anatomy 0.000 description 35
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 11
- 230000007787 long-term memory Effects 0.000 description 10
- 230000006403 short-term memory Effects 0.000 description 9
- 230000009466 transformation Effects 0.000 description 7
- 238000007634 remodeling Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 238000005070 sampling Methods 0.000 description 5
- 210000005069 ears Anatomy 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 210000003127 knee Anatomy 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 210000000707 wrist Anatomy 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 231100000517 death Toxicity 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 210000003414 extremity Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000003739 neck Anatomy 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of image recognition, and provides a method, a system, a storage medium and equipment for fall detection based on spatio-temporal information, wherein the method comprises the following steps: acquiring a video including a target to be detected; detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points; using the human skeleton key points of all the images in the sliding window as a sample; extracting the spatial features of the sample by adopting a self-adaptive key point attention network; based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network; and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample. The fall detection method and the fall detection device enhance the identification capability of falling events and improve the accuracy of fall detection.
Description
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a method, a system, a storage medium and equipment for fall detection based on spatio-temporal information.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Among the problems affecting the daily safety of the elderly, falling has become one of the main causes of injuries and deaths of the elderly. However, if the old people fall as early as possible, serious consequences can be reduced, and the corresponding fall detection and rescue services can ensure the safety of the old people as much as possible, so that the development of an intelligent detection and protection system related to the fall detection and rescue services has been the focus.
The existing fall detection methods can be mainly divided into two types: sensor device based methods and computer vision based methods. For the old, the sensor-based method has the defects of inconvenient wearing and easy forgetting; the computer vision-based method is mostly based on fixed equipment to collect data, uses a neural network to extract features such as human body contours, positions and speeds of specific parts and the like, and trains a classification network by combining the features.
At present, most of fall detection methods based on computer vision are based on posture estimation, use neural networks to extract features, and train classification networks by combining the features. However, the existing fall detection method based on computer vision has the disadvantages of high false detection rate, low accuracy rate, and poor real-time performance of part of algorithms, and is difficult to be applied to actual fall detection.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a method, a system, a storage medium and a device for fall detection based on spatiotemporal information, wherein spatiotemporal features are obtained through a self-adaptive key point attention network and a Long-Short-Term Memory (LSTM) network, so that the identification capability of a fall detection model on a fall incident is enhanced, and the accuracy of fall detection is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a first aspect of the invention provides a method of fall detection based on spatiotemporal information, comprising:
acquiring a video including a target to be detected;
detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
using the human skeleton key points of all images in the sliding window as a sample;
extracting the spatial features of the sample by adopting a self-adaptive key point attention network;
based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;
and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
Further, the samples are normalized by a batch normalization layer before being input into the adaptive keypoint attention network.
Further, the step of extracting the spatial features of the sample is as follows:
respectively carrying out global average pooling and global maximum pooling on the samples by adopting a global average pooling layer and a global maximum pooling layer to obtain a global average pooling result and a global maximum pooling result;
adding the global average pooling result and the global maximum pooling result, and passing through a full-connection layer to obtain the weights of all human skeleton key points;
and multiplying the weight by the sample to obtain the spatial characteristics of the sample.
Further, after the spatial characteristics of the sample are transformed and reshaped, the spatial characteristics are input into the improved long-time and short-time memory network.
Furthermore, the long-time and short-time memory network comprises a plurality of long-time and short-time memory units which are connected in sequence;
each long and short term memory unit comprises a lower layer memory unit, a middle layer memory unit and an upper layer memory unit;
in one long-and-short memory cell, the output of the lower memory cell is used as the input of the middle memory cell, and the output of the middle memory cell is used as the input of the upper memory cell.
Further, for two adjacent long and short term memory cells, the output of a layer of memory cell in the previous long and short term memory cell is used as the input of a layer of memory cell in the next long and short term memory cell.
Furthermore, the classification network comprises a plurality of fully-connected layers which are connected in sequence, and Dropout layers and activation functions which are connected in sequence exist in all fully-connected layers except the last fully-connected layer in the classification network.
A second aspect of the invention provides a fall detection system based on spatiotemporal information, comprising:
a video acquisition module configured to: acquiring a video including a target to be detected;
a keypoint extraction module configured to: detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
a window sliding module configured to: using the human skeleton key points of all the images in the sliding window as a sample;
a spatial feature extraction module configured to: extracting the spatial features of the samples by adopting a self-adaptive key point attention network;
a spatiotemporal feature extraction module configured to: based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;
a classification module configured to: and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
A third aspect of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps in the method for fall detection based on spatiotemporal information as described above.
A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps in the method for fall detection based on spatiotemporal information as described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a tumble detection method based on space-time information, which respectively extracts dynamic space attention characteristics of key points of a human body and dynamic time sequence characteristics of the key points of the human body through a self-adaptive key point attention network and a long-time and short-time memory network to obtain space-time characteristics, thereby enhancing the recognition capability of a tumble detection model on tumble events, improving the accuracy of tumble detection, having strong applicability and having certain real-time processing speed.
The invention provides a falling detection method based on space-time information, which is characterized in that a batch normalization layer is added before a sample is input into a self-adaptive key point attention network, so that the normalization processing of the sample is realized, and the convergence speed of a falling detection model is accelerated.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a flowchart of a fall detection method based on spatiotemporal information according to a first embodiment of the present invention;
FIG. 2 is a key point detail diagram of the first embodiment of the present invention;
FIG. 3 is a flow chart of spatial feature extraction of a sample according to a first embodiment of the present invention;
fig. 4(a) is a structure diagram of a long-term and short-term memory network according to a first embodiment of the present invention;
FIG. 4(b) is a schematic diagram of a single memory cell in an long-term memory network according to a first embodiment of the present invention;
FIG. 5 is a flow chart of spatiotemporal feature extraction and classification of samples according to a first embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
The embodiment provides a fall detection method based on spatiotemporal information, which specifically includes the following steps as shown in fig. 1:
Each frame of image in the video is an RGB image; the target to be detected is the elderly.
And 2, adopting a target detection model for each frame of image in the video, detecting to obtain a human body detection frame, and inputting the human body detection frame into the posture estimation model to extract human body skeleton key points.
The target detection model is YOLOv3 (You Only Look one (version 3)). The attitude Estimation model employs a Multi-stage attitude Estimation Network (MSPN).
Specifically, each frame of RGB image is input into YOLOv3, YOLOv3 detects a human body in the RGB image, YOLOv3 returns an image with a human body detection frame, and simultaneously deletes an image not including a human body, and adjusts the size of the human body detection frame to 256 × 192; and YOLOv3 is used as a detector, the adjusted human body detection frame is input into MSPN, and the two-dimensional key point position of the human body skeleton coordinate is calculated to obtain the human body skeleton key point.
The MSPN uses a backbone network in a plurality of cascade pyramid networks to repeatedly carry out down-sampling and up-sampling on the feature map to extract feature information; MSPN uses a cross-stage feature aggregation method, in each stage, the down-sampling and the up-sampling respectively input feature information into the next stage after a 1 × 1 convolution, thereby preventing the feature loss in the repeated sampling process; the MSPN utilizes the thought from coarse to fine, Gaussian convolution kernels with different sizes are adopted for different stages, the convolution kernel of the stage closer to the input is larger, and the convolution kernel of the stage farther away from the input is smaller, so that the accuracy of the attitude estimation model for estimating key points is improved, the estimation of the key points of the human body is further realized, and the key points of the skeleton of the human body in the image of the human body detection block are output.
After performing pose estimation on a human detection frame image corresponding to each frame of RGB image, a pose estimation model obtains 17 human skeleton key points in a txt format, each human skeleton key point is composed of X and Y coordinates, details of the detected human skeleton key points are given in FIG. 2, the 17 human skeleton key points are respectively marked as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 and 17, and the human skeleton key points respectively represent necks, right shoulders, right elbows, right wrists, left shoulders, left elbows, left wrists, right hips, right knees, right feet, left hips, left knees, left feet, right eyes, left ears and left ears.
And 3, taking the human skeleton key points of all the images in the sliding window as a sample.
The falling is a process, and if only an independent frame of image in the video is used, dynamic information in the falling process can be lost, so that a sliding window is introduced in the embodiment to process the key points of the skeleton of the human body, and the subsequent adaptive key point attention network is helped to extract the dynamic features of the key points. Namely, a plurality of continuous frames of images in the same video are adopted for fall detection, and a fall detection result of the target to be detected in the last frame of image is obtained.
The length of the sliding window is 30, and the sliding window is adopted for grouping, namely, the human skeleton key points of the current frame image and the human skeleton key points of the front continuous 29 frames of images are divided into a group, namely, a sample, and the label of the human skeleton key points of the last frame of image is used as the label of the sample (falling detection result). Therefore, the dimension of each sample is W × 17 × H, i.e., 30 × 17 × 2, where W denotes the length of the sliding window, 17 denotes the number of human skeletal key points of one frame image, and H denotes the dimension of one human skeletal key point (the dimension is 2 because one human skeletal key point is composed of X and Y coordinates).
And 4, inputting the sample into a falling detection model to obtain a falling detection result of the target to be detected. The fall detection model comprises a self-adaptive key point attention network, a long-time and short-time memory network and a classification network which are sequentially connected. Specifically, the method comprises the following steps:
step 401, extracting spatial features (dynamic spatial attention features) of the samples by using a self-adaptive key point attention network, that is, calculating the weight of each human skeleton key point in the samples to obtain samples with spatial attention weights.
In fall detection, some key points have a large influence on fall detection, such as: key points on the limbs, but some have less impact on fall detection, such as: key points such as ears and eyes on the head, attention mechanisms are introduced, different key points are given different weights, and key points with important fall detection are given larger weights.
The spatial feature extraction process of the sample, as shown in fig. 3, specifically includes:
step 40101, before the sample is input into the adaptive keypoint attention network, a batch normalization layer is adopted to normalize the sample, and a normalized sample is obtained:
wherein,human skeletal key points representing batch data,represents the mean of human skeletal key points of the batch data,represents the variance of the human skeletal key points of the batch data,is a variable added to prevent the denominator from appearing zero,andto learn the parameters, 1 and 0 are typically taken, respectively. In training, the batch data may be 64 samples, and thus, the size of the batch data is 64 × 30 × 17 × 2. The size of the batch data of the normalized sample composition was 64 × 17 × 30 × 2. Can useRepresents the firstjFirst of frame imageiThe X-coordinate of the individual's skeletal key points,represents the firstjFirst of frame imageiY-coordinates of key points of the individual's body bones. The batch normalization layer is beneficial to accelerating the convergence speed of the fall detection model. The batch data is one sample at the time of testing.
Step 40102, inputting the normalized samples into an adaptive key point attention network.
(1) The adaptive key point attention network firstly adopts a global average pooling layer and a global maximum pooling layer to respectively perform global average pooling and global maximum pooling on samples (normalized samples) to obtain a global average pooling resultAnd global max pooling results. If the batch data is 64 samples, the size of the global average pooling result and the size of the global maximum pooling result are both 64 × 17 × 2 × 1.
(2) Global average pooled resultsAnd global max pooling resultsAfter addition, the importance of each channel is predicted through a full-connection layer, and the weight of all human skeleton key points is obtained:
wherein,representing a Sigmoid activation function that is,ReLUa function of the ReLU activation is represented,andit is shown that two fully-connected operations are performed,,,the attenuation factor is set to be 1, so that the number of channels is ensured not to be attenuated all the time in the process of processing the sample by the full-connection layer, the final weight and the key points of the human skeleton are always in one-to-one correspondence,andthe results of the global average pooling and the global maximum pooling are indicated, respectively.
(3) Obtaining the weight of each human skeleton key point through a full connection layer, multiplying the weight with the sample (normalized sample) to obtain the sample (spatial characteristic of the sample) with the spatial attention weight:
After the sample passes through the adaptive key point attention network, the weights of the key points of the continuous 30 frames of images are obtained, namely the importance degree of the key points in the falling detection process, so that the extraction of the spatial features is completed. If the batch data is 64 samples, the spatial features of all samples have a size of 64 × 17 × 30 × 2.
And step 402, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network based on the space characteristics of the sample. And inputting the spatial characteristics of the sample into a long-time and short-time memory network after transformation and remodeling.
The LSTM network has great advantages in processing time series problems and has strong capability of contacting context. In the fall detection, a video or an image presents a sequential time relationship, and the fall is a process and has a large relation with previous and next frames, so an LSTM network is introduced, so that the current frame and the previous and next frames of the image have relevance instead of independent individuals.
Samples to be weighted with spatial attentionThrough transformation and remodeling to obtain the sequenceWherein the transformation is to characterize the spatial characteristics of the sampleIs exchanged, the remodeling is toAre combined. If the batch data is 64 samples, the size of the result obtained after transformation is 64 × 30 × 17 × 2, and the size of the result obtained after remodeling is 64 × 30 × 34.
The sequences obtained after transformation and remodelingInputting the sample into an LSTM network, and extracting the space-time characteristics (key points with the space-time characteristics) of the sample.
As shown in fig. 5, the LSTM network includes a plurality of sequentially connected long-short-term memory (LSTM) units, each LSTM unit uses a three-layer, many-to-one LSTM network structure, wherein each LSTM unit, as shown in fig. 4(a), mainly includes three layers of sequentially connected storage units, that is, includes a lower layer storage unit, a middle layer storage unit, and an upper layer storage unit. In an LSTM cell, the output of the lower layer memory cell serves as the input to the middle layer memory cell, and the output of the middle layer memory cell serves as the input to the upper layer memory cell. For two adjacent LSTM units, the output of a layer of storage unit in the former LSTM unit is used as the input of a layer of storage unit in the latter LSTM unit. The output of the upper storage unit in the last LSTM unit is the spatio-temporal characteristics L of the samples extracted by the LSTM network. If the batch data is 64 samples, the output of the LSTM network is 128 neurons.
Each memory cell is composed of three gate structures and a cell state, as shown in fig. 4(b), fig. 4(b) is a schematic diagram of a memory cell enclosed by a dashed line in the long and short term memory network shown in fig. 4(a), an input gate controls which information is saved to a cell state, a forgetting gate determines the retention of information in a history cell state, and an output gate controls which cell states are exported, and the specific process of the memory cell can be represented by the following formula:
C t+j =f t+j *C t+j-1+i t+j *tanh(W C ·[h t+j-1,N * t+j ]+b C ) (5)
h t+j =o t+j *tanh(C t+j ) (6)
wherein,W f andW c are all a matrix of weights, and are,b f andb c are all the error vectors, and are,f t+j 、i t+j ando t+j respectively represent the firstt+jThe outputs of the forgetting gate, the input gate and the output gate of the memory cell at the time, wherein,j=0,1,2,…,29,C t+j is shown ast+jThe cell state of the memory cell at the time,N * t+j is shown ast+jThe input vector of the storage unit at the time (if the storage unit is a lower storage unit, the input vector is the second in the sequence obtained after transformation and reshapingj+1 elementN t+j Is also the firstj17 keypoints with spatial attention information for +1 frame image; if the storage unit is a middle-layer storage unit, the output vector of the lower-layer storage unit is obtained; if the memory cell is the upper layer memory cell, it is the output vector of the middle layer memory cell),h t+j is shown ast+jThe output vector of the memory cell at a time instant,representing a Sigmoid function.
And 403, obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
The classification network is used for the binary classification of fall detection, i.e. fall or normal. As shown in fig. 5, the classification network includes a plurality of fully connected layers connected in sequence, all fully connected layers except the last fully connected layer in the classification network have a Dropout layer and an activation function (Sigmoid function) connected in sequence, in order to prevent overfitting of the fall detection model and improve the generalization capability of the fall detection model, a Dropout layer is present between the fully connected layers and the activation function for random inactivation, and the Dropout rate is 0.3. Specifically, as shown in fig. 5, the classification network includes 4 full-connection layers connected in sequence, that is, 3 full-connection layers except the last full-connection layer in the classification network.
Aiming at the problem of falling of the old, the dynamic spatial attention characteristics of the key points of the human body and the dynamic time sequence characteristics of the key points of the human body are respectively extracted through the adaptive key point attention network and the long-time and short-time memory network to obtain the space-time characteristics, so that the identification capability of a falling detection model on falling events is enhanced, the falling detection accuracy is improved, the applicability is strong, and a certain real-time processing speed is realized.
Before the samples are input into the adaptive key point attention network, a batch normalization layer is added, so that the normalization processing of the samples is realized, and the convergence speed of the fall detection model is accelerated.
Example two
The embodiment provides a fall detection system based on spatio-temporal information, which specifically comprises the following modules:
a video acquisition module configured to: acquiring a video including a target to be detected;
a keypoint extraction module configured to: detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
a window sliding module configured to: using the human skeleton key points of all the images in the sliding window as a sample;
a spatial feature extraction module configured to: extracting the spatial features of the sample by adopting a self-adaptive key point attention network;
a spatiotemporal feature extraction module configured to: based on the spatial characteristics of the sample, extracting the spatiotemporal characteristics of the sample by adopting a long-short time memory network;
a classification module configured to: and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
Before the samples are input into the adaptive key point attention network, a batch normalization layer is adopted to carry out normalization processing on the samples.
The method comprises the following steps of: respectively carrying out global average pooling and global maximum pooling on the samples by adopting a global average pooling layer and a global maximum pooling layer to obtain a global average pooling result and a global maximum pooling result; adding the global average pooling result and the global maximum pooling result, and passing through a full-connection layer to obtain the weights of all human skeleton key points; and multiplying the weight by the sample to obtain the spatial characteristics of the sample.
And inputting the spatial characteristics of the sample into the long-time and short-time memory network after transformation and remodeling.
The long-short time memory network comprises a plurality of long-short time memory units which are connected in sequence; each long-time memory unit comprises a lower-layer memory unit, a middle-layer memory unit and an upper-layer memory unit; in one long-and-short memory cell, the output of the lower memory cell is used as the input of the middle memory cell, and the output of the middle memory cell is used as the input of the upper memory cell. For two adjacent long and short term memory units, the output of a layer of memory unit in the previous long and short term memory unit is used as the input of the same layer of memory unit in the next long and short term memory unit.
The classification network comprises a plurality of fully-connected layers which are connected in sequence, all fully-connected layers except the last fully-connected layer in the classification network have activation functions, and a Dropout layer exists between the fully-connected layers and the activation functions.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.
EXAMPLE III
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps in the method for fall detection based on spatiotemporal information as described in the first embodiment above.
Example four
This embodiment provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for fall detection based on spatiotemporal information as described in the first embodiment above when executing the program.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method for fall detection based on spatio-temporal information is characterized by comprising the following steps:
acquiring a video including a target to be detected;
detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
using the human skeleton key points of all the images in the sliding window as a sample;
extracting the spatial features of the sample by adopting a self-adaptive key point attention network;
based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;
and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
2. The spatiotemporal information-based fall detection method according to claim 1, wherein the samples are normalized using a batch normalization layer before being input into the adaptive keypoint attention network.
3. A method for fall detection based on spatiotemporal information as claimed in claim 1, characterized in that the step of extracting the spatial features of the samples is:
respectively carrying out global average pooling and global maximum pooling on the samples by adopting a global average pooling layer and a global maximum pooling layer to obtain a global average pooling result and a global maximum pooling result;
adding the global average pooling result and the global maximum pooling result, and passing through a full-connection layer to obtain the weights of all human skeleton key points;
and multiplying the weight by the sample to obtain the spatial characteristics of the sample.
4. The method for fall detection based on spatiotemporal information as claimed in claim 1, wherein the spatial features of the sample are transformed and reshaped and then input into the long-short term memory network.
5. The method for fall detection based on spatiotemporal information as claimed in claim 1, wherein the long-short term memory network comprises a plurality of long-short term memory cells connected in sequence;
each long-time memory unit comprises a lower-layer memory unit, a middle-layer memory unit and an upper-layer memory unit;
in one long-and-short memory cell, the output of the lower memory cell is used as the input of the middle memory cell, and the output of the middle memory cell is used as the input of the upper memory cell.
6. The method as claimed in claim 5, wherein for two adjacent long-and-short memory cells, the output of a layer of memory cells in a previous long-and-short memory cell is used as the input of a layer of memory cells in a subsequent long-and-short memory cell.
7. A spatio-temporal information-based fall detection method according to claim 1, characterized in that the classification network comprises several fully connected layers connected in sequence, and in that there are Dropout layers and activation functions connected in sequence for all fully connected layers in the classification network except for the last fully connected layer.
8. Fall detection system based on temporal and spatial information, characterized by comprising:
a video acquisition module configured to: acquiring a video including a target to be detected;
a keypoint extraction module configured to: detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
a window sliding module configured to: using the human skeleton key points of all the images in the sliding window as a sample;
a spatial feature extraction module configured to: extracting the spatial features of the sample by adopting a self-adaptive key point attention network;
a spatiotemporal feature extraction module configured to: based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;
a classification module configured to: and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps in the spatio-temporal information-based fall detection method as defined in any one of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor when executing the program realizes the steps in the spatiotemporal information based fall detection method as claimed in any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210536743.5A CN114627427B (en) | 2022-05-18 | 2022-05-18 | Fall detection method, system, storage medium and equipment based on spatio-temporal information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210536743.5A CN114627427B (en) | 2022-05-18 | 2022-05-18 | Fall detection method, system, storage medium and equipment based on spatio-temporal information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114627427A true CN114627427A (en) | 2022-06-14 |
CN114627427B CN114627427B (en) | 2022-09-23 |
Family
ID=81906991
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210536743.5A Expired - Fee Related CN114627427B (en) | 2022-05-18 | 2022-05-18 | Fall detection method, system, storage medium and equipment based on spatio-temporal information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114627427B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110059662A (en) * | 2019-04-26 | 2019-07-26 | 山东大学 | A kind of deep video Activity recognition method and system |
CN111401177A (en) * | 2020-03-09 | 2020-07-10 | 山东大学 | End-to-end behavior recognition method and system based on adaptive space-time attention mechanism |
CN112686211A (en) * | 2021-01-25 | 2021-04-20 | 广东工业大学 | Fall detection method and device based on attitude estimation |
CN112998697A (en) * | 2021-02-22 | 2021-06-22 | 电子科技大学 | Tumble injury degree prediction method and system based on skeleton data and terminal |
CN113111865A (en) * | 2021-05-13 | 2021-07-13 | 广东工业大学 | Fall behavior detection method and system based on deep learning |
CN114387666A (en) * | 2021-12-28 | 2022-04-22 | 大连理工大学 | Graph convolution network falling detection method based on human body key points |
-
2022
- 2022-05-18 CN CN202210536743.5A patent/CN114627427B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110059662A (en) * | 2019-04-26 | 2019-07-26 | 山东大学 | A kind of deep video Activity recognition method and system |
CN111401177A (en) * | 2020-03-09 | 2020-07-10 | 山东大学 | End-to-end behavior recognition method and system based on adaptive space-time attention mechanism |
CN112686211A (en) * | 2021-01-25 | 2021-04-20 | 广东工业大学 | Fall detection method and device based on attitude estimation |
CN112998697A (en) * | 2021-02-22 | 2021-06-22 | 电子科技大学 | Tumble injury degree prediction method and system based on skeleton data and terminal |
CN113111865A (en) * | 2021-05-13 | 2021-07-13 | 广东工业大学 | Fall behavior detection method and system based on deep learning |
CN114387666A (en) * | 2021-12-28 | 2022-04-22 | 大连理工大学 | Graph convolution network falling detection method based on human body key points |
Also Published As
Publication number | Publication date |
---|---|
CN114627427B (en) | 2022-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111666857B (en) | Human behavior recognition method, device and storage medium based on environment semantic understanding | |
CN108875708A (en) | Behavior analysis method, device, equipment, system and storage medium based on video | |
Chatrath et al. | Real time human face detection and tracking | |
CN110991513B (en) | Image target recognition system and method with continuous learning ability of human-like | |
CN111680550B (en) | Emotion information identification method and device, storage medium and computer equipment | |
Santhalingam et al. | Sign language recognition analysis using multimodal data | |
CN110633624A (en) | Machine vision human body abnormal behavior identification method based on multi-feature fusion | |
CN111104925A (en) | Image processing method, image processing apparatus, storage medium, and electronic device | |
Badhe et al. | Artificial neural network based indian sign language recognition using hand crafted features | |
Zhou et al. | A study on attention-based LSTM for abnormal behavior recognition with variable pooling | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
CN112307984A (en) | Safety helmet detection method and device based on neural network | |
CN116312512A (en) | Multi-person scene-oriented audiovisual fusion wake-up word recognition method and device | |
Hong et al. | Characterizing subtle facial movements via Riemannian manifold | |
CN113239866B (en) | Face recognition method and system based on space-time feature fusion and sample attention enhancement | |
CN110807380A (en) | Human body key point detection method and device | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN111274854A (en) | Human body action recognition method and vision enhancement processing system | |
CN117894065A (en) | Multi-person scene behavior recognition method based on skeleton key points | |
CN114627427B (en) | Fall detection method, system, storage medium and equipment based on spatio-temporal information | |
CN115205750B (en) | Motion real-time counting method and system based on deep learning model | |
Hassan et al. | Enhanced dynamic sign language recognition using slowfast networks | |
CN114360058B (en) | Cross-view gait recognition method based on walking view prediction | |
Yoshihara et al. | Automatic feature point detection using deep convolutional networks for quantitative evaluation of facial paralysis | |
Suriani et al. | Sudden fall classification using motion features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221216 Address after: Room 3115, No. 135, Ward Avenue, Ping'an Street, Changqing District, Jinan, Shandong 250300 Patentee after: Shandong Jiqing Technology Service Co.,Ltd. Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Patentee before: Qilu University of Technology |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220923 |