CN114627427A - Fall detection method, system, storage medium and equipment based on spatio-temporal information - Google Patents

Fall detection method, system, storage medium and equipment based on spatio-temporal information Download PDF

Info

Publication number
CN114627427A
CN114627427A CN202210536743.5A CN202210536743A CN114627427A CN 114627427 A CN114627427 A CN 114627427A CN 202210536743 A CN202210536743 A CN 202210536743A CN 114627427 A CN114627427 A CN 114627427A
Authority
CN
China
Prior art keywords
sample
fall detection
network
layer
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210536743.5A
Other languages
Chinese (zh)
Other versions
CN114627427B (en
Inventor
张友梅
李江娇
李彬
高梦奇
智昱旻
周大正
张明亮
张瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jiqing Technology Service Co ltd
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202210536743.5A priority Critical patent/CN114627427B/en
Publication of CN114627427A publication Critical patent/CN114627427A/en
Application granted granted Critical
Publication of CN114627427B publication Critical patent/CN114627427B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image recognition, and provides a method, a system, a storage medium and equipment for fall detection based on spatio-temporal information, wherein the method comprises the following steps: acquiring a video including a target to be detected; detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points; using the human skeleton key points of all the images in the sliding window as a sample; extracting the spatial features of the sample by adopting a self-adaptive key point attention network; based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network; and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample. The fall detection method and the fall detection device enhance the identification capability of falling events and improve the accuracy of fall detection.

Description

Fall detection method, system, storage medium and equipment based on spatio-temporal information
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a method, a system, a storage medium and equipment for fall detection based on spatio-temporal information.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Among the problems affecting the daily safety of the elderly, falling has become one of the main causes of injuries and deaths of the elderly. However, if the old people fall as early as possible, serious consequences can be reduced, and the corresponding fall detection and rescue services can ensure the safety of the old people as much as possible, so that the development of an intelligent detection and protection system related to the fall detection and rescue services has been the focus.
The existing fall detection methods can be mainly divided into two types: sensor device based methods and computer vision based methods. For the old, the sensor-based method has the defects of inconvenient wearing and easy forgetting; the computer vision-based method is mostly based on fixed equipment to collect data, uses a neural network to extract features such as human body contours, positions and speeds of specific parts and the like, and trains a classification network by combining the features.
At present, most of fall detection methods based on computer vision are based on posture estimation, use neural networks to extract features, and train classification networks by combining the features. However, the existing fall detection method based on computer vision has the disadvantages of high false detection rate, low accuracy rate, and poor real-time performance of part of algorithms, and is difficult to be applied to actual fall detection.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a method, a system, a storage medium and a device for fall detection based on spatiotemporal information, wherein spatiotemporal features are obtained through a self-adaptive key point attention network and a Long-Short-Term Memory (LSTM) network, so that the identification capability of a fall detection model on a fall incident is enhanced, and the accuracy of fall detection is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a first aspect of the invention provides a method of fall detection based on spatiotemporal information, comprising:
acquiring a video including a target to be detected;
detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
using the human skeleton key points of all images in the sliding window as a sample;
extracting the spatial features of the sample by adopting a self-adaptive key point attention network;
based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;
and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
Further, the samples are normalized by a batch normalization layer before being input into the adaptive keypoint attention network.
Further, the step of extracting the spatial features of the sample is as follows:
respectively carrying out global average pooling and global maximum pooling on the samples by adopting a global average pooling layer and a global maximum pooling layer to obtain a global average pooling result and a global maximum pooling result;
adding the global average pooling result and the global maximum pooling result, and passing through a full-connection layer to obtain the weights of all human skeleton key points;
and multiplying the weight by the sample to obtain the spatial characteristics of the sample.
Further, after the spatial characteristics of the sample are transformed and reshaped, the spatial characteristics are input into the improved long-time and short-time memory network.
Furthermore, the long-time and short-time memory network comprises a plurality of long-time and short-time memory units which are connected in sequence;
each long and short term memory unit comprises a lower layer memory unit, a middle layer memory unit and an upper layer memory unit;
in one long-and-short memory cell, the output of the lower memory cell is used as the input of the middle memory cell, and the output of the middle memory cell is used as the input of the upper memory cell.
Further, for two adjacent long and short term memory cells, the output of a layer of memory cell in the previous long and short term memory cell is used as the input of a layer of memory cell in the next long and short term memory cell.
Furthermore, the classification network comprises a plurality of fully-connected layers which are connected in sequence, and Dropout layers and activation functions which are connected in sequence exist in all fully-connected layers except the last fully-connected layer in the classification network.
A second aspect of the invention provides a fall detection system based on spatiotemporal information, comprising:
a video acquisition module configured to: acquiring a video including a target to be detected;
a keypoint extraction module configured to: detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
a window sliding module configured to: using the human skeleton key points of all the images in the sliding window as a sample;
a spatial feature extraction module configured to: extracting the spatial features of the samples by adopting a self-adaptive key point attention network;
a spatiotemporal feature extraction module configured to: based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;
a classification module configured to: and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
A third aspect of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps in the method for fall detection based on spatiotemporal information as described above.
A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps in the method for fall detection based on spatiotemporal information as described above.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a tumble detection method based on space-time information, which respectively extracts dynamic space attention characteristics of key points of a human body and dynamic time sequence characteristics of the key points of the human body through a self-adaptive key point attention network and a long-time and short-time memory network to obtain space-time characteristics, thereby enhancing the recognition capability of a tumble detection model on tumble events, improving the accuracy of tumble detection, having strong applicability and having certain real-time processing speed.
The invention provides a falling detection method based on space-time information, which is characterized in that a batch normalization layer is added before a sample is input into a self-adaptive key point attention network, so that the normalization processing of the sample is realized, and the convergence speed of a falling detection model is accelerated.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a flowchart of a fall detection method based on spatiotemporal information according to a first embodiment of the present invention;
FIG. 2 is a key point detail diagram of the first embodiment of the present invention;
FIG. 3 is a flow chart of spatial feature extraction of a sample according to a first embodiment of the present invention;
fig. 4(a) is a structure diagram of a long-term and short-term memory network according to a first embodiment of the present invention;
FIG. 4(b) is a schematic diagram of a single memory cell in an long-term memory network according to a first embodiment of the present invention;
FIG. 5 is a flow chart of spatiotemporal feature extraction and classification of samples according to a first embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
The embodiment provides a fall detection method based on spatiotemporal information, which specifically includes the following steps as shown in fig. 1:
step 1, obtaining a video containing a target to be detected.
Each frame of image in the video is an RGB image; the target to be detected is the elderly.
And 2, adopting a target detection model for each frame of image in the video, detecting to obtain a human body detection frame, and inputting the human body detection frame into the posture estimation model to extract human body skeleton key points.
The target detection model is YOLOv3 (You Only Look one (version 3)). The attitude Estimation model employs a Multi-stage attitude Estimation Network (MSPN).
Specifically, each frame of RGB image is input into YOLOv3, YOLOv3 detects a human body in the RGB image, YOLOv3 returns an image with a human body detection frame, and simultaneously deletes an image not including a human body, and adjusts the size of the human body detection frame to 256 × 192; and YOLOv3 is used as a detector, the adjusted human body detection frame is input into MSPN, and the two-dimensional key point position of the human body skeleton coordinate is calculated to obtain the human body skeleton key point.
The MSPN uses a backbone network in a plurality of cascade pyramid networks to repeatedly carry out down-sampling and up-sampling on the feature map to extract feature information; MSPN uses a cross-stage feature aggregation method, in each stage, the down-sampling and the up-sampling respectively input feature information into the next stage after a 1 × 1 convolution, thereby preventing the feature loss in the repeated sampling process; the MSPN utilizes the thought from coarse to fine, Gaussian convolution kernels with different sizes are adopted for different stages, the convolution kernel of the stage closer to the input is larger, and the convolution kernel of the stage farther away from the input is smaller, so that the accuracy of the attitude estimation model for estimating key points is improved, the estimation of the key points of the human body is further realized, and the key points of the skeleton of the human body in the image of the human body detection block are output.
After performing pose estimation on a human detection frame image corresponding to each frame of RGB image, a pose estimation model obtains 17 human skeleton key points in a txt format, each human skeleton key point is composed of X and Y coordinates, details of the detected human skeleton key points are given in FIG. 2, the 17 human skeleton key points are respectively marked as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 and 17, and the human skeleton key points respectively represent necks, right shoulders, right elbows, right wrists, left shoulders, left elbows, left wrists, right hips, right knees, right feet, left hips, left knees, left feet, right eyes, left ears and left ears.
And 3, taking the human skeleton key points of all the images in the sliding window as a sample.
The falling is a process, and if only an independent frame of image in the video is used, dynamic information in the falling process can be lost, so that a sliding window is introduced in the embodiment to process the key points of the skeleton of the human body, and the subsequent adaptive key point attention network is helped to extract the dynamic features of the key points. Namely, a plurality of continuous frames of images in the same video are adopted for fall detection, and a fall detection result of the target to be detected in the last frame of image is obtained.
The length of the sliding window is 30, and the sliding window is adopted for grouping, namely, the human skeleton key points of the current frame image and the human skeleton key points of the front continuous 29 frames of images are divided into a group, namely, a sample, and the label of the human skeleton key points of the last frame of image is used as the label of the sample (falling detection result). Therefore, the dimension of each sample is W × 17 × H, i.e., 30 × 17 × 2, where W denotes the length of the sliding window, 17 denotes the number of human skeletal key points of one frame image, and H denotes the dimension of one human skeletal key point (the dimension is 2 because one human skeletal key point is composed of X and Y coordinates).
And 4, inputting the sample into a falling detection model to obtain a falling detection result of the target to be detected. The fall detection model comprises a self-adaptive key point attention network, a long-time and short-time memory network and a classification network which are sequentially connected. Specifically, the method comprises the following steps:
step 401, extracting spatial features (dynamic spatial attention features) of the samples by using a self-adaptive key point attention network, that is, calculating the weight of each human skeleton key point in the samples to obtain samples with spatial attention weights.
In fall detection, some key points have a large influence on fall detection, such as: key points on the limbs, but some have less impact on fall detection, such as: key points such as ears and eyes on the head, attention mechanisms are introduced, different key points are given different weights, and key points with important fall detection are given larger weights.
The spatial feature extraction process of the sample, as shown in fig. 3, specifically includes:
step 40101, before the sample is input into the adaptive keypoint attention network, a batch normalization layer is adopted to normalize the sample, and a normalized sample is obtained:
Figure 371508DEST_PATH_IMAGE001
(1)
wherein,
Figure 325689DEST_PATH_IMAGE002
human skeletal key points representing batch data,
Figure 197567DEST_PATH_IMAGE003
represents the mean of human skeletal key points of the batch data,
Figure 844581DEST_PATH_IMAGE004
represents the variance of the human skeletal key points of the batch data,
Figure 169120DEST_PATH_IMAGE005
is a variable added to prevent the denominator from appearing zero,
Figure 864675DEST_PATH_IMAGE006
and
Figure 591060DEST_PATH_IMAGE007
to learn the parameters, 1 and 0 are typically taken, respectively. In training, the batch data may be 64 samples, and thus, the size of the batch data is 64 × 30 × 17 × 2. The size of the batch data of the normalized sample composition was 64 × 17 × 30 × 2. Can use
Figure 408975DEST_PATH_IMAGE008
Represents the firstjFirst of frame imageiThe X-coordinate of the individual's skeletal key points,
Figure 220811DEST_PATH_IMAGE009
represents the firstjFirst of frame imageiY-coordinates of key points of the individual's body bones. The batch normalization layer is beneficial to accelerating the convergence speed of the fall detection model. The batch data is one sample at the time of testing.
Step 40102, inputting the normalized samples into an adaptive key point attention network.
(1) The adaptive key point attention network firstly adopts a global average pooling layer and a global maximum pooling layer to respectively perform global average pooling and global maximum pooling on samples (normalized samples) to obtain a global average pooling result
Figure 953012DEST_PATH_IMAGE010
And global max pooling results
Figure 300948DEST_PATH_IMAGE011
. If the batch data is 64 samples, the size of the global average pooling result and the size of the global maximum pooling result are both 64 × 17 × 2 × 1.
(2) Global average pooled results
Figure 788299DEST_PATH_IMAGE010
And global max pooling results
Figure 588896DEST_PATH_IMAGE011
After addition, the importance of each channel is predicted through a full-connection layer, and the weight of all human skeleton key points is obtained:
Figure 187105DEST_PATH_IMAGE012
(2)
wherein,
Figure 123968DEST_PATH_IMAGE013
representing a Sigmoid activation function that is,ReLUa function of the ReLU activation is represented,
Figure 782220DEST_PATH_IMAGE014
and
Figure 70113DEST_PATH_IMAGE015
it is shown that two fully-connected operations are performed,
Figure 472014DEST_PATH_IMAGE016
Figure 528963DEST_PATH_IMAGE017
Figure 92537DEST_PATH_IMAGE018
the attenuation factor is set to be 1, so that the number of channels is ensured not to be attenuated all the time in the process of processing the sample by the full-connection layer, the final weight and the key points of the human skeleton are always in one-to-one correspondence,
Figure 867726DEST_PATH_IMAGE019
and
Figure 73317DEST_PATH_IMAGE020
the results of the global average pooling and the global maximum pooling are indicated, respectively.
(3) Obtaining the weight of each human skeleton key point through a full connection layer, multiplying the weight with the sample (normalized sample) to obtain the sample (spatial characteristic of the sample) with the spatial attention weight
Figure 922455DEST_PATH_IMAGE021
Figure 656931DEST_PATH_IMAGE022
(3)
After the sample passes through the adaptive key point attention network, the weights of the key points of the continuous 30 frames of images are obtained, namely the importance degree of the key points in the falling detection process, so that the extraction of the spatial features is completed. If the batch data is 64 samples, the spatial features of all samples have a size of 64 × 17 × 30 × 2.
And step 402, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network based on the space characteristics of the sample. And inputting the spatial characteristics of the sample into a long-time and short-time memory network after transformation and remodeling.
The LSTM network has great advantages in processing time series problems and has strong capability of contacting context. In the fall detection, a video or an image presents a sequential time relationship, and the fall is a process and has a large relation with previous and next frames, so an LSTM network is introduced, so that the current frame and the previous and next frames of the image have relevance instead of independent individuals.
Samples to be weighted with spatial attention
Figure 919416DEST_PATH_IMAGE023
Through transformation and remodeling to obtain the sequence
Figure 663119DEST_PATH_IMAGE024
Wherein the transformation is to characterize the spatial characteristics of the sample
Figure 429081DEST_PATH_IMAGE025
Is exchanged, the remodeling is to
Figure 865616DEST_PATH_IMAGE025
Are combined. If the batch data is 64 samples, the size of the result obtained after transformation is 64 × 30 × 17 × 2, and the size of the result obtained after remodeling is 64 × 30 × 34.
The sequences obtained after transformation and remodeling
Figure 84239DEST_PATH_IMAGE026
Inputting the sample into an LSTM network, and extracting the space-time characteristics (key points with the space-time characteristics) of the sample.
As shown in fig. 5, the LSTM network includes a plurality of sequentially connected long-short-term memory (LSTM) units, each LSTM unit uses a three-layer, many-to-one LSTM network structure, wherein each LSTM unit, as shown in fig. 4(a), mainly includes three layers of sequentially connected storage units, that is, includes a lower layer storage unit, a middle layer storage unit, and an upper layer storage unit. In an LSTM cell, the output of the lower layer memory cell serves as the input to the middle layer memory cell, and the output of the middle layer memory cell serves as the input to the upper layer memory cell. For two adjacent LSTM units, the output of a layer of storage unit in the former LSTM unit is used as the input of a layer of storage unit in the latter LSTM unit. The output of the upper storage unit in the last LSTM unit is the spatio-temporal characteristics L of the samples extracted by the LSTM network. If the batch data is 64 samples, the output of the LSTM network is 128 neurons.
Each memory cell is composed of three gate structures and a cell state, as shown in fig. 4(b), fig. 4(b) is a schematic diagram of a memory cell enclosed by a dashed line in the long and short term memory network shown in fig. 4(a), an input gate controls which information is saved to a cell state, a forgetting gate determines the retention of information in a history cell state, and an output gate controls which cell states are exported, and the specific process of the memory cell can be represented by the following formula:
Figure 569316DEST_PATH_IMAGE027
(4)
C t+j =f t+j *C t+j-1+i t+j *tanh(W C ·[h t+j-1,N * t+j ]+b C ) (5)
h t+j =o t+j *tanh(C t+j ) (6)
wherein,W f andW c are all a matrix of weights, and are,b f andb c are all the error vectors, and are,f t+j i t+j ando t+j respectively represent the firstt+jThe outputs of the forgetting gate, the input gate and the output gate of the memory cell at the time, wherein,j=0,1,2,…,29,C t+j is shown ast+jThe cell state of the memory cell at the time,N * t+j is shown ast+jThe input vector of the storage unit at the time (if the storage unit is a lower storage unit, the input vector is the second in the sequence obtained after transformation and reshapingj+1 elementN t+j Is also the firstj17 keypoints with spatial attention information for +1 frame image; if the storage unit is a middle-layer storage unit, the output vector of the lower-layer storage unit is obtained; if the memory cell is the upper layer memory cell, it is the output vector of the middle layer memory cell),h t+j is shown ast+jThe output vector of the memory cell at a time instant,
Figure 977336DEST_PATH_IMAGE028
representing a Sigmoid function.
And 403, obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
The classification network is used for the binary classification of fall detection, i.e. fall or normal. As shown in fig. 5, the classification network includes a plurality of fully connected layers connected in sequence, all fully connected layers except the last fully connected layer in the classification network have a Dropout layer and an activation function (Sigmoid function) connected in sequence, in order to prevent overfitting of the fall detection model and improve the generalization capability of the fall detection model, a Dropout layer is present between the fully connected layers and the activation function for random inactivation, and the Dropout rate is 0.3. Specifically, as shown in fig. 5, the classification network includes 4 full-connection layers connected in sequence, that is, 3 full-connection layers except the last full-connection layer in the classification network.
Aiming at the problem of falling of the old, the dynamic spatial attention characteristics of the key points of the human body and the dynamic time sequence characteristics of the key points of the human body are respectively extracted through the adaptive key point attention network and the long-time and short-time memory network to obtain the space-time characteristics, so that the identification capability of a falling detection model on falling events is enhanced, the falling detection accuracy is improved, the applicability is strong, and a certain real-time processing speed is realized.
Before the samples are input into the adaptive key point attention network, a batch normalization layer is added, so that the normalization processing of the samples is realized, and the convergence speed of the fall detection model is accelerated.
Example two
The embodiment provides a fall detection system based on spatio-temporal information, which specifically comprises the following modules:
a video acquisition module configured to: acquiring a video including a target to be detected;
a keypoint extraction module configured to: detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
a window sliding module configured to: using the human skeleton key points of all the images in the sliding window as a sample;
a spatial feature extraction module configured to: extracting the spatial features of the sample by adopting a self-adaptive key point attention network;
a spatiotemporal feature extraction module configured to: based on the spatial characteristics of the sample, extracting the spatiotemporal characteristics of the sample by adopting a long-short time memory network;
a classification module configured to: and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
Before the samples are input into the adaptive key point attention network, a batch normalization layer is adopted to carry out normalization processing on the samples.
The method comprises the following steps of: respectively carrying out global average pooling and global maximum pooling on the samples by adopting a global average pooling layer and a global maximum pooling layer to obtain a global average pooling result and a global maximum pooling result; adding the global average pooling result and the global maximum pooling result, and passing through a full-connection layer to obtain the weights of all human skeleton key points; and multiplying the weight by the sample to obtain the spatial characteristics of the sample.
And inputting the spatial characteristics of the sample into the long-time and short-time memory network after transformation and remodeling.
The long-short time memory network comprises a plurality of long-short time memory units which are connected in sequence; each long-time memory unit comprises a lower-layer memory unit, a middle-layer memory unit and an upper-layer memory unit; in one long-and-short memory cell, the output of the lower memory cell is used as the input of the middle memory cell, and the output of the middle memory cell is used as the input of the upper memory cell. For two adjacent long and short term memory units, the output of a layer of memory unit in the previous long and short term memory unit is used as the input of the same layer of memory unit in the next long and short term memory unit.
The classification network comprises a plurality of fully-connected layers which are connected in sequence, all fully-connected layers except the last fully-connected layer in the classification network have activation functions, and a Dropout layer exists between the fully-connected layers and the activation functions.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.
EXAMPLE III
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps in the method for fall detection based on spatiotemporal information as described in the first embodiment above.
Example four
This embodiment provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for fall detection based on spatiotemporal information as described in the first embodiment above when executing the program.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for fall detection based on spatio-temporal information is characterized by comprising the following steps:
acquiring a video including a target to be detected;
detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
using the human skeleton key points of all the images in the sliding window as a sample;
extracting the spatial features of the sample by adopting a self-adaptive key point attention network;
based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;
and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
2. The spatiotemporal information-based fall detection method according to claim 1, wherein the samples are normalized using a batch normalization layer before being input into the adaptive keypoint attention network.
3. A method for fall detection based on spatiotemporal information as claimed in claim 1, characterized in that the step of extracting the spatial features of the samples is:
respectively carrying out global average pooling and global maximum pooling on the samples by adopting a global average pooling layer and a global maximum pooling layer to obtain a global average pooling result and a global maximum pooling result;
adding the global average pooling result and the global maximum pooling result, and passing through a full-connection layer to obtain the weights of all human skeleton key points;
and multiplying the weight by the sample to obtain the spatial characteristics of the sample.
4. The method for fall detection based on spatiotemporal information as claimed in claim 1, wherein the spatial features of the sample are transformed and reshaped and then input into the long-short term memory network.
5. The method for fall detection based on spatiotemporal information as claimed in claim 1, wherein the long-short term memory network comprises a plurality of long-short term memory cells connected in sequence;
each long-time memory unit comprises a lower-layer memory unit, a middle-layer memory unit and an upper-layer memory unit;
in one long-and-short memory cell, the output of the lower memory cell is used as the input of the middle memory cell, and the output of the middle memory cell is used as the input of the upper memory cell.
6. The method as claimed in claim 5, wherein for two adjacent long-and-short memory cells, the output of a layer of memory cells in a previous long-and-short memory cell is used as the input of a layer of memory cells in a subsequent long-and-short memory cell.
7. A spatio-temporal information-based fall detection method according to claim 1, characterized in that the classification network comprises several fully connected layers connected in sequence, and in that there are Dropout layers and activation functions connected in sequence for all fully connected layers in the classification network except for the last fully connected layer.
8. Fall detection system based on temporal and spatial information, characterized by comprising:
a video acquisition module configured to: acquiring a video including a target to be detected;
a keypoint extraction module configured to: detecting each frame of image in the video to obtain a human body detection frame, and extracting human body skeleton key points;
a window sliding module configured to: using the human skeleton key points of all the images in the sliding window as a sample;
a spatial feature extraction module configured to: extracting the spatial features of the sample by adopting a self-adaptive key point attention network;
a spatiotemporal feature extraction module configured to: based on the spatial characteristics of the sample, extracting the space-time characteristics of the sample by adopting a long-time and short-time memory network;
a classification module configured to: and obtaining a falling detection result of the target to be detected by adopting a classification network based on the space-time characteristics of the sample.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps in the spatio-temporal information-based fall detection method as defined in any one of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor when executing the program realizes the steps in the spatiotemporal information based fall detection method as claimed in any of claims 1-7.
CN202210536743.5A 2022-05-18 2022-05-18 Fall detection method, system, storage medium and equipment based on spatio-temporal information Expired - Fee Related CN114627427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210536743.5A CN114627427B (en) 2022-05-18 2022-05-18 Fall detection method, system, storage medium and equipment based on spatio-temporal information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210536743.5A CN114627427B (en) 2022-05-18 2022-05-18 Fall detection method, system, storage medium and equipment based on spatio-temporal information

Publications (2)

Publication Number Publication Date
CN114627427A true CN114627427A (en) 2022-06-14
CN114627427B CN114627427B (en) 2022-09-23

Family

ID=81906991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210536743.5A Expired - Fee Related CN114627427B (en) 2022-05-18 2022-05-18 Fall detection method, system, storage medium and equipment based on spatio-temporal information

Country Status (1)

Country Link
CN (1) CN114627427B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059662A (en) * 2019-04-26 2019-07-26 山东大学 A kind of deep video Activity recognition method and system
CN111401177A (en) * 2020-03-09 2020-07-10 山东大学 End-to-end behavior recognition method and system based on adaptive space-time attention mechanism
CN112686211A (en) * 2021-01-25 2021-04-20 广东工业大学 Fall detection method and device based on attitude estimation
CN112998697A (en) * 2021-02-22 2021-06-22 电子科技大学 Tumble injury degree prediction method and system based on skeleton data and terminal
CN113111865A (en) * 2021-05-13 2021-07-13 广东工业大学 Fall behavior detection method and system based on deep learning
CN114387666A (en) * 2021-12-28 2022-04-22 大连理工大学 Graph convolution network falling detection method based on human body key points

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059662A (en) * 2019-04-26 2019-07-26 山东大学 A kind of deep video Activity recognition method and system
CN111401177A (en) * 2020-03-09 2020-07-10 山东大学 End-to-end behavior recognition method and system based on adaptive space-time attention mechanism
CN112686211A (en) * 2021-01-25 2021-04-20 广东工业大学 Fall detection method and device based on attitude estimation
CN112998697A (en) * 2021-02-22 2021-06-22 电子科技大学 Tumble injury degree prediction method and system based on skeleton data and terminal
CN113111865A (en) * 2021-05-13 2021-07-13 广东工业大学 Fall behavior detection method and system based on deep learning
CN114387666A (en) * 2021-12-28 2022-04-22 大连理工大学 Graph convolution network falling detection method based on human body key points

Also Published As

Publication number Publication date
CN114627427B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN111666857B (en) Human behavior recognition method, device and storage medium based on environment semantic understanding
CN108875708A (en) Behavior analysis method, device, equipment, system and storage medium based on video
Chatrath et al. Real time human face detection and tracking
CN110991513B (en) Image target recognition system and method with continuous learning ability of human-like
CN111680550B (en) Emotion information identification method and device, storage medium and computer equipment
Santhalingam et al. Sign language recognition analysis using multimodal data
CN110633624A (en) Machine vision human body abnormal behavior identification method based on multi-feature fusion
CN111104925A (en) Image processing method, image processing apparatus, storage medium, and electronic device
Badhe et al. Artificial neural network based indian sign language recognition using hand crafted features
Zhou et al. A study on attention-based LSTM for abnormal behavior recognition with variable pooling
CN112906520A (en) Gesture coding-based action recognition method and device
CN112307984A (en) Safety helmet detection method and device based on neural network
CN116312512A (en) Multi-person scene-oriented audiovisual fusion wake-up word recognition method and device
Hong et al. Characterizing subtle facial movements via Riemannian manifold
CN113239866B (en) Face recognition method and system based on space-time feature fusion and sample attention enhancement
CN110807380A (en) Human body key point detection method and device
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN111274854A (en) Human body action recognition method and vision enhancement processing system
CN117894065A (en) Multi-person scene behavior recognition method based on skeleton key points
CN114627427B (en) Fall detection method, system, storage medium and equipment based on spatio-temporal information
CN115205750B (en) Motion real-time counting method and system based on deep learning model
Hassan et al. Enhanced dynamic sign language recognition using slowfast networks
CN114360058B (en) Cross-view gait recognition method based on walking view prediction
Yoshihara et al. Automatic feature point detection using deep convolutional networks for quantitative evaluation of facial paralysis
Suriani et al. Sudden fall classification using motion features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221216

Address after: Room 3115, No. 135, Ward Avenue, Ping'an Street, Changqing District, Jinan, Shandong 250300

Patentee after: Shandong Jiqing Technology Service Co.,Ltd.

Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501

Patentee before: Qilu University of Technology

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220923