CN112101235B

CN112101235B - Old people behavior identification and detection method based on old people behavior characteristics

Info

Publication number: CN112101235B
Application number: CN202010977729.XA
Authority: CN
Inventors: 冯志全; 孔丹; 徐涛; 杨晓晖; 田京兰; 范雪; 郭庆北
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2020-09-16
Filing date: 2020-09-16
Publication date: 2022-09-23
Anticipated expiration: 2040-09-16
Also published as: CN112101235A

Abstract

The invention discloses an old people behavior identification and detection method based on old people behavior characteristics, which comprises the following steps: wearing an acceleration sensor on the body of the old person to enable the old person to do specified actions; data acquisition is carried out through a Kinect device and the acceleration sensor, a motion video is obtained through the Kinect device, and body acceleration data are obtained through the acceleration sensor; extracting a motion frame from the motion video; extracting spatial features and temporal features of the action frame by using a first feature extraction channel; extracting numerical characteristics and time characteristics of the body acceleration data by a second characteristic extraction channel; and fusing the output results of the first characteristic extraction channel and the second characteristic extraction channel through a feedforward neural network to obtain a target result. The invention can identify the behavior of the old people by utilizing the time difference between the behavior characteristics of the old people and the behavior characteristics of the young people, and the behavior identification is carried out by the action video and the body acceleration data characteristics, so that the identification is more accurate by mutual supplement.

Description

Old people behavior identification and detection method based on old people behavior characteristics

Technical Field

The invention relates to the technical field of neural network identification, in particular to an old people behavior identification and detection method based on old people behavior characteristics.

Background

The gradual aging of the population and the care of the elderly have become problems to be faced worldwide. The aging population leads to the increase of health and social service activities, so that the intelligent accompanying service and the timely monitoring of the old become new research hotspots in the current society. In the process of researching the timely monitoring of the old people, the method has important significance in timely identifying the behaviors of the old people.

At present, the study on human body posture of the whole age group of Bangliliu and other people acquire 3D human body skeleton joints through RGBD sensors, extract skeleton characteristics, aggregate the skeleton characteristics with geometric characteristics, and construct GBSW to identify human body behaviors, the model can identify multi-person actions and interactive identification, but the method has the advantage that the overall performance is reduced due to inaccurate or missing skeleton joint identification points, so that the accuracy is only 0.65. As another example, an Intelligent Fuzzy Inference (IFI) algorithm based human activity is detected with an accuracy of 0.71 by acquiring acceleration and angular velocity data of the upper leg and lower leg through an intelligent shoe equipped with an (IMU). In addition, human motion detection is performed on fluctuations of wireless local area device signals, useful information is extracted from a Channel State Information (CSI) stream in wireless device information based on a principal component analysis method, and then human behaviors are classified by adopting a deep Forest Decision (FD) method, wherein the accuracy is 0.75.

In addition, according to the method adopted by the existing research behavior detection, the identification research on the daily life behaviors of the old people stays in the utilization of image features, or three-dimensional positions of bone points, or sensor data. One of the three types of data is generally collected and data features of different actions are extracted to distinguish the behavior categories of the elderly. For example, g.anitha and s.baghavathi Priya propose a detection and recognition system for five abnormal behaviors of old people, such as forward leaning, chest pain, backward walking, headache and vomiting, based on a dynamic bayesian network, and image features are adopted, specifically, differences among the behaviors are found from the geometric features of the postures of the old people through the acquired images, and then the behaviors of the old people are judged through a threshold value. For example, Bangliliu et al acquires 3D human body bone joints through an RGBD sensor, extracts bone features, and constructs GBSW to identify human body behaviors through aggregation of the bone features and geometric features, but the method causes overall performance reduction due to inaccurate or missing identification of bone joint points. For identifying behaviors of the old, a sensor data form is mostly used, for example, Vepakomma and the like designs a sensor worn on a wrist, and the sensor is used for classifying fine and complex behaviors in household Daily Activities (Activities of Daily Life, ADL) through innovative multi-modal perception and data analysis of a deep learning neural network to complete identification of abnormal behaviors. But the sensor approach recognizes less accurately for some behaviors that are similar in posture but different in behavior. In summary, generally, one kind of data is collected and data features of different actions are extracted to distinguish the behavior categories of the old people, and the influence of the external factors cannot be eliminated by other means after the influence of the external factors is received, so that the identification rate is low.

Disclosure of Invention

The invention provides an old people behavior identification and detection method based on old people behavior characteristics, and aims to solve the problem that the accuracy rate of identifying the actions of old people is low in the prior art.

In order to achieve the above object, the present invention provides a method for identifying and detecting behavior of an elderly person based on behavior characteristics of the elderly person, comprising:

wearing an acceleration sensor on the body of the old person to enable the old person to do specified actions;

data acquisition is carried out through Kinect equipment and the acceleration sensor, action videos are obtained through the Kinect equipment, and body acceleration data are obtained through the acceleration sensor;

extracting a motion frame from the motion video; training a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network by using the processed action frame to obtain a first feature extraction channel, and extracting spatial features and temporal features of the action frame by using the first feature extraction channel;

training a three-layer long-short term memory (LSTM) network by using the body acceleration data to obtain a second feature extraction channel, wherein the second feature extraction channel extracts numerical features and time features of the body acceleration data;

and fusing the output results of the first characteristic extraction channel and the second characteristic extraction channel through a feedforward neural network to obtain a target result.

Preferably, the acceleration sensor is a triaxial acceleration sensor, and the acceleration sensor is arranged on the neck, left shoulder, right shoulder, left upper arm, right upper arm, left forearm, right forearm, left hand, right hand, spine, hip joint, left thigh, right thigh, left calf and right calf of the elderly.

Preferably, extracting motion frames from the motion video comprises:

extracting frames capable of representing actions from video frames of the action video at equal set time intervals to serve as target frames;

and preprocessing the target frame to obtain the action frame.

Preferably, the obtaining the action frame by preprocessing the target frame comprises:

mapping a dynamic bone point model of a KinectSDK self-contained by the KinectSDK to a video frame of the action video at the same moment by utilizing a bone tracking technology in the Kinect SDK matched with Kinect equipment;

obtaining the range of the dynamic skeleton point model in the target frame;

intercepting the target frame according to the range of the dynamic skeleton point model to obtain the action frame containing the action of the old;

and uniformly processing the action frames into a format of 64 × 3, and performing normalization processing.

Preferably, the convolutional neural network CNN adopts a VGG-16 network structure, the convolutional neural network CNN includes a plurality of CNN input layers, each CNN input layer is connected to a convolutional layer, the convolutional layers are connected to pooling layers, the convolutional layers are connected to a plurality of stages of combination configuration of the pooling layers, wherein a BatchNormalization layer is added behind each pooling layer, finally, the pooling layers are connected to first full-connection layers, and each first full-connection layer is connected to an input end of the long-short term memory LSTM network.

Preferably, the output end of the long-short term memory LSTM network is connected with a second full connection layer, and the second full connection layer is connected with the input end of the feedforward neural network.

Preferably, the input layer of the three-layer long-short term memory LSTM network is connected with a hidden layer, the hidden layer is connected with a third full-connection layer, the third full-connection layer is connected with the feedforward neural network, and the hidden layer is of a three-layer structure formed by LSTM network units.

Preferably, training the convolutional neural network CNN and the long-short term memory LSTM network with the processed action frame to obtain a first feature extraction channel includes:

taking action frames of a plurality of sets of actions as a first training set, and taking the action frames of the plurality of sets of actions as a first verification set;

circularly training the convolutional neural network CNN and the long-short term memory LSTM network through the first training set to obtain a first feature extraction channel, wherein an optimization function of the convolutional neural network CNN and the long-short term memory LSTM network is SGD;

and applying the first verification set to the first feature extraction channel, acquiring and judging whether the loss function value of the first feature extraction channel meets the requirement, and if not, continuing to train the convolutional neural network CNN and the long-short term memory LSTM network.

Preferably, training the three-layer long-short term memory LSTM network with the body acceleration data to obtain a second feature extraction channel comprises:

taking body acceleration data of a plurality of sets of actions as a second training set, and taking action frames of the plurality of sets of actions as a second verification set;

circularly training the three-layer long-short term memory LSTM network through the second training set to obtain a second feature extraction channel;

and inputting the second verification set into the second feature extraction channel, acquiring and judging whether the loss function value of the second feature extraction channel meets the requirement, and if not, continuing to train the three-layer long-short term memory LSTM network.

Preferably, the output results of the first feature extraction channel and the second feature extraction channel of all the specified actions are taken to train the feedforward neural network, and the parameters of the feedforward neural network are adjusted to correctly classify all the actions of the feedforward neural network.

The method for identifying and detecting the behavior of the old people based on the behavior characteristics of the old people, which is provided by the application, has the following beneficial effects:

obtaining an action video when a specified action is executed, performing deep learning through a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network, and extracting spatial features and temporal features of an action frame; acquiring body acceleration data when a specified action is executed, performing deep learning through a three-layer long-short term memory (LSTM) network, and extracting data characteristics and time characteristics of the body acceleration data; the spatial and temporal features of the motion frames and the data and temporal features of the body acceleration data are input to a feed-forward neural network learning classification.

The action performed by the elderly is slower than that performed by the younger person so that the temporal characteristics of the action frame and the temporal characteristics of the acceleration data of the elderly are significantly different from those of the younger person when performing the same prescribed action, and the feedforward neural network training distinguishes whether the elderly or the younger person performs the prescribed action by distinguishing the difference.

The designated action is recognized through the action video and the body acceleration data, mutual complementation is achieved, and high recognition accuracy is guaranteed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a flowchart of an elderly behavior recognition detection method based on elderly behavior characteristics according to an embodiment of the present invention;

fig. 2 is a diagram of a neural network architecture of an elderly behavior recognition detection method based on elderly behavior characteristics according to an embodiment of the present invention;

FIG. 3 is an architecture diagram of a convolutional neural network CNN and a long-short term memory LSTM network in an embodiment of the present invention;

FIG. 4 is a diagram of the architecture of a three-layer long short term memory LSTM network in an embodiment of the present invention;

FIG. 5 is an architecture diagram of an LSTM network element in an embodiment of the present invention; .

FIG. 6 is a schematic diagram of hip acceleration during performance of a prescribed motion by an elderly person in accordance with an embodiment of the present invention;

FIG. 7 is a diagram illustrating left hand acceleration during performance of an action by an elderly person in accordance with an embodiment of the present invention;

figure 8 is a schematic representation of the hip acceleration of an elderly person walking in an embodiment of the present invention;

figure 9 is a schematic representation of the hip acceleration of a young person walking in an embodiment of the present invention.

The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention is based on the fact that the exercise speed and the reaction of the old are obviously slower than those of the young, and the time for doing one action is longer; the swinging characteristics of the old people in vertical walking have individual difference, but the swinging degree of the body of the old people is larger than that of the young people on the whole, the knees are bent, the acceleration of hip joints is small, the acceleration of the head of the old people is large, the old people are easy to incline forwards when walking, and the ankle joints are weak to bend; from the view point of space-time variables, the stride speed of the movement of the old people is reduced along with the increase of the age; the movement speed and the anteversion angle of the lower leg of the old people are obviously higher than those of the young people in the process of standing and sitting, the dropping force is weak, the old people cannot sit on a short stool or chair, and the old people need to support the surrounding objects in the process of sitting. In the in-process of lying down and getting up, more focus on the condition of using the trunk, elbow and buttock to accomplish corresponding action, refer to fig. 6 and fig. 7, fig. 6 shows that the old person is standing in the execution, the walking, sit down, lie, drink water, squat, bend over, see the cell-phone, see newspaper, water flowers, stretch out lazy waist, the hip acceleration condition when yawning the action, fig. 7 shows that the old person is standing in the execution, the walking, sit down, lie, drink water, squat, bend over, see the cell-phone, see newspaper, water flowers, stretch lazy waist, the left hand acceleration condition when yawning the action.

Referring to fig. 1 and fig. 2 in combination, the present invention provides a method for identifying and detecting behavior of an elderly person based on behavior characteristics of the elderly person, including:

s100, wearing an acceleration sensor on the body of the old person to enable the old person to perform specified actions;

s200, data acquisition is carried out through the Kinect equipment and the acceleration sensor, a motion video is obtained through the Kinect equipment, and body acceleration data are obtained through the acceleration sensor;

s300, extracting motion frames from the motion video; training a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network by using the processed action frame to obtain a first feature extraction channel, wherein the first feature extraction channel extracts spatial features and temporal features of the action frame; since older people tend to be slow in action, it is often possible to distinguish whether a given action is performed by a young person or an older person by the temporal characteristics of the action frame.

S400, training the three-layer long-short term memory LSTM network by using the body acceleration data to obtain a second feature extraction channel, wherein the first feature extraction channel extracts numerical features and time features of the body acceleration data; it is often possible to distinguish by the body acceleration data time characteristic whether a given action is performed by a young person or an elderly person. Referring specifically to figures 8 and 9, the pattern of hip acceleration when walking for the elderly is clearly different from the pattern of hip acceleration when walking for the younger.

And S500, fusing the output results of the first characteristic extraction channel and the second characteristic extraction channel through a feedforward neural network to obtain a target result.

In the specific implementation process, the selection of the sample of the aged people requires the aged people with the age range of about 60-80 and no diseases on physical characteristics to be found.

The specific wearing positions for wearing the acceleration sensor on the body of the old are as follows: three-axis acceleration sensors are respectively worn on the neck, the left shoulder, the right shoulder, the left upper arm, the right upper arm, the left forearm, the right forearm, the left hand, the right hand, the hip joint, the left thigh, the right thigh, the left calf and the right calf of each old person. Four three-axis acceleration sensors are worn along the spine at the spine.

The method comprises the following steps of enabling the old to do designated actions, wherein the designated actions include but are not limited to standing, walking, sitting, lying, drinking, squatting, bending, watching a mobile phone, watching newspaper, watering flowers, stretching to the lazy waist and yawning, each action is repeated for about 10 times, and the actions are required to be natural and free of constraints.

Acquiring data of the appointed action through a Kinect device and the acceleration sensor, acquiring an action video through the Kinect device, and acquiring body acceleration data of the old when the old carries out the appointed action through the acceleration sensor;

the process of extracting motion frames from the motion video is as follows: extracting frames capable of representing actions from video frames of the action video at equal set time intervals to serve as target frames; one possible implementation is that, for each of the designated actions, 12 target frames are taken at equal time intervals from the video frame corresponding to the designated action.

And preprocessing the target frame to obtain the action frame. Mapping a dynamic skeleton point model carried by a KinectSDK to a video frame of the action video at the same moment by utilizing a skeleton tracking technology in the KinectSDK matched with Kinect equipment; specifically, the dynamic skeletal point model adds a mark shape to all skeletal points supported by a skeletal tracking technology in the WPF engineering, and puts JointTrackingState to Tracked, so that pixels of the skeletal point mark shape can be seen in a video frame of the action video:

obtaining the range of the dynamic skeleton point model in the target frame; specifically, determining the range of the bone point mark shape in the target frame;

intercepting the target frame according to the range of the dynamic skeleton point model to obtain the action frame containing the actions of the old; specifically, the target frame is cropped by a rectangle containing all the bone point mark shapes, and an image in the range of the rectangle is reserved as the action frame. The action frame contains only the action of the elderly. And increasing the number of action frames representing the same specified action through a data enhancement technology so as to enlarge the sample.

Training a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network by using the processed action frame to obtain a first feature extraction channel, wherein the first feature extraction channel extracts spatial features and temporal features of the action frame; specifically, referring to fig. 3, the convolutional neural network CNN adopts a VGG-16 network structure, and the convolutional neural network CNN includes a plurality of CNN input layers, each CNN input layer is respectively connected to a convolutional layer (conv in the figure), the convolutional layers are respectively connected to a pooling layer (pooling in the figure), and the convolutional layers are connected to 4 levels of the pooling layer in a combined configuration; convolution kernel size 3 x 3 for the convolution layer of the first stage, convolution kernel number 32, step size 2 x 2 for the pooling layer of the first stage, convolution kernel size 3 x 3 for the convolution layer of the second stage, convolution kernel number 64, step size 2 x 2 for the pooling layer of the second stage, convolution kernel size 3 x 3 for the convolution layer of the third stage, convolution kernel number 128, step size 2 x 2 for the pooling layer of the third stage, convolution kernel size 3 x 3 for the convolution layer of the fourth stage, convolution kernel number 128, step size 2 x 2 for the pooling layer of the fourth stage; wherein, a BatchNormalization layer is added behind each pooling layer, so as to avoid the condition of slower and slower convergence along with the disappearance of the gradient; the final pooling layers are respectively connected with first full connection layers (FC 1 layers in the figure), and each first full connection layer is respectively connected with the input end of the long-short term memory (LSTM) network. The output end of the long-short term memory LSTM network is connected with a second full connection layer (FC 2 layer in the figure), and the second full connection layer is connected with the input end of the feedforward neural network.

The specific training process for the convolutional neural network CNN and the long-short term memory LSTM network comprises the following steps:

training the convolutional neural network CNN and the long-short term memory LSTM network by using the action frame to acquire a first feature extraction channel comprises the following steps:

taking 72% of action frames of any action as a first training set, and taking 8% of action frames of the action as a first verification set;

circularly training the convolutional neural network CNN and the long-short term memory LSTM network through the first training set to obtain a first feature extraction channel, specifically, taking 12 processed action frames as input of the convolutional neural network CNN and the long-short term memory LSTM network to perform iterative training on the convolutional neural network CNN and the long-short term memory LSTM network, and adjusting parameters and an optimization function, wherein the optimization function of the convolutional neural network CNN and the long-short term memory LSTM network is SGD (Stochastic gradient component, parameter update once for each training sample) each time of execution;

and applying the first verification set to the first feature extraction channel, acquiring and judging whether the loss function value of the first feature extraction channel meets the requirement (if the loss function value is smaller than a set threshold), and if not, continuing to train the convolutional neural network CNN and the long-short term memory LSTM network. Wherein the loss function of the first feature extraction channel is cross entropy.

And training the three-layer long-short term memory (LSTM) network by using the body acceleration data to obtain a second feature extraction channel, wherein the first feature extraction channel extracts numerical features and time features of the body acceleration data. In particular, referring to FIG. 4, the three-layer long-short term memory LSThe input layer of the TM network is connected with a hidden layer, the hidden layer is connected with a third full-connection layer, the third full-connection layer is connected with the feedforward neural network, and the hidden layer is of a three-layer structure formed by LSTM network units. Referring to fig. 5, the LSTM network element includes a forgetting gate, an input gate, a candidate gate, an output gate, an update state, and a hidden state. Forget gate, i.e. the first step of LSTM, selects what information we want to discard from the cell state, formula f _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f ) (ii) a The input gate and the candidate gate are used to decide how much new information we let in, and the input gate is expressed as i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i ) The candidate gate has the formula of

The update state is the state of the cell at time t-1 is updated to the state of the cell at time t, and the formula is

The output gate indicates which information is output at this time t, and the formula is o _t ＝σ(W ₀ ·[h _t-1 ,x _t ]+b ₀ )；h _t The hidden state is needed at the next moment and has the formula h _t ＝o _t ·tanh(C _t )。

Wherein σ (x) is 1/(1+ e) ^-x ) For sigmoid activation function, tanh (x) ═ (e) ^x -e ^-x )/(e ^x +e ^-x ) As a hyperbolic tangent activation function, W _k And b _k Respectively representing the weight matrix and the deviation vector.

Specifically, training the three-layer long-short term memory LSTM network with the body acceleration data to obtain a second feature extraction channel includes:

taking body acceleration data of 72% of actions as a second training set, and taking 8% of action frames of the actions as a second verification set; specifically, the time taken to complete one motion was averaged to 199, and each motion was represented by 200 time-series body acceleration data.

Training 200 time-series body acceleration data as the input of the three-layer long-short term memory LSTM network, wherein the learning rate of the three-layer long-short term memory LSTM network is set to be 0.0001, and the loss function is a cross entropy loss function with L1 regularization;

and inputting the second verification set into the second feature extraction channel, acquiring and judging whether the loss function value of the second feature extraction channel meets the requirement (if the loss function value is smaller than a set second feature extraction channel loss function threshold), and if not, continuing to train the three-layer long-short term memory LSTM network.

The feedforward neural network used for obtaining the target result comprises an input layer, three hidden layers and an output layer.

The feed-forward neural network training process for obtaining the target result comprises: taking output results of the first characteristic extraction channel and the second characteristic extraction channel of all the specified actions to train the feedforward neural network, and adjusting parameters of the feedforward neural network;

and testing the feedforward neural network by using the rest 20% of action frames and body acceleration data, acquiring and judging whether the loss function value of the feedforward neural network meets the requirement, and if not, continuing to train the feedforward neural network.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An old person behavior identification and detection method based on old person behavior characteristics is characterized by comprising the following steps:

data acquisition is carried out through a Kinect device and the acceleration sensor, a motion video is obtained through the Kinect device, and body acceleration data are obtained through the acceleration sensor;

extracting a motion frame from the motion video; training a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network by using the processed action frame to obtain a first feature extraction channel, and extracting spatial features and temporal features of the action frame by using the first feature extraction channel; the convolutional neural network CNN comprises a plurality of CNN input layers, each CNN input layer is respectively connected with a convolutional layer, the convolutional layers are respectively connected with pooling layers, the convolutional layers are connected with a plurality of stages of combination configuration of the pooling layers, a BatchNorhesion layer is added behind each pooling layer, the final pooling layer is respectively connected with first full-connection layers, and each first full-connection layer is respectively connected with the input end of the long-short term memory LSTM network;

2. The method for identifying and detecting behavior of an elderly person based on behavior characteristics of the elderly person as claimed in claim 1, wherein the acceleration sensor is a three-axis acceleration sensor, and the acceleration sensor is disposed on the neck, left shoulder, right shoulder, left upper arm, right upper arm, left forearm, right forearm, left hand, right hand, spine, hip joint, left thigh, right thigh, left calf, and right calf of the elderly person.

3. The method according to claim 1, wherein the extracting of the action frame from the action video comprises:

and obtaining the action frame by preprocessing the target frame.

4. The method according to claim 3, wherein the obtaining the action frame by preprocessing the target frame comprises:

mapping a dynamic skeleton point model carried by a KinectSDK to a video frame of the action video at the same moment by utilizing a skeleton tracking technology in the KinectSDK matched with Kinect equipment;

obtaining the range of the dynamic skeleton point model in the target frame;

intercepting the target frame according to the range of the dynamic skeleton point model to obtain the action frame containing the actions of the old;

5. The method for identifying and detecting the behavior of the elderly based on the behavior characteristics of the elderly according to claim 1, wherein the Convolutional Neural Network (CNN) adopts a VGG-16 network structure.

6. The method for identifying and detecting the behavior of the elderly based on the behavior characteristics of the elderly as claimed in claim 1, wherein the output terminal of the long-short term memory (LSTM) network is connected to a second full connection layer, and the second full connection layer is connected to the input terminal of the feedforward neural network.

7. The method for identifying and detecting the behavior of the elderly based on the behavior characteristics of the elderly according to claim 1, wherein the input layer of the three-layer long-short term memory (LSTM) network is connected with a hidden layer, the hidden layer is connected with a third full-connection layer, the third full-connection layer is connected with the feedforward neural network, and the hidden layer is a three-layer structure formed by LSTM network units.

8. The method of claim 1, wherein training the Convolutional Neural Network (CNN) and the long-short term memory (LSTM) network with the processed action frame to obtain a first feature extraction channel comprises:

taking action frames of a plurality of sets of actions as a first training set, and taking action frames of a plurality of sets of actions as a first verification set;

9. The method for identifying and detecting behavior of the elderly based on behavior characteristics of the elderly as claimed in claim 1, wherein training the three-layer long-short term memory (LSTM) network with the body acceleration data to obtain the second feature extraction channel comprises:

10. The method for identifying and detecting the behavior of the elderly based on the behavior features of the elderly according to claim 1, wherein the feedforward neural network is trained by taking output results of the first feature extraction channel and the second feature extraction channel of all the designated actions, and parameters of the feedforward neural network are adjusted until a loss function value of the feedforward neural network meets requirements.