CN112101235B - Old people behavior identification and detection method based on old people behavior characteristics - Google Patents

Old people behavior identification and detection method based on old people behavior characteristics Download PDF

Info

Publication number
CN112101235B
CN112101235B CN202010977729.XA CN202010977729A CN112101235B CN 112101235 B CN112101235 B CN 112101235B CN 202010977729 A CN202010977729 A CN 202010977729A CN 112101235 B CN112101235 B CN 112101235B
Authority
CN
China
Prior art keywords
extraction channel
layer
network
feature extraction
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010977729.XA
Other languages
Chinese (zh)
Other versions
CN112101235A (en
Inventor
冯志全
孔丹
徐涛
杨晓晖
田京兰
范雪
郭庆北
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN202010977729.XA priority Critical patent/CN112101235B/en
Publication of CN112101235A publication Critical patent/CN112101235A/en
Application granted granted Critical
Publication of CN112101235B publication Critical patent/CN112101235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01PMEASURING LINEAR OR ANGULAR SPEED, ACCELERATION, DECELERATION, OR SHOCK; INDICATING PRESENCE, ABSENCE, OR DIRECTION, OF MOVEMENT
    • G01P15/00Measuring acceleration; Measuring deceleration; Measuring shock, i.e. sudden change of acceleration
    • G01P15/18Measuring acceleration; Measuring deceleration; Measuring shock, i.e. sudden change of acceleration in two or more dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an old people behavior identification and detection method based on old people behavior characteristics, which comprises the following steps: wearing an acceleration sensor on the body of the old person to enable the old person to do specified actions; data acquisition is carried out through a Kinect device and the acceleration sensor, a motion video is obtained through the Kinect device, and body acceleration data are obtained through the acceleration sensor; extracting a motion frame from the motion video; extracting spatial features and temporal features of the action frame by using a first feature extraction channel; extracting numerical characteristics and time characteristics of the body acceleration data by a second characteristic extraction channel; and fusing the output results of the first characteristic extraction channel and the second characteristic extraction channel through a feedforward neural network to obtain a target result. The invention can identify the behavior of the old people by utilizing the time difference between the behavior characteristics of the old people and the behavior characteristics of the young people, and the behavior identification is carried out by the action video and the body acceleration data characteristics, so that the identification is more accurate by mutual supplement.

Description

Old people behavior identification and detection method based on old people behavior characteristics
Technical Field
The invention relates to the technical field of neural network identification, in particular to an old people behavior identification and detection method based on old people behavior characteristics.
Background
The gradual aging of the population and the care of the elderly have become problems to be faced worldwide. The aging population leads to the increase of health and social service activities, so that the intelligent accompanying service and the timely monitoring of the old become new research hotspots in the current society. In the process of researching the timely monitoring of the old people, the method has important significance in timely identifying the behaviors of the old people.
At present, the study on human body posture of the whole age group of Bangliliu and other people acquire 3D human body skeleton joints through RGBD sensors, extract skeleton characteristics, aggregate the skeleton characteristics with geometric characteristics, and construct GBSW to identify human body behaviors, the model can identify multi-person actions and interactive identification, but the method has the advantage that the overall performance is reduced due to inaccurate or missing skeleton joint identification points, so that the accuracy is only 0.65. As another example, an Intelligent Fuzzy Inference (IFI) algorithm based human activity is detected with an accuracy of 0.71 by acquiring acceleration and angular velocity data of the upper leg and lower leg through an intelligent shoe equipped with an (IMU). In addition, human motion detection is performed on fluctuations of wireless local area device signals, useful information is extracted from a Channel State Information (CSI) stream in wireless device information based on a principal component analysis method, and then human behaviors are classified by adopting a deep Forest Decision (FD) method, wherein the accuracy is 0.75.
In addition, according to the method adopted by the existing research behavior detection, the identification research on the daily life behaviors of the old people stays in the utilization of image features, or three-dimensional positions of bone points, or sensor data. One of the three types of data is generally collected and data features of different actions are extracted to distinguish the behavior categories of the elderly. For example, g.anitha and s.baghavathi Priya propose a detection and recognition system for five abnormal behaviors of old people, such as forward leaning, chest pain, backward walking, headache and vomiting, based on a dynamic bayesian network, and image features are adopted, specifically, differences among the behaviors are found from the geometric features of the postures of the old people through the acquired images, and then the behaviors of the old people are judged through a threshold value. For example, Bangliliu et al acquires 3D human body bone joints through an RGBD sensor, extracts bone features, and constructs GBSW to identify human body behaviors through aggregation of the bone features and geometric features, but the method causes overall performance reduction due to inaccurate or missing identification of bone joint points. For identifying behaviors of the old, a sensor data form is mostly used, for example, Vepakomma and the like designs a sensor worn on a wrist, and the sensor is used for classifying fine and complex behaviors in household Daily Activities (Activities of Daily Life, ADL) through innovative multi-modal perception and data analysis of a deep learning neural network to complete identification of abnormal behaviors. But the sensor approach recognizes less accurately for some behaviors that are similar in posture but different in behavior. In summary, generally, one kind of data is collected and data features of different actions are extracted to distinguish the behavior categories of the old people, and the influence of the external factors cannot be eliminated by other means after the influence of the external factors is received, so that the identification rate is low.
Disclosure of Invention
The invention provides an old people behavior identification and detection method based on old people behavior characteristics, and aims to solve the problem that the accuracy rate of identifying the actions of old people is low in the prior art.
In order to achieve the above object, the present invention provides a method for identifying and detecting behavior of an elderly person based on behavior characteristics of the elderly person, comprising:
wearing an acceleration sensor on the body of the old person to enable the old person to do specified actions;
data acquisition is carried out through Kinect equipment and the acceleration sensor, action videos are obtained through the Kinect equipment, and body acceleration data are obtained through the acceleration sensor;
extracting a motion frame from the motion video; training a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network by using the processed action frame to obtain a first feature extraction channel, and extracting spatial features and temporal features of the action frame by using the first feature extraction channel;
training a three-layer long-short term memory (LSTM) network by using the body acceleration data to obtain a second feature extraction channel, wherein the second feature extraction channel extracts numerical features and time features of the body acceleration data;
and fusing the output results of the first characteristic extraction channel and the second characteristic extraction channel through a feedforward neural network to obtain a target result.
Preferably, the acceleration sensor is a triaxial acceleration sensor, and the acceleration sensor is arranged on the neck, left shoulder, right shoulder, left upper arm, right upper arm, left forearm, right forearm, left hand, right hand, spine, hip joint, left thigh, right thigh, left calf and right calf of the elderly.
Preferably, extracting motion frames from the motion video comprises:
extracting frames capable of representing actions from video frames of the action video at equal set time intervals to serve as target frames;
and preprocessing the target frame to obtain the action frame.
Preferably, the obtaining the action frame by preprocessing the target frame comprises:
mapping a dynamic bone point model of a KinectSDK self-contained by the KinectSDK to a video frame of the action video at the same moment by utilizing a bone tracking technology in the Kinect SDK matched with Kinect equipment;
obtaining the range of the dynamic skeleton point model in the target frame;
intercepting the target frame according to the range of the dynamic skeleton point model to obtain the action frame containing the action of the old;
and uniformly processing the action frames into a format of 64 × 3, and performing normalization processing.
Preferably, the convolutional neural network CNN adopts a VGG-16 network structure, the convolutional neural network CNN includes a plurality of CNN input layers, each CNN input layer is connected to a convolutional layer, the convolutional layers are connected to pooling layers, the convolutional layers are connected to a plurality of stages of combination configuration of the pooling layers, wherein a BatchNormalization layer is added behind each pooling layer, finally, the pooling layers are connected to first full-connection layers, and each first full-connection layer is connected to an input end of the long-short term memory LSTM network.
Preferably, the output end of the long-short term memory LSTM network is connected with a second full connection layer, and the second full connection layer is connected with the input end of the feedforward neural network.
Preferably, the input layer of the three-layer long-short term memory LSTM network is connected with a hidden layer, the hidden layer is connected with a third full-connection layer, the third full-connection layer is connected with the feedforward neural network, and the hidden layer is of a three-layer structure formed by LSTM network units.
Preferably, training the convolutional neural network CNN and the long-short term memory LSTM network with the processed action frame to obtain a first feature extraction channel includes:
taking action frames of a plurality of sets of actions as a first training set, and taking the action frames of the plurality of sets of actions as a first verification set;
circularly training the convolutional neural network CNN and the long-short term memory LSTM network through the first training set to obtain a first feature extraction channel, wherein an optimization function of the convolutional neural network CNN and the long-short term memory LSTM network is SGD;
and applying the first verification set to the first feature extraction channel, acquiring and judging whether the loss function value of the first feature extraction channel meets the requirement, and if not, continuing to train the convolutional neural network CNN and the long-short term memory LSTM network.
Preferably, training the three-layer long-short term memory LSTM network with the body acceleration data to obtain a second feature extraction channel comprises:
taking body acceleration data of a plurality of sets of actions as a second training set, and taking action frames of the plurality of sets of actions as a second verification set;
circularly training the three-layer long-short term memory LSTM network through the second training set to obtain a second feature extraction channel;
and inputting the second verification set into the second feature extraction channel, acquiring and judging whether the loss function value of the second feature extraction channel meets the requirement, and if not, continuing to train the three-layer long-short term memory LSTM network.
Preferably, the output results of the first feature extraction channel and the second feature extraction channel of all the specified actions are taken to train the feedforward neural network, and the parameters of the feedforward neural network are adjusted to correctly classify all the actions of the feedforward neural network.
The method for identifying and detecting the behavior of the old people based on the behavior characteristics of the old people, which is provided by the application, has the following beneficial effects:
obtaining an action video when a specified action is executed, performing deep learning through a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network, and extracting spatial features and temporal features of an action frame; acquiring body acceleration data when a specified action is executed, performing deep learning through a three-layer long-short term memory (LSTM) network, and extracting data characteristics and time characteristics of the body acceleration data; the spatial and temporal features of the motion frames and the data and temporal features of the body acceleration data are input to a feed-forward neural network learning classification.
The action performed by the elderly is slower than that performed by the younger person so that the temporal characteristics of the action frame and the temporal characteristics of the acceleration data of the elderly are significantly different from those of the younger person when performing the same prescribed action, and the feedforward neural network training distinguishes whether the elderly or the younger person performs the prescribed action by distinguishing the difference.
The designated action is recognized through the action video and the body acceleration data, mutual complementation is achieved, and high recognition accuracy is guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a flowchart of an elderly behavior recognition detection method based on elderly behavior characteristics according to an embodiment of the present invention;
fig. 2 is a diagram of a neural network architecture of an elderly behavior recognition detection method based on elderly behavior characteristics according to an embodiment of the present invention;
FIG. 3 is an architecture diagram of a convolutional neural network CNN and a long-short term memory LSTM network in an embodiment of the present invention;
FIG. 4 is a diagram of the architecture of a three-layer long short term memory LSTM network in an embodiment of the present invention;
FIG. 5 is an architecture diagram of an LSTM network element in an embodiment of the present invention; .
FIG. 6 is a schematic diagram of hip acceleration during performance of a prescribed motion by an elderly person in accordance with an embodiment of the present invention;
FIG. 7 is a diagram illustrating left hand acceleration during performance of an action by an elderly person in accordance with an embodiment of the present invention;
figure 8 is a schematic representation of the hip acceleration of an elderly person walking in an embodiment of the present invention;
figure 9 is a schematic representation of the hip acceleration of a young person walking in an embodiment of the present invention.
The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention is based on the fact that the exercise speed and the reaction of the old are obviously slower than those of the young, and the time for doing one action is longer; the swinging characteristics of the old people in vertical walking have individual difference, but the swinging degree of the body of the old people is larger than that of the young people on the whole, the knees are bent, the acceleration of hip joints is small, the acceleration of the head of the old people is large, the old people are easy to incline forwards when walking, and the ankle joints are weak to bend; from the view point of space-time variables, the stride speed of the movement of the old people is reduced along with the increase of the age; the movement speed and the anteversion angle of the lower leg of the old people are obviously higher than those of the young people in the process of standing and sitting, the dropping force is weak, the old people cannot sit on a short stool or chair, and the old people need to support the surrounding objects in the process of sitting. In the in-process of lying down and getting up, more focus on the condition of using the trunk, elbow and buttock to accomplish corresponding action, refer to fig. 6 and fig. 7, fig. 6 shows that the old person is standing in the execution, the walking, sit down, lie, drink water, squat, bend over, see the cell-phone, see newspaper, water flowers, stretch out lazy waist, the hip acceleration condition when yawning the action, fig. 7 shows that the old person is standing in the execution, the walking, sit down, lie, drink water, squat, bend over, see the cell-phone, see newspaper, water flowers, stretch lazy waist, the left hand acceleration condition when yawning the action.
Referring to fig. 1 and fig. 2 in combination, the present invention provides a method for identifying and detecting behavior of an elderly person based on behavior characteristics of the elderly person, including:
s100, wearing an acceleration sensor on the body of the old person to enable the old person to perform specified actions;
s200, data acquisition is carried out through the Kinect equipment and the acceleration sensor, a motion video is obtained through the Kinect equipment, and body acceleration data are obtained through the acceleration sensor;
s300, extracting motion frames from the motion video; training a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network by using the processed action frame to obtain a first feature extraction channel, wherein the first feature extraction channel extracts spatial features and temporal features of the action frame; since older people tend to be slow in action, it is often possible to distinguish whether a given action is performed by a young person or an older person by the temporal characteristics of the action frame.
S400, training the three-layer long-short term memory LSTM network by using the body acceleration data to obtain a second feature extraction channel, wherein the first feature extraction channel extracts numerical features and time features of the body acceleration data; it is often possible to distinguish by the body acceleration data time characteristic whether a given action is performed by a young person or an elderly person. Referring specifically to figures 8 and 9, the pattern of hip acceleration when walking for the elderly is clearly different from the pattern of hip acceleration when walking for the younger.
And S500, fusing the output results of the first characteristic extraction channel and the second characteristic extraction channel through a feedforward neural network to obtain a target result.
In the specific implementation process, the selection of the sample of the aged people requires the aged people with the age range of about 60-80 and no diseases on physical characteristics to be found.
The specific wearing positions for wearing the acceleration sensor on the body of the old are as follows: three-axis acceleration sensors are respectively worn on the neck, the left shoulder, the right shoulder, the left upper arm, the right upper arm, the left forearm, the right forearm, the left hand, the right hand, the hip joint, the left thigh, the right thigh, the left calf and the right calf of each old person. Four three-axis acceleration sensors are worn along the spine at the spine.
The method comprises the following steps of enabling the old to do designated actions, wherein the designated actions include but are not limited to standing, walking, sitting, lying, drinking, squatting, bending, watching a mobile phone, watching newspaper, watering flowers, stretching to the lazy waist and yawning, each action is repeated for about 10 times, and the actions are required to be natural and free of constraints.
Acquiring data of the appointed action through a Kinect device and the acceleration sensor, acquiring an action video through the Kinect device, and acquiring body acceleration data of the old when the old carries out the appointed action through the acceleration sensor;
the process of extracting motion frames from the motion video is as follows: extracting frames capable of representing actions from video frames of the action video at equal set time intervals to serve as target frames; one possible implementation is that, for each of the designated actions, 12 target frames are taken at equal time intervals from the video frame corresponding to the designated action.
And preprocessing the target frame to obtain the action frame. Mapping a dynamic skeleton point model carried by a KinectSDK to a video frame of the action video at the same moment by utilizing a skeleton tracking technology in the KinectSDK matched with Kinect equipment; specifically, the dynamic skeletal point model adds a mark shape to all skeletal points supported by a skeletal tracking technology in the WPF engineering, and puts JointTrackingState to Tracked, so that pixels of the skeletal point mark shape can be seen in a video frame of the action video:
obtaining the range of the dynamic skeleton point model in the target frame; specifically, determining the range of the bone point mark shape in the target frame;
intercepting the target frame according to the range of the dynamic skeleton point model to obtain the action frame containing the actions of the old; specifically, the target frame is cropped by a rectangle containing all the bone point mark shapes, and an image in the range of the rectangle is reserved as the action frame. The action frame contains only the action of the elderly. And increasing the number of action frames representing the same specified action through a data enhancement technology so as to enlarge the sample.
And uniformly processing the action frames into a format of 64 × 3, and performing normalization processing.
Training a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network by using the processed action frame to obtain a first feature extraction channel, wherein the first feature extraction channel extracts spatial features and temporal features of the action frame; specifically, referring to fig. 3, the convolutional neural network CNN adopts a VGG-16 network structure, and the convolutional neural network CNN includes a plurality of CNN input layers, each CNN input layer is respectively connected to a convolutional layer (conv in the figure), the convolutional layers are respectively connected to a pooling layer (pooling in the figure), and the convolutional layers are connected to 4 levels of the pooling layer in a combined configuration; convolution kernel size 3 x 3 for the convolution layer of the first stage, convolution kernel number 32, step size 2 x 2 for the pooling layer of the first stage, convolution kernel size 3 x 3 for the convolution layer of the second stage, convolution kernel number 64, step size 2 x 2 for the pooling layer of the second stage, convolution kernel size 3 x 3 for the convolution layer of the third stage, convolution kernel number 128, step size 2 x 2 for the pooling layer of the third stage, convolution kernel size 3 x 3 for the convolution layer of the fourth stage, convolution kernel number 128, step size 2 x 2 for the pooling layer of the fourth stage; wherein, a BatchNormalization layer is added behind each pooling layer, so as to avoid the condition of slower and slower convergence along with the disappearance of the gradient; the final pooling layers are respectively connected with first full connection layers (FC 1 layers in the figure), and each first full connection layer is respectively connected with the input end of the long-short term memory (LSTM) network. The output end of the long-short term memory LSTM network is connected with a second full connection layer (FC 2 layer in the figure), and the second full connection layer is connected with the input end of the feedforward neural network.
The specific training process for the convolutional neural network CNN and the long-short term memory LSTM network comprises the following steps:
training the convolutional neural network CNN and the long-short term memory LSTM network by using the action frame to acquire a first feature extraction channel comprises the following steps:
taking 72% of action frames of any action as a first training set, and taking 8% of action frames of the action as a first verification set;
circularly training the convolutional neural network CNN and the long-short term memory LSTM network through the first training set to obtain a first feature extraction channel, specifically, taking 12 processed action frames as input of the convolutional neural network CNN and the long-short term memory LSTM network to perform iterative training on the convolutional neural network CNN and the long-short term memory LSTM network, and adjusting parameters and an optimization function, wherein the optimization function of the convolutional neural network CNN and the long-short term memory LSTM network is SGD (Stochastic gradient component, parameter update once for each training sample) each time of execution;
and applying the first verification set to the first feature extraction channel, acquiring and judging whether the loss function value of the first feature extraction channel meets the requirement (if the loss function value is smaller than a set threshold), and if not, continuing to train the convolutional neural network CNN and the long-short term memory LSTM network. Wherein the loss function of the first feature extraction channel is cross entropy.
And training the three-layer long-short term memory (LSTM) network by using the body acceleration data to obtain a second feature extraction channel, wherein the first feature extraction channel extracts numerical features and time features of the body acceleration data. In particular, referring to FIG. 4, the three-layer long-short term memory LSThe input layer of the TM network is connected with a hidden layer, the hidden layer is connected with a third full-connection layer, the third full-connection layer is connected with the feedforward neural network, and the hidden layer is of a three-layer structure formed by LSTM network units. Referring to fig. 5, the LSTM network element includes a forgetting gate, an input gate, a candidate gate, an output gate, an update state, and a hidden state. Forget gate, i.e. the first step of LSTM, selects what information we want to discard from the cell state, formula f t =σ(W f ·[h t-1 ,x t ]+b f ) (ii) a The input gate and the candidate gate are used to decide how much new information we let in, and the input gate is expressed as i t =σ(W i ·[h t-1 ,x t ]+b i ) The candidate gate has the formula of
Figure BDA0002684786500000091
The update state is the state of the cell at time t-1 is updated to the state of the cell at time t, and the formula is
Figure BDA0002684786500000092
The output gate indicates which information is output at this time t, and the formula is o t =σ(W 0 ·[h t-1 ,x t ]+b 0 );h t The hidden state is needed at the next moment and has the formula h t =o t ·tanh(C t )。
Wherein σ (x) is 1/(1+ e) -x ) For sigmoid activation function, tanh (x) ═ (e) x -e -x )/(e x +e -x ) As a hyperbolic tangent activation function, W k And b k Respectively representing the weight matrix and the deviation vector.
Specifically, training the three-layer long-short term memory LSTM network with the body acceleration data to obtain a second feature extraction channel includes:
taking body acceleration data of 72% of actions as a second training set, and taking 8% of action frames of the actions as a second verification set; specifically, the time taken to complete one motion was averaged to 199, and each motion was represented by 200 time-series body acceleration data.
Training 200 time-series body acceleration data as the input of the three-layer long-short term memory LSTM network, wherein the learning rate of the three-layer long-short term memory LSTM network is set to be 0.0001, and the loss function is a cross entropy loss function with L1 regularization;
and inputting the second verification set into the second feature extraction channel, acquiring and judging whether the loss function value of the second feature extraction channel meets the requirement (if the loss function value is smaller than a set second feature extraction channel loss function threshold), and if not, continuing to train the three-layer long-short term memory LSTM network.
The feedforward neural network used for obtaining the target result comprises an input layer, three hidden layers and an output layer.
The feed-forward neural network training process for obtaining the target result comprises: taking output results of the first characteristic extraction channel and the second characteristic extraction channel of all the specified actions to train the feedforward neural network, and adjusting parameters of the feedforward neural network;
and testing the feedforward neural network by using the rest 20% of action frames and body acceleration data, acquiring and judging whether the loss function value of the feedforward neural network meets the requirement, and if not, continuing to train the feedforward neural network.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An old person behavior identification and detection method based on old person behavior characteristics is characterized by comprising the following steps:
wearing an acceleration sensor on the body of the old person to enable the old person to do specified actions;
data acquisition is carried out through a Kinect device and the acceleration sensor, a motion video is obtained through the Kinect device, and body acceleration data are obtained through the acceleration sensor;
extracting a motion frame from the motion video; training a Convolutional Neural Network (CNN) and a long-short term memory (LSTM) network by using the processed action frame to obtain a first feature extraction channel, and extracting spatial features and temporal features of the action frame by using the first feature extraction channel; the convolutional neural network CNN comprises a plurality of CNN input layers, each CNN input layer is respectively connected with a convolutional layer, the convolutional layers are respectively connected with pooling layers, the convolutional layers are connected with a plurality of stages of combination configuration of the pooling layers, a BatchNorhesion layer is added behind each pooling layer, the final pooling layer is respectively connected with first full-connection layers, and each first full-connection layer is respectively connected with the input end of the long-short term memory LSTM network;
training a three-layer long-short term memory (LSTM) network by using the body acceleration data to obtain a second feature extraction channel, wherein the second feature extraction channel extracts numerical features and time features of the body acceleration data;
and fusing the output results of the first characteristic extraction channel and the second characteristic extraction channel through a feedforward neural network to obtain a target result.
2. The method for identifying and detecting behavior of an elderly person based on behavior characteristics of the elderly person as claimed in claim 1, wherein the acceleration sensor is a three-axis acceleration sensor, and the acceleration sensor is disposed on the neck, left shoulder, right shoulder, left upper arm, right upper arm, left forearm, right forearm, left hand, right hand, spine, hip joint, left thigh, right thigh, left calf, and right calf of the elderly person.
3. The method according to claim 1, wherein the extracting of the action frame from the action video comprises:
extracting frames capable of representing actions from video frames of the action video at equal set time intervals to serve as target frames;
and obtaining the action frame by preprocessing the target frame.
4. The method according to claim 3, wherein the obtaining the action frame by preprocessing the target frame comprises:
mapping a dynamic skeleton point model carried by a KinectSDK to a video frame of the action video at the same moment by utilizing a skeleton tracking technology in the KinectSDK matched with Kinect equipment;
obtaining the range of the dynamic skeleton point model in the target frame;
intercepting the target frame according to the range of the dynamic skeleton point model to obtain the action frame containing the actions of the old;
and uniformly processing the action frames into a format of 64 × 3, and performing normalization processing.
5. The method for identifying and detecting the behavior of the elderly based on the behavior characteristics of the elderly according to claim 1, wherein the Convolutional Neural Network (CNN) adopts a VGG-16 network structure.
6. The method for identifying and detecting the behavior of the elderly based on the behavior characteristics of the elderly as claimed in claim 1, wherein the output terminal of the long-short term memory (LSTM) network is connected to a second full connection layer, and the second full connection layer is connected to the input terminal of the feedforward neural network.
7. The method for identifying and detecting the behavior of the elderly based on the behavior characteristics of the elderly according to claim 1, wherein the input layer of the three-layer long-short term memory (LSTM) network is connected with a hidden layer, the hidden layer is connected with a third full-connection layer, the third full-connection layer is connected with the feedforward neural network, and the hidden layer is a three-layer structure formed by LSTM network units.
8. The method of claim 1, wherein training the Convolutional Neural Network (CNN) and the long-short term memory (LSTM) network with the processed action frame to obtain a first feature extraction channel comprises:
taking action frames of a plurality of sets of actions as a first training set, and taking action frames of a plurality of sets of actions as a first verification set;
circularly training the convolutional neural network CNN and the long-short term memory LSTM network through the first training set to obtain a first feature extraction channel, wherein an optimization function of the convolutional neural network CNN and the long-short term memory LSTM network is SGD;
and applying the first verification set to the first feature extraction channel, acquiring and judging whether the loss function value of the first feature extraction channel meets the requirement, and if not, continuing to train the convolutional neural network CNN and the long-short term memory LSTM network.
9. The method for identifying and detecting behavior of the elderly based on behavior characteristics of the elderly as claimed in claim 1, wherein training the three-layer long-short term memory (LSTM) network with the body acceleration data to obtain the second feature extraction channel comprises:
taking body acceleration data of a plurality of sets of actions as a second training set, and taking action frames of the plurality of sets of actions as a second verification set;
circularly training the three-layer long-short term memory LSTM network through the second training set to obtain a second feature extraction channel;
and inputting the second verification set into the second feature extraction channel, acquiring and judging whether the loss function value of the second feature extraction channel meets the requirement, and if not, continuing to train the three-layer long-short term memory LSTM network.
10. The method for identifying and detecting the behavior of the elderly based on the behavior features of the elderly according to claim 1, wherein the feedforward neural network is trained by taking output results of the first feature extraction channel and the second feature extraction channel of all the designated actions, and parameters of the feedforward neural network are adjusted until a loss function value of the feedforward neural network meets requirements.
CN202010977729.XA 2020-09-16 2020-09-16 Old people behavior identification and detection method based on old people behavior characteristics Active CN112101235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010977729.XA CN112101235B (en) 2020-09-16 2020-09-16 Old people behavior identification and detection method based on old people behavior characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010977729.XA CN112101235B (en) 2020-09-16 2020-09-16 Old people behavior identification and detection method based on old people behavior characteristics

Publications (2)

Publication Number Publication Date
CN112101235A CN112101235A (en) 2020-12-18
CN112101235B true CN112101235B (en) 2022-09-23

Family

ID=73758717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010977729.XA Active CN112101235B (en) 2020-09-16 2020-09-16 Old people behavior identification and detection method based on old people behavior characteristics

Country Status (1)

Country Link
CN (1) CN112101235B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818927A (en) * 2021-02-26 2021-05-18 上海交通大学 Real-time classification method and system for human body lower limb movement modes
CN112990153A (en) * 2021-05-11 2021-06-18 创新奇智(成都)科技有限公司 Multi-target behavior identification method and device, storage medium and electronic equipment
CN113837122B (en) * 2021-09-28 2023-07-25 重庆邮电大学 Wi-Fi channel state information-based contactless human body behavior recognition method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170089688A (en) * 2016-01-27 2017-08-04 동서대학교산학협력단 System for providing Information Technology karaoke based on audiance's action, and method thereof
CN110801227A (en) * 2019-12-09 2020-02-18 中国科学院计算技术研究所 Method and system for testing three-dimensional color block obstacle based on wearable equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092894A (en) * 2017-04-28 2017-08-25 孙恩泽 A kind of motor behavior recognition methods based on LSTM models
CN108182410A (en) * 2017-12-28 2018-06-19 南通大学 A kind of joint objective zone location and the tumble recognizer of depth characteristic study
CN109670396B (en) * 2018-11-06 2023-06-27 华南理工大学 Fall detection method for indoor old people
CN110633736A (en) * 2019-08-27 2019-12-31 电子科技大学 Human body falling detection method based on multi-source heterogeneous data fusion
CN111199202B (en) * 2019-12-30 2024-04-26 南京师范大学 Human body action recognition method and recognition device based on circulating attention network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170089688A (en) * 2016-01-27 2017-08-04 동서대학교산학협력단 System for providing Information Technology karaoke based on audiance's action, and method thereof
CN110801227A (en) * 2019-12-09 2020-02-18 中国科学院计算技术研究所 Method and system for testing three-dimensional color block obstacle based on wearable equipment

Also Published As

Publication number Publication date
CN112101235A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN112101235B (en) Old people behavior identification and detection method based on old people behavior characteristics
WO2021057810A1 (en) Data processing method, data training method, data identifying method and device, and storage medium
CN113496216B (en) Multi-angle falling high-risk identification method and system based on skeleton key points
CN109726672B (en) Tumbling detection method based on human body skeleton sequence and convolutional neural network
Luo et al. Intelligent carpet: Inferring 3d human pose from tactile signals
US20150320343A1 (en) Motion information processing apparatus and method
CN110633736A (en) Human body falling detection method based on multi-source heterogeneous data fusion
CN112668531A (en) Motion posture correction method based on motion recognition
Xu et al. Elders’ fall detection based on biomechanical features using depth camera
CN109271918A (en) The method for distinguishing balanced capacity obstacle crowd based on centre-of gravity shift model
CN115346272A (en) Real-time tumble detection method based on depth image sequence
Liu et al. A method to recognize sleeping position using an CNN model based on human body pressure image
US20220051145A1 (en) Machine learning based activity detection utilizing reconstructed 3d arm postures
Zhang et al. Multimodal data-based deep learning model for sitting posture recognition toward office workers’ health promotion
Nouredanesh et al. Chasing feet in the wild: a proposed egocentric motion-aware gait assessment tool
CN114550299A (en) System and method for evaluating daily life activity ability of old people based on video
CN112101094B (en) Suicide risk assessment method based on limb language
CN111951940A (en) Intelligent medical rehabilitation assisting method
CN115019233B (en) Mental retardation judging method based on gesture detection
Wahla et al. Visual fall detection from activities of daily living for assistive living
CN113271848B (en) Body health state image analysis device, method and system
Chen et al. A Novel CNN-BiLSTM Ensemble Model With Attention Mechanism for Sit-to-Stand Phase Identification Using Wearable Inertial Sensors
Yusuf et al. Upper gait analysis for human identification using convolutional–recurrent neural network
CN114360060B (en) Human body action recognition and counting method
CN111144171A (en) Abnormal crowd information identification method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant