CN110197235B

CN110197235B - Human body activity recognition method based on unique attention mechanism

Info

Publication number: CN110197235B
Application number: CN201910570941.1A
Authority: CN
Inventors: 郑增威; 石利飞; 孙霖; 陈丹; 霍梅梅
Original assignee: Zhejiang University City College ZUCC
Current assignee: Zhejiang University City College ZUCC
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2021-03-30
Anticipated expiration: 2039-06-28
Also published as: CN110197235A

Abstract

The invention relates to a human body activity recognition method based on a unique attention mechanism, which comprises the following steps: 1) preprocessing the data; 2) an LSTM neural network model with an unique attention mechanism is designed, and the LSTM neural network model with the unique attention mechanism is designed, wherein the whole network is divided into 3 parts: the system comprises an LSTM layer, a uniqueness attention mechanism layer and an SOFTMAX full-connection layer, wherein the input of a network model is a time sequence section which is subjected to data preprocessing, and the output is an activity category to be identified; 3) and (5) training and predicting. The invention has the beneficial effects that: by calculating the importance of the basic actions, the model can focus on more important basic actions, and further more accurately identify the human body activities; the invention can judge which moments are more important for identifying the activities when the activities are carried out; by calculating the attention weight value at each moment, the model can pay more attention to the part important for identifying the activity, and the performance of human activity identification is further improved.

Description

Human body activity recognition method based on unique attention mechanism

Technical Field

The invention relates to the field of human body activity recognition, in particular to a human body activity recognition method of a unique attention mechanism.

Background

Deep learning allows researchers to obtain superior features without knowledge of professional background knowledge. RNNs (Recurrent Neural Networks) are a deep learning method that can process time series data one by one according to time and maintain a "state" in hidden layer elements of Recurrent Neural NetworksThe vector "state vector" contains information of past elements of the input data. In addition, the human activity data is a kind of time-series data. Therefore, RNNs are well suited to process sensor data of human body activity. Deep capacitive and LSTM Current Neural Networks for Multimodal week Activity Recognation, Francisco Javier

Sensors, 2016, discloses a method of combining Convolutional Neural Networks (CNNs) with RNNs. The method can fuse data of a plurality of sensors and can well model the time dynamics of the data. Daily activity recognition method research based on sensor data and deep learning, Lijiazhen, published a deep neural network combining CNNs and RNNs in 2018. And empirical research is carried out aiming at carrying out deep model migration among different people. Deep Current Networks for Human Activity registration, Abdulajid Murad, and the like, a method using only RNNs was published in 2017 by the sensor. This approach can input long term dependencies in the data. And unlike other methods: this method does not require that the length of its input data be fixed.

The RNNs method fully demonstrates its potential in dealing with human activity recognition problems. However, there are problems in that only a small part of time-series data is related to human activities and a dominant part is unrelated thereto in human activity recognition. Making it difficult for RNNs to capture important parts of human activities. The main purposes of the invention are: the attention mechanism is combined with LSTM (a kind of Long Short-Term Memory, RNNs), so that it can concentrate on more important parts of the human activity data. This not only helps to understand the human activity better, but also significantly improves the accuracy of human activity recognition.

Human activity recognition may utilize data acquired from sensors worn on a person to identify the person's current activity. This study can provide personalized support for many application areas, such as: intelligent monitoring, intelligent home, intelligent old people care and the like, and the intelligent monitoring, the intelligent home, the intelligent old people care and the like have wide application prospects. Because the cheap and small equipment which can communicate and can be calculated is very easy to obtain, an environment full of the equipment is built, and abundant services can be conveniently provided for people in the environment. Setting up an environment in a room, wherein people can interact with related equipment through actions, and the equipment can provide corresponding services through recognizing the actions: if children are in the home, they can be reminded to avoid some dangerous activities by identifying their activities and informing parents when injured. In addition, the aging process of the Chinese population is accelerated, the problem of old age care is gradually highlighted, potential safety hazards exist in the solitary of the old, but children cannot obtain health information of the old in time, so the concept of realizing the intelligent old age care through intelligent equipment also highlights the value of the concept, and the premise of realizing the concept is to identify the activities of the old. The activity recognition can also be used for detecting the state of illness of a patient, such as week available assistance for Parkinson's Disease Patients With the free zing of gain Symptom, Marc Bachlin and the like, sensor data is collected from a Parkinson's Disease patient in 2010, when the patient suffers from the illness, the Gait of the patient can change, and medical staff can be informed in time by recognizing the changes; but also can do similar work for other diseases, and can reduce the monitoring cost of medical care personnel.

The early human activity recognition research is mostly trained by adopting the traditional machine learning technology, but the method cannot directly process original data, a researcher needs to extract features from the original data and then train by using the features, and the process needs the researcher to deeply understand the human activity recognition and have the skills of some feature engineering. With the advent of deep learning techniques, researchers can be freed from the task of designing features. The deep learning technology has strong expression capability and can automatically extract features. Most importantly, the results show that: compared with the traditional machine learning method, the method using the deep learning technology can obtain better performance. Leading to an increasing number of studies using this technology for human activity recognition.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a human body activity recognition method based on a unique attention mechanism.

The human body activity recognition method based on the unique attention mechanism comprises the following steps:

1) preprocessing the data;

2) the LSTM neural network model with the unique attention mechanism is designed, and the whole network is divided into 3 parts: the system comprises an LSTM layer, a uniqueness attention mechanism layer and an SOFTMAX full-connection layer, wherein the input of a network model is a time sequence section which is subjected to data preprocessing, and the output is an activity category to be identified;

3) training and predicting.

Preferably, the method comprises the following steps: the step 1) specifically comprises the following steps:

the human body activity recognition method based on the unique attention mechanism LSTM neural network model structure comprises the following steps:

1.1), the human activity data is data collected by a large number of sensors bound on a human body and is continuous time sequence data, the input required by the model is of fixed length, the original data is segmented, a time window is set, and the data of a specific time period length is intercepted from the original data and is used as input data for training;

1.2), interpolating the missing data in the data set and normalizing: the data were processed to mean 0 and variance 1.

Preferably, the step 2) specifically comprises the following steps:

2.1), encoding input data by using an LSTM layer, wherein the input data at each moment correspondingly outputs a state vector: the LSTM layer is the first layer of an LSTM neural network model with a unique attention mechanism, input data are time sequence sections which are subjected to data preprocessing, and the size of the data input by the LSTM layer is as follows: (T, F), wherein T is the number of time points included in the input data, F represents the number of data collected at each time point, and the layer outputs a state to the input data at each time pointVector ", by h_tTo represent the state vector at time t, with L_hRepresenting the length of the "state vector", the output of the LSTM layer has the size: (T, L)_h)；

2.2) automatically calculating the unique attention weight value of the state vector output by the LSTM layer at each moment by using an unique attention mechanism layer, and taking the unique attention weight value as the output of the unique attention mechanism layer, wherein the unique attention mechanism layer is a second layer of the LSTM neural network model with the unique attention mechanism, and if the basic action represented by one state vector is high in degree of belonging to a specific activity, the LSTM neural network model with the unique attention mechanism pays more attention to the state vector;

2.3) processing the state vector H obtained by the uniqueness attention mechanism layer by an SOFTMAX layer to obtain a final result vector y with the length of C_pThe subscript of the maximum value of the vector is the predicted activity type, the SOFTMAX layer and the SOFTMAX layer used by the unique attention mechanism layer are the same layer, namely the third layer of the LSTM neural network model with the unique attention mechanism, and the result obtained by the unique attention mechanism layer is identical to the size of the state vector.

Preferably, the specific calculation method in step 2.2) is as follows:

firstly, each state vector is processed by a SOFTMAX full-connection layer, and the following results are obtained:

prob_t＝softmax(h_t)

prob_trepresenting the possibility that the current activity belongs to any activity class, and the number of activity classes is represented by C, each state vector outputs a result vector of length C, namely prob_tThe length of (C) is C, and the attention weight is calculated using uniq ():

att_t＝uniq(prob_t)

the uniq () is calculated by substituting prob_tThe result of subtracting the second largest value from the largest value in the vector is taken as the fraction of the state vector:

score_t＝max₁(prob_t)-max₂(prob_t)

the score represents the degree that each basic action belongs to a specific activity, if the degree is higher, the contained information quantity is larger, each state vector is calculated to obtain a score, and the scores are normalized to obtain the attention weight value:

wherein T is the length of the time window, the attention weights of the T state vectors are obtained through calculation, and all the state vectors are subjected to weighted accumulation according to the attention weights to obtain a state vector H:

preferably, the step 3) specifically comprises the following steps:

3.1), each data segment X intercepted from the original data corresponds to a label y, and one-hot coding is carried out on the label y to obtain y_rModel () representing the LSTM neural network model with a unique attention mechanism, model () mapping the human activity data X to the class label y of the human activity_p. Namely:

y_p＝model(X)

using cross entropy as a loss function:

Loss(X，y_r)＝-y_r log(model(X))＝-y_r log(y_p)；

3.2), using ADAM (adaptive motion estimation) as an optimization algorithm of an LSTM neural network model with a unique attention mechanism: after training the LSTM neural network model with the unique attention mechanism, directly using the LSTM neural network model with the unique attention mechanism for prediction, preprocessing data acquired from a sensor in the same way as during training, inputting the preprocessed data into the LSTM neural network model with the unique attention mechanism, returning a vector with the length of C to the LSTM neural network model with the unique attention mechanism, wherein the subscript of the maximum value of the vector is the activity type predicted by the LSTM neural network model with the unique attention mechanism.

The invention has the beneficial effects that:

(1) since human activities are composed of many basic actions, each basic action also has a different degree of importance for recognizing the current activity category. By calculating the importance of the basic actions, the model can focus on more important basic actions, and further more accurately recognize human activities.

(2) The deep learning technology has strong expression capability, and can enable a user to automatically acquire characteristics and complete a training process through a deep neural network on the premise of not needing to fully understand professional field knowledge and characteristic engineering skills. However, deep learning is also a black box, and people often cannot understand the process of training learning. The invention can determine which moments are more important for identifying activities when the activities are performed. The results show that the deep neural network can accurately judge the importance of each moment. The importance is visualized, and how the deep neural network functions can be understood more intuitively.

(3) When a person is performing an activity, most of the time it takes some actions that are not relevant to the activity. The conventional method does not focus on this problem and processes the input data at each time without distinction. Another task of the invention is to make the model focus more on the important part for the recognition activity by calculating the attention weight value at each moment, thereby further improving the performance of human activity recognition. Human activities can also be understood more deeply by observing the importance of an activity at each moment.

Drawings

FIG. 1 is a flow chart of a human activity recognition method based on a unique attention mechanism;

FIG. 2 is a diagram of the LSTM model architecture with unique attention;

FIG. 3 is a graph of the unique attention weights of walking on the PAMAP2 data set;

fig. 4 is a graph of the unique attention weights of a run on the PAMAP2 data set.

Detailed Description

The present invention will be further described with reference to the following examples. The following examples are set forth merely to aid in the understanding of the invention. It should be noted that, for a person skilled in the art, several modifications can be made to the invention without departing from the principle of the invention, and these modifications and modifications also fall within the protection scope of the claims of the present invention.

Human activities are composed of many basic movements. Some basic actions occur only during certain activities, and some actions are common in many human activities. While basic actions that only occur in specific activities have more information in recognizing human activities and thus should receive more attention.

The LSTM neural network model structure based on the uniqueness attention mechanism can automatically calculate the uniqueness of each basic action and takes the uniqueness as the attention weight of the basic action. This mechanism allows the model to focus on more important parts, thereby improving the performance of the recognition.

1) preprocessing data

Human activity data is data collected using a large number of sensors bound to a person. This is a continuous time series data, whereas the input required for the model is of fixed length. And setting a time window, and intercepting data with a specific time period length from the original data as input data for training. The time window length is denoted by T. The intercepted model inputs are: x ═ X₁,x₂,…,x_T). And is

Where F represents the number of data collected by the sensor at each instant. In order to extend the number of training sets,there will be a certain overlap rate between two adjacent time windows. In addition, the missing data in the data set is interpolated and normalized, i.e. the data is processed into data with a mean of 0 and a variance of 1.

2) Designing LSTM neural network model with unique attention mechanism

Fig. 2 shows the structure of our model, and the whole network is divided into 3 parts: the LSTM layer, the uniqueness attention mechanism layer and the SOFTMAX full-connecting layer. The portion of the oval area on the right side of FIG. 2 represents the development of a unique attention mechanism layer that is used to implement the unique attention mechanism in our method. The input of the network is a time sequence segment which is subjected to data preprocessing, and the output is an activity category to be identified; the whole network uses LSTM to encode the input data, so that the input data at each moment can correspondingly output a state vector. Then, the uniqueness attention mechanism layer calculates an attention weight value for the state vector at each moment; the layer is the most core part in the method, and the model can focus attention on more important parts by calculating the importance of each basic action during human body activity, so that the activity recognition performance is improved. And finally, performing weighted accumulation on the state vector at each moment according to the attention weight value of the state vector, and enabling the accumulated result to pass through a SOFTMAX full-connection layer, so that a classified result can be output.

2.1) LSTM layer

For the first layer, the size of the data input by the layer is: (T, F), wherein T is the number of time points included in the input data. F represents the number of data collected at each time. The layer will output a "state vector" for the input data at each time instant. By using h_tTo represent the state vector at time t, with L_hRepresenting the length of the "state vector", the output of the LSTM layer has the size: (T, L)_h)。

2.2) unique attention mechanism layer

For the second layer, as shown in FIG. 2, this layer will compute an attention weight for each state vector of the LSTM output. The specific calculation method comprises the following steps: firstly, each state vector passes through a SOFTMAX full-connection layer and is obtained:

prob_t＝softmax(h_t)，

prob_tindicating the likelihood that the current activity belongs to any one of the activity categories. The number of activity categories is denoted by C. Then prob_tThe length of (a) is C. The result of subtracting the second largest value from the largest value in the result vector is taken as the fraction of the state vector:

score_t＝max₁(prob_t)-max₂(prob_t)，

this score may characterize the extent to which each basic action is unique to a particular activity. If this degree is higher, the amount of information it contains is also higher. Each state vector may be calculated to obtain a score. By normalizing these scores, the attention weight can be obtained:

where T is the time window length. And performing weighted accumulation on all the state vectors according to the attention weights to obtain:

and takes this result as the output of the layer.

2.3) SOFTMAX full connection layer

For the third layer, the SOFTMAX full connectivity layer is the full connectivity layer with the activation function SOFTMAX. The result obtained by the previous layer is identical to the state vector in size, and finally the result is processed by the SOFTMAX layer. Get the final result vector y of length C_p. The subscript of the vector maximum is the predicted activity category. The SOFTMAX layer used here is the same layer as the SOFTMAX layer used for the unique attention mechanism. Through training, the SOFTMAX layer can be enabled to have the capability of judging the activity type from the state vector. So that it can calculate its uniqueness for the state vector at each moment. Go to further lead toAnd calculating the attention weight value of each state vector through over-normalization.

3) Training and prediction

Each data segment X cut from the original data corresponds to a tag y. One-hot coding is carried out on the label y, and y is obtained_r。y_rIs a vector of length C, if y ═ C, then y_rOnly at C is the value 1, the rest are 0.

The result obtained after the model processing of X is as follows:

y_p＝model(X)。

cross entropy is used as a loss function:

Loss(X，y_r)＝-y_r log(model(X))＝-y_r log(y_p)。

in addition, adam (adaptive motion estimation) was used as an optimization algorithm for the present model. After the model is trained through the steps, the model can be directly used for prediction. Data acquired from the sensors can be preprocessed as in training and then input into the model for processing. The model returns a vector with the length of C, and the subscript of the maximum value of the vector is the activity type predicted by the model.

The experimental results are as follows:

to verify the effectiveness of the method, the performance of the proposed method was evaluated, comparing the method with other methods on two publicly available data sets. These two datasets are the Opportunity and PAMAP2 datasets (http:// architecture. ics. uci. edu/ml/dates/map 2+ physical + activity + monitoring) (https:// architecture. ics. uu/ml/dates/Opportunity + activity + registration). In order to verify that the present model can actually focus attention on important parts, two reference models very similar to the present model are provided as references. These two reference models differ from the present model only in the focus mechanism: several of these reference models will only focus on the Last moment of data in the input data, called LSTM with Last Attention. Another reference model focuses equally on the input data at each time instant, indiscriminately. It sets the attention weight of the hidden layer state vector at each instant to the same value. Called LSTM with Mean Attention.

Table 1 is the average F1 score for the different methods on the PAMAP2 data set, the average F1 score being the average of the F1 scores on the test set for each human activity category. Table 2 shows weighted F1 scores for different methods on the Opportunity data set, and the weighted F1 score is the weighted accumulation of F1 scores of the various human activity categories on the test set according to the proportions of the various human activities in the test set. Since the Opportunity dataset has two tags: locootion and geture. There are two classification targets under this data set. It can be seen from both graphs that the model always achieves the best performance. In addition, the performance of the model also exceeds that of the two LSTM reference models. Since the present model and the two reference models differ only in the attention mechanism, this result indicates that the attention mechanism of the present invention is effective.

Model (model)	F_m
		deepConvLSTM	0.7480
Temporal Attention	0.8052
		LSTM with last attention	0.7922
LSTM with average attention	0.7568
		LSTM with unique attention	0.8796

TABLE 1 Performance of each model on PAMAP2 dataset

Model (model)	Locomotion	Gesture
			LDA	0.590	0.690
QDA	0.680	0.530
			NCC	0.540	0.510
LSTM with last attention	0.873	0.894
			LSTM with average attention	0.875	0.889
LSTM with unique attention	0.892	0.904

TABLE 2 Performance of each model on Opp dataset

Fig. 3 and 4 are visualization results of the attention weights of walking and running on the PAMAP2 data set, the first 3 sub-graphs of fig. 3 and 4 are acceleration data collected from the arm, chest and ankle of the human respectively, the last sub-graph represents the unique attention weight of the corresponding sensor data, the horizontal axis of the sub-graph represents a time stamp, and the vertical axis represents the acceleration. As can be seen from the figure, the attention weighting value changes periodically with time. Also, as can be seen in fig. 4, the attention weight value also changes periodically from when the volunteer starts running. And the period of variation also coincides with the period of activity. This is consistent with the understanding of the invention to the activity: walking and running are periodically changed, if a certain basic action only occurs during running, the action also occurs periodically with running, so the attention weight value of the action also needs to be periodically changed.

And (4) experimental conclusion:

the invention provides an LSTM neural network model structure based on a unique attention mechanism and a human activity recognition method based on understanding of human activities and the relationship between basic actions forming the activities. After comparison of the published data sets with other methods, the results show that the method is indeed effective. Comparison with two LSTM reference models also shows that the LSTM model with the unique attention mechanism can actually focus the model on important parts. The attention weights of the running and walking activities are visualized and the results are consistent with the understanding of the invention for both activities. This enhances the interpretability of this deep neural network to a certain extent, so that it is no longer a black box. In addition, a better understanding of human activities can also be guided: that is, by showing which basic actions of an activity have higher attention weights, it is intuitively understood which basic actions the activity essentially consists of.

Claims

1. A human body activity recognition method based on a unique attention mechanism is characterized by comprising the following steps:

1) preprocessing the data;

3) training and predicting;

the step 2) specifically comprises the following steps:

2.1), encoding input data by using an LSTM layer, wherein the input data at each moment correspondingly outputs a state vector: the LSTM layer is the first layer of an LSTM neural network model with a unique attention mechanism, input data are time sequence sections which are subjected to data preprocessing, and the size of the data input by the LSTM layer is as follows: (T, F), wherein T is the number of time points included in the input data, F represents the number of data collected at each time point, and the layer outputs a state vector to the input data at each time point by using h_tTo represent the state vector at time t, with L_hRepresenting the length of the "state vector", the output of the LSTM layer has the size: (T, L)_h)；

2.3) notes uniquenessProcessing the state vector H obtained by the gravity mechanism layer by the SOFTMAX layer to obtain a final result vector y with the length of C_pThe subscript of the maximum value of the vector is the predicted activity type, the SOFTMAX layer and the SOFTMAX layer used by the uniqueness attention mechanism layer are the same layer, namely the third layer of the LSTM neural network model with the uniqueness attention mechanism, and the result obtained by the uniqueness attention mechanism layer is completely the same as the size of the state vector;

the specific calculation method of the step 2.2) comprises the following steps:

prob_t＝softmax(h_t)

att_t＝uniq(prob_t)

score_t＝max₁(prob_t)-max₂(prob_t)

2. the human body activity recognition method based on the unique attention mechanism according to claim 1, wherein the step 1) specifically comprises the following steps:

3. The human body activity recognition method based on the unique attention mechanism according to claim 1, wherein the step 3) specifically comprises the following steps:

3.1), each data segment X intercepted from the original data corresponds to a label y, and one-hot coding is carried out on the label y to obtain y_rModel () representing the LSTM neural network model with a unique attention mechanism, model () mapping the human activity data X to the class label y of the human activity_pNamely:

y_p＝model(X)

using cross entropy as a loss function:

Loss(X,y_r)＝-y_rlog(model(X))＝-y_rlog(y_p)；

3.2), optimization algorithm using ADAM as LSTM neural network model with unique attention mechanism: after training the LSTM neural network model with the unique attention mechanism, directly using the LSTM neural network model with the unique attention mechanism for prediction, preprocessing data acquired from a sensor in the same way as during training, inputting the preprocessed data into the LSTM neural network model with the unique attention mechanism, returning a vector with the length of C to the LSTM neural network model with the unique attention mechanism, wherein the subscript of the maximum value of the vector is the activity type predicted by the LSTM neural network model with the unique attention mechanism.