CN114612713A

CN114612713A - Human body activity recognition method, system, computer equipment and storage medium

Info

Publication number: CN114612713A
Application number: CN202210208205.3A
Authority: CN
Inventors: 刘鸿宁; 邓诗卓; 吴刚
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-06-10

Abstract

The invention provides a method, a system, computer equipment and a storage medium for identifying human body activities, wherein the identification method comprises the following steps: acquiring original human activity data; preprocessing the original human body activity data to obtain first data; down-sampling the time dimension of the first data to obtain a down-sampling sequence; thinning the down-sampling sequence to obtain second data; inputting the second data into an LSTM network for data processing to obtain a first characteristic diagram; carrying out full-connection layer processing on the time dimension of the first characteristic diagram to obtain a second characteristic diagram; multiplying the second feature map and the first feature map to obtain a weighted fusion feature map; adding the weighted fusion characteristic diagram and the first characteristic diagram to obtain a time fusion characteristic diagram; and classifying the time fusion characteristic graph by adopting a full connection layer to obtain a classification result of the original human activity data.

Description

Human body activity recognition method, system, computer equipment and storage medium

Technical Field

The invention relates to the field of human activity recognition of artificial intelligence and deep learning technologies, in particular to a human activity recognition method, a human activity recognition system, computer equipment and a storage medium.

Background

With the rapid development of deep learning techniques and the wide application of sensor devices, Human Activity Recognition (HAR) based sensors have gained more and more attention. Human Activity Recognition (HAR) is a widely studied field that relates to the identification of specific movements or actions of a person based on sensor data. These movements are generally referred to as specific activities performed indoors, such as walking, speaking, standing and sitting. They may also be more specific activities, such as certain types of activities performed in a kitchen or factory floor.

Sensor data may be recorded remotely, such as video, radar, or other wireless methods. Or the data may be recorded directly on the portable device, for example by carrying custom hardware or a smartphone with an accelerometer and gyroscope. Compared with the activity recognition technology based on the image, the activity recognition technology based on the sensor has the following advantages: flexibility and convenience, better privacy, lower cost and the like. People can effectively avoid dependence on other external equipment when carrying out data acquisition by carrying the sensors on different parts of the body, thereby reducing the data acquisition cost and well protecting the privacy of users. The convenient data acquisition mode enables the activity recognition application range based on the sensor to be infinitely expanded to be close to any place, so the human body activity recognition technology based on the sensor has wide application prospect. Currently, sensor-based human activity recognition technology has been widely used in a variety of fields.

In the medical monitoring field, whether the current daily activities of the old people have potential safety hazards or not can be effectively deduced by analyzing the data of the sensor worn by the old people, and early-finding and early-treating are carried out on some recessive diseases. In the sports field, the coach can know the motion state of sportsman in real time through wearing the sensor on sportsman's clothes, carries out the analysis to sportsman's action, makes things convenient for the coach to carry out the action to the sportsman and guides to improve sportsman's training efficiency. In the intelligent home field, the sensor can be convenient arrange in each position in room, can effectually gather owner's each item daily activity data, then utilize intelligent home systems to carry out the analysis to owner's daily life action, can bring more convenient life for owner and experience, let whole room be full of science and technology and feel.

In early studies, many classical algorithms in the traditional pattern recognition problem were successfully applied in HAR. For example, researchers use multilayer perceptrons to recognize human activities at present; some researchers apply the hidden Markov model to the HAR to explore the influence of the position of the sensor on the experimental result; researchers systematically apply machine learning algorithms such as least square method, K nearest neighbor, Bayes and the like to the HAR field, and carry out detailed comparison experiments from two aspects of recognition effect and operation complexity. The traditional machine learning method usually needs to manually extract features, and the features often need specific domain knowledge, so that the suitable features are difficult to select for a specific task, which is very unfavorable for improving the generalization performance of the model. With the development of deep learning technology, more and more researchers are beginning to apply the related method of deep learning in the field of sensor-based human activity recognition.

At present, the use of deep learning methods has made great progress in sensor-based human activity recognition, and the research on sensor-based human activity recognition is becoming mature, but there are still several major problems.

(1) Lack of research into sensor data segmentation

In the current human activity recognition research based on deep learning, most research focuses on the design of a model structure, the characteristics of sensor data are rarely researched, and the data segmentation setting of the sensor data can not be ignored except the inherent time sequence characteristics of the sensor data.

(2) Study of lack of network model capable of bidirectional feature extraction

Compared with the one-dimensional convolution kernel which can only extract the time dependence of a one-dimensional time sequence, the two-dimensional convolution kernel can extract the time dependence of the time sequence and can also well extract the space dependence among different sensors. In most studies, however, most researchers simply consider the convolved form of the sensor data, and therefore, only stitch the sensor data into the input form of a two-dimensional matrix. Although the simple input form of arranging the sensor data together according to the channels to form the two-dimensional matrix obtains a better recognition effect compared with the traditional feature extraction based on machine learning and the feature extraction based on the one-dimensional convolution kernel, the method does not well utilize the characteristics of features with different weights of the features of the context information.

Disclosure of Invention

In order to solve the problems in the background art, in a first aspect, the present invention provides a method for recognizing human body activity, comprising the steps of: acquiring original human activity data; segmenting the original human body activity data to obtain first data; down-sampling the time dimension of the first data to obtain a down-sampling sequence; thinning the down-sampling sequence to obtain second data; inputting the second data into an LSTM network for data processing to obtain a first characteristic diagram; carrying out full-connection layer processing on the time dimension of the first characteristic diagram to obtain a second characteristic diagram; multiplying the second feature map and the first feature map to obtain a weighted fusion feature map; performing addition operation on the weighted fusion characteristic diagram and the first characteristic diagram to obtain a time fusion characteristic diagram; and classifying the time fusion characteristic graph by adopting a full connection layer to obtain a classification result of the original human activity data.

Further, down-sampling the time dimension of the first data to obtain a down-sampled sequence includes: performing convolution processing on the first data to obtain a third characteristic diagram; sequentially carrying out normalization and activation function processing on the third characteristic diagram to obtain third data; and performing self-attention calculation processing on the third data to obtain the down-sampling sequence.

Further, the convolution kernel size in the convolution processing of the first data is (5,1), and the step size is 1.

Further, down-sampling is performed on the time dimension of the first data for 4 times, so as to obtain a down-sampling sequence.

Further, the original activity data is preprocessed in a sliding window processing mode, the window size of the sliding window processing is 24, and the step length is 12.

Further, in the data processing of the LSTM network, the LSTM network is 3 layers.

In a second aspect, the present invention also provides a human activity recognition system, which is used for implementing the above human activity recognition method, and the human activity recognition system includes: the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring original human activity data and segmenting the original human activity data to obtain first data; the first data processing module is used for performing down-sampling on the time dimension of the first data to obtain a down-sampling sequence; the second data processing module is used for carrying out thinning processing on the down-sampling sequence to obtain second data; the third data processing module is used for inputting the second data into the LSTM network for data processing to obtain a first characteristic diagram; the fourth data processing module is used for carrying out full connection layer processing on the time dimension of the first characteristic diagram to obtain a second characteristic diagram; the fifth data processing module is used for performing multiplication operation on the second feature map and the first feature map to obtain a weighted fusion feature map; the sixth data processing module is used for performing addition operation on the weighted fusion feature map and the first feature map to obtain a time fusion feature map; and the classification module is used for classifying the time fusion characteristic graph by adopting a full connection layer to obtain a classification result of the original human activity data.

Further, the first data processing module includes: the convolution processing unit is used for carrying out convolution processing on the first data to obtain a third characteristic diagram; the normalization processing unit is used for performing normalization processing on the data in the third feature map; the activation function processing is used for performing activation function processing on the data after the third characteristic diagram is subjected to normalization processing to obtain third data; and the self-attention calculation processing is used for carrying out self-attention calculation processing on the third data to obtain a down-sampling sequence.

In a third aspect, the present invention also provides a computer device, which includes a memory, a processor and a bidirectional feature-extracted human activity recognition program stored on the memory and operable on the processor, wherein the bidirectional feature-extracted human activity recognition program, when executed by the processor, implements the steps of the above-mentioned human activity recognition method.

In a fourth aspect, the present invention further provides a readable storage medium, on which a bidirectional feature-extracted human activity recognition program is stored, which, when executed by a processor, implements the steps of the above-described human activity recognition method.

The invention has the following beneficial effects:

according to the human activity recognition method, based on the segmentation processing and the down-sampling processing of the original activity data, the length of the original data is greatly shortened, and the subsequent processing efficiency of the human activity data is improved; meanwhile, in the human body activity recognition process, model training is carried out based on different weights of the features in the weighted fusion feature map and the time fusion feature map which are obtained by the first feature map and the second feature map, so that the features useful for classification can be enhanced, and the features not useful for classification can be weakened, thereby improving the accuracy of classification results and improving the accuracy of human body activity recognition.

Drawings

FIG. 1 is a flow chart of a method of human activity recognition of the present invention;

FIG. 2 is a schematic diagram of the structure of the human activity recognition system of the present invention;

FIG. 3 is a schematic diagram of a first data processing module of the human activity recognition system of the present invention;

FIG. 4 is a schematic structural diagram of self-attention in the human body activity recognition method and system of the present invention;

FIG. 5 is a schematic diagram of self-attention operation in the method and system for identifying human body activities according to the present invention;

fig. 6 is a diagram of the LSTM operation in the human activity recognition method and system of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions.

The method, system, computer device, and storage medium for recognizing human body activity according to the present invention will be described in detail with reference to the accompanying drawings.

Referring to fig. 1, the recognition method of human body activity of the present application includes steps S1-S9. Specifically, S1, obtaining original activity data; s2, carrying out segmentation processing on the original activity data to obtain first data; s3, performing down-sampling on the time dimension of the first data to obtain a down-sampling sequence; s4, thinning the down-sampling sequence to obtain second data; s5, inputting the second data into an LSTM network for data processing to obtain a first characteristic diagram; s6, carrying out full connection layer processing on the time dimension of the first characteristic diagram to obtain a second characteristic diagram; s7, multiplying the second feature map and the first feature map to obtain a weighted fusion feature map; s8, performing addition operation on the weighted fusion feature map and the first feature map to obtain a time fusion feature map; and S9, classifying the time fusion characteristic graph by adopting a full connection layer to obtain a classification result of the original human activity data.

Since the length of the raw activity data collected by the sensor is often tens of thousands, and the raw activity data contains a plurality of activities, and these activity data cannot be directly input into the network, in step S1, the raw activity data may be segmented in a sliding window manner, where the activity category corresponding to the last moment of the window is used as the data tag of the segmented sample, so that the raw activity data collected by the sensor may be converted into the first data (i.e., sample data) that can be input into the network. When the data is segmented by using the sliding window, two very important parameters exist: the window size of the sliding window and the step size that the window has to slide through each time. Specifically, the present invention employs a sliding window size of 24, with the step size of the window sliding through being 12.

It should be noted that the input data format for the sliding window processing is as follows: batch 1 window length characteristic dimension. Taking Skoda data set as an example, the size of a sliding window is 24, the dimensionality of the data set is 30, and data of 30 sensors are selected for training, so that the input data format of sliding window processing is batch _ size × 1 × 24 × 30, and the first data obtained after sliding processing is data input in the down-sampling step.

In step S2, a time dimension of the first data is down-sampled to obtain a down-sampled sequence. Referring to fig. 3, the down-sampling process specifically includes the steps of: s21, performing convolution processing on the first data to obtain a third feature map; s22, sequentially carrying out normalization and activation function processing on the third characteristic diagram to obtain third data; and S23, performing self-attention calculation processing on the third data to obtain the down-sampling sequence.

In other words, the down-sampling process is a process of sequentially performing convolution processing, normalization processing, activation function processing, and self-attention calculation processing on the time dimension of the first data, so that the length of a time series subsequently input into the LSTM network is reduced by the down-sampling process.

Preferably, the convolution kernel size may be (5,1), the step size is 1, and then in step S2, 4 times of downsampling processing are performed to obtain a downsampling sequence, if the input structure of the data before downsampling processing is batch _ size × 1 × 24 × 30, the output structure of the data after the first downsampling processing is batch _ size × 1 20 × 30, the output structure of the data after the second downsampling processing is batch _ size × 1 × 16 × 30, the output structure of the data after the third downsampling processing is batch _ size × 1 × 30, and the output structure of the data after the fourth downsampling processing is batch _ size × 1 × 8 × 30, so that the time length of the data set output after each downsampling processing is reduced by 4.

In step S21, the first data is convolved with the Cov2d layers (two-dimensional convolution function), for example, the convolution kernel size may be (5,1) and the step size is 1. In the convolution operation process, each convolution kernel can be regarded as a feature to be extracted, and the convolution kernel continuously slides on the input data (namely the first data) to perform operation, so that the features of different parts in the input data can be conveniently acquired. When the characteristic value obtained by the convolution operation of the input data is larger, the input data can be better matched with the characteristic represented by the convolution kernel. Compared with a convolutional layer and a naive multilayer perceptron, the convolutional layer can well extract local features of input data. In addition, the parameter sharing mechanism of the convolutional layer can effectively reduce the number of parameters in the network structure, thereby facilitating the training of the model.

Specifically, the first data is denoted as X, and the data output after convolution processing of the Cov2d layer is denoted as Z (i.e., the third feature map), so that the element of the nth channel of Z is

The calculation formula is shown as formula (1):

wherein, C_inThe number of channels is X, K is a two-dimensional convolution kernel, K_hAnd K_wRespectively representing the width and height of the convolution kernel; b_sIs the bias term. In particular, K can be viewed as a two-dimensional matrix, then

The matrix K marked s is the matrix K for the values representing the row u and column v in the matrix K; x can be viewed as a three-dimensional matrix containing C_inA two-dimensional matrix (i.e. the number of X channels is C_in) And k denotes the second two-dimensional matrix in X.

In step S22, the third feature map is normalized by using a Batchnormal layer (which may also be referred to as a batch normalization module) and is subjected to activation function processing by using a Relu activation function layer in sequence, thereby obtaining third data.

Specifically, when the Batchnormal layer is adopted for normalization, the normalization processing is carried out by subtracting the mean value and the unit variance, so that the data set can be mapped to the periphery of the origin after the normalization processing, and the problem of gradient disappearance can be solved, thereby improving the data training speed.

The Batchnormal layer performs normalization operation on the third feature diagram (a data set) by adopting a min-max normalization idea, the min-max normalization idea is simple, and then the data set Z (namely the third feature diagram) is normalized to obtain a data set Z ', and the data sample in the data set Z' is

Then

The calculation formula is shown as formula (2):

wherein, the first and the second end of the pipe are connected with each other,

is the maximum value among all the data in the third feature map,

Is the minimum of all data.

The Relu activation function layer is adopted for activation function processing, because the Relu activation function in the Relu activation function layer is a simple nonlinear activation function, and the Relu activation function is more efficient in operation. The main functions are embodied in two aspects: one is to overcome the problem of gradient disappearance, and the other is to accelerate the training speed, and the data processed by the activation function is the third data. Specifically, the calculation formula of the Relu activation function layer is as shown in the following formula (3) and formula (4):

R(x)＝max(x,0) (3)

wherein x represents input data and is obtained by normalization processing of Batchnormal layerEach data (i.e. of

). As can be seen from equation (4), when the input x is greater than 0, Relu is equivalent to a linear function, and the derivative is 1; at x less than or equal to 0, Relu is an identity function with a derivative of 0.

In step S23, a self-attention calculation process is performed on the third data by using the self-attention module layer, and the obtained result is a downsampling sequence. As shown in fig. 3-4, the main workflow of the self-attention calculation process (i.e., self-attention module layer) is as follows:

in the first stage, vectors Query, Key and Value are obtained according to the third data, and then the vectors Query and the Key Value Key in the vectors Key are based_iSimilarity or correlation of both is calculated. Wherein, the most common method for calculating the similarity or correlation between the two comprises: vector dot product of both (equation 5), vector Cosline similarity of both (equation 6) or by introducing an additional neural network to evaluate (equation 7):

dot product Similarity (Query, Key)_i)＝Query·Key_i (5)；

Vector similarity

Network Similarity (Query, Key)_i)＝MLP(Query,Key_i) (7)；

The value range of the score generated in the first stage is different according to different specific generation methods, and the invention adopts a formula score as Query Key^TThe score is calculated.

In the second stage, numerical value conversion is carried out on the score in the first stage by introducing a calculation mode similar to Softmax, normalization can be carried out on one hand, and the original calculation score is arranged into probability distribution with the sum of all element weights being 1; on the other hand, the weight of the important element can be highlighted through the intrinsic mechanism of Softmax, and the following formula is generally adopted for operation:

wherein: c is the number of output nodes, namely the number of classification, the probability distribution can be obtained through a Softmax function so as to classify, and the operation result a of the second stage_iNamely Value in vector Value_iAnd (3) carrying out weighted summation on the corresponding weight coefficients to obtain an Attention value (namely a down-sampling sequence):

source represents the relationship between Key and Value, L_xTo input the length of Value, as shown in FIG. 5, the length of Value is 4, then L_x＝4。

The autoflight mechanism is a variant of the attentiveness mechanism that reduces reliance on external information and is more adept at capturing internal correlations of data or features. In particular, reference may be made to "Attention Is al You Need" for self-Attention.

In step S4, the down-sampled sequence is refined to obtain second data. Specifically, the refining process comprises the following steps: firstly, inputting the down-sampling sequence into a channel attention layer (referred to as "attention is allyouneeded"), and the structure of data output by the channel attention layer is still batch _ size × 8 × 32 × 30; and converting the data output by the channel attention layer into second data with the size according with the input structure of the LSTM network through a reshape function (a function capable of readjusting the row number, the column number and the dimension number of the matrix), wherein the data structure of the second data is batch _ size × 8 × 960.

In step S5, the second data is input to the LSTM network for data processing, and a first feature map is obtained.

The values of the gates in the cyclic unit of the LSTM network are between (0, 1), indicating the proportion of messages that can pass through the gate, and the role of the gates in the network is as follows: 1) forget door f_tCan be controlled upwardsInternal state at a moment c_t-1How much information can be forgotten; 2) input door i_tControl candidate State c 'of Current time'_tHow much information can be saved; 3) output gate o_tControlling the internal state c at the present moment_tHow much information can be output to the external state h at the current moment_t。

Referring to fig. 6, the calculation process of LSTM for each gating using the data at the current time and the last time is as follows: (1) first, input data x at the current time is used_t(i.e. second data) and the external state h at the previous moment_t-1Calculating the candidate states c 'of the control force and the current state of the 3 doors'_t(ii) a (2) Then, the forgetting door f is used_tAnd an input gate i_tThe memory cell c at the current time_tUpdating is carried out; (3) finally, an output gate o is used_tTo control how much information in the internal state at the current moment can be transferred to the external state h_t。

Forget door f_tAnd an input gate i_tAnd an output gate o_tThe calculation process of (2) is shown by the following formula:

f_t＝σ(W_f·[c_t-1，h_t-1，x_t]+b_f) (10)

i_t＝σ(W_i·[c_t-1，h_t-1，x_t]+b_i) (11)

o_t＝σ(W_o·[c_t-1，h_t-1，x_t]+b_o) (12)

wherein, W_f，W_i，W_oIs a weight matrix (W)_fTo forget gate matrix, W_iIs a matrix of input gates, W_oA matrix of output gates); b_f，b_i，b_oIs a bias vector; c. C_t-1Is the memory cell (internal state in the LSTM layer) at the last moment; sigma is a sigmoid activation function, and the sigmoid is specifically realized as follows: the output element value of the sigmoid layer is a real number between 0 and 1, and the output element value is a weight for passing the corresponding information; for example, an output element value of 0 indicates "inhibit any messageAnd the output element value is 1, the expression "allow all information to pass through". Candidate state c'_tThe calculation method of (c) is shown by the following formula:

c’_t＝tanh(W_c·[c_t-1，h_t-1，x_t]+b_c) (13)

wherein, W_cIs each layer and input data x_tA weight matrix of (second data) connections; b_cIs a bias vector.

Internal state c_tAnd an external state h_tThe update calculation method of (2) is as follows:

c_t＝f_t⊙c_t-1+i_t⊙c’_t (14)

h_t＝o_t⊙tanh(c_t) (15)

wherein, the corresponding element in the operation matrix is multiplied.

In other words, the first characteristic diagram is obtained by the calculation processing of the above equations (10) to (15). In step S6, a full link layer process is performed on the time dimension of the first feature map to obtain a second feature map. The purpose of step S6 is to obtain the weights of the feature maps at different times through full-link layer calculation.

The fully-connected layer plays a role of a classifier in the whole deep learning neural network, continuous feature space is mapped to discrete sample class expression space through integration of distributed abstract features obtained by other layers, the core calculation process is the product of matrix vectors, and the formula is as follows:

y＝Wz+b (16)

wherein z is the input of the fully connected layer (i.e., the data in the first profile); w is a weight coefficient; b is a bias term; and y is data in the second characteristic diagram.

In step S7, the second feature map is multiplied by the first feature map to obtain a weighted fusion feature map.

In step S8, the weighted fusion feature map and the first feature map are added to obtain a time fusion feature map, that is, a weight sequence with a weight sum of 1 is obtained.

In step S9, the time fusion feature map is classified by using a full connection layer to obtain a classification result of the original activity data, thereby realizing identification and classification of human activities.

In conclusion, in the human activity recognition method, based on the application of model technologies such as Cov2d + Batchnormal + relu + self-attention in the deep learning neural network, the original data length is greatly shortened; based on the application of the reshape function, the data input can be converted into data with the size conforming to the input structure of the LSTM network, so that the subsequent calculation is facilitated; meanwhile, based on the application of the LSTM network technology and the full-connection layer processing technology, the weights of the feature maps at different times (namely the weighted fusion feature map and the time fusion feature map) can be obtained, so that the features useful for classification are enhanced, the features useless for classification are weakened, the accuracy of classification results is improved, and the accuracy of human activity recognition is improved.

The human activity recognition system comprises a preprocessing module, a first data processing module, a second data processing module, a third data processing module, a fourth data processing module, a fifth data processing module, a sixth data processing module and a classification module.

The preprocessing module obtains the original activity data and performs segmentation processing on the original activity data (for example, the original activity data can be segmented in a sliding window mode) to obtain first data in a data format meeting the requirements of the first data processing module.

The first data processing module is used for carrying out down-sampling processing on the first data and obtaining a down-sampling sequence. Specifically, the first data processing module is a CBRSA module. When 4 times of downsampling processing needs to be performed on the first data, the number of the first data processing modules is correspondingly 4.

As shown in fig. 2 to 3, the first data processing module includes a convolution processing unit (i.e., Cov2d layer), a normalization processing unit (i.e., Batchnormal layer), an activation function processing unit (i.e., relu activation function layer), and a self-attention convolution processing unit (i.e., self-attention module layer). The specific implementation and principle of the self-attribute module layer can refer to fig. 4 and 5.

The convolution processing unit is used for performing convolution processing on the first data to obtain a third feature map, the normalization processing unit is used for performing normalization processing on data in the third feature map, the activation function processing unit is used for performing activation function processing on the data after the normalization processing on the third feature map to obtain third data, and the self-attention calculation processing unit is used for performing self-attention calculation processing on the third data to obtain a down-sampling sequence.

And the second data processing module is used for carrying out thinning processing on the down-sampling sequence to obtain second data. In particular, the second data processing module is a channel attention module which converts the downsampled sequence into second data with a size conforming to the LSTM network input structure by using a reshape function (a function which can readjust the number of rows, columns, and dimensions of the matrix).

And the third data processing module is used for inputting the second data into the LSTM network for data processing to obtain a first characteristic diagram. In particular, the third data processing module is a computational model in the LSTM network layer.

And the fourth data processing module is used for carrying out full connection layer processing on the time dimension of the first characteristic diagram to obtain a second characteristic diagram. Specifically, the fourth data processing module is a calculation model in the fully connected layer.

The fifth data processing module is used for performing multiplication operation on the second feature map and the first feature map to obtain a weighted fusion feature map; the sixth data processing module is used for performing addition operation on the weighted fusion feature map and the first feature map to obtain a time fusion feature map; and the classification module is used for classifying the time fusion characteristic graph by adopting a full connection layer to obtain a classification result of the original human activity data.

In the human activity recognition system, the original activity data is subjected to segmentation processing and downsampling processing, so that the length of the original data is greatly shortened, and the subsequent processing efficiency of the human activity data is improved; meanwhile, in the human body activity recognition process, model training is carried out based on different weights of the features in the weighted fusion feature map and the time fusion feature map which are obtained by the first feature map and the second feature map, so that the features useful for classification can be enhanced, and the features not useful for classification can be weakened, thereby improving the accuracy of classification results and improving the accuracy of human body activity recognition.

In one embodiment, the identification system of the present invention includes 1 preprocessing module, 4 first data processing modules (i.e., 4 CBRSA modules), 1 second data processing module (i.e., 1 channel attention layer), 3 third data processing modules (i.e., 3 LSTM network layers), 1 fourth data processing module (i.e., 1 fully-connected layer), and 1 fifth data processing module, 1 sixth data processing module, and 1 classification module.

The operation of the recognition system of the present invention is described in detail below with the Skoda data set as an example: the Skoda dataset contains 11 activity categories, has obvious pattern differences, is abundant in data amount, occurs under factory production conditions, is standard in action, and is easy to classify. The specific categories are shown in table 1 below.

TABLE 1 action categories in Skoda dataset

The Skoda data set (original human activity data) is sequentially processed as follows to obtain a classification result, specifically referring to fig. 1:

first layer (pretreatment module): the input structure of the current data is batch × 1 × window length × feature dimension, the length of a sliding window is taken as 24, the dimension of the data set is taken as 30, and the data of 30 sensors is selected for training, so that the input structure of the first data obtained after the processing by the preprocessing module is batch _ size 1 × 24 × 30.

Second layer (first data processing module): the first data is sequentially input into 4 CBRSA modules, each CBRSA module comprises Cov2d + Batchnormal + Relu + self-attention, namely the first data with the data structure of batch _ size × 1 × 24 × 30 is firstly input into Cov2d and then sequentially output through Batchnormal, Relu and self-attention, and finally the down-sampling sequence is obtained. The data structure of the output data of the first data sequentially input into the 4 CBRSA modules is sequentially changed into batch _ size × 32 × 20 × 30, batch _ size × 32 × 16 × 30, batch _ size × 12 × 30, and batch _ size × 32 × 8 × 30 (at this time, the data structure of the data in the downsampling sequence is batch _ size × 32 × 8 × 30). The sizes of convolution kernels in Cov2d layers in each CBRSA module are all (5,1), so the height dimension of the output data set is reduced by 4 after each input data passes through 1 CBRSA module, i.e. the time sequence of 24 is reduced by 4 after each input data passes through 1 CBRSA module, and the data structures of input and output via batcnormal, relu and self-extension are consistent.

Third layer (second data processing module): inputting the data in the down-sampling sequence obtained in the second Layer into the Channel Attention Layer (i.e. Channel Attention Layer), and outputting the data with unchanged data structure, wherein the data structure is still batch _ size 32 × 8 × 30; and converting the data output by the channel attention layer into second data with the size according with the input structure of the LSTM network through a reshape function, wherein the data structure of the second data is batch _ size × 8 × 960.

Fourth layer (third data processing module): data processing is performed using 3 LSTM network layers, where the data structure of the input data of each LSTM network layer is batch _ size × 8 × 960, and the data structure of the output data is batch _ size × 8 × 128 (i.e., the first feature map), and the LSTM network layer implementation may refer to "Learning to form: continuous Prediction with LSTM Technical Report IDSIA-01-99January, 1999".

Fifth layer (fourth data processing module): and Linear (128, 1), calculating the weight of the data at each time point through a full connection layer to obtain a second characteristic diagram.

Sixth layer (fifth data processing module): sum (dim ═ 1), and the weighted fusion feature map is obtained by multiplying the weight in the second feature map obtained at the fifth level by the first feature map output at the fourth level.

Layer seven (sixth data processing module): and Linear (128, class _ label), and performing weighted summation on the weighted fusion characteristic diagram obtained by the sixth layer and the first characteristic diagram output by the fourth layer to obtain a time fusion characteristic diagram.

Eighth layer (classification module): and (3) adopting a full connection layer for classification, namely inputting the data in the time fusion characteristic diagram into a softmax layer, selecting the bit corresponding to the maximum data as the category of the human activity data according to the maximum value of the second digit data in the input data by the softmax layer, and outputting the category, wherein the specific category number of the output is 11 categories in the case, namely the output is a vector with 11 data. Specific implementations can be found in "capacitive Networks and Applications in Vision, Yann LeCun, Koray Kavukcugcuoglu and Cl' ementFarabet".

The recognition effect of the human activity recognition system of the present invention was verified on the Skoda data set by experiments as follows. Taking the scada data set as an example, the human activity identification method and the human activity identification system provided by the invention and the existing classical method and the existing deep learning model (refer to "connected and LSTM current section 3.1.deep constivlstm") adopt the same data set division method, compared with the human activity identification method and the human activity identification system provided by the invention, the human activity identification accuracy can be improved by at least 1% through experimental verification, and the specific result refers to the following table 2 experimental result table.

TABLE 2 Experimental results Table

The technical scheme of the invention adopts a self-attention mechanism in the field of natural language, and introduces a network model structure constructed by the self-attention mechanism to carry out human activity recognition. The main idea of the self-attention model is to allow the most important part of the model input to be trained according to different weights according to the characteristics of the current context, and the self-attention model eliminates the difficult problem of RNN (Recurrent Neural Network) learning from a long input sequence and the problem of NLP (natural language processing) model-based attention-only mechanism caused by continuous development. In deep learning, time-dependent architecture data is extracted from an attention model, 1 4D tensor is input into a network, and features useful for classification are enhanced and features not useful for classification are weakened according to different weights of feature maps of the model of the network. Therefore, the accuracy of human body activity recognition is improved.

The identification device comprises a memory, a processor and a bidirectional feature extraction human activity identification program which is stored on the memory and can run on the processor, wherein the bidirectional feature extraction human activity identification program realizes the steps of the human activity identification method when being executed by the processor.

The readable storage medium of the present application stores a bidirectional feature extraction human activity recognition program, which when executed by a processor implements the steps of the above-described human activity recognition method.

Claims

1. A method for recognizing human body activity is characterized by comprising the following steps:

acquiring original human activity data;

segmenting the original human body activity data to obtain first data;

down-sampling the time dimension of the first data to obtain a down-sampling sequence;

thinning the down-sampling sequence to obtain second data;

inputting the second data into an LSTM network for data processing to obtain a first characteristic diagram;

carrying out full-connection layer processing on the time dimension of the first characteristic diagram to obtain a second characteristic diagram;

multiplying the second feature map and the first feature map to obtain a weighted fusion feature map;

performing addition operation on the weighted fusion characteristic diagram and the first characteristic diagram to obtain a time fusion characteristic diagram;

and classifying the time fusion characteristic graph by adopting a full connection layer to obtain a classification result of the original human activity data.

2. The method for recognizing human body activity according to claim 1, wherein down-sampling the time dimension of the first data to obtain a down-sampled sequence comprises the steps of:

performing convolution processing on the first data to obtain a third characteristic diagram;

sequentially carrying out normalization and activation function processing on the third characteristic diagram to obtain third data;

and performing self-attention calculation processing on the third data to obtain the down-sampling sequence.

3. The method for recognizing human body activity according to claim 2, wherein the convolution kernel size of the first data in the convolution processing is (5,1), and the step size is 1.

4. The method for identifying human activities of claim 1, wherein the down-sampling sequence is obtained by down-sampling the first data for 4 times in the time dimension.

5. The method for recognizing human body activity according to claim 1, wherein the original activity data is preprocessed by sliding window processing, the window size of the sliding window processing is 24, and the step size is 12.

6. The method for recognizing human body activity according to claim 1, wherein the LSTM network is 3 layers in data processing.

7. A human activity recognition system for implementing the human activity recognition method according to any one of claims 1 to 6, the human activity recognition system comprising:

the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring original human activity data and segmenting the original human activity data to obtain first data;

the first data processing module is used for performing down-sampling on the time dimension of the first data to obtain a down-sampling sequence;

the second data processing module is used for carrying out thinning processing on the down-sampling sequence to obtain second data;

the third data processing module is used for inputting the second data into an LSTM network for data processing to obtain a first characteristic diagram;

the fourth data processing module is used for carrying out full connection layer processing on the time dimension of the first characteristic diagram to obtain a second characteristic diagram;

the fifth data processing module is used for performing multiplication operation on the second characteristic diagram and the first characteristic diagram to obtain a weighted fusion characteristic diagram;

the sixth data processing module is used for performing addition operation on the weighted fusion feature map and the first feature map to obtain a time fusion feature map;

and the classification module is used for classifying the time fusion characteristic graph by adopting a full connection layer to obtain a classification result of the original human activity data.

8. The identification system of claim 7, wherein the first data processing module comprises:

the convolution processing unit is used for carrying out convolution processing on the first data to obtain a third characteristic diagram;

the normalization processing unit is used for performing normalization processing on the data in the third feature map;

the activation function processing is used for performing activation function processing on the data after the third characteristic diagram is subjected to normalization processing to obtain third data;

and the self-attention calculation processing is used for carrying out self-attention calculation processing on the third data to obtain a down-sampling sequence.

9. A computer device comprising a memory, a processor, and a bi-directional feature extracted human activity recognition program stored on the memory and executable on the processor, wherein:

the human activity recognition program for bidirectional feature extraction, when executed by the processor, implements the steps of the method for human activity recognition according to any one of claims 1 to 6.

10. A readable storage medium, characterized in that the readable storage medium has stored thereon a bidirectional feature-extracted human activity recognition program, which when executed by a processor implements the steps of the human activity recognition method according to any one of claims 1 to 6.