CN111860188A - Human body posture recognition method based on time and channel double attention - Google Patents
Human body posture recognition method based on time and channel double attention Download PDFInfo
- Publication number
- CN111860188A CN111860188A CN202010588253.0A CN202010588253A CN111860188A CN 111860188 A CN111860188 A CN 111860188A CN 202010588253 A CN202010588253 A CN 202010588253A CN 111860188 A CN111860188 A CN 111860188A
- Authority
- CN
- China
- Prior art keywords
- data
- attention
- channel
- convolution
- human body
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 34
- 230000009471 action Effects 0.000 claims abstract description 31
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims abstract description 10
- 238000012360 testing method Methods 0.000 claims abstract description 9
- 238000010606 normalization Methods 0.000 claims abstract description 7
- 238000011176 pooling Methods 0.000 claims description 24
- 230000004913 activation Effects 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 238000003062 neural network model Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 238000009827 uniform distribution Methods 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 claims 1
- 238000012544 monitoring process Methods 0.000 abstract description 11
- 230000002452 interceptive effect Effects 0.000 abstract 1
- 230000036544 posture Effects 0.000 description 20
- 238000005516 engineering process Methods 0.000 description 11
- 238000012800 visualization Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 5
- 238000012952 Resampling Methods 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000009191 jumping Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011273 social behavior Effects 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Psychiatry (AREA)
- Image Analysis (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention discloses a human body posture identification method based on time and channel double attention, which comprises the following steps: the method comprises the steps of collecting original data of various human body actions by using a built-in sensor of the mobile equipment, attaching an attribute label of the action, utilizing a sliding window and normalization processing, segmenting the data into a training sample set and a testing sample set, establishing a deep convolutional neural network model based on time and channel double attention, importing the training sample and the testing sample to train and optimally adjust, and obtaining a recognition result of the human body actions. Due to the superposition of the channel attention and the time sequence attention, the method can accurately position the type and the occurrence time of the target action after being trained by a large amount of coarse-grained training data, greatly reduces the complexity of manually marking the training data, and has very important functions in the aspects of sports, interactive games, medical care, general monitoring systems and the like.
Description
Technical Field
The invention belongs to the field of intelligent monitoring of wearable equipment, and particularly relates to a human body posture identification method based on time and channel double attention.
Background
In recent years, with the development of computer technology and the popularization of intelligent technology, a new round of global technology change has been entered, and technologies such as large-scale cloud computing, internet of things, big data and artificial intelligence are also rapidly developed. Among them, the human body posture recognition technology is also an important research trend in the related field of computer vision. The application range is very wide, and the device can be used in various fields such as health monitoring, motion detection, man-machine interaction, movie and television production, game entertainment and the like. People can utilize a sensor worn by the human body to collect motion trail data of joint points of the human body to realize gesture recognition, and can also realize that 3D animation simulates human body motion to make movie and TV play and the like.
With the development of intelligent wearable device research, wearable sensor-based human body gesture recognition has become an important research field, and the technology is a technology for judging the human body motion behavior state by analyzing relevant information capable of reflecting human body motion behaviors. The method is applied to health monitoring, indoor positioning and navigation, user social behavior analysis, motion sensing games and the like. However, most of the existing human body posture recognition systems have the problems of low recognition accuracy, low inference speed and the like, so how to establish a high-accuracy network model and maintain the inference speed becomes a problem to be solved urgently.
The most widespread application of human body posture recognition at present is in intelligent monitoring. The intelligent monitoring is different from the common monitoring mainly in that a human body posture recognition technology is embedded into a video server, the behaviors of dynamic objects, namely pedestrians and vehicles, in a monitoring picture scene are recognized and judged by using an algorithm, key information in the behaviors is extracted, and when abnormal behaviors occur, an alarm is sent to a user in time. Similarly, human gesture recognition technology under the fixed scene can be applied to the family control, if for the emergence of prevention solitary old man's the condition of falling, can be through the intelligent supervisory equipment of installation discernment falling gesture at home, to the discernment of solitary old man's the condition of falling, in time make the response when the emergency appears. The continuous development of human society and the continuous improvement of quality of life, video monitoring has been applied to each field very widely, and the field of people's living space is expanding and expanding, public and private place is also developing thereupon, meets the probability of various emergency and is increasing constantly, especially in public place, because its control degree of difficulty is great, the population is intensive. Through simple monitoring, the requirement of current social development can not be met, the human body posture can be predicted with great difficulty by simply depending on the attendance of an operator on duty, and social resources are also potentially wasted. Therefore, the intelligent monitoring system independent of individuals is a necessary way for solving the problem in the current society, in the process of social contact, human body actions except for language can transmit certain information, the meaning of the actions can be read through scientific and reasonable prediction, and people can be better helped to realize social contact.
Deep learning has a good development prospect in pattern recognition. The model architecture represented by the shallow convolutional neural network occupies the mainstream position. The convolutional neural network is greatly concerned in the field of computer vision, can process multidimensional data, and has more obvious effect than the traditional method on the premise of large data volume. Compared with the traditional machine learning methods such as logistic regression, decision trees, Markov models and the like, although the shallow deep learning methods have significant improvement in precision, feature map information is not abundant due to the small number of convolution layers. Meanwhile, the general convolution calculation cannot accurately locate the occurrence time of the target action and the type of the target action in a long string of data. Therefore, how to accurately position the time and the category of the target action while ensuring the model identification precision becomes a problem which is first solved by current researchers.
Disclosure of Invention
The purpose of the invention is as follows: in view of the above problems, an object of the present invention is to provide a human body posture recognition method based on time and channel double attention, which can not only improve the accuracy of a model in a deeper convolutional neural network, but also accurately locate the type of a sensor acting on a channel axis and the occurrence time of a time axis target action.
The technical scheme is as follows: the invention provides a human body posture recognition method based on time and channel double attention, which comprises the following steps:
step1, acquiring human posture and motion signal data (such as lying down, standing up, walking, running, falling down and the like) of each activity type through a motion sensor, and attaching corresponding motion attribute labels to the motion signal data;
step2, preprocessing the collected motion signal data, and dividing the processed data into a training sample set and a testing sample set; the processing comprises the following steps: the data is subjected to sliding window processing with fixed step length, more data can be obtained by reducing the sliding step length, data denoising, null-removing operation and normalization processing are carried out on the data signals obtained through the processing, and the data signals are scaled according to the proportion to fall into a specific (0,1) interval;
step3, taking the processed data as an input sample, sending the processed data into a multi-attention deep convolutional neural network for training, setting a learning rate, an optimizer and a fixed batch, and then continuously reducing the loss value of the deep convolutional neural network model by utilizing gradient descent and updating each weight parameter at the same time until the loss value is smaller than a preset value to obtain a training model;
And Step4, classifying and recognizing the human body posture data to be recognized by using the trained model.
Further, in Step1, the sensor down-sampling frequency is set to be 20Hz to 40Hz when the sensor data signals are collected.
Further, Step2 includes removing outliers and nulls from the data, and rearranging the number of each activity category, so that the data set is subjected to uniform distribution, and 70% and 30% are used as training samples and test samples, respectively.
Further, Step3 specifically includes the following contents:
3.1, establishing a 6-layer convolution attention neural network model. The whole model establishes three convolution blocks together, wherein each block comprises two convolution extraction layers and a jump convolution layer;
a: a convolutional neural network was constructed using 6-layer deep convolution:
the convolutional neural network is constructed using 6 convolutional layers, where each two convolutional layers are built into one convolutional block. Adding a jump convolution layer to the convolution of two layers of each block to maintain the dimension R of the input dataC×H×W(C is the number of data channels, H is the data height, W is the data width) and output data RC’×H×W(C' is the number of data channels, H is the data height, and W is the data width) can be linearly weighted. Where C → C' is determined by the channel dimension in the convolutional layer. After the 6 layers of convolutional layers are built, output data are sent to an attention network for linear weighting of attention weight. Then finally sending the data into a full connection layer for human body action classification calculation
B: establishing a channel attention network:
sequentially adding a channel attention network and a time sequence attention network behind two convolutional layers of each block of 3 blocks of the model, and adding the channel attention network to a characteristic diagram channel dimension in order to determine the sensor characteristic importance degree of convolutional characteristics in the channel dimension;
feature time dimension information is first compressed using average pooling and maximum pooling to generate an average timing featureAnd maximum timing characteristicsAnd then sending the two characteristics to a multilayer perceptron to obtain final output characteristics by using point-by-point summation, wherein the final output characteristics are specifically represented as follows:
MC(F)=σ(MLP(Avgpool(F))+MLP(MaxPool(F)))
where σ denotes the sigmod activation function. MLP represents a standard multi-tier perceptron, AvgPool, Maxpool represent average pooling, maximum pooling operations;
the features generated by the attention network are added to the channel dimension of the sensor metadata, so that the corresponding weight of the channel dimension can be further generated, and the sensor axis corresponding to the human body posture action can be accurately positioned.
C: establishing a time sequence attention network:
and after each convolution block is added with the channel attention network, sequentially establishing a time sequence attention network, and adding time sequence attention to the time dimension of the characteristic diagram in order to determine the accurate position of the target action of the sensor data on the time axis. Channel dimension information is first compressed using average pooling and maximum pooling to generate average channel characteristics And maximum channel characteristicsWherein the content of the first and second substances,representing the channel-averaged pooling characteristics along the time axis,indicating the maximum pooled feature of the channels along the time axis, H, W being the height and width of these features, respectively, and the number of channels for these features being 1. And then linearly superposing the two features, and sending the two features into a convolution layer for convolution to obtain a final convolution feature, which is specifically expressed as:
MT(F)=σ(conv7×1([AvgPool(F);MaxPool(F)]))
where σ denotes the sigmod activation function, conv7×1Representing a convolution layer with a convolution kernel size of 7 multiplied by 1, wherein AvgPool and MaxPool respectively represent average pooling operation and large pooling operation;
3.2, importing training samples to adjust the parameters of the convolutional neural network model to obtain a model with high accuracy
In the convolutional neural network model, the size of a first layer of convolutional kernel is (6, 1), and the step length is (2, 1); the second layer convolution kernel size is (6, 1) and the step size is (2, 1); the size of a convolution kernel in the third layer is (6, 1), and the step length is (2, 1); the convolutional layer filling is set to (1, 0), the activation functions all use ReLu and add BatchNorm layer by layer to reduce the overfitting possibility, and finally, in order to obtain a classification effect with more obvious tendency, the classification is output by a Softmax layer.
Has the advantages that: compared with the prior art, the invention has the following remarkable progress:
The original data is subjected to frequency resampling processing, time dimensionality and feature dimensionality can be fused together for comprehensive consideration, high-precision discrimination is realized after deep convolutional neural network training, linear weighting is carried out on the features obtained by sequentially adding channel attention and time sequence attention to the features extracted by convolution and the original features, richer sensor data features are obtained, the convolutional neural network is trained, and the trained network model is subjected to human body posture recognition. An attention mechanism is added to the channel axis and the time axis of the extracted convolution features, so that the time of target action in the features and the category of the target action can be effectively positioned, the method can be effectively used for identifying coarse-grained human body actions, and the complexity of manually marked data is reduced; the invention adopts the sliding window technology to quickly preprocess the data under the condition of ensuring that the data does not lose the action characteristic, thereby effectively avoiding the defects of the traditional data processing.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the present invention;
FIG. 3 is a plot of a small batch of waveform of the raw triaxial acceleration data of the present invention;
FIG. 4 is a graph of error variation for training times according to the present invention;
FIG. 5 is a visualization of the sensor training data channel dimensions in the present invention;
FIG. 6 is a visualization of the time dimension of sensor training data in the present invention;
FIG. 7 is a graph of a confusion matrix for a test data set of the present invention.
Detailed Description
The technical solution and effects of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides a human body posture recognition method based on time and channel double attention, which comprises the following steps:
step1, recruiting volunteers, wearing a movement sensor, recording three-axis acceleration data of the volunteers under different body part (such as wrist, chest, leg and the like) movements (such as standing, sitting, going up stairs, going down stairs, jumping, walking and the like), and attaching corresponding movement type labels to the movement signal data;
step2, cleaning the acquired triaxial acceleration data and removing noise, performing frequency resampling processing on the cleaned data, and dividing the data into a training set and a test set after normalization processing, wherein the frequency resampling processing and normalization processing are as follows: the data is subjected to time series signal frequency down-sampling to be arranged into a data signal diagram, and the data signal diagram obtained by the processing is subjected to normalization processing, namely is scaled to fall into a specific (0,1) interval;
Step3, the processed data is a four-dimensional tensor, and includes data, features and channel information. Then, the processed data is used as an input sample and sent to a convolutional neural network for training, the batch size and the learning rate are set, and a weight parameter is automatically updated by utilizing a back propagation technology to obtain an optimal convolutional neural network model;
and Step4, classifying and recognizing the human body posture data to be recognized by using the trained model.
The human body posture identification method based on the deep convolutional neural network can identify six action postures of jumping, walking, going upstairs, going downstairs, standing and sitting.
FIG. 1 is a flow chart of an object of the present invention, which is to collect data from an original sensor, preprocess the data, input the data to a convolutional neural network for model training, and apply an ideal model obtained after training to human posture recognition data, thereby realizing human posture recognition.
FIG. 2 is a block diagram of a convolutional neural network model based on time and channel dual attention. Which contains six layers of convolution and a final classification layer. The figure also includes the internal structure of the attention module, namely after data is input, the channel attention module generates the channel attention feature and the primary convolution feature to carry out linear weighting, and then the time attention module generates the time sequence attention feature and the channel attention feature to carry out linear weighting sequentially to generate the final sensor data feature.
Specifically, firstly, time series signal frequency resampling and normalization processing are carried out on various types of human posture action signal data collected from a mobile sensor, the processed data are sent to a convolution neural network to be subjected to convolution operation to obtain corresponding convolution characteristics, then the convolution characteristics are sent to a channel attention module and a time series attention module in sequence, and the obtained attention characteristics and original characteristics are subjected to linear weighting to obtain final attention characteristics. The attention mechanism for each convolutional neural network is implemented as shown in the attention module of FIG. 2. The size of convolution kernel F during the experiment was (6, 1), the convolution step size was (2, 1), the convolution pad was set to (1, 0), there were 128 convolution kernels in total, the ReLu activation function was used and the BatchNorm layer was added.
B: attention of the channel:
to determine the sensor feature importance of the convolution feature in the channel dimension, channel attention is added to the feature map channel dimension. Feature time dimension information is first compressed using average pooling and maximum pooling to generate an average timing featureAnd maximum timing characteristicsAnd then sending the two characteristics to a multilayer perceptron to obtain final output characteristics by using point-by-point summation. The concrete expression is as follows:
MC(F)=σ(MLP(Avgpool(F))+MLP(MaxPool(F)))
Where σ denotes the sigmod activation function.
C: time-series attention:
to determine the exact location of the sensor data on the time axis for the target action, time-series attention is added to the feature map time dimension. First using an averaging cellCompressing channel dimension information by pooling and max-pooling to generate average channel characteristicsAnd maximum channel characteristicsAnd then linearly superposing the two features, and sending the two features into a convolution layer for convolution to obtain the final convolution feature. The concrete expression is as follows:
MT(F)=σ(conv7×1([AvgPool(F);MaxPool(F)]))
where σ denotes the sigmod activation function, conv7×1Represents a convolution layer having a convolution kernel size of 7 × 1.
Then, training samples are led in to adjust the parameters of the convolutional neural network model, and a model with high accuracy is obtained.
And in the network training, the dynamic learning rate is adopted to ensure that the curve oscillation is small, the initial learning rate is set to be 0.001, and every 50epochs is reduced to be 0.1 time of the original rate.
Compared with the traditional convolutional neural network, the method utilizes the deep neural network to extract richer sensor data characteristics, and simultaneously adds channel attention and time sequence attention to the sensor data in sequence to generate the characteristic diagram with attention characteristics. Due to the superposition of the channel attention and the time sequence attention, the method can accurately position the type and the occurrence time of the target action after a large amount of coarse-grained data training, and greatly reduces the complexity of manually marking training data. Through experimental comparison, the method disclosed by the invention is obviously superior to the traditional convolutional neural network in precision and has a better positioning effect.
FIG. 3 is a plot of a small batch of waveforms of raw sensor triaxial acceleration data. The down-sampling frequency of the motion sensor is preferably set to about 33 Hz.
FIG. 4 is a graph of the error variation of the neural network after 500epochs training.
The accuracy of the model of the error map is continuously increased along with the application of the deep convolutional network and the attention module. When the deep convolutional network and the double attention module act simultaneously, the model can obtain the optimal precision, and the generalization capability of the model is greatly improved.
FIG. 5 is a visualization of the sensor training data channel dimensions in the present invention.
Through the visualization of the training data channel dimension, the sensor type which plays a role can be distinguished by the channel which plays a role in the sensor channel dimension when the target action occurs, and the method has important significance for guiding the aspects of human action recognition, health detection and the like.
FIG. 6 is a visualization of the time dimension of sensor training data in the present invention.
Through the visualization of the training data time sequence dimension, the occurrence area and the category of the target action can be accurately positioned in a long string of sensing data containing the background action. By using the semi-supervised attention mechanism, a more accurate human body action classification result can be obtained by training a large amount of rough label sample data, the strict marking property of training data is reduced, the complexity of manual marking data is saved, and the human body gesture recognition industry is more convenient and faster.
FIG. 7 is a graph of a confusion matrix for a test data set of the present invention.
Confusion matrices are techniques used to summarize the performance of classification algorithms. If the number of samples in each class is not equal, or there are more than two classes in the dataset, then misleading may occur if only the classification accuracy is used as the criterion. Computing the confusion matrix allows us to better understand how the classification model behaves and what types of errors it makes. In the figure, we can see that the horizontal axis is the predicted result, the vertical axis is the true labeled result, and the main diagonal is the same number of samples as the predicted result and the true result.
By analyzing the confusion matrix, the recognition precision conditions of the convolutional neural network model to different actions can be obtained, so that the network parameters can be modified. The final model classification precision is 98.87, which meets the requirements of practical application.
Claims (4)
1. A human body posture recognition method based on time and channel double attention is characterized by comprising the following steps:
step1, acquiring human body posture action signal data of each activity type through a mobile sensor, and attaching corresponding action attribute labels to the action signal data;
step2, preprocessing the collected motion signal data, and dividing the processed data into a training sample set and a testing sample set; the processing comprises the following steps: carrying out fixed-step sliding window processing on the data, cutting the original long-section sensor data into data with fixed size, carrying out data denoising, null-removing operation and normalization processing on the processed data signal, and scaling the data signal according to the proportion to make the data fall into a specific (0, 1) interval;
Step3, inputting the processed data serving as an input sample into a multi-attention deep convolutional neural network for training, setting a learning rate, an optimizer and a fixed batch, continuously reducing the loss value of a deep convolutional neural network model by using gradient descent, and updating each weight parameter until the loss value is smaller than a preset value to obtain a training model;
and Step4, classifying and recognizing the human body posture data to be recognized by using the trained model.
2. The method for recognizing human body posture based on time and channel double attention of claim 1, characterized in that in Step1, the sensor down-sampling frequency is set to 20Hz-40Hz when the sensor data signal is collected.
3. The human body posture recognition method based on time and channel double attention as claimed in claim 1, characterized in that: in Step2, the data processing includes removing outliers and nulls from the data, and rearranging the number of each activity category, so that the data set is subjected to uniform distribution, and 70% and 30% are used as training samples and test samples, respectively.
4. The method for recognizing human body posture based on time and channel double attention as claimed in claim 1, wherein Step3 specifically comprises the following steps:
3.1, establishing a 6-layer convolution attention neural network model, and establishing three convolution blocks in the whole model, wherein each block comprises two convolution extraction layers and a jump convolution layer;
a: a convolutional neural network was constructed using 6-layer deep convolution:
constructing a convolutional neural network by using 6 convolutional layers, wherein each two convolutional layers are constructed into a convolutional block, and a jumper convolutional layer is added to the two layers of convolution of each block and used for keeping the dimension R of input dataC×H×WAnd output data dimension RC’×H×WLinear weighting can be carried out, wherein C → C 'is determined by the channel dimension number in the convolutional layers, C input is the number of data channels, H is the data height, W is the data width, C' is the number of output data channels, after 6 layers of convolutional layers are built, output data are sent to an attention network for linear weighting of attention weight, and then are sent to a full-connection layer for classification calculation of human body actions;
b: establishing a channel attention network:
sequentially adding a channel attention network and a time sequence attention network behind two convolutional layers of each block of the 3 blocks, and adding the channel attention network to the channel dimension of the feature diagram in order to determine the sensor feature importance degree of the convolutional features in the channel dimension; compressing feature time dimension information using average pooling and maximum pooling to generate average timing features And maximum timing characteristicsAnd then sending the two characteristics to a multilayer perceptron to obtain final output characteristics by using point-by-point summation, wherein the final output characteristics are specifically represented as follows:
wherein, sigma represents sigmod activation function, MLP represents a standard multilayer perceptron, and AvgPool and Maxpool represent average pooling and maximum pooling operations;
c: establishing a time sequence attention network:
after adding the channel attention network to each convolution block, sequentially establishing a time sequence attention network, adding time sequence attention to the time dimension of the feature map in order to determine the accurate position of the target action of the sensor data on a time axis, firstly compressing channel dimension information by using average pooling and maximum pooling to generate average channel featuresAnd maximum channel characteristicsWherein the content of the first and second substances,representing the channel-averaged pooling characteristics along the time axis,the method comprises the steps of representing the maximum pooling characteristic of channels along a time axis, enabling the number of the channels of all the characteristics to be 1, linearly superposing the two characteristics, feeding the two characteristics into a convolution layer for convolution to obtain a final convolution characteristic, and specifically representing the maximum pooling characteristic as follows:
where σ denotes the sigmod activation function, conv7×1Representing a convolution layer with a convolution kernel size of 7 multiplied by 1, wherein AvgPool and MaxPool respectively represent average pooling operation and large pooling operation;
3.2, importing training samples to adjust the parameters of the convolutional neural network model, and training the model
In the convolutional neural network model, the size of a first layer of convolutional kernel is (6, 1), and the step length is (2, 1); the second layer convolution kernel size is (6, 1) and the step size is (2, 1); the size of a convolution kernel in the third layer is (6, 1), and the step length is (2, 1); convolutional layer padding is set to (1, 0), activation functions are all using ReLu and adding BatchNorm layer by layer to reduce overfitting, and finally output classification after Softmax layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010588253.0A CN111860188A (en) | 2020-06-24 | 2020-06-24 | Human body posture recognition method based on time and channel double attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010588253.0A CN111860188A (en) | 2020-06-24 | 2020-06-24 | Human body posture recognition method based on time and channel double attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111860188A true CN111860188A (en) | 2020-10-30 |
Family
ID=72989342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010588253.0A Pending CN111860188A (en) | 2020-06-24 | 2020-06-24 | Human body posture recognition method based on time and channel double attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860188A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112468509A (en) * | 2020-12-09 | 2021-03-09 | 湖北松颢科技有限公司 | Deep learning technology-based automatic flow data detection method and device |
CN113283298A (en) * | 2021-04-26 | 2021-08-20 | 西安交通大学 | Real-time behavior identification method based on time attention mechanism and double-current network |
CN113298083A (en) * | 2021-02-25 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN114169374A (en) * | 2021-12-10 | 2022-03-11 | 湖南工商大学 | Cable-stayed bridge stay cable damage identification method and electronic equipment |
CN115017961A (en) * | 2022-08-05 | 2022-09-06 | 江苏江海润液设备有限公司 | Intelligent control method of lubricating equipment based on neural network data set augmentation |
CN116010816A (en) * | 2022-12-28 | 2023-04-25 | 南京大学 | LRF large-kernel attention convolution network activity identification method based on large receptive field |
CN117033979A (en) * | 2023-09-04 | 2023-11-10 | 中国人民解放军空军预警学院 | Space target identification method with same shape and micro-motion form as inclusion relation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062170A (en) * | 2017-12-15 | 2018-05-22 | 南京师范大学 | Multi-class human posture recognition method based on convolutional neural networks and intelligent terminal |
CN108345846A (en) * | 2018-01-29 | 2018-07-31 | 华东师范大学 | A kind of Human bodys' response method and identifying system based on convolutional neural networks |
CN110991511A (en) * | 2019-11-26 | 2020-04-10 | 中原工学院 | Sunflower crop seed sorting method based on deep convolutional neural network |
-
2020
- 2020-06-24 CN CN202010588253.0A patent/CN111860188A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062170A (en) * | 2017-12-15 | 2018-05-22 | 南京师范大学 | Multi-class human posture recognition method based on convolutional neural networks and intelligent terminal |
CN108345846A (en) * | 2018-01-29 | 2018-07-31 | 华东师范大学 | A kind of Human bodys' response method and identifying system based on convolutional neural networks |
CN110991511A (en) * | 2019-11-26 | 2020-04-10 | 中原工学院 | Sunflower crop seed sorting method based on deep convolutional neural network |
Non-Patent Citations (1)
Title |
---|
SANGHYUN WOO 等: "《CBAM: Convolutional Block Attention Module》", 《THE EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)2018》, 14 September 2018 (2018-09-14), pages 1 - 9 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112468509A (en) * | 2020-12-09 | 2021-03-09 | 湖北松颢科技有限公司 | Deep learning technology-based automatic flow data detection method and device |
CN113298083A (en) * | 2021-02-25 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN113283298A (en) * | 2021-04-26 | 2021-08-20 | 西安交通大学 | Real-time behavior identification method based on time attention mechanism and double-current network |
CN113283298B (en) * | 2021-04-26 | 2023-01-03 | 西安交通大学 | Real-time behavior identification method based on time attention mechanism and double-current network |
CN114169374A (en) * | 2021-12-10 | 2022-03-11 | 湖南工商大学 | Cable-stayed bridge stay cable damage identification method and electronic equipment |
CN114169374B (en) * | 2021-12-10 | 2024-02-20 | 湖南工商大学 | Cable-stayed bridge stay cable damage identification method and electronic equipment |
CN115017961A (en) * | 2022-08-05 | 2022-09-06 | 江苏江海润液设备有限公司 | Intelligent control method of lubricating equipment based on neural network data set augmentation |
CN116010816A (en) * | 2022-12-28 | 2023-04-25 | 南京大学 | LRF large-kernel attention convolution network activity identification method based on large receptive field |
CN116010816B (en) * | 2022-12-28 | 2023-09-08 | 南京大学 | LRF large-kernel attention convolution network activity identification method based on large receptive field |
US11989935B1 (en) | 2022-12-28 | 2024-05-21 | Nanjing University | Activity recognition method of LRF large-kernel attention convolution network based on large receptive field |
CN117033979A (en) * | 2023-09-04 | 2023-11-10 | 中国人民解放军空军预警学院 | Space target identification method with same shape and micro-motion form as inclusion relation |
CN117033979B (en) * | 2023-09-04 | 2024-06-04 | 中国人民解放军空军预警学院 | Space target identification method with same shape and micro-motion form as inclusion relation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860188A (en) | Human body posture recognition method based on time and channel double attention | |
Lester et al. | A hybrid discriminative/generative approach for modeling human activities | |
CN112560723B (en) | Fall detection method and system based on morphological recognition and speed estimation | |
CN106909938B (en) | Visual angle independence behavior identification method based on deep learning network | |
Hou | A study on IMU-based human activity recognition using deep learning and traditional machine learning | |
Bu | Human motion gesture recognition algorithm in video based on convolutional neural features of training images | |
CN110826453A (en) | Behavior identification method by extracting coordinates of human body joint points | |
CN111753683A (en) | Human body posture identification method based on multi-expert convolutional neural network | |
CN112464738B (en) | Improved naive Bayes algorithm user behavior identification method based on mobile phone sensor | |
KR102637133B1 (en) | On-device activity recognition | |
CN109389035A (en) | Low latency video actions detection method based on multiple features and frame confidence score | |
CN111723662B (en) | Human body posture recognition method based on convolutional neural network | |
US20230401466A1 (en) | Method for temporal knowledge graph reasoning based on distributed attention | |
Parameswari et al. | Human activity recognition using SVM and deep learning | |
CN115346272A (en) | Real-time tumble detection method based on depth image sequence | |
CN114241270A (en) | Intelligent monitoring method, system and device for home care | |
Li et al. | [Retracted] Human Motion Representation and Motion Pattern Recognition Based on Complex Fuzzy Theory | |
CN115546491A (en) | Fall alarm method, system, electronic equipment and storage medium | |
Xu et al. | Comparative studies on activity recognition of elderly people living alone | |
Gui et al. | An approach to extract state information from multivariate time series | |
Yuan et al. | The Human Continuity Activity Semisupervised Recognizing Model for Multiview IoT Network | |
Xu et al. | Fall detection based on person detection and multi-target tracking | |
CN111860191A (en) | Human body posture identification method based on channel selection convolutional neural network | |
Deepan et al. | An intelligent robust one dimensional har-cnn model for human activity recognition using wearable sensor data | |
Wang et al. | Analysis of Digital Long Jump Take‐off Wearable Sensor Monitoring System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |