CN110321833B - Human body behavior identification method based on convolutional neural network and cyclic neural network - Google Patents
Human body behavior identification method based on convolutional neural network and cyclic neural network Download PDFInfo
- Publication number
- CN110321833B CN110321833B CN201910580116.XA CN201910580116A CN110321833B CN 110321833 B CN110321833 B CN 110321833B CN 201910580116 A CN201910580116 A CN 201910580116A CN 110321833 B CN110321833 B CN 110321833B
- Authority
- CN
- China
- Prior art keywords
- neural network
- time
- convolutional neural
- rnn
- human body
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 103
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 32
- 125000004122 cyclic group Chemical group 0.000 title claims abstract description 20
- 239000013598 vector Substances 0.000 claims abstract description 79
- 230000000306 recurrent effect Effects 0.000 claims abstract description 64
- 238000012549 training Methods 0.000 claims abstract description 58
- 230000006399 behavior Effects 0.000 claims abstract description 34
- 238000012706 support-vector machine Methods 0.000 claims abstract description 29
- 230000009471 action Effects 0.000 claims abstract description 22
- 238000012795 verification Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 26
- 230000004913 activation Effects 0.000 claims description 12
- 230000002457 bidirectional effect Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 239000000463 material Substances 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 2
- 239000010410 layer Substances 0.000 description 29
- 210000004027 cell Anatomy 0.000 description 19
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 210000003423 ankle Anatomy 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 210000002683 foot Anatomy 0.000 description 2
- 210000004247 hand Anatomy 0.000 description 2
- 210000003127 knee Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 210000000707 wrist Anatomy 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a human body behavior identification method based on a convolutional neural network and a cyclic neural network, which is characterized in that a sensor is used for tracking human body behaviors, and a 3-dimensional coordinate vector group and an RGB video of human body joints in a time period are collected. And then training the 3-dimensional coordinates of the joints of the human body by using a Recurrent Neural Network (RNN) to obtain a time characteristic vector. Training the RGB video by using a convolutional neural network CNN to obtain a space-time characteristic vector, finally combining the time characteristic vector and the space-time characteristic vector and normalizing, feeding the normalized space-time characteristic vector to a classifier of a linear SVM, using a verification data set to find a parameter C of the linear support vector machine SVM, and finally obtaining a comprehensive recognition model. The method can solve the problem of overfitting of the model to action classification in the model training process, and can effectively improve the human behavior recognition efficiency and accuracy.
Description
Technical Field
The invention relates to a human body behavior recognition method based on a convolutional neural network and a cyclic neural network, and belongs to the cross technical field of behavior recognition, deep learning, machine learning and the like.
Background
An important research topic in the field of computer vision for behavior recognition and classification in videos becomes a research hotspot in the field of computer vision due to the wide application of video tracking, motion analysis, virtual reality and artificial intelligence interaction,
because the same action scene can be different under different illumination, visual angles, backgrounds and other conditions, and the same object and action in different action scenes can also generate obvious difference in characteristics and postures, the human body action has larger degree of freedom even in a constant action scene, and each same action has great difference in direction, angle and other aspects. In addition, problems of partial occlusion, individual difference and the like are the embodiment of the motion recognition complexity in space. The method of manual design is difficult to obtain the essential features of the object from the scene with severe changes, so that a more popular feature extraction method needs to be provided to solve the problems of one-sidedness, blindness and the like caused by the manual feature extraction method.
At present, many motion recognition algorithms are shallow network learning methods, and have certain limitations, for example, under the condition that training samples are limited, the capability of representing complex functions is limited, and the generalization capability of complex classification problems is also limited.
The motion recognition problem can be considered as a classification problem, and many methods in classification are designed for motion recognition, specifically using Logistic regression analysis, decision tree models, naive bayes classifiers, and support vector machines. These methods have both advantages and disadvantages in practical applications.
In general television and video images, 3D videos are often adopted, the existing technology for behavior recognition in the 3D videos is not mature, most human behavior recognition systems rely on manual marking processing of data, and then the data are placed in models for recognition. The method has strong dependence on data, has low system operation efficiency, and is not suitable for the requirements of industrialization and commercialization.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides a human body behavior recognition method based on a convolutional neural network and a cyclic neural network.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a human body behavior recognition method based on a convolutional neural network and a cyclic neural network comprises the steps of firstly enabling a user to continuously swing a hand at a fixed position for 5 times, tracking human body behaviors by using a Microsoft Kinect v2 sensor, and collecting 3-dimensional coordinate vector groups of 25 main joints of the user in a time period and RGB videos in the time period by taking 16 milliseconds as time step. Then, a gated circulation unit GRU is used as a basic unit of a circulation layer, and a bidirectional circulation neural network is used for training a 3-dimensional coordinate data set of the human body joint, so that a time feature vector of the 3-dimensional coordinate data of the human body joint is obtained. Dividing the RGB video into continuous RGB frames with 16 milliseconds as time step length, training a convolutional neural network on the data set to obtain a space-time characteristic vector of the RGB video, finally combining the output results of the two cyclic neural networks and the convolutional neural network, normalizing the output results after connection, feeding the normalized result to a classifier of a linear SVM, using a verification data set to find a parameter C of the linear support vector machine SVM, and finally obtaining a comprehensive recognition model, wherein the comprehensive recognition model specifically comprises the following steps:
step 1), a user continuously swings hands 5 times at a fixed position, a sensor is used for tracking human body behaviors, the time step is set to be 16 milliseconds during collection, and 3-dimensional coordinates V { (x) of the main 25 joints of the user in the time period are collected1,y1,z1),(x2,y2,z2),...,(x25,y25,z25) }; the x axis is vertical to the vertical direction of the human body, the positive axis points to the front, the y axis is parallel to the vertical direction of the human body, the positive axis points to the head, and the z axis is vertical to the vertical direction of the human body, and the positive axis points to the left side of the human body; at the same time collectRGB video of the scene over a period of time;
step 2), taking the 3-dimensional coordinates of 25 joints of the human body in the step 1) as a training set of a Recurrent Neural Network (RNN), training the Recurrent Neural Network (RNN) by using the training set of the Recurrent Neural Network (RNN), operating input coordinates by using a gated cyclic unit (GRU) in the Recurrent Neural Network (RNN), wherein the RNN has the functions of batch normalization and random loss, the Recurrent Neural Network (RNN) comprises two bidirectional gated cyclic layers, a hidden full-connection layer and an output layer of a softmax model, a linear rectification function (ReLU) is taken as an activation function, and a dropout mechanism is used for mapping motion characteristics to motion categories of a waving hand to obtain the trained RNN;
step 3), dividing the RGB video acquired in the step 1) into continuous RGB frames by taking 16 milliseconds as a time step, adjusting the resolution of the continuous RGB frames, and taking the continuous RGB frames as a training set of a Convolutional Neural Network (CNN); carrying out convolution operation on an input RGB frame stream by using a convolutional neural network CNN through a plurality of convolution cores, wherein the convolutional neural network CNN comprises a convolution layer, a pooling layer, a full connection layer and an output layer of a softmax model, and finely adjusting the model by using pre-training parameters on a Sports-1M data set so as to reduce overfitting and training time and obtain a trained convolutional neural network CNN;
step 4), taking the output result of the recurrent neural network RNN trained in the step 2) as a time characteristic vector of human joint 3-dimensional coordinate data, taking the output result of the convolutional neural network CNN trained in the step 3) as a space-time characteristic vector of an RGB video, connecting and combining the time characteristic vector of the human joint 3-dimensional coordinate data and the space-time characteristic vector of the RGB video, normalizing the time characteristic vector after connection, feeding the normalized time characteristic vector to a classifier of a linear Support Vector Machine (SVM), and finding a penalty coefficient C of the linear Support Vector Machine (SVM) by using a verification data set to obtain a comprehensive recognition model;
and 5) during recognition, acquiring 3-dimensional coordinates and RGB (red, green and blue) videos of 25 joints of the human body behavior to be recognized by adopting the method in the step 1, putting the acquired 3-dimensional coordinates of 25 joints of the human body into the trained Recurrent Neural Network (RNN) in the step 2) to obtain a time characteristic vector, putting the acquired RGB videos into the trained Convolutional Neural Network (CNN) in the step 3) to obtain a space-time characteristic vector, connecting the time characteristic vector and the space-time characteristic vector into a characteristic array, normalizing the characteristic array, introducing the characteristic array into a comprehensive recognition model, and recognizing the behavior of the human body by using the comprehensive recognition model.
Preferably: the sensor in step 1) is a Microsoft Kinect v2 sensor.
Preferably: the method for obtaining the trained recurrent neural network RNN in the step 2) comprises the following steps:
step 21), taking the 3-dimensional coordinates of the 25 joints of the human body in the step 1) as a training set of the recurrent neural network RNN, wherein the dimensionality of the training set of the recurrent neural network RNN is 16 x (25 x 3);
step 22), training a recurrent neural network on a training set of the recurrent neural network RNN, first passing through two bidirectional gated recurrent layers, using gated recurrent units GRU as basic units of the recurrent layers and using a bidirectional recurrent neural network, which respectively train input data from T-1 to T-T and T-T to T-1, where T represents a time variable in the data set and T represents the last moment in the data set;
step 23), utilizing GRU type recurrent neural network RNN to classify scene features, utilizing the degree of updating the gate to control the state information of the previous moment to be brought into the current state, and updating the gate zt=σ(Wz·[ht-1,xt]) Wherein h ist-1Value of candidate memory cell at time t-1, xtData representing input at time t, WzWeight, z, representing the value of the memory cell corresponding to the refresh gate datatThe value of an update gate at the moment t is represented, and sigma represents a sigmoid activation function;
step 24), control how much information of the previous state is written into the candidate memory unit h at time t by using the reset gatetReset gate rt=σ(Wr·[ht-1,xt]) Wherein h ist-1Value of candidate memory cell at time t-1, xtData representing input at time t, WrRepresenting the weight, r, of the value of the memory cell corresponding to the refresh gate datatA value representing the reset gate at time t;
step 25), calculating the candidate memory cell value, candidate memory cellWherein h istIs the value of the candidate memory cell at time t, ht-1The value of the candidate memory cell for t-1, xtRepresents the data input at time t, W represents the weight of the GRU unit at the current time,representing the value of the candidate memory cell, tanh representing the trigonometric tangent function;
step 26), calculating the state value of the memory unit at the current moment, and calculating the memory unit at the t momentWherein, "" indicates the dot-by-dot product, and the memory cell state update depends on the value h of the candidate memory cell at the moment of t-1t-1And the value of the candidate memory cellAnd the two factors are respectively adjusted through an update gate and a reset gate;
step 27), finally obtaining the output y of the recurrent neural networkt=σ(W·ht) Transfer to fully-connected layer using softmax activation function and interpret output as probability, ytRepresenting the probability of prediction using a recurrent neural network.
Preferably: the method for obtaining the trained convolutional neural network CNN in the step 3) is as follows:
step 31), dividing the RGB video acquired in step 1) into continuous RGB frames by taking 16 milliseconds as a time step, adjusting the resolution of the continuous RGB frames, taking the RGB frame stream as a training set of a convolutional neural network CNN, and inputting the RGB frame stream into the training set called as the size of c multiplied by t multiplied by h multiplied by w, wherein c is the number of channels, t is the time step, and h and w are the height and width of the RGB frames;
step 32) of receiving video frames by convolutional neural network CNNParallelizing the input, checking the input matrix x by the convolutional neural network CNN for a convolutional window of length h1:nPerforming convolution operation with convolution kernel output value of ci=f(w·xi:i+h-1+ b) of whichIs the weight of the convolution kernel and is,denotes that w takes value in the real number domain and d denotes xiThe dimensions of the material are measured in the same way,is the offset, f is the activation function, xi:i+h-1A frame vector matrix of a convolution window and a frame stream with a time gradient of n are subjected to convolution operation to obtain a convolution eigenvector c ═ c1,c2,…,cn-h+1]。
Step 33) of extracting a maximum value from each of the eigenvectors, obtaining the eigenvectors from the windows with m convolution kernelsWherein,the feature vectors extracted for the convolutional neural network,the feature vector representing the mth convolution kernel.
And step 34), outputting a classification result through a softmax function to obtain the following formula:
where y represents the probability of prediction using a convolutional neural network,for the regularization term constraint of the downsampled layer output,the multiplication is carried out for the corresponding elements,is a matrix of the weights of the full connection layer,for the offset, the convolutional neural network CNN is optimized by a random gradient descent optimizer.
Preferably: the method for obtaining the comprehensive identification model in the step 4) comprises the following steps:
step 41), extracting the recurrent neural network RNN trained in the step 2) from the 3-dimensional coordinate data of the human body, taking the output result as a time characteristic vector, extracting the convolutional neural network CNN trained in the step 3) from the RGB video, taking the output result as a space-time characteristic vector, connecting the time characteristic vector and the space-time characteristic vector into a characteristic array, and normalizing the characteristic array;
and 42), taking the normalized feature vector groups as input, taking the specific action or behavior mark corresponding to each normalized feature vector group as output, submitting the output to a linear Support Vector Machine (SVM) for training, wherein an optimization model of the SVM is as the following formula:
yi(ωTxi+b0)≥1-ξi
s.t.ξi≥0
i=1,2,...,N
where ω represents the input vector, C represents a penalty factor, ξiIndicating the classification loss for the ith sample point,
yirepresenting each sample corresponds toAction flag, b0Representing the intercept, N representing the total number of feature vectors input into the SVM;
and 43), finding the optimal value of the penalty coefficient C through the training and verifying set to obtain a comprehensive identification model.
Preferably: in step 33), when the convolutional neural network CNN is optimized by the random gradient descent optimizer, the initial learning rate is 0.0001, and when no training progress is observed, the learning rate is reduced by half.
Preferably, the following components: normalized formula in step 41):
wherein x isiDenotes the element, x 'in the feature array'iRepresenting elements in the normalized feature vector.
Preferably: step 31) the resolution is adjusted from 1920 x 1080 pixels to 320 x 240 pixels.
Compared with the prior art, the invention has the following beneficial effects:
the invention uses a Microsoft Kinect v2 sensor to collect 3-dimensional coordinates and RGB video of human joints, uses a convolution neural network and a circulation neural network to obtain time characteristics and space-time characteristics of human behavior data in the video, and effectively combines the time characteristics and the space-time characteristics to finally obtain a model which can adapt to complex environment and actions, and can quickly and effectively obtain the identification of human actions in the video by inputting corresponding data of a section of video in the future, thereby having good accuracy and stability, in particular:
(1) the invention uses double-current RNN/CNN neural network algorithm, can effectively consider the influence of action continuity on action identification, and can identify and predict the action in a short time
(2) The invention combines the scene information and the action information into consideration, and matches the action sequence in the action database by using the scene information as a label, thereby more accurately finishing the identification of human actions.
(3) The invention provides an effective and strong-practicability system structure, and a user interface module and an operation recording module are configured, so that the stability of human behavior recognition is improved, and the specific application of the system structure in industry is facilitated.
Drawings
FIG. 1 is a flow chart of 3D video motion recognition based on convolutional neural network and cyclic neural network
FIG. 2 is a schematic diagram of a GRU-RNN-based neural network.
Fig. 3 is a schematic diagram of three-dimensional data sampling points of a human joint.
Detailed Description
The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are intended only to illustrate the present invention and not to limit the scope of the present invention, which is defined in the appended claims to the present application, and that modifications of various equivalent forms to the present invention by those skilled in the art will fall within the scope of the present invention after reading the present invention.
A human behavior recognition method based on convolutional neural network and cyclic neural network, as shown in fig. 1-3, comprises the following steps:
step 1), a user continuously swings his hand 5 times at a fixed position, the behavior of the human body is tracked by using a Microsoft Kinect v2 sensor, and the sampling points of the joints of the human body by using the Microsoft Kinect v2 sensor are as follows: 1-base of spine, 2-middle of spine, 3-neck, 4-head, 5-left shoulder, 6-left elbow, 7-left wrist, 8-hand, 9-right shoulder, 10-right elbow, 11-right wrist, 12-right hand, 13-left hip, 14-left knee, 15-left ankle, 16-left foot, 17-right hip, 18-right knee, 19-right ankle, 20-right foot, 21-spine, 22-left apex, 24-right apex, 25-right thumb. Setting the time step to be 16 milliseconds during collection, and collecting the 3-dimensional coordinates V { (x) of the main 25 joints of the user in the time period1,y1,z1),(x2,y2,z2),...,(x25,y25,z25) }; the x axis is vertical to the vertical direction of the human body, the positive axis points to the front, the y axis is parallel to the vertical direction of the human body, the positive axis points to the head, and the z axis is vertical to the vertical direction of the human body, and the positive axis points to the left side of the human body; simultaneous collectionRGB video of this scene during the time period;
step 2), taking the 3-dimensional coordinates of 25 joints of the human body in the step 1) as a training set of a Recurrent Neural Network (RNN), training the Recurrent Neural Network (RNN) by using the training set of the Recurrent Neural Network (RNN), operating input coordinates by using a gated cyclic unit (GRU) in the Recurrent Neural Network (RNN), wherein the RNN has the functions of batch normalization and random loss, the Recurrent Neural Network (RNN) comprises two bidirectional gated cyclic layers, a hidden full connection layer and an output layer of a softmax model, a linear rectification function (ReLU) is taken as an activation function, and a dropout mechanism is used for mapping motion characteristics to motion categories of a waving hand to obtain the trained Recurrent Neural Network (RNN);
step 21), taking the 3-dimensional coordinates of 25 joints of the human body in the step 1) as a training set of a Recurrent Neural Network (RNN), wherein the dimensionality of the training set of the Recurrent Neural Network (RNN) is 16 (time step) x (25 x 3) (3-dimensional joint coordinates);
step 22), training a recurrent neural network on a training set of the recurrent neural network RNN, first passing through two bidirectional gated recurrent layers, using gated recurrent units GRU as basic units of the recurrent layers and using bidirectional recurrent neural networks, which train input data from T ═ 1 to T ═ T and T ═ T to T ═ 1, respectively, wherein T represents a time variable in the data set, and T represents the last minute time in the data set;
step 23), utilizing GRU type recurrent neural network RNN to classify scene features, utilizing the degree of updating the gate to control the state information of the previous moment to be brought into the current state, and updating the gate zt=σ(Wz·[ht-1,xt]) Wherein h ist-1Value of candidate memory cell at time t-1, xtData representing input at time t, WzWeight, z, representing the value of the corresponding refresh gate data memory celltThe table represents the value of an update gate at the time t, and sigma represents a sigmoid activation function;
step 24), using the reset gate to control how much information of the previous state is written into the candidate memory unit h at time ttReset gate rt=σ(Wr·[ht-1,xt]) Wherein h ist-1Value of candidate memory cell at time t-1, xtData representing input at time t, WrRepresenting the weight, r, of the value of the memory cell corresponding to the refresh gate datatA value representing the reset gate at time t;
step 25), calculating the candidate memory cell value, candidate memory cellWherein h istIs the value of the candidate memory cell at time t, ht-1The value of the candidate memory cell for t-1, xtRepresents the data input at time t, W represents the weight of the GRU unit at the current time,a value representing a candidate memory cell, tanh representing a trigonometric tangent function;
step 26), calculating the state value of the memory unit at the current moment, and calculating the memory unit at the t momentWherein, "" indicates the dot-by-dot product, and the memory cell state update depends on the value h of the candidate memory cell at the moment of t-1t-1And the value of the candidate memory cellThe two factors are respectively adjusted through an updating gate and a resetting gate;
step 27), finally obtaining the output y of the recurrent neural networkt=σ(W·ht) Transfer to fully-connected layer using softmax activation function and interpret output as probability, ytRepresenting the probability of prediction using a recurrent neural network.
Step 3), dividing the RGB video collected in the step 1) into continuous RGB frames by taking 16 milliseconds as a time step, adjusting the resolution of the continuous RGB frames from 1920 × 1080 pixels to 320 × 240 pixels, and using the continuous RGB frames as a training set of the convolutional neural network CNN; carrying out convolution operation on an input RGB frame stream by using a convolutional neural network CNN through a plurality of convolution kernels, wherein the convolutional neural network CNN comprises a convolution layer, a pooling layer, a full connection layer and an output layer of a softmax model, and finely adjusting the model by using pre-training parameters on a Sports-1M data set to reduce overfitting and training time to obtain a trained convolutional neural network CNN;
the method for obtaining the trained convolutional neural network CNN in the step 3) is as follows:
step 31), dividing the RGB video acquired in step 1) into continuous RGB frames by taking 16 milliseconds as a time step, adjusting the resolution of the continuous RGB frames, taking the RGB frame stream as a training set of a convolutional neural network CNN, and inputting the RGB frame stream into the training set called as the size of c multiplied by t multiplied by h multiplied by w, wherein c is the number of channels, t is the time step, and h and w are the height and width of the RGB frames;
step 32), the convolutional neural network CNN receives the parallelized input of the video frames, and for a convolutional window of length h, the convolutional neural network CNN checks the input matrix x by a convolution kernel1:nPerforming convolution operation with convolution kernel output value of ci=f(w·xi:i+h-1+ b) ofIn order to be the weights of the convolution kernel,denotes that w takes value in the real number domain and d denotes xiThe dimensions of the material are measured in the same way,is the offset, f is the activation function, xi:i+h-1A frame vector matrix of a convolution window and a frame stream with a time gradient of n are subjected to convolution operation to obtain a convolution eigenvector c ═ c1,c2,…,cn-h+1]。
Step 33) of extracting a maximum value from each of the eigenvectors, obtaining the eigenvectors for the window with m convolution kernelsWherein,the feature vectors extracted for the convolutional neural network,the feature vector representing the mth convolution kernel.
And step 34), outputting a classification result through a softmax function to obtain the following formula:
where y represents the probability of prediction using a convolutional neural network,for the regularization term constraint of the downsampled layer output,the multiplication is carried out for the corresponding elements,is a matrix of the weights of the full connection layer,for the offset, the convolutional neural network CNN is optimized by a random gradient descent optimizer.
Step 4), taking the output result of the recurrent neural network RNN trained in the step 2) as a time characteristic vector of human joint 3-dimensional coordinate data, taking the output result of the convolutional neural network CNN trained in the step 3) as a space-time characteristic vector of an RGB video, connecting the time characteristic vector of the human joint 3-dimensional coordinate data and the space-time characteristic vector of the RGB video to form a characteristic array, normalizing the characteristic array after connection, feeding the characteristic array to a classifier of a linear Support Vector Machine (SVM), finding a penalty coefficient C of the linear Support Vector Machine (SVM) by using a verification data set, wherein the penalty coefficient represents the tolerance degree of errors, and obtaining a comprehensive identification model;
the method for obtaining the comprehensive identification model in the step 4) comprises the following steps:
step 41), extracting the recurrent neural network RNN trained in the step 2) from the 3-dimensional coordinate data of the human body, taking the output result as a time characteristic vector (600 dimensions), extracting the recurrent neural network CNN trained in the step 3) from the RGB video, taking the output result as a space-time characteristic vector (4096 dimensions), connecting the time characteristic vector and the space-time characteristic vector into a characteristic array (4696 dimensions), and normalizing;
normalized formula:
wherein x isiDenotes the element, x 'in the feature array'iRepresenting elements in the normalized feature vector.
And 42), taking the normalized feature vector groups as input, taking the specific action or behavior mark corresponding to each normalized feature vector group as output, submitting the output to a linear Support Vector Machine (SVM) for training, wherein an optimization model of the SVM is as the following formula:
yi(ωTxi+b0)≥1-ξi
s.t.ξi≥0
i=1,2,...,N
where ω represents the vector of inputs, C represents a penalty coefficient, ξiIndicating the classification loss for the ith sample point,
yirepresenting the action signature corresponding to each sample, b0Representing the intercept, N representing the total number of feature vectors input into the SVM;
and 43) finding the optimal value of the penalty coefficient C through the training and verification set to obtain a comprehensive recognition model, and finally verifying the motion category in the human body by using the model.
And 5) during recognition, acquiring 3-dimensional coordinates and RGB (red, green and blue) videos of 25 joints of the human body behavior to be recognized by adopting the method in the step 1, putting the acquired 3-dimensional coordinates of the 25 joints of the human body into the Recurrent Neural Network (RNN) trained in the step 2) to obtain a time characteristic vector, putting the acquired RGB videos into the Convolutional Neural Network (CNN) trained in the step 3) to obtain a space-time characteristic vector, connecting the time characteristic vector and the space-time characteristic vector into a characteristic array, normalizing the characteristic array, guiding the characteristic array into a comprehensive recognition model, and recognizing the human behavior by using the comprehensive recognition model.
Simulation:
a user continuously waves hands 5 times at a fixed position, uses a Microsoft Kinect V2 sensor to track human body behaviors, sets a time step length at the collection time to be 16 milliseconds, and collects a 3-dimensional coordinate vector set V { (x) of 25 joints which are main for the user in the time period1,y1,z1),(x2,y2,z2),...,(x25,y25,z25) }; each joint has 25 3D coordinates. This motion sample dimension is processed as 16 (time step) × (25 × 3) (3-dimensional joint coordinates) and is taken as a training set of the recurrent neural network.
Training the recurrent neural network using multiple GPUs, setting the learning rate to 0.001 and the decay rate to 0.9, and then training the network for a single-layer model using a small batch of 1000 sequences from the beginning; a small batch of 650 sequences was used for the bilayer model. For all RNN networks, 300 neurons are used in each unidirectional layer, the number of neurons in the bidirectional layer is doubled, the neurons are lost by using 75% of holding probability, and finally the time characteristic vector (600 dimensions) of the human body 3D joint coordinate data is obtained through a complete connection layer.
The RGB video is then imported for convolutional neural network training, frames are extracted from the RGB video using a 3D-CNN model in Caffe, and they are clipped from 1920 x 1080 pixels to 320 x 240 pixels with a time step of 16 milliseconds, we refer to the input to the CNN model as the size of c x t h w, where c is the number of channels, t is the time step, and h and w are the height and width of the RGB frame, respectively. The network takes a video clip as input and marks the data with the motion tag of the waving hand. The input RGB frame was then resized to 128 x 171 pixel resolution, the input size changed to 3 x 16 x 128 x 171, and the network was network verified using a random gradient descent optimizer with an initial learning rate of 0.0001. Then, when no training progress was observed, the learning rate was reduced by half. Finally, the space-time characteristic vector (4096 dimensions) of the RGB video is obtained after complete connection layers.
For feature fusion, we concatenate the temporal feature vectors extracted from RNN with the spatio-temporal feature vectors extracted from CNN into a feature array (4696 dimensions) and perform L2 normalization. Finally, we normalized the RNN/CNN function through training, validation and test segmentation. We use training and verification splitting to find the optimal value of the parameter C of the linear SVM model. The comprehensive model is used for verifying the accuracy of motion recognition.
The method can solve the problem of overfitting of the model to action classification in the model training process, and can effectively improve the human behavior recognition efficiency and accuracy.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention, and such modifications and adaptations are intended to be within the scope of the invention.
Claims (5)
1. A human behavior recognition method based on a convolutional neural network and a cyclic neural network is characterized by comprising the following steps:
step 1), a user continuously swings hands 5 times at a fixed position, a sensor is used for tracking human body behaviors, the time step is set to be 16 milliseconds during collection, and 3-dimensional coordinates V { (x) of the main 25 joints of the user in the time period are collected1,y1,z1),(x2,y2,z2),...,(x25,y25,z25) }; wherein, the x-axis is vertical to the vertical direction of the human body and the positive axis points to the front, the y-axis is parallel to the vertical direction of the human body and the positive axis points to the frontThe z axis is vertical to the vertical direction of the human body, and the positive axis points to the left side of the human body; simultaneously collecting the RGB video of the scene in the time period;
step 2), taking the 3-dimensional coordinates of 25 joints of the human body in the step 1) as a training set of a Recurrent Neural Network (RNN), training the Recurrent Neural Network (RNN) by using the training set of the Recurrent Neural Network (RNN), operating input coordinates by using a gated cyclic unit (GRU) in the Recurrent Neural Network (RNN), wherein the RNN has the functions of batch normalization and random loss, the Recurrent Neural Network (RNN) comprises two bidirectional gated cyclic layers, a hidden full-connection layer and an output layer of a softmax model, a linear rectification function (ReLU) is taken as an activation function, and a dropout mechanism is used for mapping motion characteristics to motion categories of a waving hand to obtain the trained RNN;
the method for obtaining the trained recurrent neural network RNN is as follows:
step 21), taking the 3-dimensional coordinates of the 25 joints of the human body in the step 1) as a training set of a Recurrent Neural Network (RNN), wherein the dimensionality of the training set of the Recurrent Neural Network (RNN) is 16 x (25 x 3);
step 22), training a recurrent neural network on a training set of the recurrent neural network RNN, first passing through two bidirectional gated recurrent layers, using gated recurrent units GRU as basic units of the recurrent layers and using a bidirectional recurrent neural network, which respectively train input data from T-1 to T-T and T-T to T-1, where T represents a time variable in the data set and T represents the last moment in the data set;
step 23), using GRU type recurrent neural network RNN to classify the scene characteristics, using the degree of the state information of the previous time of the refresh gate control being brought into the current state, refreshing the gate zt=σ(Wz·[ht-1,xt]) Wherein h ist-1Value of candidate memory cell at time t-1, xtData representing input at time t, WzWeight, z, representing the value of the corresponding refresh gate data memory celltRepresenting the value of an update gate at the moment t, and sigma representing a sigmoid activation function;
step 24), control of previous state using reset gateHow much information of a state is written into a candidate memory cell h at time ttReset gate rt=σ(Wr·[ht-1,xt]) Wherein h ist-1Value of candidate memory cell at time t-1, xtData representing the input at time t, WrRepresenting the weight, r, of the value of the memory cell corresponding to the refresh gate datatA value representing the reset gate at time t;
step 25), calculating the candidate memory cell value, candidate memory cellWherein h istIs the value of the candidate memory cell at time t, ht-1The value of the candidate memory cell for t-1, xtRepresents the data input at time t, W represents the weight of the GRU unit at the current time,a value representing a candidate memory cell, tanh representing a trigonometric tangent function;
step 26), calculating the state value of the memory unit at the current moment, and calculating the memory unit at the t momentWherein s indicates a dot-by-dot product, and the memory cell state update depends on the value h of the candidate memory cell at t-1t-1And the value of the candidate memory cellAnd the two factors are respectively adjusted through an update gate and a reset gate;
step 27), finally obtaining the output y of the recurrent neural networkt=σ(W·ht) Transfer to the fully-connected layer using the softmax activation function and interpret the output as a probability, ytRepresenting the probability of using a recurrent neural network prediction;
step 3), dividing the RGB video acquired in the step 1) into continuous RGB frames by taking 16 milliseconds as a time step, adjusting the resolution of the continuous RGB frames, and taking the continuous RGB frames as a training set of a Convolutional Neural Network (CNN); carrying out convolution operation on an input RGB frame stream by using a convolutional neural network CNN through a plurality of convolution cores, wherein the convolutional neural network CNN comprises a convolution layer, a pooling layer, a full connection layer and an output layer of a softmax model, and finely adjusting the model by using pre-training parameters on a Sports-1M data set to reduce overfitting and training time so as to obtain a trained convolutional neural network CNN;
the method for obtaining the trained convolutional neural network CNN is as follows:
step 31), dividing the RGB video acquired in step 1) into continuous RGB frames by taking 16 milliseconds as a time step, adjusting the resolution of the continuous RGB frames, taking the RGB frame stream as a training set of a convolutional neural network CNN, and inputting the RGB frame stream into the training set called as the size of c multiplied by t multiplied by h multiplied by w, wherein c is the number of channels, t is the time step, and h and w are the height and width of the RGB frames;
step 32), the convolutional neural network CNN receives the parallelized input of the video frames, and for a convolutional window of length h, the convolutional neural network CNN checks the input matrix x by a convolution kernel1:nPerforming convolution operation with convolution kernel output value of ci=f(w·xi:i+h-1+ b) ofIn order to be the weights of the convolution kernel,denotes that w takes value in the real number domain and d denotes xiThe dimensions of the material are measured in the same way,is the offset, f is the activation function, xi:i+h-1A frame vector matrix of a convolution window and a frame stream with time gradient n are subjected to convolution operation to obtain a feature vector c ═ c after convolution1,c2,…,cn-h+1];
Step 33) of extracting a maximum value from each of the eigenvectors, obtaining the eigenvectors for the window with m convolution kernelsWherein,the feature vectors extracted for the convolutional neural network,a feature vector representing the mth convolution kernel;
and step 34), outputting a classification result through a softmax function to obtain the following formula:
where y represents the probability of prediction using a convolutional neural network,for the regularized term constraint of the downsampled layer output,in order to multiply the corresponding elements,is a matrix of the weights of the full connection layer,optimizing the convolutional neural network CNN for offset by a random gradient descent optimizer;
step 4), taking the output result of the recurrent neural network RNN trained in the step 2) as a time characteristic vector of human joint 3-dimensional coordinate data, taking the output result of the convolutional neural network CNN trained in the step 3) as a space-time characteristic vector of an RGB video, connecting and combining the time characteristic vector of the human joint 3-dimensional coordinate data and the space-time characteristic vector of the RGB video, normalizing the time characteristic vector after connection, feeding the normalized time characteristic vector to a classifier of a linear Support Vector Machine (SVM), and finding a penalty coefficient C of the linear Support Vector Machine (SVM) by using a verification data set to obtain a comprehensive recognition model;
the method for obtaining the comprehensive identification model comprises the following steps:
step 41), extracting the recurrent neural network RNN trained in the step 2) from the 3-dimensional coordinate data of the human body, taking the output result as a time characteristic vector, extracting the convolutional neural network CNN trained in the step 3) from the RGB video, taking the output result as a space-time characteristic vector, connecting the time characteristic vector and the space-time characteristic vector into a characteristic array, and normalizing;
and 42) taking the normalized feature vector groups as input, taking the specific action or behavior mark corresponding to each normalized feature vector group as output, submitting the output to a linear Support Vector Machine (SVM) for training, wherein the optimization model is as the following formula:
yi(ωTxi+b0)≥1-ξi
s.t.ξi≥0
i=1,2,...,N
where ω represents the vector of inputs, C represents a penalty coefficient, ξiRepresents the classification loss for the ith sample point, yiRepresenting the action signature corresponding to each sample, b0Representing the intercept, N representing the total number of feature vectors input into the SVM;
step 43), finding the optimal value of the penalty coefficient C through a training and verification set to obtain a comprehensive identification model;
and 5) during recognition, acquiring 3-dimensional coordinates and RGB (red, green and blue) videos of 25 joints of the human body behavior to be recognized by adopting the method in the step 1, putting the acquired 3-dimensional coordinates of the 25 joints of the human body into the Recurrent Neural Network (RNN) trained in the step 2) to obtain a time characteristic vector, putting the acquired RGB videos into the Convolutional Neural Network (CNN) trained in the step 3) to obtain a space-time characteristic vector, connecting the time characteristic vector and the space-time characteristic vector into a characteristic array, normalizing the characteristic array, introducing the characteristic array into a comprehensive recognition model, and recognizing the human behavior by utilizing the comprehensive recognition model.
2. The human behavior recognition method based on the convolutional neural network and the cyclic neural network as claimed in claim 1, wherein: the sensor in step 1) is a Microsoft Kinect v2 sensor.
3. The human behavior recognition method based on the convolutional neural network and the cyclic neural network as claimed in claim 2, wherein: in step 33), when the convolutional neural network CNN is optimized by the random gradient descent optimizer, the initial learning rate is 0.0001, and when no training progress is observed, the learning rate is reduced by half.
5. The human behavior recognition method based on the convolutional neural network and the cyclic neural network as claimed in claim 4, wherein: step 31) the resolution is adjusted from 1920 x 1080 pixels to 320 x 240 pixels.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910580116.XA CN110321833B (en) | 2019-06-28 | 2019-06-28 | Human body behavior identification method based on convolutional neural network and cyclic neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910580116.XA CN110321833B (en) | 2019-06-28 | 2019-06-28 | Human body behavior identification method based on convolutional neural network and cyclic neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110321833A CN110321833A (en) | 2019-10-11 |
CN110321833B true CN110321833B (en) | 2022-05-20 |
Family
ID=68121381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910580116.XA Active CN110321833B (en) | 2019-06-28 | 2019-06-28 | Human body behavior identification method based on convolutional neural network and cyclic neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110321833B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110880172A (en) * | 2019-11-12 | 2020-03-13 | 中山大学 | Video face tampering detection method and system based on cyclic convolution neural network |
CN111079928B (en) * | 2019-12-14 | 2023-07-07 | 大连大学 | Method for predicting human body movement by using circulating neural network based on countermeasure learning |
CN111597881B (en) * | 2020-04-03 | 2022-04-05 | 浙江工业大学 | Human body complex behavior identification method based on data separation multi-scale feature combination |
CN111459283A (en) * | 2020-04-07 | 2020-07-28 | 电子科技大学 | Man-machine interaction implementation method integrating artificial intelligence and Web3D |
CN111681321B (en) * | 2020-06-05 | 2023-07-04 | 大连大学 | Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning |
CN111914638B (en) * | 2020-06-29 | 2022-08-12 | 南京邮电大学 | Character action recognition method based on improved long-term recursive deep convolution model |
CN111860269B (en) * | 2020-07-13 | 2024-04-16 | 南京航空航天大学 | Multi-feature fusion series RNN structure and pedestrian prediction method |
CN112232489A (en) * | 2020-10-26 | 2021-01-15 | 南京明德产业互联网研究院有限公司 | Method and device for gating cycle network and method and device for link prediction |
CN112488014B (en) * | 2020-12-04 | 2022-06-10 | 重庆邮电大学 | Video prediction method based on gated cyclic unit |
CN112906509A (en) * | 2021-01-28 | 2021-06-04 | 浙江省隧道工程集团有限公司 | Method and system for identifying operation state of water delivery tunnel excavator |
CN113568819B (en) * | 2021-01-31 | 2024-04-16 | 腾讯科技(深圳)有限公司 | Abnormal data detection method, device, computer readable medium and electronic equipment |
CN112906383B (en) * | 2021-02-05 | 2022-04-19 | 成都信息工程大学 | Integrated adaptive water army identification method based on incremental learning |
CN113111756B (en) * | 2021-04-02 | 2024-05-03 | 浙江工业大学 | Human body fall recognition method based on human body skeleton key points and long-short-term memory artificial neural network |
CN113378638B (en) * | 2021-05-11 | 2023-12-22 | 大连海事大学 | Method for identifying abnormal behavior of turbine operator based on human body joint point detection and D-GRU network |
CN114091596A (en) * | 2021-11-15 | 2022-02-25 | 长安大学 | Problem behavior recognition system and method for barrier population |
CN114399841A (en) * | 2022-01-25 | 2022-04-26 | 台州学院 | Human behavior recognition method under man-machine cooperation assembly scene |
CN116911955B (en) * | 2023-09-12 | 2024-01-05 | 深圳须弥云图空间科技有限公司 | Training method and device for target recommendation model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107943967A (en) * | 2017-11-28 | 2018-04-20 | 华南理工大学 | Algorithm of documents categorization based on multi-angle convolutional neural networks and Recognition with Recurrent Neural Network |
CN108280406A (en) * | 2017-12-30 | 2018-07-13 | 广州海昇计算机科技有限公司 | A kind of Activity recognition method, system and device based on segmentation double-stream digestion |
AU2018101512A4 (en) * | 2018-10-11 | 2018-11-15 | Dong, Xun Miss | A comprehensive stock trend predicting method based on neural networks |
CN109117701A (en) * | 2018-06-05 | 2019-01-01 | 东南大学 | Pedestrian's intension recognizing method based on picture scroll product |
-
2019
- 2019-06-28 CN CN201910580116.XA patent/CN110321833B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107943967A (en) * | 2017-11-28 | 2018-04-20 | 华南理工大学 | Algorithm of documents categorization based on multi-angle convolutional neural networks and Recognition with Recurrent Neural Network |
CN108280406A (en) * | 2017-12-30 | 2018-07-13 | 广州海昇计算机科技有限公司 | A kind of Activity recognition method, system and device based on segmentation double-stream digestion |
CN109117701A (en) * | 2018-06-05 | 2019-01-01 | 东南大学 | Pedestrian's intension recognizing method based on picture scroll product |
AU2018101512A4 (en) * | 2018-10-11 | 2018-11-15 | Dong, Xun Miss | A comprehensive stock trend predicting method based on neural networks |
Non-Patent Citations (4)
Title |
---|
Combination of CNN-GRU Model to Recognize Characters of a License Plate number without Segmentation;Bhargavi Suvarnam等;《2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)》;20190606;第317-322页 * |
基于姿态和骨架信息的行为识别方法研究与实现;马静;《中国优秀硕士学位论文全文数据库信息科技辑》;20190115(第12期);第9-12、18-28、35页 * |
基于注意力机制和BGRU网络的文本情感分析方法研究;尹良亮等;《无线互联科技》;20190531(第9期);第27-28页 * |
基于视频深度学习的时空双流人物动作识别模型;杨天明等;《计算机应用》;20180310(第3期);第895-899页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110321833A (en) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321833B (en) | Human body behavior identification method based on convolutional neural network and cyclic neural network | |
Zhang et al. | Dynamic hand gesture recognition based on short-term sampling neural networks | |
Reddy et al. | Spontaneous facial micro-expression recognition using 3D spatiotemporal convolutional neural networks | |
CN110096950B (en) | Multi-feature fusion behavior identification method based on key frame | |
CN106682598B (en) | Multi-pose face feature point detection method based on cascade regression | |
CN112784763B (en) | Expression recognition method and system based on local and overall feature adaptive fusion | |
Cui | Applying gradient descent in convolutional neural networks | |
CN108932500A (en) | A kind of dynamic gesture identification method and system based on deep neural network | |
US9317785B1 (en) | Method and system for determining ethnicity category of facial images based on multi-level primary and auxiliary classifiers | |
Littlewort et al. | Dynamics of facial expression extracted automatically from video | |
CN102930302B (en) | Based on the incrementally Human bodys' response method of online sequential extreme learning machine | |
CN110188637A (en) | A kind of Activity recognition technical method based on deep learning | |
CN112528928B (en) | Commodity identification method based on self-attention depth network | |
Caroppo et al. | Comparison between deep learning models and traditional machine learning approaches for facial expression recognition in ageing adults | |
CN108182397B (en) | Multi-pose multi-scale human face verification method | |
CN110575663B (en) | Physical education auxiliary training method based on artificial intelligence | |
CN107767416B (en) | Method for identifying pedestrian orientation in low-resolution image | |
CN110084211B (en) | Action recognition method | |
Zhang et al. | BoMW: Bag of manifold words for one-shot learning gesture recognition from kinect | |
CN109063626A (en) | Dynamic human face recognition methods and device | |
Lu et al. | Automatic lip reading using convolution neural network and bidirectional long short-term memory | |
CN113255602A (en) | Dynamic gesture recognition method based on multi-modal data | |
CN107220597B (en) | Key frame selection method based on local features and bag-of-words model human body action recognition process | |
Zheng et al. | Attention assessment based on multi‐view classroom behaviour recognition | |
Mohana et al. | Emotion recognition from facial expression using hybrid CNN–LSTM network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |