CN115690906A - Human body action recognition method based on self-attention mechanism and Bi-GRU - Google Patents

Human body action recognition method based on self-attention mechanism and Bi-GRU Download PDF

Info

Publication number
CN115690906A
CN115690906A CN202211304941.5A CN202211304941A CN115690906A CN 115690906 A CN115690906 A CN 115690906A CN 202211304941 A CN202211304941 A CN 202211304941A CN 115690906 A CN115690906 A CN 115690906A
Authority
CN
China
Prior art keywords
data
action
human body
gru
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211304941.5A
Other languages
Chinese (zh)
Inventor
路永乐
修蔚然
韩亮
杨杰
孙旗
罗毅
彭慧
刘宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211304941.5A priority Critical patent/CN115690906A/en
Publication of CN115690906A publication Critical patent/CN115690906A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention requests to protect a human body action recognition method based on a self-attention mechanism and Bi-GRU. Comprising the following steps, S1: recording inertial sensor data of human body action, and intercepting the data and an action category label corresponding to the data through a sliding window; s2: inputting data into an Encoder for coding, extracting time correlation characteristics among the input data through a multi-head self-attention layer, and splicing the time correlation characteristics with the original input data; s3: inputting the output data of the Encoder into a Bi-GRU for further time sequence feature extraction; s4: inputting the output characteristics of the Bi-GRU into the full-connection layer to obtain an output vector; s5: and training the model according to the sample data, and inputting the inertial sensor data with unknown classification labels into the trained model to obtain the human body motion category. The invention solves the problems that effective time sequence characteristics are difficult to extract and the identification precision is low in the existing human body action identification.

Description

Human body action recognition method based on self-attention mechanism and Bi-GRU
Technical Field
The invention belongs to the field of human body action recognition, and particularly relates to a human body action recognition method based on a self-attention mechanism and Bi-GRU.
Background
Human motion recognition refers to classifying motion into predefined human motion classes based on data obtained by sensors. Has very important function in the fields of health monitoring systems, remote medical care, motion detection and the like. The human body action recognition based on the inertial sensor has the advantages of no external interference, no scene limitation, strong anti-interference capability and the like, and is more suitable for daily sports and military application.
The proposal of deep learning makes the machine learning make a breakthrough progress, and brings a new development direction for human action recognition. Deep learning can automatically learn deep features from original data, and the problem that the feature extraction of the traditional machine learning depends on the prior knowledge of researchers, so that the algorithm generalization capability is poor is solved.
The human body motion recognition technology based on the convolutional neural network and the cyclic neural network is a technology which is used more in the current human body motion recognition technology based on deep learning. The convolutional neural network can extract spatial features, and the cyclic neural network can extract temporal features. The following problems still remain: 1. for a task with strong time correlation, such as human motion recognition, the spatial features extracted by the convolutional network are not effective enough, so that the accuracy rate of complex motion recognition is low. 2. The convolution network has too much calculation complexity and parameter quantity. 3. The cyclic neural network is difficult to extract time characteristics among data with long time intervals, so that the human body action recognition accuracy is not high enough. Therefore, a new feature extraction and identification method is needed to be provided to improve the human body motion identification precision and reduce the algorithm complexity.
The invention is essentially different from the patent CN 114639169A. The data source of the present invention is inertial sensing, CN114639169A uses WiFi, and the present invention does not use complex convolution algorithm.
According to the invention, the global time correlation characteristic is extracted through the self-attention mechanism, and in order to ensure that the Bi-GRU can extract the local time sequence characteristic of the original data, the output of the self-attention mechanism is spliced with the original input data. And then, the Bi-GRU is used for extracting the local time sequence characteristics, so that the complete extraction of the time domain characteristics is realized. Meanwhile, the self-attention mechanism combined with the Bi-GRU has a simple structure and low parameter quantity, and solves the problems of large parameter quantity and complex structure of a convolution network.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A human body action recognition method based on a self-attention mechanism and Bi-GRU is provided. The technical scheme of the invention is as follows:
a human body action recognition method based on a self-attention mechanism and Bi-GRU comprises the following steps:
s1: recording inertial sensor data of human body action, and intercepting the data and an action category label corresponding to the data through a sliding window;
s2: constructing an Encoder-Decoder model; the Encoder-Decoder model comprises an Encoder and a Decoder, data are input into an Encoder Encoder to be encoded, time correlation characteristics among the input data are extracted through a multi-head self-attention layer in the Encoder Encoder, and then the time correlation characteristics are spliced with original input data;
s3: decoding by Decoder: the Decoder comprises a bidirectional gating circulating unit Bi-GRU, a full connection layer and a Softmax layer, and the output data of the Encoder is input into the bidirectional gating circulating unit Bi-GRU for further time sequence feature extraction; the full connection layer integrates the features into vectors, and the Softmax layer converts the output of the full connection layer into probability distribution;
s4: inputting the output characteristics of the Bi-GRU into the full connection layer to obtain an output vector, wherein the dimensionality of the output vector is the total number of the classification labels, and the N-dimension value of the vector is the possibility that the action corresponding to the input inertial sensor data is the Nth action;
s5: and training the model according to the sample data, and inputting the inertial sensor data with unknown classification labels into the trained model to obtain the human body motion category.
Further, the S1 specifically includes:
the method comprises the steps of recording time sequence data of inertial sensors related to human body actions by using the inertial sensors positioned on a trunk, setting sliding windows with certain lengths, and intercepting the data with corresponding lengths and the human body action categories corresponding to the sliding windows.
Further, the multi-headed self-attention layer in step S2 includes three fully-connected layers: query, key and value, the input data respectively obtain Q, K and V matrixes through the three full connection layers, and then an Attention-Score matrix is obtained through further calculation, in order to ensure that the Bi-GRU can learn the time domain characteristics of the original data, the Attention-Score matrix and the original data are spliced in the last dimension, and the output of the Encoder is obtained.
Further, the equation for the orientation-Score matrix is as follows:
Figure BDA0003905442630000031
where Head _ size represents the dimension of each Head of the Multi-Head, and Softmax represents the Softmax function, calculated for each row of the matrix, the Softmax formula being as follows:
Figure BDA0003905442630000032
wherein y is a Values, y, representing the a-th column of a row of the Attention-Score matrix b The value of the b-th column of a certain row of the matrix is shown, and w represents the number of columns of the matrix.
Further, the full connection layer is connected with a Softmax layer in a back mode, the Softmax layer classifies sensor time sequence data x which is input into the Encoder-Decoder model at present and is calculated into a probability Q (i | x) of i according to a vector output by the full connection layer by utilizing a Softmax formula; the Softmax formula is as follows:
Figure BDA0003905442630000033
wherein z is i Representing the output of the ith neuron of the last fully-connected layer corresponding to the input sequence x, where z c Representing the output of the c < th > neuron of the full connection layer, and the N < th > dimensional value is the probability that the action corresponding to the inertial sensor data in the input sliding window is the N < th > action, wherein Softmax (z is i )=Q(i|x);
And selecting the action i corresponding to the maximum Q (i | x) as the human body action recognition result.
If Softmax (z) i ) The maximum value of the result of the Softmax function, the action recognition result corresponding to the input data x is the i-th type tag action.
Further, the loss function adopts a balanced cross entropy function:
Figure BDA0003905442630000041
where the first half of the equation to the right is the equilibrium cross entropy loss function, α i Representing the loss weight of the ith action, N representing the number of action types, P representing the probability distribution of the real label converted into one-hot code, and Q representing that the vector output by the model is regarded as the action probability distribution; p (x) ji ) Represents the probability of the ith action in the real label corresponding to the jth input sequence x, Q (x) ji ) Representing the i-th action in the model output for the j-th input sequence xProbability; the problem of unbalanced sample size of the data set can be solved by distributing different loss weights; the rear half part is an L2 regular term; wherein lambda is a regular term coefficient, theta represents a set of learnable parameters in the algorithm, and m is the number of learnable parameters in the algorithm.
The invention has the following advantages and beneficial effects:
the Encoder-Decoder model is a neural network model with simple network structure and light weight. Different from the common human body action recognition method based on deep cyclic neural network learning, the method firstly extracts the global time correlation characteristics among data regardless of time intervals through the self-attention mechanism coding in the Encoder, and solves the defect that the cyclic neural network is difficult to extract the time correlation characteristics among data with longer time intervals. And secondly, splicing the Attention-Score matrix and the original data in the last dimension to obtain the output of the Encoder in order to ensure that the Bi-GRU can learn the time domain characteristics of the original data. And the Encoder outputs the time sequence characteristics of the data extracted by a gating circulation unit of the Decoder, so that the human body action identification precision is improved. The invention can efficiently process the inertial sensor data and can automatically learn complete and effective time sequence characteristics from the sensor data. Meanwhile, the invention only uses the recurrent neural network and does not use the convolutional neural network, and has simpler structure and lower parameter number, thereby having simpler calculation and less consumption of computer resources. The Attention-Score matrix and the original data are spliced in the last dimension, so that the integrity of time domain characteristics of the original data is guaranteed, and the identification precision is improved. The invention can provide a new visual field and new thinking for human body action recognition, and is beneficial to the development of human body action recognition.
Drawings
Fig. 1 is a schematic structural diagram of an Encoder-Decoder model according to a preferred embodiment of the present invention.
Fig. 2 is a flowchart of a method implementation according to an embodiment of the present invention.
FIG. 3 is a flow chart of an Encoder-Decoder model generation process.
FIG. 4 is a diagram illustrating an Attention-Score matrix calculation method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
the invention firstly provides a human body action recognition method based on a self-attention mechanism and Bi-GRU (Bi-GRU), which comprises the steps of collecting and processing data, inputting the data into a model and obtaining a human body action recognition result as shown in figure 2. The specific steps of model training are shown in fig. 3, and include the following steps one, two and three:
the method comprises the following steps: recording time sequence data of inertial sensors about human body actions by using the inertial sensors positioned on the trunk, setting sliding windows with certain lengths, and intercepting the data with corresponding lengths and the human body action category corresponding to each sliding window;
step two: constructing an Encoder-Decoder model;
as shown in FIG. 1, the Encoder-Decoder model includes an Encoder and a Decoder. Wherein Encoder: comprises a Multi-Head-Self-orientation layer; a Decoder: the system comprises a bidirectional gating cycle unit network, a full connection layer and a Softmax layer;
in the present invention, the Multi-Head-Self-orientation layer comprises three fully-connected layers: query, key, value. The input data respectively obtain Q, K and V matrixes through the three full-connection layers. And then obtaining an Attention-Score matrix through further calculation, and splicing the Attention-Score matrix and the original data in the last dimension to obtain the output of the Encoder in order to ensure that the Bi-GRU can learn the time domain characteristics of the original data. The Decoder comprises a bidirectional gating circulation unit and is used for further extracting time sequence characteristics; a full connection layer for integrating the output of the bidirectional gating circulation unit and outputting a vector, wherein the vector dimension is the total number of the classification labels; and a Softmax layer, wherein the output of the full connection layer obtains a vector through a Softmax function, and the dimension of the vector is the total number of the classification tags. And the vector N dimension value is the probability that the action corresponding to the inertial sensor data in the input sliding window is the N action.
The first module of the Encoder-Decoder model is an Encoder, and comprises a Multi-Head-Self-attachment layer, wherein the Multi-Head-Self-attachment layer comprises three fully-connected layers: query, key, value. The input data respectively obtain Q, K and V matrixes through the three full connection layers, and then an Attention-Score matrix is obtained through further calculation, wherein the calculation formula of the Attention-Score matrix is as follows:
Figure BDA0003905442630000061
where Head _ size represents the dimension size of each Head of the Multi-Head and Softmax represents the Softmax function, calculated for each row of the matrix. The Softmax formula is as follows:
Figure BDA0003905442630000062
wherein y is a Value, y, representing the a column of a row of the Attention-Score matrix b The value of the b-th column of a certain row of the matrix is shown, and w represents the number of columns of the matrix.
FIG. 4 shows the Attention-Score matrix computation process with an input time series length of 2, data dimensions of 1 × 3 for each time step, a head number of 4, and a head dimension of 3. The Attention-score matrix (S in FIG. 3) calculated for each head 0 Etc.) are concatenated in columns to obtain the output of the Multi-Head-Self-orientation layer (S in fig. 3).
And splicing the Attention-Score matrix and the original data in the last dimension to obtain the output of the Encoder in order to ensure that the Bi-GRU can learn the time domain characteristics of the original data. The second module is a Decoder and comprises a bidirectional gating circulation unit: extracting high-dimensional time sequence characteristics of the data; full connection layer: integrating and outputting the high-dimensional time sequence characteristics obtained by the bidirectional gating circulation unit into a vector, and recording the vector as (z) 1 ,z 2 ,...,z N ) The vector dimension is the number of action categories; softmax function: calculating the output of the full connection layer by a Softmax formula to obtain a vector, wherein the vector dimension isThe degree is the total number of classification labels. And the vector N-th dimension value is the probability that the action corresponding to the inertial sensor data in the input sliding window is the N-th action. The Softmax formula is as follows:
Figure BDA0003905442630000063
wherein z is i Representing the output of the ith neuron of the last fully-connected layer corresponding to the input sequence x, where z c Representing the output of the c-th neuron of the full connection layer, and the N-dimensional value is the probability that the action corresponding to the inertial sensor data in the input sliding window is the N-th action, wherein Softmax (z) i )=Q(i|x);
If Softmax (z) i ) And the maximum value of the result of the Softmax function is obtained, the action identification result corresponding to the input data x is the ith type tag action, and N is the action type number.
Step three: and (4) training the Encoder-Decoder model according to the sensor time sequence data sample intercepted in the first step and the human body action class label corresponding to the sensor time sequence data sample, namely stopping training when the loss function value is lower than a set threshold value.
In order to enable the neural network to learn more distinguishing features, the degree of closeness of the actual output of the Encoder-Decoder model to the expected output is judged through a loss function. The loss function of the invention adopts the following balanced cross entropy loss function:
Figure BDA0003905442630000071
where the first half of the equation to the right is the equilibrium cross entropy loss function, α i And the loss weight of the ith action is represented, N represents the sample size of one training, N represents the action number, P represents the probability distribution of the true label converted into one-hot coding, and Q represents that the vector output by the model is regarded as the action probability distribution. P (x) ji ) Represents the probability of the ith action in the real label corresponding to the jth input sequence x, Q (x) ji ) Outline of i-th action in model output corresponding to j-th input sequence xAnd (4) the ratio. The problem of data set sample size imbalance can be solved by assigning different loss weights. The latter half is an L2 regular term, which helps to mitigate overfitting of the algorithm. Where λ is the regular term coefficient, θ represents the set of learnable parameters in the algorithm (weights and biases), and m is the number of learnable parameters in the algorithm.
Step four: and identifying and classifying the human body action by using the trained Encoder-Decoder model.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the present invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (6)

1. A human body action recognition method based on a self-attention mechanism and Bi-GRU is characterized by comprising the following steps:
s1: recording inertial sensor data of human body action, and intercepting the data and an action category label corresponding to the data through a sliding window;
s2: constructing an Encoder-Decoder model; the Encoder-Decoder model comprises an Encoder and a Decoder, data are input into an Encoder Encoder to be encoded, time correlation characteristics among the input data are extracted through a multi-head self-attention layer in the Encoder Encoder, and then the time correlation characteristics are spliced with original input data;
s3: decoding by a Decoder: the Decoder comprises a bidirectional gating circulating unit Bi-GRU, a full connection layer and a Softmax layer, and the output data of the Encoder is input into the bidirectional gating circulating unit Bi-GRU for further time sequence feature extraction; the full connection layer integrates the features into vectors, and the Softmax layer converts the output of the full connection layer into probability distribution;
s4: inputting the output characteristics of the Bi-GRU into the full connection layer to obtain an output vector, wherein the dimensionality of the output vector is the total number of the classification labels, and the N-dimension value of the vector is the possibility that the action corresponding to the input inertial sensor data is the Nth action;
s5: training the model according to the sample data, and inputting the inertial sensor data with unknown classification labels into the trained model to obtain the human body action category.
2. The method for recognizing human body actions based on the self-attention mechanism and the Bi-GRU as claimed in claim 1, wherein the S1 specifically comprises:
the method comprises the steps of recording time sequence data of inertial sensors related to human body actions by using the inertial sensors positioned on a trunk, setting sliding windows with certain lengths, and intercepting the data with corresponding lengths and the human body action categories corresponding to the sliding windows.
3. The method for recognizing human body actions based on the self-attention mechanism and the Bi-GRU as claimed in claim 1, wherein the multi-head self-attention layer in the step S2 comprises three fully-connected layers: query, key and value values, respectively obtaining Q, K and V matrixes from input data through the three full connection layers, then obtaining an Attention Score matrix through further calculation, and splicing the Attention Score matrix and the original data in the last dimension to obtain the output of the Encoder in order to ensure that the Bi-GRU can learn the time domain characteristics of the original data.
4. The method for recognizing human body actions based on the self-Attention mechanism and the Bi-GRU as claimed in claim 3, wherein the Attention-Score matrix is calculated by the following formula:
Figure FDA0003905442620000021
where Head _ size represents the dimension size of each Head of the Multi-Head, and Softmax represents the Softmax function, calculated for each row of the matrix, the Softmax formula being as follows:
Figure FDA0003905442620000022
wherein y is a Values, y, representing the a-th column of a row of the Attention-Score matrix b The value of the b-th column of a certain row of the matrix is shown, and w represents the number of columns of the matrix.
5. The human body motion recognition method based on the self-attention mechanism and the Bi-GRU is characterized in that the full connection layer is followed by a Softmax layer, and the Softmax layer classifies sensor time sequence data x which is calculated to be input into an Encoder-Decoder model currently into a probability Q (i | x) labeled with i according to vectors output by the full connection layer by utilizing a Softmax formula; the Softmax formula is as follows:
Figure FDA0003905442620000023
wherein z is i Representing the output of the ith neuron of the last fully-connected layer corresponding to the input sequence x, where z c Representing the output of the c-th neuron of the full connection layer, N is the number of action types, and the N-dimensional value is the probability that the action corresponding to the inertial sensor data in the input sliding window is the N-th action, wherein Softmax (z) i )=Q(i|x);
And selecting the action i corresponding to the maximum Q (i | x) as the human body action recognition result.
If Softmax (z) i ) The maximum value of the result of the Softmax function, the action recognition result corresponding to the input data x is the i-th class tag action.
6. The human body motion recognition method based on the self-attention mechanism and the Bi-GRU as claimed in claim 5, wherein the loss function is a balanced cross entropy function:
Figure FDA0003905442620000031
where the first half of the equation to the right is the balanced cross entropy loss function, α i The loss weight of the ith action is shown, N is the actionThe number of the types, P represents the probability distribution of the real label after being converted into one-hot coding, and Q represents that the vector output by the model is regarded as action probability distribution; p (x) ji ) Represents the probability of the ith action in the real label corresponding to the jth input sequence x, Q (x) ji ) Representing the probability of the ith action in the model output corresponding to the jth input sequence x; the problem of unbalanced sample size of the data set can be solved by distributing different loss weights; the rear half part is an L2 regular term; wherein λ is a regular term coefficient, θ represents a set of learnable parameters in the algorithm, and m is the number of learnable parameters in the algorithm.
CN202211304941.5A 2022-10-24 2022-10-24 Human body action recognition method based on self-attention mechanism and Bi-GRU Pending CN115690906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211304941.5A CN115690906A (en) 2022-10-24 2022-10-24 Human body action recognition method based on self-attention mechanism and Bi-GRU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211304941.5A CN115690906A (en) 2022-10-24 2022-10-24 Human body action recognition method based on self-attention mechanism and Bi-GRU

Publications (1)

Publication Number Publication Date
CN115690906A true CN115690906A (en) 2023-02-03

Family

ID=85099719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211304941.5A Pending CN115690906A (en) 2022-10-24 2022-10-24 Human body action recognition method based on self-attention mechanism and Bi-GRU

Country Status (1)

Country Link
CN (1) CN115690906A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665110A (en) * 2023-07-25 2023-08-29 上海蜜度信息技术有限公司 Video action recognition method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665110A (en) * 2023-07-25 2023-08-29 上海蜜度信息技术有限公司 Video action recognition method and device
CN116665110B (en) * 2023-07-25 2023-11-10 上海蜜度信息技术有限公司 Video action recognition method and device

Similar Documents

Publication Publication Date Title
Han et al. Memory-augmented dense predictive coding for video representation learning
Su et al. Ensemble learning for hyperspectral image classification using tangent collaborative representation
Wang et al. Wearable Sensor‐Based Human Activity Recognition Using Hybrid Deep Learning Techniques
Tang et al. Supervised deep hashing for scalable face image retrieval
CN111785301B (en) Residual error network-based 3DACRNN speech emotion recognition method and storage medium
CN111652066B (en) Medical behavior identification method based on multi-self-attention mechanism deep learning
Zheng et al. MOOC dropout prediction using FWTS-CNN model based on fused feature weighting and time series
US11531876B2 (en) Deep learning for characterizing unseen categories
CN112507898A (en) Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN
Tang et al. Channel attention-based temporal convolutional network for satellite image time series classification
Karimi-Bidhendi et al. Scalable classification of univariate and multivariate time series
Wlodarczak et al. Multimedia data mining using deep learning
CN112329680A (en) Semi-supervised remote sensing image target detection and segmentation method based on class activation graph
CN113158815B (en) Unsupervised pedestrian re-identification method, system and computer readable medium
Bilal et al. A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes
CN115690906A (en) Human body action recognition method based on self-attention mechanism and Bi-GRU
Wang et al. Dynamic texture video classification using extreme learning machine
CN112257716A (en) Scene character recognition method based on scale self-adaption and direction attention network
Ranganathan et al. Deep active learning for image regression
Agbinya Applied data analytics-principles and applications
Liang et al. A lightweight method for face expression recognition based on improved MobileNetV3
CN113762331A (en) Relational self-distillation method, apparatus and system, and storage medium
Gong et al. MPCSAN: multi-head parallel channel-spatial attention network for facial expression recognition in the wild
Niu et al. ALSTM: adaptive LSTM for durative sequential data
Yuan et al. FFGS: Feature fusion with gating structure for image caption generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination