CN112766177A - Behavior identification method based on feature mapping and multi-layer time interaction attention - Google Patents
Behavior identification method based on feature mapping and multi-layer time interaction attention Download PDFInfo
- Publication number
- CN112766177A CN112766177A CN202110086627.3A CN202110086627A CN112766177A CN 112766177 A CN112766177 A CN 112766177A CN 202110086627 A CN202110086627 A CN 202110086627A CN 112766177 A CN112766177 A CN 112766177A
- Authority
- CN
- China
- Prior art keywords
- video
- matrix
- feature
- generating
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 31
- 238000013507 mapping Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 title claims abstract description 25
- 239000011159 matrix material Substances 0.000 claims abstract description 57
- 239000013598 vector Substances 0.000 claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 10
- 230000002452 interceptive effect Effects 0.000 claims abstract description 8
- 238000005070 sampling Methods 0.000 claims description 27
- 230000009625 temporal interaction Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000003672 processing method Methods 0.000 claims description 3
- 210000004205 output neuron Anatomy 0.000 claims description 2
- 230000006399 behavior Effects 0.000 abstract description 30
- 230000006870 function Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 239000004579 marble Substances 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a behavior recognition method based on feature mapping and multi-layer time interaction attention, which solves the problem that the prior art is insufficient in time dynamic information modeling and ignores the interdependency relation between different frames, so that the recognition capability of behaviors is insufficient. The method comprises the following implementation steps: (1) generating a training set; (2) acquiring a depth feature map; (3) constructing a feature mapping matrix; (4) generating a time interaction attention matrix; (5) generating a time interaction attention weighted feature matrix; (6) generating a multi-layer time interactive attention weighted feature matrix; (7) acquiring a feature vector of a video; (8) and performing behavior recognition on the video. Because the invention constructs the feature mapping matrix and provides multilayer time interaction attention, the invention can improve the accuracy of behavior identification in the video.
Description
Technical Field
The invention belongs to the technical field of video processing, and further relates to a behavior identification method based on feature mapping and multi-layer time interaction attention in the technical field of computer vision. The method can be used for human behavior recognition in videos.
Background
The human behavior recognition task based on the video plays an important role in the field of computer vision, has a wide application prospect, and is applied to the fields of unmanned driving, man-machine interaction, video monitoring and the like at present. The aim of human behavior recognition is to judge the category of human behavior in a video, and the essence is a classification problem. In recent years, with the development of deep learning, behavior recognition methods based on deep learning have been widely studied.
The south China university of marble discloses a human behavior recognition method in the patent document "human behavior recognition method based on time attention mechanism and LSTM" (patent application No.: CN201910271178.2, application publication No. CN110135249A) applied by south China university of marble. The method mainly comprises the following implementation steps: 1. acquiring video data of an RGB monocular vision sensor; 2. extracting 2D skeleton joint point data; 3. extracting joint point joint structure characteristics; 4. constructing an LSTM long-term and short-term memory network; 5. adding a time attention mechanism in the LSTM network; 6. and (5) carrying out human behavior recognition by utilizing a softmax classifier. The time attention mechanism proposed by the method separately explores the importance degree of each frame in the video and gives large weight to the characteristics of the important frames, but the method still has the defects that the interdependence relation between different frames in the video is ignored, so that partial global information is lost, and the error of behavior identification is caused.
A behavior recognition method is disclosed in the published article "Temporal segment networks for action recognition in videos" (IEEE transactions on pattern analysis and machine interaction, 2018,2740-2755) by Limin Wang et al. The method mainly comprises the following implementation steps: 1. uniformly dividing the video into 7 video segments; 2. randomly sampling a frame of RGB image in each video segment to obtain 7 frames of RGB images; 3. inputting each frame of RGB image into a convolutional neural network to obtain a classification score of each frame of RGB image; 4. and combining the segment consensus function and the prediction function with the classification score of the 7 frames of RGB images to obtain the behavior recognition result of the video. The method has the defects that for a longer video, only 7 frames of RGB images are sampled, so that information in the video is lost, more complete time dynamic information cannot be modeled, and further, the behavior recognition accuracy rate is lower.
Disclosure of Invention
The invention aims to provide a behavior recognition method based on feature mapping and multi-layer time interaction attention aiming at the defects of the prior art, and the behavior recognition method is used for solving the problem that the prior art is poor in behavior recognition capability due to the fact that time dynamic information is not sufficiently modeled and the interdependence relation between different frames is ignored.
In order to achieve the purpose, the idea of the invention is to construct a feature mapping matrix and embed the feature mapping matrix into the time and space information in the video; the time interaction attention is obtained by exploring the mutual influence among different frames in the video; and (3) mining complex time dynamic information in the video by using multi-layer time interaction attention.
In order to achieve the purpose, the method comprises the following specific steps:
(1) generating a training set:
(1a) selecting RGB videos containing N behavior categories in a video data set to form a sample set, wherein each category contains at least 100 videos, each video has a determined behavior category, and N is greater than 50;
(1b) preprocessing each video in the sample set to obtain RGB images corresponding to the video, and forming the RGB images of all preprocessed videos into a training set;
(2) generating a depth feature map:
sequentially inputting each frame of RGB image in each video in the training set into an inclusion-v 2 network, and sequentially outputting a depth feature map X with the size of 7X 1024 in each frame of image in each videokWherein k represents the sequence number of the sampling image in the video, and k is 1, 2.
(3) Constructing a feature mapping matrix:
(3a) encoding each depth feature map as a low-dimensional vector f with 1024 dimensions using a spatial vectorization functionk,k=1,2,...,60;
(3b) Arranging the low-dimensional vectors corresponding to 60 frame sampling images of each video in a row according to the time sequence of the frames to obtain a two-dimensional feature mapping matrixWherein T represents a transpose operation;
(4) generating a temporal interaction attention matrix:
(4a) using the formula B ═ MTM, generating a correlation matrix B of M, wherein the value of the ith row and the jth column in the matrix represents the correlation degree between two low-dimensional vectors corresponding to the ith and jth sampling images in the video;
(4b) normalizing the correlation matrix B to obtain a time interaction attention matrix A with the size of 60 multiplied by 60;
(5) generating a time interaction attention weighted feature matrix:
using formulasGenerating a temporal interaction attention weighted feature matrixWherein γ represents a scaling parameter initialized to 0 to balance both MA and M;
(6) generating a multi-layer time interaction attention weighted feature matrix:
(6a) using formulasGeneratingCorrelation matrix ofTo pairNormalization processing is carried out to obtain a multilayer time interactive attention moment array with the size of 60 multiplied by 60
(6b) Using formulasGenerating a multi-tier temporal interaction attention weighted feature matrixWherein,indicating an initialization to 0 for balancingAndthe proportion parameters of the two terms;
(7) acquiring a feature vector of a video:
inputting the multi-layer time interactive attention weighted feature matrix of each video into a full-connection layer, and outputting the feature vector of the video;
(8) performing behavior recognition on the video:
(8a) inputting the feature vector of each video into a softmax classifier, and iteratively updating the parameters gamma and gamma by using a back propagation gradient descent methodParameters of a full connection layer and parameters of a softmax classifier are obtained until a cross entropy loss function is converged, and each trained parameter is obtained;
(8b) sampling 60 frames of RGB images of each video to be identified at equal intervals, scaling the size of each frame of image to 256 × 340, then performing center cutting to obtain 60 frames of RGB images with the size of 224 × 224, inputting each frame of RGB images into an inclusion-v 2 network, and outputting a depth feature map of the video to be identified;
(8c) and (4) processing the depth feature map of each video to be recognized by adopting the same processing method as the steps (3) to (7) to obtain feature vectors of the video, inputting each feature vector into a trained softmax classifier, and outputting a behavior recognition result of each video.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention constructs the feature mapping matrix which comprises the time information of 60 sampling images in the video and the spatial information of each sampling image, the invention overcomes the problems that the information in the video is lost and more complete time dynamic information can not be modeled when only 7 frames of RGB images are sampled in the prior art, so that the invention can more fully retain the time sequence information and obtain the features with more expressive ability.
Secondly, the time interaction attention matrix is provided by the invention, and is obtained by calculating the correlation degree between the low-dimensional features of different sampling images in the feature mapping matrix, so that the problem that partial global information is lost due to the fact that the method in the prior art ignores the interdependence relation between different frames in the video is solved, the technology provided by the invention can fully explore the global information, and the accuracy of behavior identification is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The specific steps of the present invention will be further described with reference to fig. 1.
Step 1, generating a training set.
Selecting RGB videos containing N behavior categories in a video data set to form a sample set, wherein each category contains at least 100 videos, each video has a determined behavior category, and N is greater than 50. Preprocessing each video in the sample set to obtain an RGB image corresponding to the video, and forming the RGB images of all preprocessed videos into a training set. The preprocessing is to sample 60 frames of RGB images at equal intervals for each video in the sample set, scale the size of each frame of RGB image to 256 × 340, and then crop the RGB images to obtain 60 frames of RGB images with the size of 224 × 224 for the video.
And 2, acquiring a depth characteristic map.
Sequentially inputting each frame of RGB image in each video in the training set into an inclusion-v 2 network, and sequentially outputting a depth feature map X with the size of 7X 1024 in each frame of image in each videokWhere k denotes the sequence number of the sample image in the video, and k is 1, 2.
And 3, constructing a feature mapping matrix.
Due to the high dimensionality of the feature map, the joint analysis of the information of the densely sampled images in the video is challenging, and the mapping of the feature map into a low-dimensional vector can reduce the amount of calculation and is beneficial to the joint analysis of the densely sampled images. Taking the kth sampling image in the r video as an example, how to encode the depth feature map of the sampling image of the video into a low-dimensional vector with 1024 dimensions is described:
wherein f isr,kRepresents the corresponding low-dimensional vector of the kth sampling image in the r video, V (-) represents the space vectorization function, Xr,kRepresenting a depth feature map, X, corresponding to the kth sample image in the r videor,k,ijRepresents Xr,kRepresents a summation operation, and H and W represent X, respectivelyr,kThe total number of rows and the total number of columns.
Arranging the low-dimensional vectors corresponding to 60 frame sampling images of each video in a row according to the time sequence of the frames to obtain a two-dimensional feature mapping matrixWherein f iskRepresents the kth sampleThe low-dimensional vector of the image, k 1, 2.., 60, T, represents the transpose operation.
The number of columns of the matrix M is equal to the total number of sampled images corresponding to each video, and the number of rows is equal to the dimension of the low-dimensional vector.
The feature mapping matrix contains the time information of the video and the spatial information of each sampling image, so that the method can perform joint analysis on the densely sampled images in the video.
And 4, generating a time interaction attention matrix.
Generating a correlation matrix B of M ═ MTAnd M, expressing the correlation degree between two low-dimensional vectors corresponding to the ith and jth sampling images in the video by the value of the ith row and the jth column in the B, and normalizing the B to obtain a time interaction attention matrix A with the size of 60 multiplied by 60.
Taking the ith frame sample image and the jth frame sample image as an example, the following explains how to calculate the ith row and jth column element A of the time interaction attention matrix A through the correlation degree between the two frame imagesijThe specific calculation formula is as follows:
wherein A isijThe correlation degree of the ith frame sample image and the jth frame sample image is measured. MiAnd MjAnd the physical meanings of the column vectors respectively consisting of the ith column elements and the jth column elements in the feature mapping matrix M are the transposes of the low-dimensional vectors of the ith sampling image and the jth sampling image in the video. If the lower-dimensional vectors of the two frames are more similar, AijThe larger the correlation between the two frames.
All elements in the time interaction attention matrix A are calculated in the same way, and the ith row in A represents the correlation degree of the ith frame sampling image of the video and all sampling images in the video. Therefore, the time interaction attention moment matrix models the correlation between video frames, and is helpful for more fully exploring the global information in the video.
And 5, generating a time interaction attention weighted feature matrix.
Using formulasGenerating a temporal interaction attention weighted feature matrixWhere γ represents a scaling parameter initialized to 0 to balance both MA and M.
And 6, generating a multilayer time interaction attention weighted feature matrix.
Using formulasGeneratingCorrelation matrix ofTo pairNormalization processing is carried out to obtain a multilayer time interactive attention moment array with the size of 60 multiplied by 60Reuse formulaGenerating a multi-tier temporal interaction attention weighted feature matrixWherein,indicating an initialization to 0 for balancingAndthe ratio parameter of the two terms.
The multi-layer time interaction attention to time interaction attention weighted feature matrix applies time interaction attention again, and richer time dynamics are explored.
And 7, acquiring a feature vector of the video.
And inputting the multi-layer time interactive attention weighted feature matrix of each video into a full-connection layer with 1024 output neurons to obtain the feature vector of the video.
And 8, performing behavior recognition on the video.
Inputting the feature vector of each video into a softmax classifier, and respectively updating gamma, gamma and gamma by using a back propagation gradient descent method,Parameters of a full connection layer and parameters of a softmax classifier until the convergence of a cross entropy loss function.
Sampling 60 frames of RGB images of each video to be identified at equal intervals, scaling the size of each frame of image to 256 × 340, then performing center cropping to obtain 60 frames of RGB images with the size of 224 × 224, inputting each frame of RGB images into an inclusion-v 2 network, and outputting a depth feature map of the video to be identified.
And (3) processing the depth feature map of each video to be recognized by adopting the same processing method as the steps from 3 to 7 to obtain feature vectors of the video to be recognized, inputting each feature vector into a trained softmax classifier, and outputting a behavior recognition result of each video.
Claims (4)
1. A behavior identification method based on feature mapping and multi-layer time interactive attention is characterized in that a feature mapping matrix containing time information of a video and space information of each sampling image is constructed; the time interaction attention is provided, a time interaction attention matrix is obtained by calculating the correlation degree between the low-dimensional vectors of different sampling images in the feature mapping matrix, and the method specifically comprises the following steps:
(1) generating a training set:
(1a) selecting RGB videos containing N behavior categories in a video data set to form a sample set, wherein each category contains at least 100 videos, each video has a determined behavior category, and N is greater than 50;
(1b) preprocessing each video in the sample set to obtain RGB images corresponding to the video, and forming the RGB images of all preprocessed videos into a training set;
(2) generating a depth feature map:
sequentially inputting each frame of RGB image in each video in the training set into an inclusion-v 2 network, and sequentially outputting a depth feature map X with the size of 7X 1024 in each frame of image in each videokWherein k represents the sequence number of the sampling image in the video, and k is 1, 2.
(3) Constructing a feature mapping matrix:
(3a) encoding each depth feature map as a low-dimensional vector f with 1024 dimensions using a spatial vectorization functionk,k=1,2,...,60;
(3b) Arranging the low-dimensional vectors corresponding to 60 frame sampling images of each video in a row according to the time sequence of the frames to obtain a two-dimensional feature mapping matrixWherein T represents a transpose operation;
(4) generating a temporal interaction attention matrix:
(4a) using the formula B ═ MTM, generating a correlation matrix B of M, wherein the value of the ith row and the jth column in the matrix represents the correlation degree between two low-dimensional vectors corresponding to the ith and jth sampling images in the video;
(4b) normalizing the correlation matrix B to obtain a time interaction attention matrix A with the size of 60 multiplied by 60;
(5) generating a time interaction attention weighted feature matrix:
using formulasGenerating a temporal interaction attention weighted feature matrixWherein γ represents a scaling parameter initialized to 0 to balance both MA and M;
(6) generating a multi-layer time interaction attention weighted feature matrix:
(6a) using formulasGeneratingCorrelation matrix ofTo pairNormalization processing is carried out to obtain a multilayer time interactive attention moment array with the size of 60 multiplied by 60
(6b) Using formulasGenerating a multi-tier temporal interaction attention weighted feature matrixWherein,indicating an initialization to 0 for balancingAndthe proportion parameters of the two terms;
(7) acquiring a feature vector of a video:
inputting the multi-layer time interactive attention weighted feature matrix of each video into a full-connection layer, and outputting the feature vector of the video;
(8) performing behavior recognition on the video:
(8a) inputting the feature vector of each video into a softmax classifier, and iteratively updating the parameters gamma and gamma by using a back propagation gradient descent methodParameters of a full connection layer and parameters of a softmax classifier are obtained until a cross entropy loss function is converged, and each trained parameter is obtained;
(8b) sampling 60 frames of RGB images of each video to be identified at equal intervals, scaling the size of each frame of image to 256 × 340, then performing center cutting to obtain 60 frames of RGB images with the size of 224 × 224, inputting each frame of RGB images into an inclusion-v 2 network, and outputting a depth feature map of the video to be identified;
(8c) and (4) processing the depth feature map of each video to be recognized by adopting the same processing method as the steps (3) to (7) to obtain feature vectors of the video, inputting each feature vector into a trained softmax classifier, and outputting a behavior recognition result of each video.
2. The method according to claim 1, wherein the preprocessing of each video in the sample set in step (1a) comprises sampling 60 frames of RGB images at equal intervals for each video in the sample set, scaling the RGB images to 256 × 340, and cropping to obtain 60 frames of RGB images with a size of 224 × 224 for the video.
3. The method for behavior recognition based on feature mapping and multi-layer temporal interaction attention of claim 1, wherein the spatial vectorization function in step (3a) is as follows:
wherein f isr,kRepresents the corresponding low-dimensional vector of the kth sampling frame in the r video, V (-) represents the space vectorization function, Xr,kRepresenting a depth feature map, X, corresponding to the kth sample frame in the r videor,k,ijRepresents Xr,kRepresents a summation operation, and H and W represent X, respectivelyr,kThe total number of rows and the total number of columns.
4. The method for behavior recognition based on feature mapping and multi-layer temporal interaction attention of claim 1, wherein the number of output neurons of the fully-connected layer in step (7) is set to 1024.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110086627.3A CN112766177B (en) | 2021-01-22 | 2021-01-22 | Behavior identification method based on feature mapping and multi-layer time interaction attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110086627.3A CN112766177B (en) | 2021-01-22 | 2021-01-22 | Behavior identification method based on feature mapping and multi-layer time interaction attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112766177A true CN112766177A (en) | 2021-05-07 |
CN112766177B CN112766177B (en) | 2022-12-02 |
Family
ID=75702700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110086627.3A Active CN112766177B (en) | 2021-01-22 | 2021-01-22 | Behavior identification method based on feature mapping and multi-layer time interaction attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112766177B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109389055A (en) * | 2018-09-21 | 2019-02-26 | 西安电子科技大学 | Video classification methods based on mixing convolution sum attention mechanism |
CN110059662A (en) * | 2019-04-26 | 2019-07-26 | 山东大学 | A kind of deep video Activity recognition method and system |
EP3625727A1 (en) * | 2017-11-14 | 2020-03-25 | Google LLC | Weakly-supervised action localization by sparse temporal pooling network |
US20200175281A1 (en) * | 2018-11-30 | 2020-06-04 | International Business Machines Corporation | Relation attention module for temporal action localization |
CN111325099A (en) * | 2020-01-21 | 2020-06-23 | 南京邮电大学 | Sign language identification method and system based on double-current space-time diagram convolutional neural network |
CN111627052A (en) * | 2020-04-30 | 2020-09-04 | 沈阳工程学院 | Action identification method based on double-flow space-time attention mechanism |
-
2021
- 2021-01-22 CN CN202110086627.3A patent/CN112766177B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3625727A1 (en) * | 2017-11-14 | 2020-03-25 | Google LLC | Weakly-supervised action localization by sparse temporal pooling network |
CN109389055A (en) * | 2018-09-21 | 2019-02-26 | 西安电子科技大学 | Video classification methods based on mixing convolution sum attention mechanism |
US20200175281A1 (en) * | 2018-11-30 | 2020-06-04 | International Business Machines Corporation | Relation attention module for temporal action localization |
CN110059662A (en) * | 2019-04-26 | 2019-07-26 | 山东大学 | A kind of deep video Activity recognition method and system |
CN111325099A (en) * | 2020-01-21 | 2020-06-23 | 南京邮电大学 | Sign language identification method and system based on double-current space-time diagram convolutional neural network |
CN111627052A (en) * | 2020-04-30 | 2020-09-04 | 沈阳工程学院 | Action identification method based on double-flow space-time attention mechanism |
Non-Patent Citations (3)
Title |
---|
MING TONG 等: ""A new framework of action recognition with discriminative parts,spatio-temporal and causal interaction descriptors"", 《ELSEVIER》 * |
刘天亮等: "融合空间-时间双网络流和视觉注意的人体行为识别", 《电子与信息学报》 * |
解怀奇等: "基于通道注意力机制的视频人体行为识别", 《电子技术与软件工程》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112766177B (en) | 2022-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110119703B (en) | Human body action recognition method fusing attention mechanism and spatio-temporal graph convolutional neural network in security scene | |
CN108537742B (en) | Remote sensing image panchromatic sharpening method based on generation countermeasure network | |
CN113469094A (en) | Multi-mode remote sensing data depth fusion-based earth surface coverage classification method | |
CN110555446A (en) | Remote sensing image scene classification method based on multi-scale depth feature fusion and transfer learning | |
CN112926396A (en) | Action identification method based on double-current convolution attention | |
CN112070078B (en) | Deep learning-based land utilization classification method and system | |
CN110059587A (en) | Human bodys' response method based on space-time attention | |
CN114782694B (en) | Unsupervised anomaly detection method, system, device and storage medium | |
CN113935249B (en) | Upper-layer ocean thermal structure inversion method based on compression and excitation network | |
CN114565594A (en) | Image anomaly detection method based on soft mask contrast loss | |
CN113780249A (en) | Expression recognition model processing method, device, equipment, medium and program product | |
CN117690178B (en) | Face image recognition method and system based on computer vision | |
CN114170657A (en) | Facial emotion recognition method integrating attention mechanism and high-order feature representation | |
CN114937173A (en) | Hyperspectral image rapid classification method based on dynamic graph convolution network | |
CN107038410A (en) | A kind of weed images recognition methods that network is stacked based on depth | |
CN110782503B (en) | Face image synthesis method and device based on two-branch depth correlation network | |
CN115757919A (en) | Symmetric deep network and dynamic multi-interaction human resource post recommendation method | |
CN111008570B (en) | Video understanding method based on compression-excitation pseudo-three-dimensional network | |
CN116883364A (en) | Apple leaf disease identification method based on CNN and Transformer | |
CN115032602A (en) | Radar target identification method based on multi-scale convolution capsule network | |
CN114359785A (en) | Lip language identification method and device based on adaptive matrix feature fusion network and electronic equipment | |
Mahadevan et al. | Automatic recognition of Rice Plant leaf diseases detection using deep neural network with improved threshold neural network | |
CN112766177B (en) | Behavior identification method based on feature mapping and multi-layer time interaction attention | |
CN115879623A (en) | Agricultural drought level prediction method and device, electronic equipment and storage medium | |
CN116257786A (en) | Asynchronous time sequence classification method based on multi-element time sequence diagram structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |