CN112766177A - Behavior identification method based on feature mapping and multi-layer time interaction attention - Google Patents

Behavior identification method based on feature mapping and multi-layer time interaction attention Download PDF

Info

Publication number
CN112766177A
CN112766177A CN202110086627.3A CN202110086627A CN112766177A CN 112766177 A CN112766177 A CN 112766177A CN 202110086627 A CN202110086627 A CN 202110086627A CN 112766177 A CN112766177 A CN 112766177A
Authority
CN
China
Prior art keywords
video
matrix
feature
generating
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110086627.3A
Other languages
Chinese (zh)
Other versions
CN112766177B (en
Inventor
同鸣
金磊
董秋宇
边放
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110086627.3A priority Critical patent/CN112766177B/en
Publication of CN112766177A publication Critical patent/CN112766177A/en
Application granted granted Critical
Publication of CN112766177B publication Critical patent/CN112766177B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a behavior recognition method based on feature mapping and multi-layer time interaction attention, which solves the problem that the prior art is insufficient in time dynamic information modeling and ignores the interdependency relation between different frames, so that the recognition capability of behaviors is insufficient. The method comprises the following implementation steps: (1) generating a training set; (2) acquiring a depth feature map; (3) constructing a feature mapping matrix; (4) generating a time interaction attention matrix; (5) generating a time interaction attention weighted feature matrix; (6) generating a multi-layer time interactive attention weighted feature matrix; (7) acquiring a feature vector of a video; (8) and performing behavior recognition on the video. Because the invention constructs the feature mapping matrix and provides multilayer time interaction attention, the invention can improve the accuracy of behavior identification in the video.

Description

Behavior identification method based on feature mapping and multi-layer time interaction attention
Technical Field
The invention belongs to the technical field of video processing, and further relates to a behavior identification method based on feature mapping and multi-layer time interaction attention in the technical field of computer vision. The method can be used for human behavior recognition in videos.
Background
The human behavior recognition task based on the video plays an important role in the field of computer vision, has a wide application prospect, and is applied to the fields of unmanned driving, man-machine interaction, video monitoring and the like at present. The aim of human behavior recognition is to judge the category of human behavior in a video, and the essence is a classification problem. In recent years, with the development of deep learning, behavior recognition methods based on deep learning have been widely studied.
The south China university of marble discloses a human behavior recognition method in the patent document "human behavior recognition method based on time attention mechanism and LSTM" (patent application No.: CN201910271178.2, application publication No. CN110135249A) applied by south China university of marble. The method mainly comprises the following implementation steps: 1. acquiring video data of an RGB monocular vision sensor; 2. extracting 2D skeleton joint point data; 3. extracting joint point joint structure characteristics; 4. constructing an LSTM long-term and short-term memory network; 5. adding a time attention mechanism in the LSTM network; 6. and (5) carrying out human behavior recognition by utilizing a softmax classifier. The time attention mechanism proposed by the method separately explores the importance degree of each frame in the video and gives large weight to the characteristics of the important frames, but the method still has the defects that the interdependence relation between different frames in the video is ignored, so that partial global information is lost, and the error of behavior identification is caused.
A behavior recognition method is disclosed in the published article "Temporal segment networks for action recognition in videos" (IEEE transactions on pattern analysis and machine interaction, 2018,2740-2755) by Limin Wang et al. The method mainly comprises the following implementation steps: 1. uniformly dividing the video into 7 video segments; 2. randomly sampling a frame of RGB image in each video segment to obtain 7 frames of RGB images; 3. inputting each frame of RGB image into a convolutional neural network to obtain a classification score of each frame of RGB image; 4. and combining the segment consensus function and the prediction function with the classification score of the 7 frames of RGB images to obtain the behavior recognition result of the video. The method has the defects that for a longer video, only 7 frames of RGB images are sampled, so that information in the video is lost, more complete time dynamic information cannot be modeled, and further, the behavior recognition accuracy rate is lower.
Disclosure of Invention
The invention aims to provide a behavior recognition method based on feature mapping and multi-layer time interaction attention aiming at the defects of the prior art, and the behavior recognition method is used for solving the problem that the prior art is poor in behavior recognition capability due to the fact that time dynamic information is not sufficiently modeled and the interdependence relation between different frames is ignored.
In order to achieve the purpose, the idea of the invention is to construct a feature mapping matrix and embed the feature mapping matrix into the time and space information in the video; the time interaction attention is obtained by exploring the mutual influence among different frames in the video; and (3) mining complex time dynamic information in the video by using multi-layer time interaction attention.
In order to achieve the purpose, the method comprises the following specific steps:
(1) generating a training set:
(1a) selecting RGB videos containing N behavior categories in a video data set to form a sample set, wherein each category contains at least 100 videos, each video has a determined behavior category, and N is greater than 50;
(1b) preprocessing each video in the sample set to obtain RGB images corresponding to the video, and forming the RGB images of all preprocessed videos into a training set;
(2) generating a depth feature map:
sequentially inputting each frame of RGB image in each video in the training set into an inclusion-v 2 network, and sequentially outputting a depth feature map X with the size of 7X 1024 in each frame of image in each videokWherein k represents the sequence number of the sampling image in the video, and k is 1, 2.
(3) Constructing a feature mapping matrix:
(3a) encoding each depth feature map as a low-dimensional vector f with 1024 dimensions using a spatial vectorization functionk,k=1,2,...,60;
(3b) Arranging the low-dimensional vectors corresponding to 60 frame sampling images of each video in a row according to the time sequence of the frames to obtain a two-dimensional feature mapping matrix
Figure BDA0002911074420000021
Wherein T represents a transpose operation;
(4) generating a temporal interaction attention matrix:
(4a) using the formula B ═ MTM, generating a correlation matrix B of M, wherein the value of the ith row and the jth column in the matrix represents the correlation degree between two low-dimensional vectors corresponding to the ith and jth sampling images in the video;
(4b) normalizing the correlation matrix B to obtain a time interaction attention matrix A with the size of 60 multiplied by 60;
(5) generating a time interaction attention weighted feature matrix:
using formulas
Figure BDA0002911074420000031
Generating a temporal interaction attention weighted feature matrix
Figure BDA0002911074420000032
Wherein γ represents a scaling parameter initialized to 0 to balance both MA and M;
(6) generating a multi-layer time interaction attention weighted feature matrix:
(6a) using formulas
Figure BDA0002911074420000033
Generating
Figure BDA0002911074420000034
Correlation matrix of
Figure BDA0002911074420000035
To pair
Figure BDA0002911074420000036
Normalization processing is carried out to obtain a multilayer time interactive attention moment array with the size of 60 multiplied by 60
Figure BDA0002911074420000037
(6b) Using formulas
Figure BDA0002911074420000038
Generating a multi-tier temporal interaction attention weighted feature matrix
Figure BDA0002911074420000039
Wherein,
Figure BDA00029110744200000310
indicating an initialization to 0 for balancing
Figure BDA00029110744200000311
And
Figure BDA00029110744200000312
the proportion parameters of the two terms;
(7) acquiring a feature vector of a video:
inputting the multi-layer time interactive attention weighted feature matrix of each video into a full-connection layer, and outputting the feature vector of the video;
(8) performing behavior recognition on the video:
(8a) inputting the feature vector of each video into a softmax classifier, and iteratively updating the parameters gamma and gamma by using a back propagation gradient descent method
Figure BDA00029110744200000313
Parameters of a full connection layer and parameters of a softmax classifier are obtained until a cross entropy loss function is converged, and each trained parameter is obtained;
(8b) sampling 60 frames of RGB images of each video to be identified at equal intervals, scaling the size of each frame of image to 256 × 340, then performing center cutting to obtain 60 frames of RGB images with the size of 224 × 224, inputting each frame of RGB images into an inclusion-v 2 network, and outputting a depth feature map of the video to be identified;
(8c) and (4) processing the depth feature map of each video to be recognized by adopting the same processing method as the steps (3) to (7) to obtain feature vectors of the video, inputting each feature vector into a trained softmax classifier, and outputting a behavior recognition result of each video.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention constructs the feature mapping matrix which comprises the time information of 60 sampling images in the video and the spatial information of each sampling image, the invention overcomes the problems that the information in the video is lost and more complete time dynamic information can not be modeled when only 7 frames of RGB images are sampled in the prior art, so that the invention can more fully retain the time sequence information and obtain the features with more expressive ability.
Secondly, the time interaction attention matrix is provided by the invention, and is obtained by calculating the correlation degree between the low-dimensional features of different sampling images in the feature mapping matrix, so that the problem that partial global information is lost due to the fact that the method in the prior art ignores the interdependence relation between different frames in the video is solved, the technology provided by the invention can fully explore the global information, and the accuracy of behavior identification is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The specific steps of the present invention will be further described with reference to fig. 1.
Step 1, generating a training set.
Selecting RGB videos containing N behavior categories in a video data set to form a sample set, wherein each category contains at least 100 videos, each video has a determined behavior category, and N is greater than 50. Preprocessing each video in the sample set to obtain an RGB image corresponding to the video, and forming the RGB images of all preprocessed videos into a training set. The preprocessing is to sample 60 frames of RGB images at equal intervals for each video in the sample set, scale the size of each frame of RGB image to 256 × 340, and then crop the RGB images to obtain 60 frames of RGB images with the size of 224 × 224 for the video.
And 2, acquiring a depth characteristic map.
Sequentially inputting each frame of RGB image in each video in the training set into an inclusion-v 2 network, and sequentially outputting a depth feature map X with the size of 7X 1024 in each frame of image in each videokWhere k denotes the sequence number of the sample image in the video, and k is 1, 2.
And 3, constructing a feature mapping matrix.
Due to the high dimensionality of the feature map, the joint analysis of the information of the densely sampled images in the video is challenging, and the mapping of the feature map into a low-dimensional vector can reduce the amount of calculation and is beneficial to the joint analysis of the densely sampled images. Taking the kth sampling image in the r video as an example, how to encode the depth feature map of the sampling image of the video into a low-dimensional vector with 1024 dimensions is described:
Figure BDA0002911074420000041
wherein f isr,kRepresents the corresponding low-dimensional vector of the kth sampling image in the r video, V (-) represents the space vectorization function, Xr,kRepresenting a depth feature map, X, corresponding to the kth sample image in the r videor,k,ijRepresents Xr,kRepresents a summation operation, and H and W represent X, respectivelyr,kThe total number of rows and the total number of columns.
Arranging the low-dimensional vectors corresponding to 60 frame sampling images of each video in a row according to the time sequence of the frames to obtain a two-dimensional feature mapping matrix
Figure BDA0002911074420000051
Wherein f iskRepresents the kth sampleThe low-dimensional vector of the image, k 1, 2.., 60, T, represents the transpose operation.
The number of columns of the matrix M is equal to the total number of sampled images corresponding to each video, and the number of rows is equal to the dimension of the low-dimensional vector.
The feature mapping matrix contains the time information of the video and the spatial information of each sampling image, so that the method can perform joint analysis on the densely sampled images in the video.
And 4, generating a time interaction attention matrix.
Generating a correlation matrix B of M ═ MTAnd M, expressing the correlation degree between two low-dimensional vectors corresponding to the ith and jth sampling images in the video by the value of the ith row and the jth column in the B, and normalizing the B to obtain a time interaction attention matrix A with the size of 60 multiplied by 60.
Taking the ith frame sample image and the jth frame sample image as an example, the following explains how to calculate the ith row and jth column element A of the time interaction attention matrix A through the correlation degree between the two frame imagesijThe specific calculation formula is as follows:
Figure BDA0002911074420000052
wherein A isijThe correlation degree of the ith frame sample image and the jth frame sample image is measured. MiAnd MjAnd the physical meanings of the column vectors respectively consisting of the ith column elements and the jth column elements in the feature mapping matrix M are the transposes of the low-dimensional vectors of the ith sampling image and the jth sampling image in the video. If the lower-dimensional vectors of the two frames are more similar, AijThe larger the correlation between the two frames.
All elements in the time interaction attention matrix A are calculated in the same way, and the ith row in A represents the correlation degree of the ith frame sampling image of the video and all sampling images in the video. Therefore, the time interaction attention moment matrix models the correlation between video frames, and is helpful for more fully exploring the global information in the video.
And 5, generating a time interaction attention weighted feature matrix.
Using formulas
Figure BDA0002911074420000061
Generating a temporal interaction attention weighted feature matrix
Figure BDA0002911074420000062
Where γ represents a scaling parameter initialized to 0 to balance both MA and M.
And 6, generating a multilayer time interaction attention weighted feature matrix.
Using formulas
Figure BDA0002911074420000063
Generating
Figure BDA0002911074420000064
Correlation matrix of
Figure BDA0002911074420000065
To pair
Figure BDA0002911074420000066
Normalization processing is carried out to obtain a multilayer time interactive attention moment array with the size of 60 multiplied by 60
Figure BDA0002911074420000067
Reuse formula
Figure BDA0002911074420000068
Generating a multi-tier temporal interaction attention weighted feature matrix
Figure BDA0002911074420000069
Wherein,
Figure BDA00029110744200000610
indicating an initialization to 0 for balancing
Figure BDA00029110744200000611
And
Figure BDA00029110744200000612
the ratio parameter of the two terms.
The multi-layer time interaction attention to time interaction attention weighted feature matrix applies time interaction attention again, and richer time dynamics are explored.
And 7, acquiring a feature vector of the video.
And inputting the multi-layer time interactive attention weighted feature matrix of each video into a full-connection layer with 1024 output neurons to obtain the feature vector of the video.
And 8, performing behavior recognition on the video.
Inputting the feature vector of each video into a softmax classifier, and respectively updating gamma, gamma and gamma by using a back propagation gradient descent method,
Figure BDA00029110744200000613
Parameters of a full connection layer and parameters of a softmax classifier until the convergence of a cross entropy loss function.
Sampling 60 frames of RGB images of each video to be identified at equal intervals, scaling the size of each frame of image to 256 × 340, then performing center cropping to obtain 60 frames of RGB images with the size of 224 × 224, inputting each frame of RGB images into an inclusion-v 2 network, and outputting a depth feature map of the video to be identified.
And (3) processing the depth feature map of each video to be recognized by adopting the same processing method as the steps from 3 to 7 to obtain feature vectors of the video to be recognized, inputting each feature vector into a trained softmax classifier, and outputting a behavior recognition result of each video.

Claims (4)

1. A behavior identification method based on feature mapping and multi-layer time interactive attention is characterized in that a feature mapping matrix containing time information of a video and space information of each sampling image is constructed; the time interaction attention is provided, a time interaction attention matrix is obtained by calculating the correlation degree between the low-dimensional vectors of different sampling images in the feature mapping matrix, and the method specifically comprises the following steps:
(1) generating a training set:
(1a) selecting RGB videos containing N behavior categories in a video data set to form a sample set, wherein each category contains at least 100 videos, each video has a determined behavior category, and N is greater than 50;
(1b) preprocessing each video in the sample set to obtain RGB images corresponding to the video, and forming the RGB images of all preprocessed videos into a training set;
(2) generating a depth feature map:
sequentially inputting each frame of RGB image in each video in the training set into an inclusion-v 2 network, and sequentially outputting a depth feature map X with the size of 7X 1024 in each frame of image in each videokWherein k represents the sequence number of the sampling image in the video, and k is 1, 2.
(3) Constructing a feature mapping matrix:
(3a) encoding each depth feature map as a low-dimensional vector f with 1024 dimensions using a spatial vectorization functionk,k=1,2,...,60;
(3b) Arranging the low-dimensional vectors corresponding to 60 frame sampling images of each video in a row according to the time sequence of the frames to obtain a two-dimensional feature mapping matrix
Figure FDA0002911074410000011
Wherein T represents a transpose operation;
(4) generating a temporal interaction attention matrix:
(4a) using the formula B ═ MTM, generating a correlation matrix B of M, wherein the value of the ith row and the jth column in the matrix represents the correlation degree between two low-dimensional vectors corresponding to the ith and jth sampling images in the video;
(4b) normalizing the correlation matrix B to obtain a time interaction attention matrix A with the size of 60 multiplied by 60;
(5) generating a time interaction attention weighted feature matrix:
using formulas
Figure FDA0002911074410000021
Generating a temporal interaction attention weighted feature matrix
Figure FDA0002911074410000022
Wherein γ represents a scaling parameter initialized to 0 to balance both MA and M;
(6) generating a multi-layer time interaction attention weighted feature matrix:
(6a) using formulas
Figure FDA0002911074410000023
Generating
Figure FDA0002911074410000024
Correlation matrix of
Figure FDA0002911074410000025
To pair
Figure FDA0002911074410000026
Normalization processing is carried out to obtain a multilayer time interactive attention moment array with the size of 60 multiplied by 60
Figure FDA0002911074410000027
(6b) Using formulas
Figure FDA0002911074410000028
Generating a multi-tier temporal interaction attention weighted feature matrix
Figure FDA0002911074410000029
Wherein,
Figure FDA00029110744100000210
indicating an initialization to 0 for balancing
Figure FDA00029110744100000211
And
Figure FDA00029110744100000212
the proportion parameters of the two terms;
(7) acquiring a feature vector of a video:
inputting the multi-layer time interactive attention weighted feature matrix of each video into a full-connection layer, and outputting the feature vector of the video;
(8) performing behavior recognition on the video:
(8a) inputting the feature vector of each video into a softmax classifier, and iteratively updating the parameters gamma and gamma by using a back propagation gradient descent method
Figure FDA00029110744100000213
Parameters of a full connection layer and parameters of a softmax classifier are obtained until a cross entropy loss function is converged, and each trained parameter is obtained;
(8b) sampling 60 frames of RGB images of each video to be identified at equal intervals, scaling the size of each frame of image to 256 × 340, then performing center cutting to obtain 60 frames of RGB images with the size of 224 × 224, inputting each frame of RGB images into an inclusion-v 2 network, and outputting a depth feature map of the video to be identified;
(8c) and (4) processing the depth feature map of each video to be recognized by adopting the same processing method as the steps (3) to (7) to obtain feature vectors of the video, inputting each feature vector into a trained softmax classifier, and outputting a behavior recognition result of each video.
2. The method according to claim 1, wherein the preprocessing of each video in the sample set in step (1a) comprises sampling 60 frames of RGB images at equal intervals for each video in the sample set, scaling the RGB images to 256 × 340, and cropping to obtain 60 frames of RGB images with a size of 224 × 224 for the video.
3. The method for behavior recognition based on feature mapping and multi-layer temporal interaction attention of claim 1, wherein the spatial vectorization function in step (3a) is as follows:
Figure FDA0002911074410000031
wherein f isr,kRepresents the corresponding low-dimensional vector of the kth sampling frame in the r video, V (-) represents the space vectorization function, Xr,kRepresenting a depth feature map, X, corresponding to the kth sample frame in the r videor,k,ijRepresents Xr,kRepresents a summation operation, and H and W represent X, respectivelyr,kThe total number of rows and the total number of columns.
4. The method for behavior recognition based on feature mapping and multi-layer temporal interaction attention of claim 1, wherein the number of output neurons of the fully-connected layer in step (7) is set to 1024.
CN202110086627.3A 2021-01-22 2021-01-22 Behavior identification method based on feature mapping and multi-layer time interaction attention Active CN112766177B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110086627.3A CN112766177B (en) 2021-01-22 2021-01-22 Behavior identification method based on feature mapping and multi-layer time interaction attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110086627.3A CN112766177B (en) 2021-01-22 2021-01-22 Behavior identification method based on feature mapping and multi-layer time interaction attention

Publications (2)

Publication Number Publication Date
CN112766177A true CN112766177A (en) 2021-05-07
CN112766177B CN112766177B (en) 2022-12-02

Family

ID=75702700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110086627.3A Active CN112766177B (en) 2021-01-22 2021-01-22 Behavior identification method based on feature mapping and multi-layer time interaction attention

Country Status (1)

Country Link
CN (1) CN112766177B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
CN110059662A (en) * 2019-04-26 2019-07-26 山东大学 A kind of deep video Activity recognition method and system
EP3625727A1 (en) * 2017-11-14 2020-03-25 Google LLC Weakly-supervised action localization by sparse temporal pooling network
US20200175281A1 (en) * 2018-11-30 2020-06-04 International Business Machines Corporation Relation attention module for temporal action localization
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
CN111627052A (en) * 2020-04-30 2020-09-04 沈阳工程学院 Action identification method based on double-flow space-time attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3625727A1 (en) * 2017-11-14 2020-03-25 Google LLC Weakly-supervised action localization by sparse temporal pooling network
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
US20200175281A1 (en) * 2018-11-30 2020-06-04 International Business Machines Corporation Relation attention module for temporal action localization
CN110059662A (en) * 2019-04-26 2019-07-26 山东大学 A kind of deep video Activity recognition method and system
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
CN111627052A (en) * 2020-04-30 2020-09-04 沈阳工程学院 Action identification method based on double-flow space-time attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MING TONG 等: ""A new framework of action recognition with discriminative parts,spatio-temporal and causal interaction descriptors"", 《ELSEVIER》 *
刘天亮等: "融合空间-时间双网络流和视觉注意的人体行为识别", 《电子与信息学报》 *
解怀奇等: "基于通道注意力机制的视频人体行为识别", 《电子技术与软件工程》 *

Also Published As

Publication number Publication date
CN112766177B (en) 2022-12-02

Similar Documents

Publication Publication Date Title
CN110119703B (en) Human body action recognition method fusing attention mechanism and spatio-temporal graph convolutional neural network in security scene
CN108537742B (en) Remote sensing image panchromatic sharpening method based on generation countermeasure network
CN113469094A (en) Multi-mode remote sensing data depth fusion-based earth surface coverage classification method
CN110555446A (en) Remote sensing image scene classification method based on multi-scale depth feature fusion and transfer learning
CN112926396A (en) Action identification method based on double-current convolution attention
CN112070078B (en) Deep learning-based land utilization classification method and system
CN110059587A (en) Human bodys' response method based on space-time attention
CN114782694B (en) Unsupervised anomaly detection method, system, device and storage medium
CN113935249B (en) Upper-layer ocean thermal structure inversion method based on compression and excitation network
CN114565594A (en) Image anomaly detection method based on soft mask contrast loss
CN113780249A (en) Expression recognition model processing method, device, equipment, medium and program product
CN117690178B (en) Face image recognition method and system based on computer vision
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN114937173A (en) Hyperspectral image rapid classification method based on dynamic graph convolution network
CN107038410A (en) A kind of weed images recognition methods that network is stacked based on depth
CN110782503B (en) Face image synthesis method and device based on two-branch depth correlation network
CN115757919A (en) Symmetric deep network and dynamic multi-interaction human resource post recommendation method
CN111008570B (en) Video understanding method based on compression-excitation pseudo-three-dimensional network
CN116883364A (en) Apple leaf disease identification method based on CNN and Transformer
CN115032602A (en) Radar target identification method based on multi-scale convolution capsule network
CN114359785A (en) Lip language identification method and device based on adaptive matrix feature fusion network and electronic equipment
Mahadevan et al. Automatic recognition of Rice Plant leaf diseases detection using deep neural network with improved threshold neural network
CN112766177B (en) Behavior identification method based on feature mapping and multi-layer time interaction attention
CN115879623A (en) Agricultural drought level prediction method and device, electronic equipment and storage medium
CN116257786A (en) Asynchronous time sequence classification method based on multi-element time sequence diagram structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant