CN110059587A - Human bodys' response method based on space-time attention - Google Patents

Human bodys' response method based on space-time attention Download PDF

Info

Publication number
CN110059587A
CN110059587A CN201910250775.7A CN201910250775A CN110059587A CN 110059587 A CN110059587 A CN 110059587A CN 201910250775 A CN201910250775 A CN 201910250775A CN 110059587 A CN110059587 A CN 110059587A
Authority
CN
China
Prior art keywords
picture
space
attention
shot
long term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910250775.7A
Other languages
Chinese (zh)
Inventor
田智强
产文颂
郑帅
杜少毅
兰旭光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910250775.7A priority Critical patent/CN110059587A/en
Publication of CN110059587A publication Critical patent/CN110059587A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of Human bodys' response methods based on space-time attention, and the present invention extracts picture feature using convolutional neural networks, so that input of the feature vector as shot and long term memory network is obtained, it is more more advantageous than directly inputting picture;Preferably retain and handled the timing information in video using shot and long term memory network;Using space-time attention mechanism, model is allowed to pay close attention to the upper important sequence of spatially important point and time, to improve the efficiency and accuracy rate of identification.

Description

Human bodys' response method based on space-time attention
Technical field
The invention belongs to computer vision, visual classification, deep learning and field in intelligent robotics, and in particular to Yi Zhongji In the Human bodys' response method of space-time attention.
Background technique
Depth learning technology achieves very more research achievements, is more and more leading in fast development recent years Domain plays an important role.
The application prospect of computer vision is very wide, it obtains image using image capture device, then calculates Machine analyzes these images, obtain mutually in requisition for information, as made by the eyes and brain for scheming the mankind and many biologies It works similar.With the development of technology, machine learning and depth learning technology are combined, this field obtained many weights in recent years Quantum jump, while also having more and more problem and demand urgently to be resolved.
With flourishing for internet area and mobile terminal many years, there is a large amount of video collected daily and uploads, How to carry out Classification and Identification to these videos has research significance very much.On the other hand, a kind of carrier of the video as information, is obtained Take information therein that all there is important value many-sided.However, being accomplished manually these tasks due to the substantial amounts of video It is very unpractical, and substitutes that manually to complete task be natural using computer.
All kinds of robots play increasingly important role in today's society, and society is also more next with the demand in market It is bigger.In this case, robot is made to become more intelligent necessary.Robot is exactly intelligence to the Activity recognition of the mankind Change a form, have Human bodys' response robot can preferably carry out human-computer interaction and man-machine collaboration etc. it is many compared with For complicated behavior.
Since AlexNet comes out, extensive concern and application that convolutional neural networks are subject to.Convolutional neural networks are deep One of most representative algorithm of learning areas is spent, is a kind of BP network model comprising convolutional calculation, answers extensively For computer vision field, the network structure represented has VGG, GoogleNet, ResNet etc..
Shot and long term memory network is one of the representative algorithm in a kind of recursive neural network and deep learning.With Convolutional neural networks are compared, it is generally better at processing sequence information, such as machine translation, sentiment analysis etc..
Many behavior recognizers are currently existed, but much the effect is unsatisfactory, mainly due to the following aspects: It is relatively convenient that the Spatial information processing of normal picture gets up, and in video other than comprising spatial information, when further including Sequence information, this part are difficult to handle, and the related information between multiframe is difficult to hold;Since video file is often bigger, so The hardware requirement for handling video is often relatively high, so there is hardware limitation;Many information is not valuable in video Value, concern is not needed, so being highly desirable to carry out the extraction of key point and key frame, but this is one very difficult again It solves the problems, such as.
Summary of the invention
The purpose of the present invention is to overcome the above shortcomings and to provide a kind of Human bodys' response sides based on space-time attention Method, it is intended in solve the problems, such as video identification timing information processing and concern video in key message.
In order to achieve the above object, the present invention the following steps are included:
The video of input is split into picture frame, and uniformly extracts required amount of picture by step 1;
Step 2 carries out feature extraction to the picture of extraction using the convolutional neural networks of completion, to obtain corresponding Feature vector;
Step 3 calculates the corresponding space of every picture to perceptron using preceding using feature vector is extracted as input Attention weight;
Step 4, use space attention weight are weighted picture feature vector to obtain weighted feature vector;
Weighted feature vector is input in shot and long term memory network by step 5, before in shot and long term memory network To propagation, the class probability vector of output is calculated;
It is calculated using the feature vector of each picture and the output of corresponding shot and long term memory network hidden layer corresponding Spatial attention weight;
Step 6, use space attention weight are weighted summation to the class probability vector of every picture, obtain one A class probability vector;
Step 7 is trained model using several marked video datas;Backpropagation is used in training process, When losing larger, model parameter is constantly updated, lesser value is converged to until losing, saves as model;
Take the corresponding classification of maximum value in class probability vector as final classification and output, as model parameter;
Step 8 combines the model of preservation and model parameter, constitutes Human bodys' response model.
In step 2, convolutional neural networks are used for using the VGG19 convolutional neural networks that training is completed on ImageNet The network for carrying out picture classification, using picture as the input of network, and the feature vector for taking it not connect entirely also.
In step 3, the calculation formula of spatial attention weight are as follows:
Wherein etFor results of intermediate calculations, lt,iFor the spatial attention weighted value of t-th of picture ith zone,WithIt for weight parameter, is obtained in training, XtFor the corresponding feature vector of t picture, ht-1It is corresponding for t-1 picture The output of hidden layer, K2For the number of regions that each picture is divided into, b is biasing.
In step 5, shot and long term memory network using two layers shot and long term memory network as master network, its calculation formula is:
Wherein YtFor the input of t-th of time step short-term memory network, xt,iFor t picture character pair vector The feature of ith zone.
In step 5, the input needs of shot and long term memory network are weighted to obtain single class by time attention weight Other probability vector, its calculation formula is:
Wherein o is the categorization vector of output, and tanh is activation primitive;
Choose output result of the corresponding classification of maximum probability value as prediction in class probability vector;
The calculation formula of time attention weight are as follows:
βt=ReLU (Wout(WXXt+Whht-1+b))
Wherein β tForThe spatial attention weight of t picture, ReLU are linear activation primitive, Wout、WXAnd WhIt is weight Parameter obtains in training, XtFor the corresponding feature vector of t picture, ht-1For the corresponding hidden layer of t-1 picture Output, b are biasing.
Shot and long term memory network initializes hidden layer, its calculation formula is:
Wherein finit,hTo be preceding to perceptron.
The cell state c of first time step hidden layer input of shot and long term memory network0Initialization calculation formula are as follows:
Wherein finit,cTo be preceding to perceptron.
In step 7, loss function is used in training process, and the adjustment of parameter is carried out when losing backpropagation, damage Lose function calculation formula are as follows:
Wherein C is the sum of classification, yiFor true tag,For the probability for belonging to the i-th class, T is input total picture Number, λ1Spatial attention penalty coefficient, λ2Time attention penalty coefficient.
Compared with prior art, the present invention extracts picture feature using convolutional neural networks, to obtain feature vector work It is more more advantageous than directly inputting picture for the input of shot and long term memory network;Preferably retained using shot and long term memory network With the timing information handled in video;Using space-time attention mechanism, model is paid close attention to spatially important Point and time upper important sequence, to improve the efficiency and accuracy rate of identification;After the video pre-filtering stage reduces Continuous calculation amount alleviates the calculating pressure of hardware.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is model structure of the invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawing.
Referring to Fig. 1, the present invention the following steps are included:
Step 101, camera is used to obtain video data or direct uploaded videos data as video input.
Step 102, to being originally inputted video data pre-processes, and video is split framing, in order to reduce subsequent meter Calculation amount equably extracts wherein certain amount ground picture, and keep originally temporal order arrange these pictures.
Step 103, feature is carried out to every picture using with training completion on ImageNet convolutional neural networks VGG19 It extracts, obtains corresponding feature vector;For convenience subsequent calculating, by feature vector from bivector be stretched as it is one-dimensional to Amount, the feature vector of t picture are Xt={ xt,1,xt,2,…,xt,i,…}。
Step 104, due to the importance of all parts in each picture be not it is identical, some parts are important, to identification It is helpful, and some parts are useless, so introducing spatial attention weight, that is, indicate the important journey of each part of picture Degree, the size of numerical value represent the height of importance,
The calculation formula of spatial attention weight are as follows:
Wherein etFor results of intermediate calculations, lt,iFor the spatial attention weighted value of t-th of picture ith zone,WithIt for weight parameter, is obtained in training, XtFor the corresponding feature vector of t picture, ht-1It is corresponding for t-1 picture The output of hidden layer, K2For the number of regions that each picture is divided into, b is biasing;
Step 105, in the spatial attention that every picture is calculated, spatial attention weighting is carried out immediately, to picture Corresponding feature vector is weighted, the formula of weighting are as follows:
Wherein YtFor the input of t-th of time step short-term memory network, xt,iFor t picture character pair vector The feature of ith zone.
Step 106, later feature vector will be weighted and inputs shot and long term memory network, before shot and long term memory network To propagation, the corresponding hidden layer of every picture exports ht, there are two the effect of aspect, one is as output for the output of hidden layer To next step, the other is calculating spatial attention and time attention weight.
Step 107, since the importance of frame different in video is different, some frames are important, some frames are then not So important, so needing to distinguish these importance for not having to frame, this introduces time attention mechanism, the numerical value of weight is big The small height for representing importance, the wherein calculation formula of time attention are as follows:
βt=ReLU (Wout(WXXt+Whht-1+b))
Wherein βtFor the spatial attention weight of t picture, ReLU is linear activation primitive, Wout、WXAnd WhIt is power Weight parameter, obtains, X in trainingtFor the corresponding feature vector of t picture, ht-1For the corresponding hidden layer of t-1 picture Output, b be biasing
Step 108, it is calculated after time attention weight, needs to add the corresponding categorization vector of each picture Power, and sum and obtain a categorization vector, and input softmax function and obtain final class probability vector, calculation formula Are as follows:
Wherein o is the categorization vector of output, and tanh is activation primitive;For the probability for belonging to i-th of classification, C is classification Sum.
Step 109, after obtaining class probability vector, probability is takenMaximum corresponding classification is as final classification and defeated Result out.
In above-mentioned steps, need to initialize shot and long term memory network hidden layer, its calculation formula is:
Wherein finit,hTo be preceding to perceptron;
The cellular layer to shot and long term memory network is also needed to initialize, its calculation formula is:
Wherein finit,cTo be preceding to perceptron.
In addition, the foundation of model is a large amount of marked firstly the need of using as scheming most of deep learning algorithms Video data is trained, and uses backpropagation among these, can adjust model parameter according to loss in this backpropagation, therefore It needs to construct loss function, its calculation formula is:
Wherein C is the sum of classification, yiFor true tag,For the probability for belonging to the i-th class, T is input total picture Number, λ1Spatial attention penalty coefficient, λ2Time attention penalty coefficient.
Referring to fig. 2, which depict the specific structures of model of the present invention, including following part:
Step 201, it is the video data of input, and video is split into framing, uniformly extracts a part of frame therein.
It step 202, is the trained VGG19 network on ImageNet, for carrying out the feature extraction of picture.
Step 203, it is spatial attention weighted portion, carries out spatial attention weighting for the feature vector to picture.
Step 204, it is shot and long term memory network (LSTM), is the master network of model.
Step 205, it is time attention weighted portion, is weighted summation for the output to shot and long term memory network.
Step 206, it is softmax function, the output of front is input to softmax function and obtains class probability vector, And the corresponding classification of value for choosing maximum probability is as final classification.

Claims (8)

1. the Human bodys' response method based on space-time attention, which comprises the following steps:
The video of input is split into picture frame, and uniformly extracts required amount of picture by step 1;
Step 2 carries out feature extraction to the picture of extraction using the convolutional neural networks of completion, to obtain corresponding feature Vector;
Step 3 calculates the corresponding space transforms of every picture to perceptron using preceding using feature vector is extracted as input Power weight;
Step 4, use space attention weight are weighted picture feature vector to obtain weighted feature vector;
Weighted feature vector is input in shot and long term memory network by step 5, is passed by the forward direction in shot and long term memory network It broadcasts, the class probability vector of output is calculated;
Corresponding space is calculated using the output of the feature vector and corresponding shot and long term memory network hidden layer of each picture Attention weight;
Step 6, use space attention weight are weighted summation to the class probability vector of every picture, obtain a class Other probability vector;
Step 7 is trained model using several marked video datas;Backpropagation is used in training process, works as damage When losing larger, model parameter is constantly updated, lesser value is converged to until losing, saves as model;
Take the corresponding classification of maximum value in class probability vector as final classification and output, as model parameter;
Step 8 combines the model of preservation and model parameter, constitutes Human bodys' response model.
2. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 2 In, convolutional neural networks are using the VGG19 convolutional neural networks that training is completed on ImageNet, for carrying out picture classification Network, using picture as the input of network, and the feature vector for taking it not connect entirely also.
3. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 3 In, the calculation formula of spatial attention weight are as follows:
Wherein etFor results of intermediate calculations, lt,iFor the spatial attention weighted value of t-th of picture ith zone,WithFor Weight parameter obtains in training, XtFor the corresponding feature vector of t picture, ht-1It is hidden for t-1 picture is corresponding The output of layer, K2For the number of regions that each picture is divided into, b is biasing.
4. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 5 In, shot and long term memory network using two layers shot and long term memory network as master network, its calculation formula is:
Wherein YtFor the input of t-th of time step short-term memory network, xt,iIt is i-th of t picture character pair vector The feature in region.
5. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 5 In, the input of shot and long term memory network is weighted to obtain single class probability vector by time attention weight, meter Calculate formula are as follows:
Wherein o is the categorization vector of output, and tanh is activation primitive;
Choose output result of the corresponding classification of maximum probability value as prediction in class probability vector;
The calculation formula of time attention weight are as follows:
βt=ReLU (Wout(WXXt+Whht-1+b))
Wherein βtFor the spatial attention weight of t picture, ReLU is linear activation primitive, Wout、WXAnd WhIt is weight ginseng Number, obtains, X in trainingtFor the corresponding feature vector of t picture, ht-1For the defeated of the corresponding hidden layer of t-1 picture Out, b is biasing.
6. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that shot and long term note Recall network to initialize hidden layer, its calculation formula is:
Wherein finit,hTo be preceding to perceptron.
7. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that shot and long term note Recall the cell state c of first time step hidden layer input of network0Initialization calculation formula are as follows:
Wherein finit,cTo be preceding to perceptron.
8. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 7 In, loss function is used in training process, and the adjustment of parameter, loss function calculation formula are carried out when losing backpropagation Are as follows:
Wherein C is the sum of classification, yiFor true tag,For the probability for belonging to the i-th class, T is the total picture number of input, λ1 Spatial attention penalty coefficient, λ2Time attention penalty coefficient.
CN201910250775.7A 2019-03-29 2019-03-29 Human bodys' response method based on space-time attention Pending CN110059587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910250775.7A CN110059587A (en) 2019-03-29 2019-03-29 Human bodys' response method based on space-time attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910250775.7A CN110059587A (en) 2019-03-29 2019-03-29 Human bodys' response method based on space-time attention

Publications (1)

Publication Number Publication Date
CN110059587A true CN110059587A (en) 2019-07-26

Family

ID=67317918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910250775.7A Pending CN110059587A (en) 2019-03-29 2019-03-29 Human bodys' response method based on space-time attention

Country Status (1)

Country Link
CN (1) CN110059587A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826447A (en) * 2019-10-29 2020-02-21 北京工商大学 Restaurant kitchen staff behavior identification method based on attention mechanism
CN111083477A (en) * 2019-12-11 2020-04-28 北京航空航天大学 HEVC (high efficiency video coding) optimization algorithm based on visual saliency
CN111191739A (en) * 2020-01-09 2020-05-22 电子科技大学 Wall surface defect detection method based on attention mechanism
CN111210907A (en) * 2020-01-14 2020-05-29 西北工业大学 Pain intensity estimation method based on space-time attention mechanism
CN111242101A (en) * 2020-03-08 2020-06-05 电子科技大学 Behavior identification method based on spatiotemporal context association
CN111402928A (en) * 2020-03-04 2020-07-10 华南理工大学 Attention-based speech emotion state evaluation method, device, medium and equipment
CN111401149A (en) * 2020-02-27 2020-07-10 西北工业大学 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN111738218A (en) * 2020-07-27 2020-10-02 成都睿沿科技有限公司 Human body abnormal behavior recognition system and method
CN112329867A (en) * 2020-11-10 2021-02-05 宁波大学 MRI image classification method based on task-driven hierarchical attention network
CN112752102A (en) * 2019-10-31 2021-05-04 北京大学 Video code rate distribution method based on visual saliency
CN113408349A (en) * 2021-05-17 2021-09-17 浙江大华技术股份有限公司 Training method of motion evaluation model, motion evaluation method and related equipment
CN114299436A (en) * 2021-12-30 2022-04-08 东北农业大学 Group-breeding pig fighting behavior identification method integrating space-time double-attention mechanism

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066973A (en) * 2017-04-17 2017-08-18 杭州电子科技大学 A kind of video content description method of utilization spatio-temporal attention model
WO2017155660A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Action localization in sequential data with attention proposals from a recurrent network
CN108600701A (en) * 2018-05-02 2018-09-28 广州飞宇智能科技有限公司 A kind of monitoring system and method judging video behavior based on deep learning
CN108776796A (en) * 2018-06-26 2018-11-09 内江师范学院 A kind of action identification method based on global spatio-temporal attention model
CN108846332A (en) * 2018-05-30 2018-11-20 西南交通大学 A kind of railway drivers Activity recognition method based on CLSTA
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017155660A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Action localization in sequential data with attention proposals from a recurrent network
CN107066973A (en) * 2017-04-17 2017-08-18 杭州电子科技大学 A kind of video content description method of utilization spatio-temporal attention model
CN108600701A (en) * 2018-05-02 2018-09-28 广州飞宇智能科技有限公司 A kind of monitoring system and method judging video behavior based on deep learning
CN108846332A (en) * 2018-05-30 2018-11-20 西南交通大学 A kind of railway drivers Activity recognition method based on CLSTA
CN108776796A (en) * 2018-06-26 2018-11-09 内江师范学院 A kind of action identification method based on global spatio-temporal attention model
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WENSONG CHAN 等: ""Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection"", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
YANG HAODONG 等: ""Bi-direction hierarchical LSTM with spatial-temporal attention for action recognition"", 《JOURNAL OF INTELLIGENT & FUZZY SYSTEMS》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826447A (en) * 2019-10-29 2020-02-21 北京工商大学 Restaurant kitchen staff behavior identification method based on attention mechanism
CN112752102B (en) * 2019-10-31 2022-12-30 北京大学 Video code rate distribution method based on visual saliency
CN112752102A (en) * 2019-10-31 2021-05-04 北京大学 Video code rate distribution method based on visual saliency
CN111083477B (en) * 2019-12-11 2020-11-10 北京航空航天大学 HEVC (high efficiency video coding) optimization algorithm based on visual saliency
CN111083477A (en) * 2019-12-11 2020-04-28 北京航空航天大学 HEVC (high efficiency video coding) optimization algorithm based on visual saliency
CN111191739A (en) * 2020-01-09 2020-05-22 电子科技大学 Wall surface defect detection method based on attention mechanism
CN111210907A (en) * 2020-01-14 2020-05-29 西北工业大学 Pain intensity estimation method based on space-time attention mechanism
CN111401149A (en) * 2020-02-27 2020-07-10 西北工业大学 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN111401149B (en) * 2020-02-27 2022-05-13 西北工业大学 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN111402928A (en) * 2020-03-04 2020-07-10 华南理工大学 Attention-based speech emotion state evaluation method, device, medium and equipment
CN111242101A (en) * 2020-03-08 2020-06-05 电子科技大学 Behavior identification method based on spatiotemporal context association
CN111738218A (en) * 2020-07-27 2020-10-02 成都睿沿科技有限公司 Human body abnormal behavior recognition system and method
CN112329867A (en) * 2020-11-10 2021-02-05 宁波大学 MRI image classification method based on task-driven hierarchical attention network
CN113408349A (en) * 2021-05-17 2021-09-17 浙江大华技术股份有限公司 Training method of motion evaluation model, motion evaluation method and related equipment
CN114299436A (en) * 2021-12-30 2022-04-08 东北农业大学 Group-breeding pig fighting behavior identification method integrating space-time double-attention mechanism

Similar Documents

Publication Publication Date Title
CN110059587A (en) Human bodys' response method based on space-time attention
CN111091045B (en) Sign language identification method based on space-time attention mechanism
CN112052886B (en) Intelligent human body action posture estimation method and device based on convolutional neural network
CN109472194B (en) Motor imagery electroencephalogram signal feature identification method based on CBLSTM algorithm model
CN110111366A (en) A kind of end-to-end light stream estimation method based on multistage loss amount
CN112307995B (en) Semi-supervised pedestrian re-identification method based on feature decoupling learning
CN111814611B (en) Multi-scale face age estimation method and system embedded with high-order information
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN111242844A (en) Image processing method, image processing apparatus, server, and storage medium
CN111476133A (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN114581502A (en) Monocular image-based three-dimensional human body model joint reconstruction method, electronic device and storage medium
CN112861718A (en) Lightweight feature fusion crowd counting method and system
CN110188791B (en) Visual emotion label distribution prediction method based on automatic estimation
Zhang et al. FCHP: Exploring the discriminative feature and feature correlation of feature maps for hierarchical DNN pruning and compression
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
Zhao et al. Human action recognition based on improved fusion attention CNN and RNN
CN116884067B (en) Micro-expression recognition method based on improved implicit semantic data enhancement
CN111160327B (en) Expression recognition method based on lightweight convolutional neural network
CN116543289B (en) Image description method based on encoder-decoder and Bi-LSTM attention model
CN117611428A (en) Fashion character image style conversion method
CN112528077A (en) Video face retrieval method and system based on video embedding
CN116543021A (en) Siamese network video single-target tracking method based on feature fusion
CN115965905A (en) Crowd counting method and system based on multi-scale fusion convolutional network
Zhang From artificial neural networks to deep learning: A research survey
He Exploring style transfer algorithms in Animation: Enhancing visual

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190726