CN110059587A - Human bodys' response method based on space-time attention - Google Patents
Human bodys' response method based on space-time attention Download PDFInfo
- Publication number
- CN110059587A CN110059587A CN201910250775.7A CN201910250775A CN110059587A CN 110059587 A CN110059587 A CN 110059587A CN 201910250775 A CN201910250775 A CN 201910250775A CN 110059587 A CN110059587 A CN 110059587A
- Authority
- CN
- China
- Prior art keywords
- picture
- space
- attention
- shot
- long term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of Human bodys' response methods based on space-time attention, and the present invention extracts picture feature using convolutional neural networks, so that input of the feature vector as shot and long term memory network is obtained, it is more more advantageous than directly inputting picture;Preferably retain and handled the timing information in video using shot and long term memory network;Using space-time attention mechanism, model is allowed to pay close attention to the upper important sequence of spatially important point and time, to improve the efficiency and accuracy rate of identification.
Description
Technical field
The invention belongs to computer vision, visual classification, deep learning and field in intelligent robotics, and in particular to Yi Zhongji
In the Human bodys' response method of space-time attention.
Background technique
Depth learning technology achieves very more research achievements, is more and more leading in fast development recent years
Domain plays an important role.
The application prospect of computer vision is very wide, it obtains image using image capture device, then calculates
Machine analyzes these images, obtain mutually in requisition for information, as made by the eyes and brain for scheming the mankind and many biologies
It works similar.With the development of technology, machine learning and depth learning technology are combined, this field obtained many weights in recent years
Quantum jump, while also having more and more problem and demand urgently to be resolved.
With flourishing for internet area and mobile terminal many years, there is a large amount of video collected daily and uploads,
How to carry out Classification and Identification to these videos has research significance very much.On the other hand, a kind of carrier of the video as information, is obtained
Take information therein that all there is important value many-sided.However, being accomplished manually these tasks due to the substantial amounts of video
It is very unpractical, and substitutes that manually to complete task be natural using computer.
All kinds of robots play increasingly important role in today's society, and society is also more next with the demand in market
It is bigger.In this case, robot is made to become more intelligent necessary.Robot is exactly intelligence to the Activity recognition of the mankind
Change a form, have Human bodys' response robot can preferably carry out human-computer interaction and man-machine collaboration etc. it is many compared with
For complicated behavior.
Since AlexNet comes out, extensive concern and application that convolutional neural networks are subject to.Convolutional neural networks are deep
One of most representative algorithm of learning areas is spent, is a kind of BP network model comprising convolutional calculation, answers extensively
For computer vision field, the network structure represented has VGG, GoogleNet, ResNet etc..
Shot and long term memory network is one of the representative algorithm in a kind of recursive neural network and deep learning.With
Convolutional neural networks are compared, it is generally better at processing sequence information, such as machine translation, sentiment analysis etc..
Many behavior recognizers are currently existed, but much the effect is unsatisfactory, mainly due to the following aspects:
It is relatively convenient that the Spatial information processing of normal picture gets up, and in video other than comprising spatial information, when further including
Sequence information, this part are difficult to handle, and the related information between multiframe is difficult to hold;Since video file is often bigger, so
The hardware requirement for handling video is often relatively high, so there is hardware limitation;Many information is not valuable in video
Value, concern is not needed, so being highly desirable to carry out the extraction of key point and key frame, but this is one very difficult again
It solves the problems, such as.
Summary of the invention
The purpose of the present invention is to overcome the above shortcomings and to provide a kind of Human bodys' response sides based on space-time attention
Method, it is intended in solve the problems, such as video identification timing information processing and concern video in key message.
In order to achieve the above object, the present invention the following steps are included:
The video of input is split into picture frame, and uniformly extracts required amount of picture by step 1;
Step 2 carries out feature extraction to the picture of extraction using the convolutional neural networks of completion, to obtain corresponding
Feature vector;
Step 3 calculates the corresponding space of every picture to perceptron using preceding using feature vector is extracted as input
Attention weight;
Step 4, use space attention weight are weighted picture feature vector to obtain weighted feature vector;
Weighted feature vector is input in shot and long term memory network by step 5, before in shot and long term memory network
To propagation, the class probability vector of output is calculated;
It is calculated using the feature vector of each picture and the output of corresponding shot and long term memory network hidden layer corresponding
Spatial attention weight;
Step 6, use space attention weight are weighted summation to the class probability vector of every picture, obtain one
A class probability vector;
Step 7 is trained model using several marked video datas;Backpropagation is used in training process,
When losing larger, model parameter is constantly updated, lesser value is converged to until losing, saves as model;
Take the corresponding classification of maximum value in class probability vector as final classification and output, as model parameter;
Step 8 combines the model of preservation and model parameter, constitutes Human bodys' response model.
In step 2, convolutional neural networks are used for using the VGG19 convolutional neural networks that training is completed on ImageNet
The network for carrying out picture classification, using picture as the input of network, and the feature vector for taking it not connect entirely also.
In step 3, the calculation formula of spatial attention weight are as follows:
Wherein etFor results of intermediate calculations, lt,iFor the spatial attention weighted value of t-th of picture ith zone,WithIt for weight parameter, is obtained in training, XtFor the corresponding feature vector of t picture, ht-1It is corresponding for t-1 picture
The output of hidden layer, K2For the number of regions that each picture is divided into, b is biasing.
In step 5, shot and long term memory network using two layers shot and long term memory network as master network, its calculation formula is:
Wherein YtFor the input of t-th of time step short-term memory network, xt,iFor t picture character pair vector
The feature of ith zone.
In step 5, the input needs of shot and long term memory network are weighted to obtain single class by time attention weight
Other probability vector, its calculation formula is:
Wherein o is the categorization vector of output, and tanh is activation primitive;
Choose output result of the corresponding classification of maximum probability value as prediction in class probability vector;
The calculation formula of time attention weight are as follows:
βt=ReLU (Wout(WXXt+Whht-1+b))
Wherein β tForThe spatial attention weight of t picture, ReLU are linear activation primitive, Wout、WXAnd WhIt is weight
Parameter obtains in training, XtFor the corresponding feature vector of t picture, ht-1For the corresponding hidden layer of t-1 picture
Output, b are biasing.
Shot and long term memory network initializes hidden layer, its calculation formula is:
Wherein finit,hTo be preceding to perceptron.
The cell state c of first time step hidden layer input of shot and long term memory network0Initialization calculation formula are as follows:
Wherein finit,cTo be preceding to perceptron.
In step 7, loss function is used in training process, and the adjustment of parameter is carried out when losing backpropagation, damage
Lose function calculation formula are as follows:
Wherein C is the sum of classification, yiFor true tag,For the probability for belonging to the i-th class, T is input total picture
Number, λ1Spatial attention penalty coefficient, λ2Time attention penalty coefficient.
Compared with prior art, the present invention extracts picture feature using convolutional neural networks, to obtain feature vector work
It is more more advantageous than directly inputting picture for the input of shot and long term memory network;Preferably retained using shot and long term memory network
With the timing information handled in video;Using space-time attention mechanism, model is paid close attention to spatially important
Point and time upper important sequence, to improve the efficiency and accuracy rate of identification;After the video pre-filtering stage reduces
Continuous calculation amount alleviates the calculating pressure of hardware.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is model structure of the invention.
Specific embodiment
The present invention will be further described with reference to the accompanying drawing.
Referring to Fig. 1, the present invention the following steps are included:
Step 101, camera is used to obtain video data or direct uploaded videos data as video input.
Step 102, to being originally inputted video data pre-processes, and video is split framing, in order to reduce subsequent meter
Calculation amount equably extracts wherein certain amount ground picture, and keep originally temporal order arrange these pictures.
Step 103, feature is carried out to every picture using with training completion on ImageNet convolutional neural networks VGG19
It extracts, obtains corresponding feature vector;For convenience subsequent calculating, by feature vector from bivector be stretched as it is one-dimensional to
Amount, the feature vector of t picture are Xt={ xt,1,xt,2,…,xt,i,…}。
Step 104, due to the importance of all parts in each picture be not it is identical, some parts are important, to identification
It is helpful, and some parts are useless, so introducing spatial attention weight, that is, indicate the important journey of each part of picture
Degree, the size of numerical value represent the height of importance,
The calculation formula of spatial attention weight are as follows:
Wherein etFor results of intermediate calculations, lt,iFor the spatial attention weighted value of t-th of picture ith zone,WithIt for weight parameter, is obtained in training, XtFor the corresponding feature vector of t picture, ht-1It is corresponding for t-1 picture
The output of hidden layer, K2For the number of regions that each picture is divided into, b is biasing;
Step 105, in the spatial attention that every picture is calculated, spatial attention weighting is carried out immediately, to picture
Corresponding feature vector is weighted, the formula of weighting are as follows:
Wherein YtFor the input of t-th of time step short-term memory network, xt,iFor t picture character pair vector
The feature of ith zone.
Step 106, later feature vector will be weighted and inputs shot and long term memory network, before shot and long term memory network
To propagation, the corresponding hidden layer of every picture exports ht, there are two the effect of aspect, one is as output for the output of hidden layer
To next step, the other is calculating spatial attention and time attention weight.
Step 107, since the importance of frame different in video is different, some frames are important, some frames are then not
So important, so needing to distinguish these importance for not having to frame, this introduces time attention mechanism, the numerical value of weight is big
The small height for representing importance, the wherein calculation formula of time attention are as follows:
βt=ReLU (Wout(WXXt+Whht-1+b))
Wherein βtFor the spatial attention weight of t picture, ReLU is linear activation primitive, Wout、WXAnd WhIt is power
Weight parameter, obtains, X in trainingtFor the corresponding feature vector of t picture, ht-1For the corresponding hidden layer of t-1 picture
Output, b be biasing
Step 108, it is calculated after time attention weight, needs to add the corresponding categorization vector of each picture
Power, and sum and obtain a categorization vector, and input softmax function and obtain final class probability vector, calculation formula
Are as follows:
Wherein o is the categorization vector of output, and tanh is activation primitive;For the probability for belonging to i-th of classification, C is classification
Sum.
Step 109, after obtaining class probability vector, probability is takenMaximum corresponding classification is as final classification and defeated
Result out.
In above-mentioned steps, need to initialize shot and long term memory network hidden layer, its calculation formula is:
Wherein finit,hTo be preceding to perceptron;
The cellular layer to shot and long term memory network is also needed to initialize, its calculation formula is:
Wherein finit,cTo be preceding to perceptron.
In addition, the foundation of model is a large amount of marked firstly the need of using as scheming most of deep learning algorithms
Video data is trained, and uses backpropagation among these, can adjust model parameter according to loss in this backpropagation, therefore
It needs to construct loss function, its calculation formula is:
Wherein C is the sum of classification, yiFor true tag,For the probability for belonging to the i-th class, T is input total picture
Number, λ1Spatial attention penalty coefficient, λ2Time attention penalty coefficient.
Referring to fig. 2, which depict the specific structures of model of the present invention, including following part:
Step 201, it is the video data of input, and video is split into framing, uniformly extracts a part of frame therein.
It step 202, is the trained VGG19 network on ImageNet, for carrying out the feature extraction of picture.
Step 203, it is spatial attention weighted portion, carries out spatial attention weighting for the feature vector to picture.
Step 204, it is shot and long term memory network (LSTM), is the master network of model.
Step 205, it is time attention weighted portion, is weighted summation for the output to shot and long term memory network.
Step 206, it is softmax function, the output of front is input to softmax function and obtains class probability vector,
And the corresponding classification of value for choosing maximum probability is as final classification.
Claims (8)
1. the Human bodys' response method based on space-time attention, which comprises the following steps:
The video of input is split into picture frame, and uniformly extracts required amount of picture by step 1;
Step 2 carries out feature extraction to the picture of extraction using the convolutional neural networks of completion, to obtain corresponding feature
Vector;
Step 3 calculates the corresponding space transforms of every picture to perceptron using preceding using feature vector is extracted as input
Power weight;
Step 4, use space attention weight are weighted picture feature vector to obtain weighted feature vector;
Weighted feature vector is input in shot and long term memory network by step 5, is passed by the forward direction in shot and long term memory network
It broadcasts, the class probability vector of output is calculated;
Corresponding space is calculated using the output of the feature vector and corresponding shot and long term memory network hidden layer of each picture
Attention weight;
Step 6, use space attention weight are weighted summation to the class probability vector of every picture, obtain a class
Other probability vector;
Step 7 is trained model using several marked video datas;Backpropagation is used in training process, works as damage
When losing larger, model parameter is constantly updated, lesser value is converged to until losing, saves as model;
Take the corresponding classification of maximum value in class probability vector as final classification and output, as model parameter;
Step 8 combines the model of preservation and model parameter, constitutes Human bodys' response model.
2. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 2
In, convolutional neural networks are using the VGG19 convolutional neural networks that training is completed on ImageNet, for carrying out picture classification
Network, using picture as the input of network, and the feature vector for taking it not connect entirely also.
3. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 3
In, the calculation formula of spatial attention weight are as follows:
Wherein etFor results of intermediate calculations, lt,iFor the spatial attention weighted value of t-th of picture ith zone,WithFor
Weight parameter obtains in training, XtFor the corresponding feature vector of t picture, ht-1It is hidden for t-1 picture is corresponding
The output of layer, K2For the number of regions that each picture is divided into, b is biasing.
4. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 5
In, shot and long term memory network using two layers shot and long term memory network as master network, its calculation formula is:
Wherein YtFor the input of t-th of time step short-term memory network, xt,iIt is i-th of t picture character pair vector
The feature in region.
5. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 5
In, the input of shot and long term memory network is weighted to obtain single class probability vector by time attention weight, meter
Calculate formula are as follows:
Wherein o is the categorization vector of output, and tanh is activation primitive;
Choose output result of the corresponding classification of maximum probability value as prediction in class probability vector;
The calculation formula of time attention weight are as follows:
βt=ReLU (Wout(WXXt+Whht-1+b))
Wherein βtFor the spatial attention weight of t picture, ReLU is linear activation primitive, Wout、WXAnd WhIt is weight ginseng
Number, obtains, X in trainingtFor the corresponding feature vector of t picture, ht-1For the defeated of the corresponding hidden layer of t-1 picture
Out, b is biasing.
6. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that shot and long term note
Recall network to initialize hidden layer, its calculation formula is:
Wherein finit,hTo be preceding to perceptron.
7. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that shot and long term note
Recall the cell state c of first time step hidden layer input of network0Initialization calculation formula are as follows:
Wherein finit,cTo be preceding to perceptron.
8. the Human bodys' response method according to claim 1 based on space-time attention, which is characterized in that step 7
In, loss function is used in training process, and the adjustment of parameter, loss function calculation formula are carried out when losing backpropagation
Are as follows:
Wherein C is the sum of classification, yiFor true tag,For the probability for belonging to the i-th class, T is the total picture number of input, λ1
Spatial attention penalty coefficient, λ2Time attention penalty coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910250775.7A CN110059587A (en) | 2019-03-29 | 2019-03-29 | Human bodys' response method based on space-time attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910250775.7A CN110059587A (en) | 2019-03-29 | 2019-03-29 | Human bodys' response method based on space-time attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110059587A true CN110059587A (en) | 2019-07-26 |
Family
ID=67317918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910250775.7A Pending CN110059587A (en) | 2019-03-29 | 2019-03-29 | Human bodys' response method based on space-time attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110059587A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110826447A (en) * | 2019-10-29 | 2020-02-21 | 北京工商大学 | Restaurant kitchen staff behavior identification method based on attention mechanism |
CN111083477A (en) * | 2019-12-11 | 2020-04-28 | 北京航空航天大学 | HEVC (high efficiency video coding) optimization algorithm based on visual saliency |
CN111191739A (en) * | 2020-01-09 | 2020-05-22 | 电子科技大学 | Wall surface defect detection method based on attention mechanism |
CN111210907A (en) * | 2020-01-14 | 2020-05-29 | 西北工业大学 | Pain intensity estimation method based on space-time attention mechanism |
CN111242101A (en) * | 2020-03-08 | 2020-06-05 | 电子科技大学 | Behavior identification method based on spatiotemporal context association |
CN111402928A (en) * | 2020-03-04 | 2020-07-10 | 华南理工大学 | Attention-based speech emotion state evaluation method, device, medium and equipment |
CN111401149A (en) * | 2020-02-27 | 2020-07-10 | 西北工业大学 | Lightweight video behavior identification method based on long-short-term time domain modeling algorithm |
CN111738218A (en) * | 2020-07-27 | 2020-10-02 | 成都睿沿科技有限公司 | Human body abnormal behavior recognition system and method |
CN112329867A (en) * | 2020-11-10 | 2021-02-05 | 宁波大学 | MRI image classification method based on task-driven hierarchical attention network |
CN112752102A (en) * | 2019-10-31 | 2021-05-04 | 北京大学 | Video code rate distribution method based on visual saliency |
CN113408349A (en) * | 2021-05-17 | 2021-09-17 | 浙江大华技术股份有限公司 | Training method of motion evaluation model, motion evaluation method and related equipment |
CN114299436A (en) * | 2021-12-30 | 2022-04-08 | 东北农业大学 | Group-breeding pig fighting behavior identification method integrating space-time double-attention mechanism |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066973A (en) * | 2017-04-17 | 2017-08-18 | 杭州电子科技大学 | A kind of video content description method of utilization spatio-temporal attention model |
WO2017155660A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Action localization in sequential data with attention proposals from a recurrent network |
CN108600701A (en) * | 2018-05-02 | 2018-09-28 | 广州飞宇智能科技有限公司 | A kind of monitoring system and method judging video behavior based on deep learning |
CN108776796A (en) * | 2018-06-26 | 2018-11-09 | 内江师范学院 | A kind of action identification method based on global spatio-temporal attention model |
CN108846332A (en) * | 2018-05-30 | 2018-11-20 | 西南交通大学 | A kind of railway drivers Activity recognition method based on CLSTA |
CN109101896A (en) * | 2018-07-19 | 2018-12-28 | 电子科技大学 | A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism |
-
2019
- 2019-03-29 CN CN201910250775.7A patent/CN110059587A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017155660A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Action localization in sequential data with attention proposals from a recurrent network |
CN107066973A (en) * | 2017-04-17 | 2017-08-18 | 杭州电子科技大学 | A kind of video content description method of utilization spatio-temporal attention model |
CN108600701A (en) * | 2018-05-02 | 2018-09-28 | 广州飞宇智能科技有限公司 | A kind of monitoring system and method judging video behavior based on deep learning |
CN108846332A (en) * | 2018-05-30 | 2018-11-20 | 西南交通大学 | A kind of railway drivers Activity recognition method based on CLSTA |
CN108776796A (en) * | 2018-06-26 | 2018-11-09 | 内江师范学院 | A kind of action identification method based on global spatio-temporal attention model |
CN109101896A (en) * | 2018-07-19 | 2018-12-28 | 电子科技大学 | A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism |
Non-Patent Citations (2)
Title |
---|
WENSONG CHAN 等: ""Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection"", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
YANG HAODONG 等: ""Bi-direction hierarchical LSTM with spatial-temporal attention for action recognition"", 《JOURNAL OF INTELLIGENT & FUZZY SYSTEMS》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110826447A (en) * | 2019-10-29 | 2020-02-21 | 北京工商大学 | Restaurant kitchen staff behavior identification method based on attention mechanism |
CN112752102B (en) * | 2019-10-31 | 2022-12-30 | 北京大学 | Video code rate distribution method based on visual saliency |
CN112752102A (en) * | 2019-10-31 | 2021-05-04 | 北京大学 | Video code rate distribution method based on visual saliency |
CN111083477B (en) * | 2019-12-11 | 2020-11-10 | 北京航空航天大学 | HEVC (high efficiency video coding) optimization algorithm based on visual saliency |
CN111083477A (en) * | 2019-12-11 | 2020-04-28 | 北京航空航天大学 | HEVC (high efficiency video coding) optimization algorithm based on visual saliency |
CN111191739A (en) * | 2020-01-09 | 2020-05-22 | 电子科技大学 | Wall surface defect detection method based on attention mechanism |
CN111210907A (en) * | 2020-01-14 | 2020-05-29 | 西北工业大学 | Pain intensity estimation method based on space-time attention mechanism |
CN111401149A (en) * | 2020-02-27 | 2020-07-10 | 西北工业大学 | Lightweight video behavior identification method based on long-short-term time domain modeling algorithm |
CN111401149B (en) * | 2020-02-27 | 2022-05-13 | 西北工业大学 | Lightweight video behavior identification method based on long-short-term time domain modeling algorithm |
CN111402928A (en) * | 2020-03-04 | 2020-07-10 | 华南理工大学 | Attention-based speech emotion state evaluation method, device, medium and equipment |
CN111242101A (en) * | 2020-03-08 | 2020-06-05 | 电子科技大学 | Behavior identification method based on spatiotemporal context association |
CN111738218A (en) * | 2020-07-27 | 2020-10-02 | 成都睿沿科技有限公司 | Human body abnormal behavior recognition system and method |
CN112329867A (en) * | 2020-11-10 | 2021-02-05 | 宁波大学 | MRI image classification method based on task-driven hierarchical attention network |
CN113408349A (en) * | 2021-05-17 | 2021-09-17 | 浙江大华技术股份有限公司 | Training method of motion evaluation model, motion evaluation method and related equipment |
CN114299436A (en) * | 2021-12-30 | 2022-04-08 | 东北农业大学 | Group-breeding pig fighting behavior identification method integrating space-time double-attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110059587A (en) | Human bodys' response method based on space-time attention | |
CN111091045B (en) | Sign language identification method based on space-time attention mechanism | |
CN112052886B (en) | Intelligent human body action posture estimation method and device based on convolutional neural network | |
CN109472194B (en) | Motor imagery electroencephalogram signal feature identification method based on CBLSTM algorithm model | |
CN110111366A (en) | A kind of end-to-end light stream estimation method based on multistage loss amount | |
CN112307995B (en) | Semi-supervised pedestrian re-identification method based on feature decoupling learning | |
CN111814611B (en) | Multi-scale face age estimation method and system embedded with high-order information | |
CN111696101A (en) | Light-weight solanaceae disease identification method based on SE-Inception | |
CN111242844A (en) | Image processing method, image processing apparatus, server, and storage medium | |
CN111476133A (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN114581502A (en) | Monocular image-based three-dimensional human body model joint reconstruction method, electronic device and storage medium | |
CN112861718A (en) | Lightweight feature fusion crowd counting method and system | |
CN110188791B (en) | Visual emotion label distribution prediction method based on automatic estimation | |
Zhang et al. | FCHP: Exploring the discriminative feature and feature correlation of feature maps for hierarchical DNN pruning and compression | |
CN114170657A (en) | Facial emotion recognition method integrating attention mechanism and high-order feature representation | |
Zhao et al. | Human action recognition based on improved fusion attention CNN and RNN | |
CN116884067B (en) | Micro-expression recognition method based on improved implicit semantic data enhancement | |
CN111160327B (en) | Expression recognition method based on lightweight convolutional neural network | |
CN116543289B (en) | Image description method based on encoder-decoder and Bi-LSTM attention model | |
CN117611428A (en) | Fashion character image style conversion method | |
CN112528077A (en) | Video face retrieval method and system based on video embedding | |
CN116543021A (en) | Siamese network video single-target tracking method based on feature fusion | |
CN115965905A (en) | Crowd counting method and system based on multi-scale fusion convolutional network | |
Zhang | From artificial neural networks to deep learning: A research survey | |
He | Exploring style transfer algorithms in Animation: Enhancing visual |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190726 |