CN111401149A - Lightweight video behavior identification method based on long-short-term time domain modeling algorithm - Google Patents

Lightweight video behavior identification method based on long-short-term time domain modeling algorithm Download PDF

Info

Publication number
CN111401149A
CN111401149A CN202010124065.2A CN202010124065A CN111401149A CN 111401149 A CN111401149 A CN 111401149A CN 202010124065 A CN202010124065 A CN 202010124065A CN 111401149 A CN111401149 A CN 111401149A
Authority
CN
China
Prior art keywords
term
short
video
long
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010124065.2A
Other languages
Chinese (zh)
Other versions
CN111401149B (en
Inventor
王�琦
李学龙
白思开
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010124065.2A priority Critical patent/CN111401149B/en
Publication of CN111401149A publication Critical patent/CN111401149A/en
Application granted granted Critical
Publication of CN111401149B publication Critical patent/CN111401149B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a lightweight video behavior identification method based on a long-term and short-term time domain modeling algorithm. A short-term feature interchange module is constructed by using a partial channel interchange method, a long-term feature fusion module is constructed by using graph convolution, effective extraction of short-term and long-term video time features is achieved respectively, time features in different stages are extracted by inserting the two modules into different positions of a two-dimensional depth residual error network, and therefore the problems that the current video behavior recognition technology is inaccurate in result and high in calculation resource consumption are effectively solved.

Description

Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
Technical Field
The invention belongs to the technical field of computer vision and video classification, and particularly relates to a lightweight video behavior identification method based on a long-term and short-term time domain modeling algorithm, which can be applied to intelligent monitoring, crowd analysis, man-machine interaction and the like.
Background
With the advent of short video software such as tremble, fast-hand, etc. and some live broadcast platforms, a large amount of new video is generated and shared on the internet almost every moment. To cope with such information explosion, it becomes increasingly important to analyze and understand video information applied to various scenes. The video behavior recognition means recognizing and judging the behavior and actions of people in a video, and has wide application in real life, but the video behavior recognition is still a very challenging task in the field of video analysis due to the influences of factors such as large resource consumption, insufficient time domain information extraction and the like.
The video behavior recognition technology can classify the current behaviors in the video and predict the actions to be taken in the video, so the video behavior recognition technology is applied to many fields including intelligent monitoring systems, gesture recognition and the like. The behavior of people in the monitoring system is detected, and the behavior is analyzed and judged according to a certain rule, so that the abnormal behavior can be alarmed in time. And by recognizing the gesture and the gesture, the video behavior recognition technology can also be applied to crowd analysis and human-computer interaction.
Currently, most behavior recognition techniques can be divided into two categories: one is a method based on a dual-stream structure, and the other is a method based on a three-dimensional convolutional neural network. The method based on the double-flow structure respectively inputs the dense optical flows between frames in the video into two branches of the double-flow structure for processing, and finally, the results of the two branches are fused to obtain a final result. The disadvantages of this method are: 1) the optical flow characteristics of the video need to be additionally extracted, so that the time and memory consumption are high; 2) because the double-flow structure is still based on the two-dimensional convolutional neural network essentially, complex time domain information in the video cannot be effectively captured, and the identification accuracy is low. The method based on the three-dimensional convolution neural network simultaneously extracts the time characteristic and the space characteristic in the video by using the three-dimensional convolution, and the main defects of the method are as follows: 1) compared with a two-dimensional convolutional neural network, the number of parameters is increased exponentially; 2) the calculation cost required during model pre-training is high, the model is not easy to train, and the overfitting phenomenon is easy to occur; 3) on a single layer of the model, only short-term time domain information can be obtained, and effective extraction of long-term time domain information in the video cannot be carried out.
Therefore, the existing video behavior recognition technology generally has the defects of high computing resource consumption, insufficient time feature extraction and the like, and a video behavior recognition method which is high in precision, low in computing resource consumption and capable of effectively extracting time features needs to be provided.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a lightweight video behavior identification method based on a long-short-term time domain modeling algorithm. Based on the two-dimensional depth residual error network and the graph convolution, the short-term and long-term time characteristics of the video are effectively extracted. Compared with a double-flow algorithm and a three-dimensional convolution neural network algorithm, the method provided by the invention effectively solves the problems of inaccurate identification result and high calculation resource consumption of the current video behavior identification technology on the premise of not additionally training a graph model and extracting optical flow characteristics.
A lightweight video behavior identification method based on a long-short-term time domain modeling algorithm is characterized by comprising the following steps:
step 1: extracting 8 frames of video clips from each video of the video data set by adopting a uniform sampling method, carrying out multi-scale cutting on the extracted video clips to ensure that the video clips have the same size, forming a new video clip data set by all the cut video clips and the video tags to which the video clips belong together, and carrying out the following steps of: 1 into a training data set and a test data set;
step 2: constructing a long-term and short-term time domain behavior recognition network model, wherein the long-term and short-term time domain behavior recognition network model comprises a spatial feature extraction module, a short-term feature exchange module, a long-term feature fusion module and a behavior prediction module; the spatial feature extraction module is composed of 50 layers of ResNet networks and comprises 16 Bottleneeck modules, wherein 4 Bottleneeck modules comprise down-sampling layers, the first convolution layer and different Bottleneeck modules of the ResNet networks extract spatial features of different stages of input video clips, and the last layer of the ResNet networks outputs scores of each frame relative to all categories; inserting a short-term feature exchange module in front of each Bottleneck module, exchanging the features on the channels of each frame part with the features on the corresponding channels in the two adjacent frames before and after the channels, and overlapping the exchanged features with the original features to obtain short-term time features of different stages; respectively adding a long-term feature fusion module before the last two Bottleneck modules containing down-sampling layers, wherein the long-term feature fusion module is arranged before the inserted short-term feature interchange module, taking the features extracted from the input feature graph as nodes of a full-connected graph, fusing information on the nodes by adopting a graph convolution method, and keeping the long-term time features obtained by fusion and the input feature graph in the same structure through mapping; the behavior prediction module averages the category scores of all the frames obtained by the feature extraction module according to categories to obtain the average score of each category of the video clip, and takes the category with the highest score as the final behavior recognition result of the video clip;
and step 3: inputting the training data set obtained in the step 1 into the network model constructed in the step 2 for training, setting a loss function of the network as a mean square error loss function, optimizing the training network by adopting a random gradient descent method, wherein the batch size is 16, the learning rate of training is 0.01, the learning rate is reduced by 10 times per 10 training rounds, 30 training rounds are trained in total, and the trained network is the final behavior recognition network model;
and 4, step 4: and (3) inputting the videos in the test data set into the long-short-term time domain behavior recognition network model trained in the step (3) to obtain a behavior recognition result of each video in the test set.
The invention has the beneficial effects that: because the short-term and long-term time domain range module construction is carried out by utilizing partial feature interchange and graph convolution, and the two modules are inserted into a plurality of positions of a deep residual error network (ResNet50), the time features of different stages can be effectively extracted, and higher behavior identification accuracy is obtained; meanwhile, the graph model does not need to be trained additionally and the optical flow characteristics do not need to be extracted, and the calculated amount is small.
Drawings
FIG. 1 is a schematic diagram of a long-term and short-term time-domain behavior recognition network model of the present invention;
FIG. 2 is a schematic diagram of a short term feature interchange module;
FIG. 3 is a schematic diagram of a long term feature fusion module.
Detailed Description
The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.
The invention provides a lightweight video behavior identification method based on a long-term and short-term time domain modeling algorithm, which comprises the following implementation steps:
1. video pre-processing
The method comprises the steps of firstly extracting video clips of 8 frames from each video through a uniform sampling method, then carrying out multi-scale cutting (such as center cutting) on the video clips, converting the size of each frame into 224 × 224, and converting each video into a video clip with the size of 8 × 3 × 224 × 224, wherein all the video clips form a new video clip data set, the labels of the videos in the original data set serve as the labels of the corresponding video clips in the new video clip data set, and finally dividing the new video clip data set into a training data set and a testing data set according to the ratio of 4: 1.
2. Constructing long-short-term time domain behavior recognition network model
In order to extract various useful characteristics from the video clip, the invention respectively utilizes a spatial characteristic extraction module to realize the extraction of spatial characteristics of different stages of each frame in the video clip, utilizes a short-term characteristic interchange module to implement partial channel interchange on the extracted characteristics along a time dimension to obtain short-term time characteristics, utilizes a long-term characteristic fusion module to spread and fuse the extracted characteristics in a long-term time range to obtain long-term time characteristics, and finally utilizes a behavior prediction module to make final judgment on the behavior category of the video clip. Therefore, a long-short-term time domain behavior recognition network model comprising a spatial feature extraction module, a short-term feature exchange module, a long-term feature fusion module and a behavior prediction module is constructed.
(1) Spatial feature extraction module
The spatial feature extraction module is composed of a ResNet network with 50 layers and comprises 16 Bottleneeck modules, wherein 4 Bottleneeck modules comprise down-sampling layers, different Bottleneeck modules of the ResNet network extract spatial features of different stages of input video fragments, and the last layer of the ResNet network outputs scores of each frame relative to all categories.
(2) Short term feature interchange module
A short-term feature interchange module is inserted before each bottleeck module. As shown in fig. 2, the short-term feature interchange module exchanges the features of each frame with the features of two adjacent frames along the time dimension. Since the feature of each frame is composed of a plurality of channels, for less calculation, the interchange of partial channels is adopted, the feature on the front 1/8 channel is exchanged with the previous frame, the feature on the 1/8 channel adjacent to the former frame is exchanged with the next frame, and the rest 6/8 channel is kept unchanged. In order to prevent the original spatial characteristics of each frame from being damaged after the characteristics are interchanged, the invention adopts the residual error idea to superpose the interchanged characteristics and the input original characteristics, thereby obtaining the short-term time characteristics and also keeping the original spatial characteristics. The whole process can be formulated as:
F2 s=Stm(F1,F2,F3)+F2(1)
where Stm (,) represents the short term feature interchange operation, frame I2By two adjacent frames I1、I3The switching part channel obtains short-term time characteristics and then adds the original characteristics F2Thereby obtaining the feature F processed by the short-term feature interchange module2 s,F1Representing the previous frame I1Original characteristic of (1), F2Represents the current frame I2Original characteristic of (1), F3Representing the next frame I3The original characteristics of (1). The whole process neither introduces additional parameters nor consumes much computing resources.
(3) Long-term feature fusion module
A long-term feature fusion module is added before the last two Bottleneck modules containing the down-sampling layer, and the long-term feature fusion module is placed before the inserted short-term feature interchange module.
The long-term feature fusion module takes the extracted features in the input feature graph as nodes of the full-connection graph, firstly, the input feature graph F ∈ RC×T×H×WStraightening to form a new characteristic diagram F' ∈ RC×LL ═ T × H × W, C denotes the number of channels, T denotes the number of frames of the video clip, H denotes the height of the features per frame, W denotes the width of the features per frame, and then a plurality of features F 'are extracted from the feature map F' by a one-dimensional convolution operation1,f2...fnWherein f iskDenotes the kth feature extracted by one-dimensional convolution, k is 1, …, n denotes the number of extracted features.
Then, constructing a full-connected graph of a single layer, and extracting the feature f1,f2...fnAs nodes of a fully connected graph. And then, the information on the nodes is spread and fused in a long-term time range by adopting a graph convolution method. The graph convolution operation is as follows:
Y=AlVWl(2)
wherein V is a node of the full-connected graph, which is formed by extracting a plurality of features f1,f2...fnConstitution AlAnd WlRespectively representing an adjacency matrix and a weight matrix in the long-term feature fusion module, and Y is the long-term time feature obtained by propagation and fusion in the long-term time range. In graph convolution, the matrix A is first adjoinedlLearning the weights of edges between nodes, propagating information, and then passing through a weight matrix WlAnd updating the state of the node. While to prevent optimization difficulties and degradation problems, the state step of the update node is preceded by (right-multiplying the weight matrix W)lBefore) and after the entire graph convolution operation, identity maps are added, respectively. The graph convolution operation is thus optimized as follows:
Y=(V+AlV)Wl+V (3)
finally, the long-term time characteristic Y obtained by the graph convolution operation is converted into a characteristic graph with the same structure as the input characteristic graph F of the module through the deconvolution operation, so that the characteristic processed by the long-term characteristic fusion module is adapted to the characteristic structure of the spatial characteristic extraction module, and the process is the inverse process of converting the module input characteristic graph into the node of the full-connection graph.
(4) Behavior prediction module
The behavior prediction module averages scores of all frames of the video clip obtained by the spatial feature extraction module relative to all categories according to the categories by using an averaging method, and the category with the highest score is used as a final behavior recognition result of the video clip and is also used as a behavior recognition result of an original video which is not preprocessed.
3. Network model training
Setting network training parameters, wherein a loss function of the network is a mean square error loss function, a method for training the network is a random gradient descent method, the batch size is 16, the learning rate of training is 0.01, the learning rate is reduced by 10 times for each 10 training rounds, and 30 training rounds are trained in total. Then training the constructed long-short time domain behavior recognition network model by using the training data set obtained in the step 1, wherein the trained network is the final behavior recognition network model;
4. and inputting the videos in the test data set into the trained long-short-term time domain behavior recognition network model to obtain the behavior recognition result of each video in the test set. Meanwhile, if any video is input into the network, the corresponding behavior recognition result can be obtained.
To verify the effectiveness of the method of the invention
Figure BDA0002393871700000051
The data used in The experiments were The sounding-sound V1 data set proposed by Goyal et al in The documents "Raghav Goyal, Samira Ebrahis Kahou, and Visce Michalskit", "The sounding-sound" for learning and evaluating visual community sensor, in EEE International Conference video, 2017, pp.5842-5850 ", and The sounding-sound V1 data set proposed by Goyal et al in The documents" sounding-sound 52, Waring GTX 1080GPU, Ubuntu16.04 operating system, OpenCV3.2.0, cuda9.2.148, cudann7.3.1 and Pythoch 1.0.0 deep learning frameworkThe invention relates to a method for calculating the efficiency of a resource balance system, which comprises the steps of providing a Multi-Scale TRN algorithm, Zolfaghari and other references in the specification of Mohammadriza Zolfaghari, Kamaljeet Singh, and Thomas Brox, providing an Eco: Efficient connected network for online video understanding, providing an Eco: Efficient connected network Vision, providing an Eco connected network for online video understanding, providing an European connection Computer Vision,2018, pp.695-712, providing an Eco algorithm and Carreira and other references in the specification of Joao Carreira and Andrew Zissman, and Quo variance, interaction A new models and repair models, providing a balance between the efficiency of the resource balance system and the resource balance system, the efficiency of the resource balance system, the resource balance, the resource balance, the efficiency of the efficiency, the efficiency of the balance, the balance of the balance, the efficiency of the balance of the efficiency of the resource balance, the efficiency of the balance of the resource balance, the balance of the efficiency of the balance, the efficiency of the efficiency.
TABLE 1
Method of producing a composite material Input frame number Operand (ms) Accuracy (%)
TSN 8 16G 19.5
Multi-Scale TRN 8 16G 34.4
ECO 8 32G 39.6
I3D 32×2clips 153G×2 41.6
The invention 8 33G 40.6

Claims (1)

1. A lightweight video behavior identification method based on a long-short-term time domain modeling algorithm is characterized by comprising the following steps:
step 1: extracting 8 frames of video clips from each video of the video data set by adopting a uniform sampling method, carrying out multi-scale cutting on the extracted video clips to ensure that the video clips have the same size, forming a new video clip data set by all the cut video clips and the video tags to which the video clips belong together, and carrying out the following steps of: 1 into a training data set and a test data set;
step 2: constructing a long-term and short-term time domain behavior recognition network model, wherein the long-term and short-term time domain behavior recognition network model comprises a spatial feature extraction module, a short-term feature exchange module, a long-term feature fusion module and a behavior prediction module; the spatial feature extraction module is composed of 50 layers of ResNet networks and comprises 16 Bottleneeck modules, wherein 4 Bottleneeck modules comprise down-sampling layers, the first convolution layer and different Bottleneeck modules of the ResNet networks extract spatial features of different stages of input video clips, and the last layer of the ResNet networks outputs scores of each frame relative to all categories; inserting a short-term feature exchange module in front of each Bottleneck module, exchanging the features on the channels of each frame part with the features on the corresponding channels in the two adjacent frames before and after the channels, and overlapping the exchanged features with the original features to obtain short-term time features of different stages; respectively adding a long-term feature fusion module before the last two Bottleneck modules containing down-sampling layers, wherein the long-term feature fusion module is arranged before the inserted short-term feature interchange module, taking the features extracted from the input feature graph as nodes of a full-connected graph, fusing information on the nodes by adopting a graph convolution method, and keeping the long-term time features obtained by fusion and the input feature graph in the same structure through mapping; the behavior prediction module averages the category scores of all the frames obtained by the feature extraction module according to categories to obtain the average score of each category of the video clip, and takes the category with the highest score as the final behavior recognition result of the video clip;
and step 3: inputting the training data set obtained in the step 1 into the network model constructed in the step 2 for training, setting a loss function of the network as a mean square error loss function, optimizing the training network by adopting a random gradient descent method, wherein the batch size is 16, the learning rate of training is 0.01, the learning rate is reduced by 10 times per 10 training rounds, 30 training rounds are trained in total, and the trained network is the final behavior recognition network model;
and 4, step 4: and (3) inputting the videos in the test data set into the long-short-term time domain behavior recognition network model trained in the step (3) to obtain a behavior recognition result of each video in the test set.
CN202010124065.2A 2020-02-27 2020-02-27 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm Active CN111401149B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010124065.2A CN111401149B (en) 2020-02-27 2020-02-27 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010124065.2A CN111401149B (en) 2020-02-27 2020-02-27 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm

Publications (2)

Publication Number Publication Date
CN111401149A true CN111401149A (en) 2020-07-10
CN111401149B CN111401149B (en) 2022-05-13

Family

ID=71432113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010124065.2A Active CN111401149B (en) 2020-02-27 2020-02-27 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm

Country Status (1)

Country Link
CN (1) CN111401149B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464831A (en) * 2020-12-01 2021-03-09 马上消费金融股份有限公司 Video classification method, training method of video classification model and related equipment
CN112712695A (en) * 2020-12-30 2021-04-27 桂林电子科技大学 Traffic flow prediction method, device and storage medium
CN113239766A (en) * 2021-04-30 2021-08-10 复旦大学 Behavior recognition method based on deep neural network and intelligent alarm device
WO2022228325A1 (en) * 2021-04-27 2022-11-03 中兴通讯股份有限公司 Behavior detection method, electronic device, and computer readable storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN107506712A (en) * 2017-08-15 2017-12-22 成都考拉悠然科技有限公司 Method for distinguishing is known in a kind of human behavior based on 3D depth convolutional networks
US20180268222A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Action recognition system for action recognition in unlabeled videos with domain adversarial learning and knowledge distillation
CN108764009A (en) * 2018-03-21 2018-11-06 苏州大学 The Video Events recognition methods of memory network in short-term is grown based on depth residual error
CN109214285A (en) * 2018-08-01 2019-01-15 浙江深眸科技有限公司 Detection method is fallen down based on depth convolutional neural networks and shot and long term memory network
CN109359519A (en) * 2018-09-04 2019-02-19 杭州电子科技大学 A kind of video anomaly detection method based on deep learning
CN109753897A (en) * 2018-12-21 2019-05-14 西北工业大学 Based on memory unit reinforcing-time-series dynamics study Activity recognition method
CN109753906A (en) * 2018-12-25 2019-05-14 西北工业大学 Public place anomaly detection method based on domain migration
CN109886358A (en) * 2019-03-21 2019-06-14 上海理工大学 Human bodys' response method based on multi-space information fusion convolutional neural networks
CN109919031A (en) * 2019-01-31 2019-06-21 厦门大学 A kind of Human bodys' response method based on deep neural network
CN110059587A (en) * 2019-03-29 2019-07-26 西安交通大学 Human bodys' response method based on space-time attention
CN110188653A (en) * 2019-05-27 2019-08-30 东南大学 Activity recognition method based on local feature polymerization coding and shot and long term memory network
US20190294881A1 (en) * 2018-03-22 2019-09-26 Viisights Solutions Ltd. Behavior recognition
CN110378208A (en) * 2019-06-11 2019-10-25 杭州电子科技大学 A kind of Activity recognition method based on depth residual error network

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
US20180268222A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Action recognition system for action recognition in unlabeled videos with domain adversarial learning and knowledge distillation
CN107506712A (en) * 2017-08-15 2017-12-22 成都考拉悠然科技有限公司 Method for distinguishing is known in a kind of human behavior based on 3D depth convolutional networks
CN108764009A (en) * 2018-03-21 2018-11-06 苏州大学 The Video Events recognition methods of memory network in short-term is grown based on depth residual error
US20190294881A1 (en) * 2018-03-22 2019-09-26 Viisights Solutions Ltd. Behavior recognition
CN109214285A (en) * 2018-08-01 2019-01-15 浙江深眸科技有限公司 Detection method is fallen down based on depth convolutional neural networks and shot and long term memory network
CN109359519A (en) * 2018-09-04 2019-02-19 杭州电子科技大学 A kind of video anomaly detection method based on deep learning
CN109753897A (en) * 2018-12-21 2019-05-14 西北工业大学 Based on memory unit reinforcing-time-series dynamics study Activity recognition method
CN109753906A (en) * 2018-12-25 2019-05-14 西北工业大学 Public place anomaly detection method based on domain migration
CN109919031A (en) * 2019-01-31 2019-06-21 厦门大学 A kind of Human bodys' response method based on deep neural network
CN109886358A (en) * 2019-03-21 2019-06-14 上海理工大学 Human bodys' response method based on multi-space information fusion convolutional neural networks
CN110059587A (en) * 2019-03-29 2019-07-26 西安交通大学 Human bodys' response method based on space-time attention
CN110188653A (en) * 2019-05-27 2019-08-30 东南大学 Activity recognition method based on local feature polymerization coding and shot and long term memory network
CN110378208A (en) * 2019-06-11 2019-10-25 杭州电子科技大学 A kind of Activity recognition method based on depth residual error network

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
ALEXANDER KOZLOV 等: "Lightweight Network Architecture for Real-Time Action Recognition", 《ARXIV:1905.08711V1》 *
RUNHAO ZENG 等: "Graph Convolutional Networks for Temporal Action Localization", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
TOBY PERRETT 等: "DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition", 《ARXIV:1904.08634V1》 *
XI OUYANG 等: "A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition", 《DIGITAL OBJECT IDENTIFIER》 *
张颖 等: "基于3D卷积神经网络的人体行为识别方法", 《软件导刊》 *
李庆辉 等: "结合有序光流图和双流卷积网络的行为识别", 《光学学报》 *
王萍 等: "基于视频分段的空时双通道卷积神经网络的行为识别", 《计算机应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464831A (en) * 2020-12-01 2021-03-09 马上消费金融股份有限公司 Video classification method, training method of video classification model and related equipment
CN112464831B (en) * 2020-12-01 2021-07-30 马上消费金融股份有限公司 Video classification method, training method of video classification model and related equipment
CN112712695A (en) * 2020-12-30 2021-04-27 桂林电子科技大学 Traffic flow prediction method, device and storage medium
CN112712695B (en) * 2020-12-30 2021-11-26 桂林电子科技大学 Traffic flow prediction method, device and storage medium
WO2022228325A1 (en) * 2021-04-27 2022-11-03 中兴通讯股份有限公司 Behavior detection method, electronic device, and computer readable storage medium
CN113239766A (en) * 2021-04-30 2021-08-10 复旦大学 Behavior recognition method based on deep neural network and intelligent alarm device

Also Published As

Publication number Publication date
CN111401149B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN111401149B (en) Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN110334705B (en) Language identification method of scene text image combining global and local information
CN109272500B (en) Fabric classification method based on adaptive convolutional neural network
CN107506722A (en) One kind is based on depth sparse convolution neutral net face emotion identification method
CN106203395A (en) Face character recognition methods based on the study of the multitask degree of depth
CN109214263A (en) A kind of face identification method based on feature multiplexing
CN112085072B (en) Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information
KR102593835B1 (en) Face recognition technology based on heuristic Gaussian cloud transformation
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111666852A (en) Micro-expression double-flow network identification method based on convolutional neural network
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN115659966A (en) Rumor detection method and system based on dynamic heteromorphic graph and multi-level attention
CN113869285B (en) Crowd density estimation device, method and storage medium
CN104598898A (en) Aerially photographed image quick recognizing system and aerially photographed image quick recognizing method based on multi-task topology learning
CN111914600A (en) Group emotion recognition method based on space attention model
CN113688856A (en) Pedestrian re-identification method based on multi-view feature fusion
CN105354591A (en) High-order category-related prior knowledge based three-dimensional outdoor scene semantic segmentation system
CN111159411B (en) Knowledge graph fused text position analysis method, system and storage medium
CN117576038A (en) Fabric flaw detection method and system based on YOLOv8 network
CN113343760A (en) Human behavior recognition method based on multi-scale characteristic neural network
CN112560668A (en) Human behavior identification method based on scene prior knowledge
Chen et al. Intelligent teaching evaluation system integrating facial expression and behavior recognition in teaching video
CN111783891B (en) Customized object detection method
CN114663953A (en) Facial expression recognition method based on facial key points and deep neural network
CN114663910A (en) Multi-mode learning state analysis system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant