CN111401149B - Lightweight video behavior identification method based on long-short-term time domain modeling algorithm - Google Patents

Lightweight video behavior identification method based on long-short-term time domain modeling algorithm Download PDF

Info

Publication number
CN111401149B
CN111401149B CN202010124065.2A CN202010124065A CN111401149B CN 111401149 B CN111401149 B CN 111401149B CN 202010124065 A CN202010124065 A CN 202010124065A CN 111401149 B CN111401149 B CN 111401149B
Authority
CN
China
Prior art keywords
term
short
video
long
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010124065.2A
Other languages
Chinese (zh)
Other versions
CN111401149A (en
Inventor
王�琦
李学龙
白思开
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010124065.2A priority Critical patent/CN111401149B/en
Publication of CN111401149A publication Critical patent/CN111401149A/en
Application granted granted Critical
Publication of CN111401149B publication Critical patent/CN111401149B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a lightweight video behavior identification method based on a long-term and short-term time domain modeling algorithm. A short-term feature interchange module is constructed by using a partial channel interchange method, a long-term feature fusion module is constructed by using graph convolution, effective extraction of short-term and long-term video time features is achieved respectively, time features in different stages are extracted by inserting the two modules into different positions of a two-dimensional depth residual error network, and therefore the problems that the current video behavior recognition technology is inaccurate in result and high in calculation resource consumption are effectively solved.

Description

Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
Technical Field
The invention belongs to the technical field of computer vision and video classification, and particularly relates to a lightweight video behavior identification method based on a long-term and short-term time domain modeling algorithm, which can be applied to intelligent monitoring, crowd analysis, man-machine interaction and the like.
Background
With the advent of short video software such as tremble, fast-hand, etc. and some live broadcast platforms, a large amount of new video is generated and shared on the internet almost every moment. To cope with such information explosion, it becomes increasingly important to analyze and understand video information applied to various scenes. The video behavior recognition means recognizing and judging the behavior and actions of people in a video, and has wide application in real life, but the video behavior recognition is still a very challenging task in the field of video analysis due to the influences of factors such as large resource consumption, insufficient time domain information extraction and the like.
The video behavior recognition technology can classify the current behaviors in the video and predict the actions to be taken in the video, so the video behavior recognition technology is applied to many fields including intelligent monitoring systems, gesture recognition and the like. The behavior of people in the monitoring system is detected, and the behavior is analyzed and judged according to a certain rule, so that the abnormal behavior can be alarmed in time. And by recognizing the gesture and the gesture, the video behavior recognition technology can also be applied to crowd analysis and human-computer interaction.
Currently, most behavior recognition techniques can be divided into two categories: one is a method based on a dual-stream structure, and the other is a method based on a three-dimensional convolutional neural network. The method based on the double-flow structure respectively inputs the dense optical flows between frames in the video into two branches of the double-flow structure for processing, and finally, the results of the two branches are fused to obtain a final result. The disadvantages of this method are: 1) the optical flow characteristics of the video need to be additionally extracted, so that the time and memory consumption are high; 2) because the double-flow structure is still based on the two-dimensional convolutional neural network essentially, complex time domain information in the video cannot be effectively captured, and the identification accuracy is low. The method based on the three-dimensional convolution neural network simultaneously extracts the time characteristic and the space characteristic in the video by using the three-dimensional convolution, and the main defects of the method are as follows: 1) compared with a two-dimensional convolutional neural network, the number of parameters is increased exponentially; 2) the calculation cost required during model pre-training is high, the model is not easy to train, and the overfitting phenomenon is easy to occur; 3) on a single layer of the model, only short-term time domain information can be obtained, and effective extraction of long-term time domain information in the video cannot be carried out.
Therefore, the existing video behavior recognition technology generally has the defects of high computing resource consumption, insufficient time feature extraction and the like, and a video behavior recognition method which is high in precision, low in computing resource consumption and capable of effectively extracting time features needs to be provided.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a lightweight video behavior identification method based on a long-short-term time domain modeling algorithm. Based on the two-dimensional depth residual error network and the graph convolution, the short-term and long-term time characteristics of the video are effectively extracted. Compared with a double-flow algorithm and a three-dimensional convolution neural network algorithm, the method provided by the invention effectively solves the problems of inaccurate identification result and high calculation resource consumption of the current video behavior identification technology on the premise of not additionally training a graph model and extracting optical flow characteristics.
A lightweight video behavior identification method based on a long-short-term time domain modeling algorithm is characterized by comprising the following steps:
step 1: extracting 8 frames of video clips from each video of the video data set by adopting a uniform sampling method, carrying out multi-scale cutting on the extracted video clips to ensure that the video clips have the same size, forming a new video clip data set by all the cut video clips and the video tags to which the video clips belong together, and carrying out the following steps of: 1 into a training data set and a test data set;
step 2: constructing a long-term and short-term time domain behavior recognition network model, wherein the long-term and short-term time domain behavior recognition network model comprises a spatial feature extraction module, a short-term feature exchange module, a long-term feature fusion module and a behavior prediction module; the spatial feature extraction module is composed of 50 layers of ResNet networks and comprises 16 Bottleneeck modules, wherein 4 Bottleneeck modules comprise down-sampling layers, the first convolution layer and different Bottleneeck modules of the ResNet networks extract spatial features of different stages of input video clips, and the last layer of the ResNet networks outputs scores of each frame relative to all categories; inserting a short-term feature interchanging module in front of each Bottleneck module, interchanging the features on the front 1/8 channel of each frame with the previous frame, interchanging the features on the 1/8 channel adjacent to the front 1/8 channel with the next frame, keeping the features of the residual 6/8 channel unchanged, and overlapping the interchanged features with the original features before interchanging to obtain short-term time features of different stages; respectively adding a long-term feature fusion module before the last two Bottleneck modules containing down-sampling layers, wherein the long-term feature fusion module is arranged before the inserted short-term feature interchange module, taking the features extracted from the input feature graph as nodes of a full-connection graph, fusing information on the nodes by adopting a graph convolution method, and keeping the long-term time features obtained by fusion and the input feature graph in the same structure through mapping; the behavior prediction module averages the category scores of all the frames obtained by the feature extraction module according to categories to obtain the average score of each category of the video clip, and takes the category with the highest score as the final behavior recognition result of the video clip;
and 3, step 3: inputting the training data set obtained in the step 1 into the network model constructed in the step 2 for training, setting a loss function of the network as a mean square error loss function, optimizing the training network by adopting a random gradient descent method, wherein the batch size is 16, the learning rate of training is 0.01, the learning rate is reduced by 10 times per 10 training rounds, 30 training rounds are trained in total, and the trained network is the final behavior recognition network model;
and 4, step 4: and (3) inputting the videos in the test data set into the long-short-term time domain behavior recognition network model trained in the step (3) to obtain a behavior recognition result of each video in the test set.
The invention has the beneficial effects that: because the short-term and long-term time domain range module construction is carried out by utilizing partial feature interchange and graph convolution, and the two modules are inserted into a plurality of positions of a deep residual error network (ResNet50), the time features of different stages can be effectively extracted, and higher behavior identification accuracy is obtained; meanwhile, the graph model does not need to be trained additionally and the optical flow characteristics do not need to be extracted, and the calculated amount is small.
Drawings
FIG. 1 is a schematic diagram of a long-term and short-term time-domain behavior recognition network model of the present invention;
FIG. 2 is a schematic diagram of a short term feature interchange module;
FIG. 3 is a schematic diagram of a long term feature fusion module.
Detailed Description
The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.
The invention provides a lightweight video behavior identification method based on a long-term and short-term time domain modeling algorithm, which comprises the following implementation steps:
1. video pre-processing
The video in the data set is firstly extracted into a video segment of 8 frames from each video by a uniform sampling method, then the video segment is subjected to multi-scale cutting (such as center cutting and the like), the size of each frame is converted into 224 × 224, and each video is converted into a video segment with the size of 8 × 3 × 224 × 224. All the video clips form a new video clip data set, and the labels of the videos in the original data set are used as the labels of the corresponding video clips in the new video clip data set. And finally, the new video clip data set is divided into 4: the scale of 1 is divided into a training data set and a test data set.
2. Constructing long-short-term time domain behavior recognition network model
In order to extract various useful characteristics from the video clip, the invention respectively utilizes a spatial characteristic extraction module to realize the extraction of spatial characteristics of different stages of each frame in the video clip, utilizes a short-term characteristic interchange module to implement partial channel interchange on the extracted characteristics along a time dimension to obtain short-term time characteristics, utilizes a long-term characteristic fusion module to spread and fuse the extracted characteristics in a long-term time range to obtain long-term time characteristics, and finally utilizes a behavior prediction module to make final judgment on the behavior category of the video clip. Therefore, a long-short-term time domain behavior recognition network model comprising a spatial feature extraction module, a short-term feature exchange module, a long-term feature fusion module and a behavior prediction module is constructed.
(1) Spatial feature extraction module
The spatial feature extraction module is composed of a ResNet network with 50 layers and comprises 16 Bottleneeck modules, wherein 4 Bottleneeck modules comprise down-sampling layers, different Bottleneeck modules of the ResNet network extract spatial features of different stages of input video fragments, and the last layer of the ResNet network outputs scores of each frame relative to all categories.
(2) Short term feature interchange module
A short-term feature interchange module is inserted before each bottleeck module. As shown in fig. 2, the short-term feature interchange module exchanges the features of each frame with the features of two adjacent frames along the time dimension. Since the feature of each frame is made up of multiple channels, for less computation, the partial channels are interchanged, the feature on the front 1/8 channel is exchanged with the previous frame, the feature on the 1/8 channel adjacent to it is exchanged with the next frame, and the remaining 6/8 channels remain unchanged. In order to prevent the original spatial characteristics of each frame from being damaged after the characteristics are interchanged, the invention adopts the residual error idea to superpose the interchanged characteristics and the input original characteristics, thereby obtaining the short-term time characteristics and also keeping the original spatial characteristics. The whole process can be formulated as:
F2 s=Stm(F1,F2,F3)+F2 (1)
where Stm (,) represents the short term feature interchange operation, frame I2By two adjacent frames I1、I3The switching part channel obtains short-term time characteristics and then adds the original characteristics F2Thereby obtaining the feature F processed by the short-term feature interchange module2 s,F1Representing the previous frame I1Original characteristic of (1), F2Represents the current frame I2Original characteristic of (1), F3Representing the next frame I3The original characteristics of (1). The whole process neither introduces additional parameters nor consumes much computing resources.
(3) Long-term feature fusion module
A long-term feature fusion module is added before the last two Bottleneck modules containing the down-sampling layer, and the long-term feature fusion module is placed before the inserted short-term feature interchange module.
And the long-term feature fusion module takes the features extracted from the input feature graph as nodes of the full-connection graph. Firstly, inputting a feature diagram F epsilon RC×T×H×WStraightening to form a new characteristic diagram F' epsilon RC×LWhere L is T × H × W, C denotes the number of channels, T denotes the number of frames of the video segment, H denotes the height of the feature of each frame, W denotes the width of the feature of each frame, and then a plurality of features F 'are extracted from the feature map F' by a one-dimensional convolution operation1,f2...fnWherein f iskDenotes the kth feature extracted by one-dimensional convolution, k is 1, …, n denotes the number of extracted features.
Then, constructing a full-connected graph of a single layer, and extracting the feature f1,f2...fnAs nodes of a fully connected graph. And then, the information on the nodes is spread and fused in a long-term time range by adopting a graph convolution method. The graph convolution operation is as follows:
Y=AlVWl (2)
wherein V is a node of the full-connected graph, which is formed by extracting a plurality of features f1,f2...fnConstitution AlAnd WlRespectively representing an adjacency matrix and a weight matrix in the long-term feature fusion module, and Y is the long-term time feature obtained by propagation and fusion in the long-term time range. In graph convolution, the matrix A is first adjoinedlLearning the weight of edges between nodes, performing information propagation, and then passing through a weight matrix WlAnd updating the state of the node. While to prevent optimization difficulties and degradation problems, before the step of updating the state of the nodes (right-multiplying the weight matrix W)lBefore) and after the entire graph convolution operation, identity maps are added, respectively. The graph convolution operation is thus optimized as follows:
Y=(V+AlV)Wl+V (3)
and finally, converting the long-term time characteristic Y obtained by the graph convolution operation into a characteristic graph with the same structure as the input characteristic graph F of the module through deconvolution operation, so that the characteristic processed by the long-term characteristic fusion module is adapted to the characteristic structure of the spatial characteristic extraction module, and the process is the inverse process of converting the module input characteristic graph into the node of the full-connection graph.
(4) Behavior prediction module
The behavior prediction module averages scores of all frames of the video clip obtained by the spatial feature extraction module relative to all categories according to the categories by using an averaging method, and the category with the highest score is used as a final behavior recognition result of the video clip and is also used as a behavior recognition result of an original video which is not preprocessed.
3. Network model training
Setting network training parameters, wherein a loss function of the network is a mean square error loss function, a method for training the network is a random gradient descent method, the batch size is 16, the learning rate of training is 0.01, the learning rate is reduced by 10 times for each 10 training rounds, and 30 training rounds are trained in total. Then training the constructed long-short time domain behavior recognition network model by using the training data set obtained in the step 1, wherein the trained network is the final behavior recognition network model;
4. and inputting the videos in the test data set into the trained long-short-term time domain behavior recognition network model to obtain the behavior recognition result of each video in the test set. Meanwhile, if any video is input into the network, the corresponding behavior recognition result can be obtained.
To verify the effectiveness of the method of the invention
Figure GDA0003512190890000051
Simulation experiments are carried out under the deep learning frameworks of i7-6800K, NVIDIA GeForce GTX 1080GPU, Ubuntu16.04 operating system, OpenCV3.2.0, cuda9.2.148, cudann7.3.1 and PyToch1.0.0. The data used in The experiments are The Sometaling-Sometaling V1 data set forth by Goyal et al in The references "Raghav Goyal, Samira Ebrahimi Kahou, and Vincent Michalskie.," The sometaling "video database for learning and evaluating visual common sense, in IEEE International Conference Computer Vision,2017, pp.5842-5850". Then separately select the "Limin Wan" document of Wang et alG, Yuanjun Xiong, and Zhen Wang, Temporal segment networks, TSN methods mentioned in Towards good practices for preference action, in European Conference on Computer Vision,2016, pp.20-36, Zhou et al, Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torrala, Temporal Conference in Vision, in European Conference Computer Vision,2018, pp.803-818, "Multi-aluminum TRN algorithm, Zolfagi et al, Mommlighting algorithm 2018, Kamjun Sijn, Brookhaharas, Zones et al, Ocimum-simulation, Zones et al, John correlation, Jordan program, Zones et al, Jordan program coding algorithm, Zollin et al, Ochragman simulation algorithm, Zones et al, Jordan program coding algorithm, Zones et al, Ocimum Conference algorithm and Ocimum Conference simulation program coding algorithm, Zosteran et al, Ochragman program coding algorithm, Zosteran program and Ochran et al, Ocimum program coding algorithm, Zosteran et al, Ocimum et al, Jordan program coding algorithm No. 7, Zosteran et al, C. Jordan program, C. 3, C. supplement algorithm, Zosteran program coding algorithm, Zosteran program and Ochra, C.7, C.3, C.7, C.E.7, C.7, C.E.E.E.E.7, C.E.E.D.E.E.D.D.E.D.D.D.E.D.E.E.E.E.E.E.D.D.D.D.D.D.D.D.D.E.D.D.E.E.D.D.D.D.D.D.E.D.D.D.D.D.D.D.E.E.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D.D, comparative experiments were carried out with the method proposed by the present invention. The required input frame number, the operand FLOPs and the identification accuracy of different methods are respectively compared, and the result data is shown in Table 1. It can be seen that under the condition that the input frame number and the operation amount are similar, the method of the invention obtains better effect; under the condition that the required input frame number and the operation amount are far lower than those of the I3D algorithm with the optimal performance, the method obtains the accuracy rate close to that of the method. The method fully proves that the performance of the method is balanced in accuracy and effectiveness, needs less computing resources, is high in accuracy and is more practical.
TABLE 1
Method Input frame number Operand (ms) Accuracy (%)
TSN 8 16G 19.5
Multi-Scale TRN 8 16G 34.4
ECO 8 32G 39.6
I3D 32×2clips 153G×2 41.6
The invention 8 33G 40.6

Claims (1)

1. A lightweight video behavior identification method based on a long-short-term time domain modeling algorithm is characterized by comprising the following steps:
step 1: extracting 8 frames of video clips from each video of the video data set by adopting a uniform sampling method, carrying out multi-scale cutting on the extracted video clips to ensure that the video clips have the same size, forming a new video clip data set by all the cut video clips and the video tags to which the video clips belong together, and carrying out the following steps of: 1 into a training data set and a test data set;
step 2: constructing a long-term and short-term time domain behavior recognition network model, wherein the long-term and short-term time domain behavior recognition network model comprises a spatial feature extraction module, a short-term feature exchange module, a long-term feature fusion module and a behavior prediction module; the spatial feature extraction module is composed of 50 layers of ResNet networks and comprises 16 Bottleneeck modules, wherein 4 Bottleneeck modules comprise down-sampling layers, the first convolution layer and different Bottleneeck modules of the ResNet networks extract spatial features of different stages of input video clips, and the last layer of the ResNet networks outputs scores of each frame relative to all categories; inserting a short-term feature interchanging module in front of each Bottleneck module, interchanging the features on the front 1/8 channel of each frame with the previous frame, interchanging the features on the 1/8 channel adjacent to the front 1/8 channel with the next frame, keeping the features of the residual 6/8 channel unchanged, and overlapping the interchanged features with the original features before interchanging to obtain short-term time features of different stages; respectively adding a long-term feature fusion module before the last two Bottleneck modules containing down-sampling layers, wherein the long-term feature fusion module is arranged before the inserted short-term feature interchange module, taking the features extracted from the input feature graph as nodes of a full-connection graph, fusing information on the nodes by adopting a graph convolution method, and keeping the long-term time features obtained by fusion and the input feature graph in the same structure through mapping; the behavior prediction module averages the category scores of all the frames obtained by the feature extraction module according to categories to obtain the average score of each category of the video clip, and takes the category with the highest score as the final behavior recognition result of the video clip;
and step 3: inputting the training data set obtained in the step 1 into the network model constructed in the step 2 for training, setting a loss function of the network as a mean square error loss function, optimizing the training network by adopting a random gradient descent method, wherein the batch size is 16, the learning rate of training is 0.01, the learning rate is reduced by 10 times per 10 training rounds, 30 training rounds are trained in total, and the trained network is the final behavior recognition network model;
and 4, step 4: and (3) inputting the videos in the test data set into the long-short-term time domain behavior recognition network model trained in the step (3) to obtain a behavior recognition result of each video in the test set.
CN202010124065.2A 2020-02-27 2020-02-27 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm Active CN111401149B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010124065.2A CN111401149B (en) 2020-02-27 2020-02-27 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010124065.2A CN111401149B (en) 2020-02-27 2020-02-27 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm

Publications (2)

Publication Number Publication Date
CN111401149A CN111401149A (en) 2020-07-10
CN111401149B true CN111401149B (en) 2022-05-13

Family

ID=71432113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010124065.2A Active CN111401149B (en) 2020-02-27 2020-02-27 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm

Country Status (1)

Country Link
CN (1) CN111401149B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464831B (en) * 2020-12-01 2021-07-30 马上消费金融股份有限公司 Video classification method, training method of video classification model and related equipment
CN112712695B (en) * 2020-12-30 2021-11-26 桂林电子科技大学 Traffic flow prediction method, device and storage medium
CN115346143A (en) * 2021-04-27 2022-11-15 中兴通讯股份有限公司 Behavior detection method, electronic device, and computer-readable medium
CN113239766A (en) * 2021-04-30 2021-08-10 复旦大学 Behavior recognition method based on deep neural network and intelligent alarm device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN107506712A (en) * 2017-08-15 2017-12-22 成都考拉悠然科技有限公司 Method for distinguishing is known in a kind of human behavior based on 3D depth convolutional networks
CN108764009A (en) * 2018-03-21 2018-11-06 苏州大学 The Video Events recognition methods of memory network in short-term is grown based on depth residual error
CN109214285A (en) * 2018-08-01 2019-01-15 浙江深眸科技有限公司 Detection method is fallen down based on depth convolutional neural networks and shot and long term memory network
CN109359519A (en) * 2018-09-04 2019-02-19 杭州电子科技大学 A kind of video anomaly detection method based on deep learning
CN109753897A (en) * 2018-12-21 2019-05-14 西北工业大学 Based on memory unit reinforcing-time-series dynamics study Activity recognition method
CN109753906A (en) * 2018-12-25 2019-05-14 西北工业大学 Public place anomaly detection method based on domain migration
CN109886358A (en) * 2019-03-21 2019-06-14 上海理工大学 Human bodys' response method based on multi-space information fusion convolutional neural networks
CN109919031A (en) * 2019-01-31 2019-06-21 厦门大学 A kind of Human bodys' response method based on deep neural network
CN110059587A (en) * 2019-03-29 2019-07-26 西安交通大学 Human bodys' response method based on space-time attention
CN110188653A (en) * 2019-05-27 2019-08-30 东南大学 Activity recognition method based on local feature polymerization coding and shot and long term memory network
CN110378208A (en) * 2019-06-11 2019-10-25 杭州电子科技大学 A kind of Activity recognition method based on depth residual error network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10706336B2 (en) * 2017-03-17 2020-07-07 Nec Corporation Recognition in unlabeled videos with domain adversarial learning and knowledge distillation
US10614310B2 (en) * 2018-03-22 2020-04-07 Viisights Solutions Ltd. Behavior recognition

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN107506712A (en) * 2017-08-15 2017-12-22 成都考拉悠然科技有限公司 Method for distinguishing is known in a kind of human behavior based on 3D depth convolutional networks
CN108764009A (en) * 2018-03-21 2018-11-06 苏州大学 The Video Events recognition methods of memory network in short-term is grown based on depth residual error
CN109214285A (en) * 2018-08-01 2019-01-15 浙江深眸科技有限公司 Detection method is fallen down based on depth convolutional neural networks and shot and long term memory network
CN109359519A (en) * 2018-09-04 2019-02-19 杭州电子科技大学 A kind of video anomaly detection method based on deep learning
CN109753897A (en) * 2018-12-21 2019-05-14 西北工业大学 Based on memory unit reinforcing-time-series dynamics study Activity recognition method
CN109753906A (en) * 2018-12-25 2019-05-14 西北工业大学 Public place anomaly detection method based on domain migration
CN109919031A (en) * 2019-01-31 2019-06-21 厦门大学 A kind of Human bodys' response method based on deep neural network
CN109886358A (en) * 2019-03-21 2019-06-14 上海理工大学 Human bodys' response method based on multi-space information fusion convolutional neural networks
CN110059587A (en) * 2019-03-29 2019-07-26 西安交通大学 Human bodys' response method based on space-time attention
CN110188653A (en) * 2019-05-27 2019-08-30 东南大学 Activity recognition method based on local feature polymerization coding and shot and long term memory network
CN110378208A (en) * 2019-06-11 2019-10-25 杭州电子科技大学 A kind of Activity recognition method based on depth residual error network

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition;XI OUYANG 等;《Digital Object Identifier》;20191231;第7卷;40757-40770 *
DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition;Toby Perrett 等;《arXiv:1904.08634v1》;20190418;1-10 *
Graph Convolutional Networks for Temporal Action Localization;Runhao Zeng 等;《2019 IEEE/CVF International Conference on Computer Vision (ICCV)》;20191231;7093-7102 *
Lightweight Network Architecture for Real-Time Action Recognition;Alexander Kozlov 等;《arXiv:1905.08711v1》;20190521;1-8 *
基于3D卷积神经网络的人体行为识别方法;张颖 等;《软件导刊》;20171130;第16卷(第11期);9-11 *
基于视频分段的空时双通道卷积神经网络的行为识别;王萍 等;《计算机应用》;20190710;第39卷(第7期);2081-2086 *
结合有序光流图和双流卷积网络的行为识别;李庆辉 等;《光学学报》;20180630;第38卷(第6期);1-7 *

Also Published As

Publication number Publication date
CN111401149A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401149B (en) Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN110084151B (en) Video abnormal behavior discrimination method based on non-local network deep learning
CN112989977B (en) Audio-visual event positioning method and device based on cross-modal attention mechanism
CN109272500B (en) Fabric classification method based on adaptive convolutional neural network
CN110516536A (en) A kind of Weakly supervised video behavior detection method for activating figure complementary based on timing classification
CN112085072B (en) Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information
CN108256482A (en) A kind of face age estimation method that Distributed learning is carried out based on convolutional neural networks
CN110378233B (en) Double-branch anomaly detection method based on crowd behavior prior knowledge
KR102593835B1 (en) Face recognition technology based on heuristic Gaussian cloud transformation
CN113762138A (en) Method and device for identifying forged face picture, computer equipment and storage medium
CN111666852A (en) Micro-expression double-flow network identification method based on convolutional neural network
CN106780639A (en) Hash coding method based on the sparse insertion of significant characteristics and extreme learning machine
CN111275694B (en) Attention mechanism guided progressive human body division analysis system and method
CN113343760A (en) Human behavior recognition method based on multi-scale characteristic neural network
CN116721458A (en) Cross-modal time sequence contrast learning-based self-supervision action recognition method
CN113869285B (en) Crowd density estimation device, method and storage medium
CN113378962B (en) Garment attribute identification method and system based on graph attention network
CN109241315B (en) Rapid face retrieval method based on deep learning
Rijal et al. Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19.
CN110287970B (en) Weak supervision object positioning method based on CAM and covering
CN110705638A (en) Credit rating prediction classification method using deep network learning fuzzy information feature technology
CN110599460A (en) Underground pipe network detection and evaluation cloud system based on hybrid convolutional neural network
CN111160077A (en) Large-scale dynamic face clustering method
CN115240271A (en) Video behavior identification method and system based on space-time modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant