CN109753897B - Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning - Google Patents

Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning Download PDF

Info

Publication number
CN109753897B
CN109753897B CN201811569882.8A CN201811569882A CN109753897B CN 109753897 B CN109753897 B CN 109753897B CN 201811569882 A CN201811569882 A CN 201811569882A CN 109753897 B CN109753897 B CN 109753897B
Authority
CN
China
Prior art keywords
video
memory unit
neural network
time sequence
recurrent neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811569882.8A
Other languages
Chinese (zh)
Other versions
CN109753897A (en
Inventor
袁媛
王�琦
王栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201811569882.8A priority Critical patent/CN109753897B/en
Publication of CN109753897A publication Critical patent/CN109753897A/en
Application granted granted Critical
Publication of CN109753897B publication Critical patent/CN109753897B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a behavior recognition method based on memory cell reinforcement-time sequence dynamic learning, which is used for solving the technical problem of poor practicability of the existing behavior recognition method. The technical scheme includes that a recurrent neural network with a fusion memory unit is adopted to model time sequence structure information of a long-term video sequence, each video frame of the video sequence is classified into a related frame and a noise frame through a discretization memory unit read-write controller module, and information of the related frame is written into the memory unit while noise frame information is ignored. The method can filter a large amount of noise information in the un-clipped video, realizes the connection of a large-span time sequence structure by combining the recurrent neural network of the memory unit, models the long-time sequence structure mode of the complex character behaviors through data-driven self-training learning, solves the problems of complex motion mode and multiple background changes of the long-time un-clipped video in the background technology, improves the robustness of the character behavior identification method, and achieves the identification accuracy rate of 94.8% on average.

Description

Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning
Technical Field
The present invention relates to a behavior recognition method, and more particularly, to a behavior recognition method based on memory cell reinforcement-time sequence dynamic learning.
Background
Documents "L.Wang, Y.Xiong, Z.Wang, Y.Qiao, D.Lin, X.Tang, and L.V.Gool.temporal Segment Networks: aware Good Practices for Deep Action Recognition, In Proceedings of European Conference reference Computer Vision, pp.20-36,2016" disclose a method for person behavior Recognition based on a dual-stream convolutional neural network and a time-series Segment network. The method utilizes two independent convolutional neural networks to solve a behavior recognition task, wherein the spatial flow network extracts apparent features of a target from a video frame, the time sequence flow network extracts motion features of the target from corresponding optical flow field data, and a behavior recognition result is obtained by fusing the outputs of the two networks. Meanwhile, the method provides a time sequence segment network to model long-term time sequence structure information of the video sequence, and the network realizes efficient and effective learning of the whole neural network through sparse time sequence sampling strategy and sequence scale supervised learning, and obtains better results on a large-scale public data set. The method disclosed by the document is rough in time sequence modeling in the video, so that the network often ignores the time sequence relevance of the features in the learning process; when the video sequence is long and is not edited, irrelevant noise information is blended into the final recognition result by the method, the accuracy of character behavior recognition is reduced, and meanwhile, the training and learning of the whole neural network are difficult due to the addition of the noise information.
Disclosure of Invention
In order to overcome the defect of poor practicability of the conventional behavior identification method, the invention provides a behavior identification method based on memory unit reinforcement-time sequence dynamic learning. The method adopts a recurrent neural network of a fusion memory unit to model the time sequence structure information of a long-term video sequence, classifies each video frame of the video sequence into a related frame and a noise frame through a discretization memory unit read-write controller module, writes the information of the related frame into the memory unit and ignores the noise frame information, can filter a large amount of noise information in the un-clipped video, and improves the accuracy of subsequent behavior identification. In addition, the recurrent neural network fused with the memory unit can realize the connection of a large-span time sequence structure, and model a long-term time sequence structure mode of a complex character behavior through data-driven autonomous training and learning, so that the problems that the motion mode of a long-term video and an uncut video is complex and the background changes are more solved, the robustness of the character behavior identification method is improved, and the identification accuracy of average 94.8% and 71.8% is achieved.
The technical scheme adopted by the invention for solving the technical problems is as follows: a behavior identification method based on memory cell reinforcement-time sequence dynamic learning is characterized by comprising the following steps:
step one, calculating a video frame IaWherein the optical flow information of each pixel is represented by a two-dimensional vector (Δ x, Δ y) and stored as an optical flow map Im. Extracting respective high-dimensional semantic features by using two independent thinking convolutional neural networks:
xa=CNNa(Ia;wa) (1)
xm=CNNm(Im;wm) (2)
wherein, CNNa、CNNmRespectively representing an apparent convolutional neural network and a motion convolutional neural network for extracting a video frame IaAnd an optical flow diagram ImHigh dimensional characteristics of (2). x is the number ofa、xm2048 dimensional vectors respectively represent the appearance and motion characteristics extracted by the convolutional neural network. w is aa、wmRepresenting the internally trainable parameters of two convolutional neural networks. And x is used for representing the high-dimensional features extracted by the convolutional neural network.
Step two, initializing the memory cell M to be empty, which is denoted as M0. Assuming the tth video frame, the memory unit MtIs not empty, and contains Nt>0 elements, respectively denoted as m1,m2,...mNt. Then, the memory module read operation at the corresponding time is as follows:
Figure GDA0003385246580000021
wherein mh is read outtRepresenting historical information at time t before the video.
And step three, extracting the short-time context characteristics of the video content by utilizing the segmented recurrent neural network. Taking the high-dimensional semantic feature x obtained by calculation in the step one as input, and recording the feature corresponding to the t-th video frame as xt. Initializing hidden states h of long-and-short-term recurrent neural networks (LSTMs)0、c0At zero, the short-time context feature at time t is calculated as follows:
Figure GDA0003385246580000022
wherein LSTM () represents a long and short recurrent neural network, ht-1,ct-1Representing the hidden state of the recurrent neural network at the previous moment. While
Figure GDA0003385246580000023
As short-time contextual features of the video content for subsequent calculations.
Step four, for each video frame, calculating the high-dimensional semantic feature x obtained in the step one, the step two and the step threetMemorize the cell history information mhtAnd short-time contextual features
Figure GDA0003385246580000024
Inputting the data into the memory unit controller, calculating to obtain a binary memory unit write command stE {0,1}, specifically as follows:
Figure GDA0003385246580000025
at=σ(qt) (6)
st=τ(at) (7)
Figure GDA0003385246580000031
wherein v isTFor learnable row vector parameters, Wf、Wc、WmAs a learnable weight parameter, bsIs a bias parameter. sigmoid function σ () linearly weighted result qtNormalized to between 0 and 1, i.e. at∈(0,1)。atInputting the binarization function tau () limited by the threshold value to obtain the writing instruction s of the binarization memory unitt
Step five, writing the command s based on the binary memory unittAnd updating the memory unit and the segmented recurrent neural network. For each video frame, a memory unit MtThe update strategy of (2) is as follows:
Figure GDA0003385246580000032
wherein, WwIs a learnable weight matrix which multiplies a high-dimensional semantic feature xtConversion to memory cell elements
Figure GDA0003385246580000033
Show that
Figure GDA0003385246580000034
Write memory cell Mt-1Forming a new memory cell Mt. In addition, the hidden state h of the segmented recurrent neural networkt,ctThe update is as follows:
Figure GDA0003385246580000035
wherein the content of the first and second substances,
Figure GDA0003385246580000036
the result obtained by calculation for equation (4).
And step six, utilizing the memory unit to classify the behaviors. Assuming that the total video length is T, the memory unit is M at the end of the whole video processingTIn which there is NTElement, then the feature representation f of the entire video is:
Figure GDA0003385246580000037
and f is a D-dimensional vector and represents information of behavior categories in the video. The feature is input into the full-connection classification layer to obtain a behavior classification score y, which is specifically as follows:
y=softmax(W·f) (12)
wherein W ∈ RC×DAnd C represents the total number of identifiable behavior categories. The calculated y represents the classification score of the system for each category, and higher score represents more probable behavior of the category. Suppose ya、ymRespectively representing the scores obtained by the appearance and motor neural networks, and finally obtaining the score yfThe following were used:
yf=ya+ym (13)
wherein, yfAnd representing the final character behavior recognition result.
The invention has the beneficial effects that: the method adopts a recurrent neural network of a fusion memory unit to model the time sequence structure information of a long-term video sequence, classifies each video frame of the video sequence into a related frame and a noise frame through a discretization memory unit read-write controller module, writes the information of the related frame into the memory unit and ignores the noise frame information, can filter a large amount of noise information in the un-clipped video, and improves the accuracy of subsequent behavior identification. In addition, the recurrent neural network fused with the memory unit can realize the connection of a large-span time sequence structure, and model a long-term time sequence structure mode of a complex character behavior through data-driven autonomous training and learning, so that the problems that the motion mode of a long-term video and an uncut video is complex and the background changes are more solved, the robustness of the character behavior identification method is improved, and the identification accuracy of average 94.8% and 71.8% is achieved.
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Drawings
FIG. 1 is a flowchart of a behavior recognition method based on reinforcement-time sequence dynamic learning of memory cells according to the present invention.
Detailed Description
Refer to fig. 1. The behavior identification method based on the memory cell reinforcement-time sequence dynamic learning specifically comprises the following steps:
step one, extracting high-dimensional appearance and motion characteristics containing semantic information. First, a video frame I is calculatedaWherein the optical flow information of each pixel is represented by a two-dimensional vector (Δ x, Δ y) and stored as an optical flow map Im. Then, extracting respective high-dimensional semantic features by using two independent thinking convolutional neural networks:
xa=CNNa(Ia;wa) (1)
xm=CNNm(Im;wm) (2)
wherein CNNa、CNNmRespectively representing an apparent convolutional neural network and a motion convolutional neural network for extracting a video frame IaAnd an optical flow diagram ImHigh dimensional characteristics of (2). x is the number ofa、xm2048 dimensional vectors respectively represent the appearance and motion characteristics extracted by the convolutional neural network. w is aa、wmRepresenting the internally trainable parameters of two convolutional neural networks. Because the subsequent operations of the apparent neural network and the motor neural network are completely consistent, in order to make the labels simple and clear, the high-dimensional features extracted by the convolutional neural network are represented by x.
Step two, initializing the memory cell M to be empty, which is denoted as M0. Assuming the tth video frame, the memory unit MtIs not empty, and contains Nt>0 elements, respectively expressed as
Figure GDA0003385246580000041
Then, the memory module read operation at the corresponding time is as follows:
Figure GDA0003385246580000042
in which mh is readtRepresents the historical information of the video at the previous t moment, and the historical information influences the analysis and understanding of the video content at the moment.
Step three, utilizing segmented recurrent nerveAnd the network extracts the short-time contextual features of the video content. Taking the high-dimensional semantic feature x obtained by calculation in the step one as input, and recording the feature corresponding to the t-th video frame as xt. First, initialize the hidden state h of long and short time recurrent neural network (LSTM)0、c0At zero, the short-time context feature at time t is calculated as follows:
Figure GDA0003385246580000051
where LSTM () represents a long and short recurrent neural network, ht-1,ct-1Representing the hidden state of the recurrent neural network at the previous moment. While
Figure GDA0003385246580000052
As short-time contextual features of the video content for subsequent calculations.
Step four, writing the discretized memory cell into the controller. For each video frame, the high-dimensional semantic feature x obtained by calculation in the steps 1,2 and 3tMemorize the cell history information mhtAnd short-time contextual features
Figure GDA0003385246580000053
Inputting the data into the memory unit controller, calculating to obtain a binary memory unit write command stE {0,1}, specifically as follows:
Figure GDA0003385246580000054
at=σ(qt) (6)
st=τ(at) (7)
Figure GDA0003385246580000055
wherein v isTFor learnable row vector parameters, Wf、Wc、WmAs a learnable weight parameter, bsIs a bias parameter. From the above, it can be seen that the sigmoid function σ () linearly weights the result qtNormalized to between 0 and 1, i.e. atE (0, 1). Secondly, atInputting the binarization function tau () limited by the threshold value to obtain the writing instruction s of the binarization memory unitt
Step five, writing the command s based on the binary memory unittAnd updating the memory unit and the segmented recurrent neural network. For each video frame, a memory unit MtThe update strategy of (2) is as follows:
Figure GDA0003385246580000056
wherein WwIs a learnable weight matrix which multiplies a high-dimensional semantic feature xtConversion to memory cell elements
Figure GDA0003385246580000057
Show that
Figure GDA0003385246580000058
Write memory cell Mt-1Forming a new memory cell Mt. In addition, the hidden state h of the segmented recurrent neural networkt,ctThe update is as follows:
Figure GDA0003385246580000061
wherein
Figure GDA0003385246580000062
The result obtained by calculation for equation (4).
And step six, utilizing the memory unit to classify the behaviors. Assuming that the total video length is T, the memory unit is M at the end of the whole video processingTIn which there is NTElement, then the feature representation f of the entire video is:
Figure GDA0003385246580000063
where f is a D-dimensional vector representing information of the behavior class in the video. Then, the feature is input into the full-connection classification layer to obtain a behavior classification score y, which is as follows:
y=softmax(W·f) (12)
wherein W ∈ RC×DAnd C represents the total number of identifiable behavior categories. The calculated y represents the classification score of the system for each category, and higher score represents more probable behavior of the category. Suppose ya、ymRespectively representing the scores obtained by the appearance and motor neural networks, and finally obtaining the score yfThe following:
yf=ya+ym (13)
wherein y isfAnd representing the final character behavior recognition result.
The effects of the present invention are further illustrated by the following simulation experiments.
1. And (5) simulating conditions.
The invention is implemented by a central processing unit
Figure GDA0003385246580000064
Xeon E5-2697A 2.6GHz CPU, video card NVIDIA K80 and memory 16G, Centos 7 operating system, and PyTorch software is used for simulation.
The data used in the simulation was data from two public test data sets, UCF101/HMDB51, where camera movement varied significantly and the background was complex. The experimental data comprises 13320/6766 video segments, and can be classified into 101/51 classes according to behavior categories. Most of the video data in the HMDB51 dataset is not clipped, and contains much noise.
2. And simulating the content.
In order to prove the effectiveness of the invention, a simulation experiment carries out a comparison experiment on the memory cell strengthening and time sequence dynamic learning method provided by the invention. Specifically, as a comparison algorithm of the present invention, a method of Lattice-length recurrent neural network (Lattice-LSTM) was selected In a simulation experiment by using the dual stream network architecture (TSN) with the highest accuracy and the method of l.sun et al In the documents "l.sun, k.jia, k.chen, d.yeung, b.shi and s.savress.lattic Long Short-Term Memory for Human Action Recognition, In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp.2166-2175,2011". The three algorithms set the same parameters and calculate their mean AUC values over the UCF101/HMDB51 data set. The comparative results are shown in Table 1.
TABLE 1
Method TSN Lattice-LSTM OUR
AUC(UCF101) 93.6% 94.0% 94.8%
AUC(HMDB51) 66.2% 68.5% 71.8%
As can be seen from table 1, the recognition accuracy of the present invention is significantly higher than that of the existing behavior recognition method. Specifically, the accuracy of the algorithm TSN is lower than that of the algorithms Lattice-LSTM and OUR, because the TSN algorithm does not consider the time sequence change mode of the video content, and both the Lattice-LSTM and the OUR adopt the recurrent neural network to model the time sequence change mode of the video, thereby proving the effectiveness of the time sequence dynamic learning method based on the recurrent neural network provided by the invention. In addition, on the HMDB51 data set, the algorithm OUR is obviously superior to that of Lattice-LSTM, because the memory unit provided by the invention can effectively strengthen the processing capacity of the recurrent neural network on long-term and uncut video. Therefore, for the effectiveness of the memory unit on the strengthening of the recurrent neural network, the simulation experiment performed a comparison experiment on various recurrent neural networks LSTM, ALSTM, and VideoLSTM on the UCF101 data set with the algorithm of the present invention, and the results are shown in table 2.
TABLE 2
Method LSTM ALSTM VideoLSTM Ours
AUC 88.3% 77.0% 89.2% 91.03%
As can be seen from the table 2, the result obtained by the fusion of the method is higher in accuracy than the results of various recurrent neural networks, and the reason is that the memory unit strengthening method can effectively extract effective information in the video, so that a time sequence change mode in the video can be modeled. In contrast, simple recurrent neural network methods are susceptible to noise, thus reducing accuracy in return. Therefore, the effectiveness of the present invention can be verified through the above simulation experiment.

Claims (1)

1. A behavior recognition method based on memory cell reinforcement-time sequence dynamic learning is characterized by comprising the following steps:
step one, calculating a video frame IaWherein the optical flow information of each pixel is represented by a two-dimensional vector (Δ x, Δ y) and stored as an optical flow map Im(ii) a Extracting respective high-dimensional semantic features by using two independent thinking convolutional neural networks:
xa=CNNa(Ia;wa) (1)
xm=CNNm(Im;wm) (2)
wherein, CNNa、CNNmRespectively representing an apparent convolutional neural network and a motion convolutional neural network for extracting a video frame IaAnd an optical flow diagram ImHigh dimensional features of (2); x is the number ofa、xm2048-dimensional vectors respectively representing appearance and motion characteristics extracted by the convolutional neural network; w is aa、wmInternal trainable parameters representing two convolutional neural networks; representing the high-dimensional features extracted by the convolutional neural network by using x;
step two, initializing the memory cell M to be empty, which is expressed as M0(ii) a Assuming the tth video frame, the memory unit MtIs not empty, and contains Nt>0 elements, respectively denoted as m1,m2,...
Figure FDA0003385246570000011
Then, the memory module reading operation at the corresponding time is as follows:
Figure FDA0003385246570000012
wherein the read-out mhtRepresenting historical information of the video at the previous t moment;
extracting short-time context characteristics of video contents by using a segmented recurrent neural network; taking the high-dimensional semantic feature x obtained by calculation in the step one as input, and recording the feature corresponding to the t-th video frame as xt(ii) a Initializing hidden states h of long and short term recurrent neural networks (LSTMs)0、c0At zero, the short-time context feature at time t is calculated as follows:
Figure FDA0003385246570000013
wherein LSTM () represents a long and short recurrent neural network, ht-1,ct-1Representing a hidden state of the recurrent neural network at a previous time; while
Figure FDA0003385246570000014
As short-time contextual features of the video content for subsequent computation;
step four, for each video frame, calculating the high-dimensional semantic feature x obtained in the step one, the step two and the step threetMemorize the cell history information mhtAnd short-time contextual features
Figure FDA0003385246570000015
Inputting the data into the memory unit controller, calculating to obtain a binary memory unit write command stE {0,1}, specifically as follows:
Figure FDA0003385246570000016
at=σ(qt) (6)
st=τ(at) (7)
Figure FDA0003385246570000021
wherein v isTFor learnable row vector parameters, Wf、Wc、WmAs a learnable weight parameter, bsIs a bias parameter; sigmoid function σ () linearly weighted result qtNormalized to between 0 and 1, i.e. at∈(0,1);atInputting the binarization function tau () limited by the threshold value to obtain the writing instruction s of the binarization memory unitt
Step five, writing the command s based on the binary memory unittUpdating the memory unit and the segmented recurrent neural network; for each video frame, a memory unit MtThe update strategy of (2) is as follows:
Figure FDA0003385246570000022
wherein, WwIs a learnable weight matrix which multiplies a high-dimensional semantic feature xtConversion to memory cell elements
Figure FDA0003385246570000023
Figure FDA0003385246570000024
Show that
Figure FDA0003385246570000025
Write memory cell Mt-1Forming a new memory cell Mt(ii) a In addition, the hidden state h of the segmented recurrent neural networkt,ctThe update is as follows:
Figure FDA0003385246570000026
wherein the content of the first and second substances,
Figure FDA0003385246570000027
the result obtained by calculation for formula (4);
step six, using a memory unit to classify behaviors; assuming that the total video length is T, the memory unit is M at the end of the whole video processingTIn which there is NTElement, then the feature representation f of the entire video is:
Figure FDA0003385246570000028
wherein f is a D-dimensional vector and represents information of behavior categories in the video; the feature is input into the full-connection classification layer to obtain a behavior classification score y, which is specifically as follows:
y=softmax(W·f) (12)
wherein W ∈ RC×DC represents the total number of identifiable behavior categories; the calculated y represents the classification score of the system for each category, and the higher the score is, the more probable the behavior is; suppose ya、ymRespectively representing the scores obtained by the appearance and motor neural networks, and finally obtaining the score yfThe following were used:
yf=ya+ym (13)
wherein, yfAnd representing the final character behavior recognition result.
CN201811569882.8A 2018-12-21 2018-12-21 Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning Active CN109753897B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811569882.8A CN109753897B (en) 2018-12-21 2018-12-21 Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811569882.8A CN109753897B (en) 2018-12-21 2018-12-21 Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning

Publications (2)

Publication Number Publication Date
CN109753897A CN109753897A (en) 2019-05-14
CN109753897B true CN109753897B (en) 2022-05-27

Family

ID=66403877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811569882.8A Active CN109753897B (en) 2018-12-21 2018-12-21 Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning

Country Status (1)

Country Link
CN (1) CN109753897B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135345A (en) * 2019-05-15 2019-08-16 武汉纵横智慧城市股份有限公司 Activity recognition method, apparatus, equipment and storage medium based on deep learning
CN110348567B (en) * 2019-07-15 2022-10-25 北京大学深圳研究生院 Memory network method based on automatic addressing and recursive information integration
CN110852273B (en) * 2019-11-12 2023-05-16 重庆大学 Behavior recognition method based on reinforcement learning attention mechanism
CN111401149B (en) * 2020-02-27 2022-05-13 西北工业大学 Lightweight video behavior identification method based on long-short-term time domain modeling algorithm
CN111639548A (en) * 2020-05-11 2020-09-08 华南理工大学 Door-based video context multi-modal perceptual feature optimization method
CN112926453B (en) * 2021-02-26 2022-08-05 电子科技大学 Examination room cheating behavior analysis method based on motion feature enhancement and long-term time sequence modeling
CN112633260B (en) * 2021-03-08 2021-06-22 北京世纪好未来教育科技有限公司 Video motion classification method and device, readable storage medium and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407889A (en) * 2016-08-26 2017-02-15 上海交通大学 Video human body interaction motion identification method based on optical flow graph depth learning model
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN106934352A (en) * 2017-02-28 2017-07-07 华南理工大学 A kind of video presentation method based on two-way fractal net work and LSTM
CN107330362A (en) * 2017-05-25 2017-11-07 北京大学 A kind of video classification methods based on space-time notice
CN108681712A (en) * 2018-05-17 2018-10-19 北京工业大学 A kind of Basketball Match Context event recognition methods of fusion domain knowledge and multistage depth characteristic
CN108805080A (en) * 2018-06-12 2018-11-13 上海交通大学 Multi-level depth Recursive Networks group behavior recognition methods based on context

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10242266B2 (en) * 2016-03-02 2019-03-26 Mitsubishi Electric Research Laboratories, Inc. Method and system for detecting actions in videos

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845351A (en) * 2016-05-13 2017-06-13 苏州大学 It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term
CN106407889A (en) * 2016-08-26 2017-02-15 上海交通大学 Video human body interaction motion identification method based on optical flow graph depth learning model
CN106934352A (en) * 2017-02-28 2017-07-07 华南理工大学 A kind of video presentation method based on two-way fractal net work and LSTM
CN107330362A (en) * 2017-05-25 2017-11-07 北京大学 A kind of video classification methods based on space-time notice
CN108681712A (en) * 2018-05-17 2018-10-19 北京工业大学 A kind of Basketball Match Context event recognition methods of fusion domain knowledge and multistage depth characteristic
CN108805080A (en) * 2018-06-12 2018-11-13 上海交通大学 Multi-level depth Recursive Networks group behavior recognition methods based on context

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Bidirectional Multirate Reconstruction for Temporal Modeling in Videos;Linchao Zhu;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20171109;第1339-1348页 *
Lattice Long Short-Term Memory for Human Action Recognition;Lin Sun et al.;《2017 IEEE International Conference on Computer Vision》;20171225;第2166-2175页 *
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition;Limin Wang et al.;《arXiv》;20160802;第1-16页 *
融合双重时空网络流和attention机制的人体行为识别;谯庆伟;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180215;第2018年卷(第2期);第I138-2110页 *

Also Published As

Publication number Publication date
CN109753897A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN109753897B (en) Behavior recognition method based on memory cell reinforcement-time sequence dynamic learning
US20210012198A1 (en) Method for training deep neural network and apparatus
CN112560432B (en) Text emotion analysis method based on graph attention network
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
CN110046671A (en) A kind of file classification method based on capsule network
CN112507898A (en) Multi-modal dynamic gesture recognition method based on lightweight 3D residual error network and TCN
CN112464865A (en) Facial expression recognition method based on pixel and geometric mixed features
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN111476315A (en) Image multi-label identification method based on statistical correlation and graph convolution technology
CN111291556A (en) Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN109508686B (en) Human behavior recognition method based on hierarchical feature subspace learning
CN114821271B (en) Model training method, image description generation device and storage medium
CN111881731A (en) Behavior recognition method, system, device and medium based on human skeleton
CN103065158A (en) Action identification method of independent subspace analysis (ISA) model based on relative gradient
CN112561064A (en) Knowledge base completion method based on OWKBC model
CN114444600A (en) Small sample image classification method based on memory enhanced prototype network
CN113868448A (en) Fine-grained scene level sketch-based image retrieval method and system
CN114299362A (en) Small sample image classification method based on k-means clustering
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN112183464A (en) Video pedestrian identification method based on deep neural network and graph convolution network
CN116258990A (en) Cross-modal affinity-based small sample reference video target segmentation method
Saqib et al. Intelligent dynamic gesture recognition using CNN empowered by edit distance
Yan Computational Methods for Deep Learning: Theory, Algorithms, and Implementations
CN112668543B (en) Isolated word sign language recognition method based on hand model perception
CN111985333A (en) Behavior detection method based on graph structure information interaction enhancement and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant