CN110175580B - Video behavior identification method based on time sequence causal convolutional network - Google Patents

Video behavior identification method based on time sequence causal convolutional network Download PDF

Info

Publication number
CN110175580B
CN110175580B CN201910459028.4A CN201910459028A CN110175580B CN 110175580 B CN110175580 B CN 110175580B CN 201910459028 A CN201910459028 A CN 201910459028A CN 110175580 B CN110175580 B CN 110175580B
Authority
CN
China
Prior art keywords
convolution
time
causal
behavior
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910459028.4A
Other languages
Chinese (zh)
Other versions
CN110175580A (en
Inventor
姜育刚
程昌茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201910459028.4A priority Critical patent/CN110175580B/en
Publication of CN110175580A publication Critical patent/CN110175580A/en
Application granted granted Critical
Publication of CN110175580B publication Critical patent/CN110175580B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computer image analysis, and particularly relates to a video behavior identification method based on a time sequence causal convolution network. The method comprises the steps of extracting space-time semantic feature representation from a plurality of video segments by using a time sequence causal three-dimensional convolutional neural network to obtain a predicted behavior category; and modeling the frame sequence till the current moment, and extracting the space-time high-level semantic features for behavior positioning and precision prediction. The fusion mechanism of space convolution and time sequence convolution and the causal space-time attention mechanism are designed. The method has the advantages of high precision, high calculation efficiency, real-time performance and the like, is suitable for online real-time video behavior detection and analysis tasks, and can also be used for tasks such as offline video behavior identification, abnormal event monitoring and the like.

Description

Video behavior identification method based on time sequence causal convolutional network
Technical Field
The invention belongs to the technical field of computer image analysis, and particularly relates to a video behavior identification method based on a time sequence causal convolution network.
Background
Video behavior detection and recognition is a classic task of computer vision, is a very fundamental problem in video understanding of sub-directions, and has been studied for many years so far. Because video data is difficult to label and analyze and the difficulty of space-time characteristic modeling is high, the development of the video behavior identification technology is slow. Under the innovation of deep learning technology, learning of spatiotemporal high-level semantic features through a neural network becomes mainstream. However, due to the large capacity of video data and the high calculation cost of a common deep network model, a practical video behavior recognition system is still scarce, and the task still has no very robust solution at present.
The system of the invention is mainly aimed at the video behavior recognition task of the online video stream. The main challenges faced by conventional recognition frameworks are: firstly, videos are different in length, pain points such as relative motion, irrelevant lens and unfixed scale exist in the videos in an open environment, and a traditional identification method can only enumerate common conditions and assumptions through a heuristic method; video data occupies resources, and a common depth model is large, so that the end-to-end training and optimization difficulty is high, and time is consumed; the optimization target is single, and the classification task can be trained only on the clipped short video.
In recent years, there have also been related research efforts attempting to solve such problems.
Article [1] proposes initializing a 3D convolutional network with pre-trained 2D network parameters, using a lighter-weight network structure for training on large-scale video data sets. However, this method can only process short videos and the practicability of the model is very low.
The document [2] proposes that the long-term spatiotemporal dependency can be captured by learning the global features of the video from the attention module. However, this method can only process offline video and cannot be applied to real-time video streams. Meanwhile, the method is high in calculation cost and time-consuming in training.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a video behavior identification method based on a time sequence causal convolutional network.
Because the 3D convolutional neural network has large parameter quantity and high calculation cost and does not have the capability of processing long-term videos, the invention designs a video behavior identification algorithm based on a time sequence causal convolutional network, and divides the 3D convolution into time sequence convolution and space convolution. The time sequence convolution guarantees causal constraint, the time dimension characteristic change is modeled by combining a short-time sequence convolution and a long-time self-attention mechanism, and the time sequence convolution is sparsely arranged in a network. In order to be better suitable for online streaming video, the invention adopts a historical characteristic caching mechanism to cache the historical characteristics needed by future frames so as to reduce the calculation amount, run the system more quickly and efficiently and achieve the real-time effect.
The invention provides a video behavior identification method based on a time sequence causal convolutional network, which comprises the steps of extracting space-time semantic feature representation from a plurality of video segments by using the time sequence causal three-dimensional convolutional neural network to obtain a predicted behavior category; and modeling the frame sequence till the current moment, and extracting the space-time high-level semantic features for behavior positioning and precision prediction.
The invention provides a video behavior identification method based on a time sequence causal convolutional network, which comprises the following specific steps of:
step 1: reading video stream data, decoding it on-line to obtain frame sequenceI={I 0,I 1…, each element of the sequence being a tensor representation of the frame picture data;
step 2: each moment of timet(t=0,1, …) will video stream current frameI t-1Sending the pre-trained time sequence causal three-dimensional convolutional neural network, and extracting space-time characteristic representation;
and step 3: sending the extracted space-time feature representation into a behavior classifier to obtain a behavior category, and obtaining the current progress of the behavior through a regression network;
and 4, step 4: and (5) caching the convolutional network part hidden layer characteristics at the moment t, wherein t = t +1, and returning to the step 2.
In step 2 of the invention, the time sequence causal three-dimensional convolution neural network comprises a space convolution layer, a time sequence causal convolution and space convolution fusion module, a causal attention mechanism and a space convolution fusion module; the space convolution layer is a main component element of the network and is used for extracting the space semantic features of the current frame; the last two modules are alternately and sparsely placed in the network and used for capturing short-term and long-term historical information; the network completes pre-training on a large-scale labeled video behavior detection dataset.
The time sequence cause and effect convolution and spatial convolution fusion module is shown in fig. 2 and comprises a time sequence cause and effect convolution and spatial convolution module with convolution kernel of 3 × 1 × 1 and a spatial convolution module with convolution kernel of 1 × 3 × 3, wherein the input feature map X is subjected to two spatial convolution paths with convolution kernel of 3 × 1 × 1 and a time sequence cause and effect convolution and convolution kernel of 1 × 3 × 3 to obtain two feature maps, and elements of the two feature maps are added to obtain a fused output feature map Y.
And the sizes of convolution kernels of the time-series causal convolution and the frame image along the height and width dimensions are all 1, the size of the convolution kernel in the time dimension is 3, and the convolution operation of each time point is the feature fusion of the time point and the past two time points. The structure is used for mining short-term historical motion information. The convolution operation of each space position is the feature fusion of the space position and 8 points adjacent to the space position. The structure is used for learning frame image space semantic information.
In the present invention, the structure of the causal attention mechanism and spatial convolution fusion module is shown in fig. 3, and the causal attention mechanism and spatial convolution fusion module includes convolution layers with three convolution kernels of 1 × 1 × 1, where an input feature map X is subjected to convolution operations with three convolution kernels of 1 × 1 × 1 and shape adjustment to obtain values V, a key K, and a query Q, where each query point of Q retrieves the correlation of all key position features before the query point time in K to obtain the correlation under causal constraint, and is generally implemented by a SoftMax function with a mask. And the characteristics of each position in the V are combined through the relevance weight to obtain the final characteristic expression of each query point. And then, carrying out convolution operation by connecting a convolution kernel 1 multiplied by 1 to obtain an output feature map of the causal attention mechanism path, wherein the feature map captures long-term historical semantic information. The feature map obtained by convolution of the spatial convolution path is added and fused to obtain the final output feature map Y.
In step 3 of the present invention, the behavior classifier is a linear classification layer, which covers common behavior classes and no-behavior classes. Mapping the extracted features to a classification space to obtain the probability of each class, sequencing the probabilities of all the classes from large to small, returning the behavior class corresponding to the maximum probability value, and finally obtaining the behavior class.
In step 3 of the invention, the regression network comprises a linear layer and a Sigmoid function, the current progress of the behavior obtained through the regression network is a predicted value between 0 and 1 obtained through the linear layer and the Sigmoid function under the condition that the occurrence of a preset behavior type is predicted; 0 represents the beginning of the action and 1 represents the end of the action. If no predetermined behavior category is predicted, the return value of the progress regression network is 0, i.e., no progress.
In step 4 of the present invention, the convolution network part hidden layer characteristics at the cache time t, for the time sequence causal convolution module, the input characteristic diagram of the time sequence causal convolution module needs to cache the characteristics of the current time and the previous time, and for the causal attention mechanism module, the characteristics of the historical time stored by the key K need to be cached and updated at each time. The caching technology can greatly reduce repeated calculation and improve the system efficiency. At the next moment, step 2 will be re-entered to update the prediction state.
Different from the existing video behavior analysis and identification method, the method considers the time sequence causality of the long-term video and the calculation cost of the time-space characteristic modeling, so that a 3D neural network structure based on the time sequence causality convolution and the self-attention mechanism is designed, and a behavior progress predictor is added, so that the model capacity is greatly reduced, a more efficient training mode can be used, the problems of poor robustness of time-space characteristic learning, unpractical model and the like are solved, and meanwhile, the accurate behavior characteristics and progress are obtained from the video stream in real time. Based on the improvement, the video behavior identification system based on the time sequence causal convolutional network has stronger expression capability and higher efficiency, and can process online video streams in real time. The method has the advantages of high precision, high calculation efficiency, real-time performance and the like, is suitable for online real-time video behavior detection and analysis tasks, and can also be used for tasks such as offline video behavior identification, abnormal event monitoring and the like.
The main innovation of the invention is that:
1. short-time and long-time space-time modeling is learned respectively by using a time sequence causal convolution and a causal attention mechanism, and an online video behavior identification task can be processed naturally. And the calculation cost is reduced through sparse use amount and the number of compression channels, and the performance of real-time processing is achieved. The temporal and spatial feature modeling are separated and stacked one on top of the other, which is highly efficient. The time and space dimensions are treated differently, so that parameter optimization is facilitated while the model parameters and calculated amount are reduced;
2. and (4) multi-task learning. Besides predicting the video behavior category, the system simultaneously regresses the progress of the behavior, can help the network to learn more refined behavior characteristics, and the multitask supervision constraint can improve the robustness and the expression capability of a network model.
Drawings
FIG. 1 is a diagram of a time-sequential causal three-dimensional convolutional neural network processing system in accordance with the present invention.
FIG. 2 is a block diagram of the fusion of the temporal causal convolution and the spatial convolution proposed by the present invention.
FIG. 3 is a block diagram of the fusion module of the causal attention-free mechanism and the spatial convolution proposed by the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
FIG. 1 shows a time-sequential causal three-dimensional convolutional neural network processing system diagram for online behavior identification according to the present invention. The system of the invention comprises an input video frame picture stream, a spatial convolution layer, a time sequence causal convolution and causal attention mechanism network foundation module, a behavior classifier and a progress regressor.
FIG. 2 illustrates a block diagram of the fusion of the temporal causal convolution and the spatial convolution for modeling of the short-time space features according to the present invention. And (3) carrying out time sequence causal convolution with convolution kernel of 3 multiplied by 1 and spatial convolution with convolution kernel of 1 multiplied by 3 on the input feature map X to obtain two feature maps, and adding elements of the two feature maps to obtain a fused output feature map Y.
FIG. 3 illustrates a block diagram of the causal attention machine and spatial convolution fusion module of the present invention for long-term spatiotemporal feature modeling. The input feature diagram X is subjected to convolution operation with three convolution kernels of 1 × 1 × 1 and shape adjustment to obtain a value V, a key K and a query Q, each query point of Q retrieves the correlation of all key position features before the query point time in K to obtain the correlation under causal constraint, and the correlation is generally realized by a masked SoftMax function (maskedsofmax). The final feature expression obtained by the attention mechanism can be expressed by a formula
Figure DEST_PATH_IMAGE001
. Then, the convolution operation of connecting with convolution kernel 1 × 1 × 1 obtains the output characteristic diagram of the causal attention mechanism path. The feature map obtained by convolution of the spatial convolution path is added and fused to obtain the final output feature map Y.
The specific steps of the operation are as follows:
step 1, collecting large-scale labeling actions and long video data sets of corresponding segments, and initializing parameters of a time sequence causal three-dimensional convolutional neural network;
and 2, randomly selecting a long video in the data set, sequentially sending video frames to the network according to the time sequence, and calculating loss and reversely transmitting the loss based on the action tag of each time point after the maximum video memory or memory occupation amount is reached. And when all the video frames of the video sample are used up, updating the network by using a random gradient descent optimizer based on the collected network parameter gradient. The way of training for small batches of samples is similar;
step 3, deploying the trained network model to the terminal, accessing a real-time video stream, decoding video frame data, sending the video frame data into the network, extracting the current space-time characteristics, and caching the required intermediate layer characteristic state;
step 4, based on the extracted space-time characteristic expression, mapping the space-time characteristic expression to a classification space through a behavior classifier to obtain a behavior class, and sending the behavior class to a progress regressor to obtain the current progress of the behavior under the condition of behavior;
and 5, displaying the identification result of the network in real time, synchronizing the identification result with the video frame stream and basically having no time delay. The structure can accurately judge common behaviors and can cover more complex behavior categories.
Reference to the literature
[1]Carreira, Joao, and Andrew Zisserman. "Quo vadis, actionrecognition a new model and the kinetics dataset." proceedings of the IEEEConference on Computer Vision and Pattern Recognition. 2017.
[2]Wang, Xiaolong, et al. "Non-local neural networks." Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition. 2018.。

Claims (6)

1. A video behavior identification method based on a time sequence causal convolutional network is characterized by comprising the steps of extracting space-time semantic feature representation from a plurality of video segments by using the time sequence causal three-dimensional convolutional neural network to obtain a predicted behavior category; modeling the frame sequence till the current moment, and extracting space-time high-level semantic features for behavior positioning and precision prediction; the method comprises the following specific steps:
step 1: reading video stream data, decoding it on-line to obtain frame sequenceI={I 0,I 1…, each element of the sequence being a tensor representation of the frame picture data;
step 2: each moment of timetThe current frame of the video streamI t-1Sending the data into a pre-trained time sequence causal three-dimensional convolution neural network, extracting space-time characteristic representation, wherein,t=1,2,…;
and step 3: sending the extracted space-time feature representation into a behavior classifier to obtain a behavior category, and obtaining the current progress of the behavior through a regression network;
and 4, step 4: caching partial hidden layer characteristics of the convolution network at the moment t, wherein t = t +1, and returning to the step 2;
the time sequence causal three-dimensional convolution neural network in the step 2 comprises a space convolution layer, a time sequence causal convolution and space convolution fusion module and a causal attention mechanism and space convolution fusion module; the space convolution layer is a main component element of the network and is used for extracting the space semantic features of the current frame; the last two modules are alternately and sparsely placed in the network and used for capturing short-term and long-term historical information; the network completes pre-training on a large-scale labeled video behavior detection data set;
the time sequence causal convolution and space convolution fusion module comprises a time sequence causal convolution and convolution kernel 1 multiplied by 3 multiplied by 1 and a space convolution module with convolution kernel 1 multiplied by 3, an input feature map X passes through two paths of time sequence causal convolution and convolution kernel 1 multiplied by 3 to obtain two feature maps, and elements of the two feature maps are added to obtain a fused output feature map Y;
the causal attention mechanism and space convolution fusion module comprises three convolution layers with convolution kernels of 1 multiplied by 1, an input feature diagram X is subjected to convolution operation with the convolution kernels of 1 multiplied by 1 and shape adjustment to obtain a value V, a key K and a query Q, each query point of Q retrieves the correlation degree of all key position features before the query point in the K, the correlation under causal constraint is obtained, and the correlation is realized through a SoftMax function added with a mask; the features of each position in the V are combined through the relevance weight to obtain the final feature expression of each query point; then, a convolution operation of a convolution kernel 1 multiplied by 1 is connected to obtain an output characteristic diagram of the causal attention mechanism path, and long-term historical semantic information is captured by the characteristic diagram; the feature map obtained by convolution of the spatial convolution path is added and fused to obtain the final output feature map Y.
2. The method for identifying video behaviors based on a time-series causal convolution network as claimed in claim 1, wherein the time-series causal convolution has convolution kernels with sizes of 1 along the height and width dimensions of the frame image and 3 along the time dimension, and the convolution operation at each time point is feature fusion of the time point and its two past time points; the structure is used for mining short-term historical motion information.
3. The video behavior identification method based on the time sequence causal convolution network as claimed in claim 1, wherein the spatial convolution has convolution kernels with sizes of 3 along the height and width dimensions of the frame image, the convolution kernel size in the time dimension is 1, and the convolution operation of each spatial position is feature fusion of the spatial position and 8 points in its spatial neighborhood; the structure is used for learning frame image space semantic information.
4. The video behavior recognition method based on the time-series causal convolutional network of any one of claims 1-3, wherein the behavior classifier in step 3 is a linear classification layer covering common behavior classes and no behavior classes; mapping the extracted features to a classification space to obtain the probability of each class, sequencing the probabilities of all the classes from large to small, returning the behavior class corresponding to the maximum probability value, and finally obtaining the behavior class.
5. The video behavior identification method based on the time sequence causal convolutional network as claimed in claim 4, wherein the regression network in step 3 includes a linear layer and a Sigmoid function, and the current progress of the behavior obtained through the regression network is obtained by obtaining a predicted value between 0 and 1 through the linear layer and the Sigmoid function when a predetermined behavior class is predicted to appear; 0 represents the beginning of the action and 1 represents the end of the action; if no predetermined behavior category is predicted, the return value of the progress regression network is 0, i.e., no progress.
6. The video behavior identification method based on the time sequence causal convolutional network as claimed in claim 5, wherein the convolutional network partial hidden layer feature of the cache time t in step 4 is input into a feature map for the time sequence causal convolutional module to cache the features of the current time and the previous time; for the cause and effect self-attention mechanism module, caching and updating the characteristics of the historical time points stored by the key K at each moment; at the next moment, step 2 will be re-entered to update the prediction state.
CN201910459028.4A 2019-05-29 2019-05-29 Video behavior identification method based on time sequence causal convolutional network Active CN110175580B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910459028.4A CN110175580B (en) 2019-05-29 2019-05-29 Video behavior identification method based on time sequence causal convolutional network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910459028.4A CN110175580B (en) 2019-05-29 2019-05-29 Video behavior identification method based on time sequence causal convolutional network

Publications (2)

Publication Number Publication Date
CN110175580A CN110175580A (en) 2019-08-27
CN110175580B true CN110175580B (en) 2020-10-30

Family

ID=67696573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910459028.4A Active CN110175580B (en) 2019-05-29 2019-05-29 Video behavior identification method based on time sequence causal convolutional network

Country Status (1)

Country Link
CN (1) CN110175580B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516611B (en) * 2019-08-28 2022-03-01 中科人工智能创新技术研究院(青岛)有限公司 Autism detection system and autism detection device
CN110503076B (en) * 2019-08-29 2023-06-30 腾讯科技(深圳)有限公司 Video classification method, device, equipment and medium based on artificial intelligence
CN110765854B (en) * 2019-09-12 2022-12-02 昆明理工大学 Video motion recognition method
CN110807369B (en) * 2019-10-09 2024-02-20 南京航空航天大学 Short video content intelligent classification method based on deep learning and attention mechanism
CN110852295B (en) * 2019-10-15 2023-08-25 深圳龙岗智能视听研究院 Video behavior recognition method based on multitasking supervised learning
CN110839156A (en) * 2019-11-08 2020-02-25 北京邮电大学 Future frame prediction method and model based on video image
CN111259782B (en) * 2020-01-14 2022-02-11 北京大学 Video behavior identification method based on mixed multi-scale time sequence separable convolution operation
CN111460928B (en) * 2020-03-17 2023-07-21 中国科学院计算技术研究所 Human body action recognition system and method
CN111563417B (en) * 2020-04-13 2023-03-21 华南理工大学 Pyramid structure convolutional neural network-based facial expression recognition method
CN111506835B (en) * 2020-04-17 2022-12-23 北京理工大学 Data feature extraction method fusing user time features and individual features
CN112185352B (en) * 2020-08-31 2024-05-17 华为技术有限公司 Voice recognition method and device and electronic equipment
CN112257572B (en) * 2020-10-20 2022-02-01 神思电子技术股份有限公司 Behavior identification method based on self-attention mechanism
CN112434615A (en) * 2020-11-26 2021-03-02 天津大学 Time sequence action detection method based on Tensorflow deep learning framework
CN112487957A (en) * 2020-11-27 2021-03-12 广州华多网络科技有限公司 Video behavior detection and response method and device, equipment and medium
CN112651324A (en) * 2020-12-22 2021-04-13 深圳壹账通智能科技有限公司 Method and device for extracting semantic information of video frame and computer equipment
CN112288050B (en) * 2020-12-29 2021-05-11 中电科新型智慧城市研究院有限公司 Abnormal behavior identification method and device, terminal equipment and storage medium
CN112668495B (en) * 2020-12-30 2024-02-02 东北大学 Full-time space convolution module-based violent video detection algorithm
CN112883983A (en) * 2021-02-09 2021-06-01 北京迈格威科技有限公司 Feature extraction method and device and electronic system
CN112883929B (en) * 2021-03-26 2023-08-08 全球能源互联网研究院有限公司 On-line video abnormal behavior detection model training and abnormal detection method and system
CN113466852B (en) * 2021-06-08 2023-11-24 江苏科技大学 Millimeter wave radar dynamic gesture recognition method applied to random interference scene
CN113487247B (en) * 2021-09-06 2022-02-01 阿里巴巴(中国)有限公司 Digitalized production management system, video processing method, equipment and storage medium
CN114510966B (en) * 2022-01-14 2023-04-28 电子科技大学 End-to-end brain causal network construction method based on graph neural network
CN115100740B (en) * 2022-06-15 2024-04-05 东莞理工学院 Human motion recognition and intention understanding method, terminal equipment and storage medium
CN114898166B (en) * 2022-07-13 2022-09-27 合肥工业大学 Method for detecting glass cleanliness based on evolution causal model
CN115147935B (en) * 2022-09-05 2022-12-13 浙江壹体科技有限公司 Behavior identification method based on joint point, electronic device and storage medium
CN116797972A (en) * 2023-06-26 2023-09-22 中科(黑龙江)数字经济研究院有限公司 Self-supervision group behavior recognition method and recognition system based on sparse graph causal time sequence coding

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830157A (en) * 2018-05-15 2018-11-16 华北电力大学(保定) Human bodys' response method based on attention mechanism and 3D convolutional neural networks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10929681B2 (en) * 2016-11-03 2021-02-23 Nec Corporation Surveillance system using adaptive spatiotemporal convolution feature representation with dynamic abstraction for video to language translation
CN107609460B (en) * 2017-05-24 2021-02-02 南京邮电大学 Human body behavior recognition method integrating space-time dual network flow and attention mechanism
CN109389055B (en) * 2018-09-21 2021-07-20 西安电子科技大学 Video classification method based on mixed convolution and attention mechanism

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830157A (en) * 2018-05-15 2018-11-16 华北电力大学(保定) Human bodys' response method based on attention mechanism and 3D convolutional neural networks

Also Published As

Publication number Publication date
CN110175580A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN110175580B (en) Video behavior identification method based on time sequence causal convolutional network
CN109446923B (en) Deep supervision convolutional neural network behavior recognition method based on training feature fusion
CN113936339B (en) Fighting identification method and device based on double-channel cross attention mechanism
CN109086873B (en) Training method, recognition method and device of recurrent neural network and processing equipment
CN109993102B (en) Similar face retrieval method, device and storage medium
CN113469289B (en) Video self-supervision characterization learning method and device, computer equipment and medium
CN111783712A (en) Video processing method, device, equipment and medium
CN111008337A (en) Deep attention rumor identification method and device based on ternary characteristics
CN110705412A (en) Video target detection method based on motion history image
CN113255625B (en) Video detection method and device, electronic equipment and storage medium
Wang et al. Ttpp: Temporal transformer with progressive prediction for efficient action anticipation
CN115240024A (en) Method and system for segmenting extraterrestrial pictures by combining self-supervised learning and semi-supervised learning
CN113591674A (en) Real-time video stream-oriented edge environment behavior recognition system
Bai et al. A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal
CN114663798A (en) Single-step video content identification method based on reinforcement learning
CN115705706A (en) Video processing method, video processing device, computer equipment and storage medium
CN113936175A (en) Method and system for identifying events in video
CN117095460A (en) Self-supervision group behavior recognition method and system based on long-short time relation predictive coding
CN112926517B (en) Artificial intelligence monitoring method
CN115082840A (en) Action video classification method and device based on data combination and channel correlation
CN115188022A (en) Human behavior identification method based on consistency semi-supervised deep learning
CN114092746A (en) Multi-attribute identification method and device, storage medium and electronic equipment
CN114218434A (en) Automatic labeling method, automatic labeling device and computer readable storage medium
CN113011320A (en) Video processing method and device, electronic equipment and storage medium
Guhagarkar et al. DEEPFAKE DETECTION TECHNIQUES: A REVIEW

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant