CN110175580B - Video behavior identification method based on time sequence causal convolutional network - Google Patents
Video behavior identification method based on time sequence causal convolutional network Download PDFInfo
- Publication number
- CN110175580B CN110175580B CN201910459028.4A CN201910459028A CN110175580B CN 110175580 B CN110175580 B CN 110175580B CN 201910459028 A CN201910459028 A CN 201910459028A CN 110175580 B CN110175580 B CN 110175580B
- Authority
- CN
- China
- Prior art keywords
- convolution
- time
- causal
- behavior
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000001364 causal effect Effects 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000007246 mechanism Effects 0.000 claims abstract description 19
- 230000004927 fusion Effects 0.000 claims abstract description 18
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 238000001514 detection method Methods 0.000 claims abstract description 5
- 230000006399 behavior Effects 0.000 claims description 62
- 238000010586 diagram Methods 0.000 claims description 12
- 230000007774 longterm Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 230000009471 action Effects 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 9
- 238000004458 analytical method Methods 0.000 abstract description 3
- 230000002159 abnormal effect Effects 0.000 abstract description 2
- 238000011496 digital image analysis Methods 0.000 abstract description 2
- 238000012544 monitoring process Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- BULVZWIRKLYCBC-UHFFFAOYSA-N phorate Chemical compound CCOP(=S)(OCC)SCSCC BULVZWIRKLYCBC-UHFFFAOYSA-N 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of computer image analysis, and particularly relates to a video behavior identification method based on a time sequence causal convolution network. The method comprises the steps of extracting space-time semantic feature representation from a plurality of video segments by using a time sequence causal three-dimensional convolutional neural network to obtain a predicted behavior category; and modeling the frame sequence till the current moment, and extracting the space-time high-level semantic features for behavior positioning and precision prediction. The fusion mechanism of space convolution and time sequence convolution and the causal space-time attention mechanism are designed. The method has the advantages of high precision, high calculation efficiency, real-time performance and the like, is suitable for online real-time video behavior detection and analysis tasks, and can also be used for tasks such as offline video behavior identification, abnormal event monitoring and the like.
Description
Technical Field
The invention belongs to the technical field of computer image analysis, and particularly relates to a video behavior identification method based on a time sequence causal convolution network.
Background
Video behavior detection and recognition is a classic task of computer vision, is a very fundamental problem in video understanding of sub-directions, and has been studied for many years so far. Because video data is difficult to label and analyze and the difficulty of space-time characteristic modeling is high, the development of the video behavior identification technology is slow. Under the innovation of deep learning technology, learning of spatiotemporal high-level semantic features through a neural network becomes mainstream. However, due to the large capacity of video data and the high calculation cost of a common deep network model, a practical video behavior recognition system is still scarce, and the task still has no very robust solution at present.
The system of the invention is mainly aimed at the video behavior recognition task of the online video stream. The main challenges faced by conventional recognition frameworks are: firstly, videos are different in length, pain points such as relative motion, irrelevant lens and unfixed scale exist in the videos in an open environment, and a traditional identification method can only enumerate common conditions and assumptions through a heuristic method; video data occupies resources, and a common depth model is large, so that the end-to-end training and optimization difficulty is high, and time is consumed; the optimization target is single, and the classification task can be trained only on the clipped short video.
In recent years, there have also been related research efforts attempting to solve such problems.
Article [1] proposes initializing a 3D convolutional network with pre-trained 2D network parameters, using a lighter-weight network structure for training on large-scale video data sets. However, this method can only process short videos and the practicability of the model is very low.
The document [2] proposes that the long-term spatiotemporal dependency can be captured by learning the global features of the video from the attention module. However, this method can only process offline video and cannot be applied to real-time video streams. Meanwhile, the method is high in calculation cost and time-consuming in training.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a video behavior identification method based on a time sequence causal convolutional network.
Because the 3D convolutional neural network has large parameter quantity and high calculation cost and does not have the capability of processing long-term videos, the invention designs a video behavior identification algorithm based on a time sequence causal convolutional network, and divides the 3D convolution into time sequence convolution and space convolution. The time sequence convolution guarantees causal constraint, the time dimension characteristic change is modeled by combining a short-time sequence convolution and a long-time self-attention mechanism, and the time sequence convolution is sparsely arranged in a network. In order to be better suitable for online streaming video, the invention adopts a historical characteristic caching mechanism to cache the historical characteristics needed by future frames so as to reduce the calculation amount, run the system more quickly and efficiently and achieve the real-time effect.
The invention provides a video behavior identification method based on a time sequence causal convolutional network, which comprises the steps of extracting space-time semantic feature representation from a plurality of video segments by using the time sequence causal three-dimensional convolutional neural network to obtain a predicted behavior category; and modeling the frame sequence till the current moment, and extracting the space-time high-level semantic features for behavior positioning and precision prediction.
The invention provides a video behavior identification method based on a time sequence causal convolutional network, which comprises the following specific steps of:
step 1: reading video stream data, decoding it on-line to obtain frame sequenceI={I 0,I 1…, each element of the sequence being a tensor representation of the frame picture data;
step 2: each moment of timet(t=0,1, …) will video stream current frameI t-1Sending the pre-trained time sequence causal three-dimensional convolutional neural network, and extracting space-time characteristic representation;
and step 3: sending the extracted space-time feature representation into a behavior classifier to obtain a behavior category, and obtaining the current progress of the behavior through a regression network;
and 4, step 4: and (5) caching the convolutional network part hidden layer characteristics at the moment t, wherein t = t +1, and returning to the step 2.
In step 2 of the invention, the time sequence causal three-dimensional convolution neural network comprises a space convolution layer, a time sequence causal convolution and space convolution fusion module, a causal attention mechanism and a space convolution fusion module; the space convolution layer is a main component element of the network and is used for extracting the space semantic features of the current frame; the last two modules are alternately and sparsely placed in the network and used for capturing short-term and long-term historical information; the network completes pre-training on a large-scale labeled video behavior detection dataset.
The time sequence cause and effect convolution and spatial convolution fusion module is shown in fig. 2 and comprises a time sequence cause and effect convolution and spatial convolution module with convolution kernel of 3 × 1 × 1 and a spatial convolution module with convolution kernel of 1 × 3 × 3, wherein the input feature map X is subjected to two spatial convolution paths with convolution kernel of 3 × 1 × 1 and a time sequence cause and effect convolution and convolution kernel of 1 × 3 × 3 to obtain two feature maps, and elements of the two feature maps are added to obtain a fused output feature map Y.
And the sizes of convolution kernels of the time-series causal convolution and the frame image along the height and width dimensions are all 1, the size of the convolution kernel in the time dimension is 3, and the convolution operation of each time point is the feature fusion of the time point and the past two time points. The structure is used for mining short-term historical motion information. The convolution operation of each space position is the feature fusion of the space position and 8 points adjacent to the space position. The structure is used for learning frame image space semantic information.
In the present invention, the structure of the causal attention mechanism and spatial convolution fusion module is shown in fig. 3, and the causal attention mechanism and spatial convolution fusion module includes convolution layers with three convolution kernels of 1 × 1 × 1, where an input feature map X is subjected to convolution operations with three convolution kernels of 1 × 1 × 1 and shape adjustment to obtain values V, a key K, and a query Q, where each query point of Q retrieves the correlation of all key position features before the query point time in K to obtain the correlation under causal constraint, and is generally implemented by a SoftMax function with a mask. And the characteristics of each position in the V are combined through the relevance weight to obtain the final characteristic expression of each query point. And then, carrying out convolution operation by connecting a convolution kernel 1 multiplied by 1 to obtain an output feature map of the causal attention mechanism path, wherein the feature map captures long-term historical semantic information. The feature map obtained by convolution of the spatial convolution path is added and fused to obtain the final output feature map Y.
In step 3 of the present invention, the behavior classifier is a linear classification layer, which covers common behavior classes and no-behavior classes. Mapping the extracted features to a classification space to obtain the probability of each class, sequencing the probabilities of all the classes from large to small, returning the behavior class corresponding to the maximum probability value, and finally obtaining the behavior class.
In step 3 of the invention, the regression network comprises a linear layer and a Sigmoid function, the current progress of the behavior obtained through the regression network is a predicted value between 0 and 1 obtained through the linear layer and the Sigmoid function under the condition that the occurrence of a preset behavior type is predicted; 0 represents the beginning of the action and 1 represents the end of the action. If no predetermined behavior category is predicted, the return value of the progress regression network is 0, i.e., no progress.
In step 4 of the present invention, the convolution network part hidden layer characteristics at the cache time t, for the time sequence causal convolution module, the input characteristic diagram of the time sequence causal convolution module needs to cache the characteristics of the current time and the previous time, and for the causal attention mechanism module, the characteristics of the historical time stored by the key K need to be cached and updated at each time. The caching technology can greatly reduce repeated calculation and improve the system efficiency. At the next moment, step 2 will be re-entered to update the prediction state.
Different from the existing video behavior analysis and identification method, the method considers the time sequence causality of the long-term video and the calculation cost of the time-space characteristic modeling, so that a 3D neural network structure based on the time sequence causality convolution and the self-attention mechanism is designed, and a behavior progress predictor is added, so that the model capacity is greatly reduced, a more efficient training mode can be used, the problems of poor robustness of time-space characteristic learning, unpractical model and the like are solved, and meanwhile, the accurate behavior characteristics and progress are obtained from the video stream in real time. Based on the improvement, the video behavior identification system based on the time sequence causal convolutional network has stronger expression capability and higher efficiency, and can process online video streams in real time. The method has the advantages of high precision, high calculation efficiency, real-time performance and the like, is suitable for online real-time video behavior detection and analysis tasks, and can also be used for tasks such as offline video behavior identification, abnormal event monitoring and the like.
The main innovation of the invention is that:
1. short-time and long-time space-time modeling is learned respectively by using a time sequence causal convolution and a causal attention mechanism, and an online video behavior identification task can be processed naturally. And the calculation cost is reduced through sparse use amount and the number of compression channels, and the performance of real-time processing is achieved. The temporal and spatial feature modeling are separated and stacked one on top of the other, which is highly efficient. The time and space dimensions are treated differently, so that parameter optimization is facilitated while the model parameters and calculated amount are reduced;
2. and (4) multi-task learning. Besides predicting the video behavior category, the system simultaneously regresses the progress of the behavior, can help the network to learn more refined behavior characteristics, and the multitask supervision constraint can improve the robustness and the expression capability of a network model.
Drawings
FIG. 1 is a diagram of a time-sequential causal three-dimensional convolutional neural network processing system in accordance with the present invention.
FIG. 2 is a block diagram of the fusion of the temporal causal convolution and the spatial convolution proposed by the present invention.
FIG. 3 is a block diagram of the fusion module of the causal attention-free mechanism and the spatial convolution proposed by the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
FIG. 1 shows a time-sequential causal three-dimensional convolutional neural network processing system diagram for online behavior identification according to the present invention. The system of the invention comprises an input video frame picture stream, a spatial convolution layer, a time sequence causal convolution and causal attention mechanism network foundation module, a behavior classifier and a progress regressor.
FIG. 2 illustrates a block diagram of the fusion of the temporal causal convolution and the spatial convolution for modeling of the short-time space features according to the present invention. And (3) carrying out time sequence causal convolution with convolution kernel of 3 multiplied by 1 and spatial convolution with convolution kernel of 1 multiplied by 3 on the input feature map X to obtain two feature maps, and adding elements of the two feature maps to obtain a fused output feature map Y.
FIG. 3 illustrates a block diagram of the causal attention machine and spatial convolution fusion module of the present invention for long-term spatiotemporal feature modeling. The input feature diagram X is subjected to convolution operation with three convolution kernels of 1 × 1 × 1 and shape adjustment to obtain a value V, a key K and a query Q, each query point of Q retrieves the correlation of all key position features before the query point time in K to obtain the correlation under causal constraint, and the correlation is generally realized by a masked SoftMax function (maskedsofmax). The final feature expression obtained by the attention mechanism can be expressed by a formula. Then, the convolution operation of connecting with convolution kernel 1 × 1 × 1 obtains the output characteristic diagram of the causal attention mechanism path. The feature map obtained by convolution of the spatial convolution path is added and fused to obtain the final output feature map Y.
The specific steps of the operation are as follows:
step 1, collecting large-scale labeling actions and long video data sets of corresponding segments, and initializing parameters of a time sequence causal three-dimensional convolutional neural network;
and 2, randomly selecting a long video in the data set, sequentially sending video frames to the network according to the time sequence, and calculating loss and reversely transmitting the loss based on the action tag of each time point after the maximum video memory or memory occupation amount is reached. And when all the video frames of the video sample are used up, updating the network by using a random gradient descent optimizer based on the collected network parameter gradient. The way of training for small batches of samples is similar;
step 3, deploying the trained network model to the terminal, accessing a real-time video stream, decoding video frame data, sending the video frame data into the network, extracting the current space-time characteristics, and caching the required intermediate layer characteristic state;
step 4, based on the extracted space-time characteristic expression, mapping the space-time characteristic expression to a classification space through a behavior classifier to obtain a behavior class, and sending the behavior class to a progress regressor to obtain the current progress of the behavior under the condition of behavior;
and 5, displaying the identification result of the network in real time, synchronizing the identification result with the video frame stream and basically having no time delay. The structure can accurately judge common behaviors and can cover more complex behavior categories.
Reference to the literature
[1]Carreira, Joao, and Andrew Zisserman. "Quo vadis, actionrecognition a new model and the kinetics dataset." proceedings of the IEEEConference on Computer Vision and Pattern Recognition. 2017.
[2]Wang, Xiaolong, et al. "Non-local neural networks." Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition. 2018.。
Claims (6)
1. A video behavior identification method based on a time sequence causal convolutional network is characterized by comprising the steps of extracting space-time semantic feature representation from a plurality of video segments by using the time sequence causal three-dimensional convolutional neural network to obtain a predicted behavior category; modeling the frame sequence till the current moment, and extracting space-time high-level semantic features for behavior positioning and precision prediction; the method comprises the following specific steps:
step 1: reading video stream data, decoding it on-line to obtain frame sequenceI={I 0,I 1…, each element of the sequence being a tensor representation of the frame picture data;
step 2: each moment of timetThe current frame of the video streamI t-1Sending the data into a pre-trained time sequence causal three-dimensional convolution neural network, extracting space-time characteristic representation, wherein,t=1,2,…;
and step 3: sending the extracted space-time feature representation into a behavior classifier to obtain a behavior category, and obtaining the current progress of the behavior through a regression network;
and 4, step 4: caching partial hidden layer characteristics of the convolution network at the moment t, wherein t = t +1, and returning to the step 2;
the time sequence causal three-dimensional convolution neural network in the step 2 comprises a space convolution layer, a time sequence causal convolution and space convolution fusion module and a causal attention mechanism and space convolution fusion module; the space convolution layer is a main component element of the network and is used for extracting the space semantic features of the current frame; the last two modules are alternately and sparsely placed in the network and used for capturing short-term and long-term historical information; the network completes pre-training on a large-scale labeled video behavior detection data set;
the time sequence causal convolution and space convolution fusion module comprises a time sequence causal convolution and convolution kernel 1 multiplied by 3 multiplied by 1 and a space convolution module with convolution kernel 1 multiplied by 3, an input feature map X passes through two paths of time sequence causal convolution and convolution kernel 1 multiplied by 3 to obtain two feature maps, and elements of the two feature maps are added to obtain a fused output feature map Y;
the causal attention mechanism and space convolution fusion module comprises three convolution layers with convolution kernels of 1 multiplied by 1, an input feature diagram X is subjected to convolution operation with the convolution kernels of 1 multiplied by 1 and shape adjustment to obtain a value V, a key K and a query Q, each query point of Q retrieves the correlation degree of all key position features before the query point in the K, the correlation under causal constraint is obtained, and the correlation is realized through a SoftMax function added with a mask; the features of each position in the V are combined through the relevance weight to obtain the final feature expression of each query point; then, a convolution operation of a convolution kernel 1 multiplied by 1 is connected to obtain an output characteristic diagram of the causal attention mechanism path, and long-term historical semantic information is captured by the characteristic diagram; the feature map obtained by convolution of the spatial convolution path is added and fused to obtain the final output feature map Y.
2. The method for identifying video behaviors based on a time-series causal convolution network as claimed in claim 1, wherein the time-series causal convolution has convolution kernels with sizes of 1 along the height and width dimensions of the frame image and 3 along the time dimension, and the convolution operation at each time point is feature fusion of the time point and its two past time points; the structure is used for mining short-term historical motion information.
3. The video behavior identification method based on the time sequence causal convolution network as claimed in claim 1, wherein the spatial convolution has convolution kernels with sizes of 3 along the height and width dimensions of the frame image, the convolution kernel size in the time dimension is 1, and the convolution operation of each spatial position is feature fusion of the spatial position and 8 points in its spatial neighborhood; the structure is used for learning frame image space semantic information.
4. The video behavior recognition method based on the time-series causal convolutional network of any one of claims 1-3, wherein the behavior classifier in step 3 is a linear classification layer covering common behavior classes and no behavior classes; mapping the extracted features to a classification space to obtain the probability of each class, sequencing the probabilities of all the classes from large to small, returning the behavior class corresponding to the maximum probability value, and finally obtaining the behavior class.
5. The video behavior identification method based on the time sequence causal convolutional network as claimed in claim 4, wherein the regression network in step 3 includes a linear layer and a Sigmoid function, and the current progress of the behavior obtained through the regression network is obtained by obtaining a predicted value between 0 and 1 through the linear layer and the Sigmoid function when a predetermined behavior class is predicted to appear; 0 represents the beginning of the action and 1 represents the end of the action; if no predetermined behavior category is predicted, the return value of the progress regression network is 0, i.e., no progress.
6. The video behavior identification method based on the time sequence causal convolutional network as claimed in claim 5, wherein the convolutional network partial hidden layer feature of the cache time t in step 4 is input into a feature map for the time sequence causal convolutional module to cache the features of the current time and the previous time; for the cause and effect self-attention mechanism module, caching and updating the characteristics of the historical time points stored by the key K at each moment; at the next moment, step 2 will be re-entered to update the prediction state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910459028.4A CN110175580B (en) | 2019-05-29 | 2019-05-29 | Video behavior identification method based on time sequence causal convolutional network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910459028.4A CN110175580B (en) | 2019-05-29 | 2019-05-29 | Video behavior identification method based on time sequence causal convolutional network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110175580A CN110175580A (en) | 2019-08-27 |
CN110175580B true CN110175580B (en) | 2020-10-30 |
Family
ID=67696573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910459028.4A Expired - Fee Related CN110175580B (en) | 2019-05-29 | 2019-05-29 | Video behavior identification method based on time sequence causal convolutional network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110175580B (en) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516611B (en) * | 2019-08-28 | 2022-03-01 | 中科人工智能创新技术研究院(青岛)有限公司 | Autism detection system and autism detection device |
CN110503076B (en) * | 2019-08-29 | 2023-06-30 | 腾讯科技(深圳)有限公司 | Video classification method, device, equipment and medium based on artificial intelligence |
CN110765854B (en) * | 2019-09-12 | 2022-12-02 | 昆明理工大学 | Video motion recognition method |
CN110807369B (en) * | 2019-10-09 | 2024-02-20 | 南京航空航天大学 | Short video content intelligent classification method based on deep learning and attention mechanism |
CN110852295B (en) * | 2019-10-15 | 2023-08-25 | 深圳龙岗智能视听研究院 | Video behavior recognition method based on multitasking supervised learning |
CN110839156A (en) * | 2019-11-08 | 2020-02-25 | 北京邮电大学 | Future frame prediction method and model based on video image |
CN111259782B (en) * | 2020-01-14 | 2022-02-11 | 北京大学 | Video behavior identification method based on mixed multi-scale time sequence separable convolution operation |
CN111460928B (en) * | 2020-03-17 | 2023-07-21 | 中国科学院计算技术研究所 | Human body action recognition system and method |
CN111563417B (en) * | 2020-04-13 | 2023-03-21 | 华南理工大学 | Pyramid structure convolutional neural network-based facial expression recognition method |
CN111506835B (en) * | 2020-04-17 | 2022-12-23 | 北京理工大学 | Data feature extraction method fusing user time features and individual features |
CN112185352B (en) * | 2020-08-31 | 2024-05-17 | 华为技术有限公司 | Voice recognition method and device and electronic equipment |
CN112257572B (en) * | 2020-10-20 | 2022-02-01 | 神思电子技术股份有限公司 | Behavior identification method based on self-attention mechanism |
CN112434615A (en) * | 2020-11-26 | 2021-03-02 | 天津大学 | Time sequence action detection method based on Tensorflow deep learning framework |
CN112487957A (en) * | 2020-11-27 | 2021-03-12 | 广州华多网络科技有限公司 | Video behavior detection and response method and device, equipment and medium |
CN112651324A (en) * | 2020-12-22 | 2021-04-13 | 深圳壹账通智能科技有限公司 | Method and device for extracting semantic information of video frame and computer equipment |
CN112288050B (en) * | 2020-12-29 | 2021-05-11 | 中电科新型智慧城市研究院有限公司 | Abnormal behavior identification method and device, terminal equipment and storage medium |
CN112668495B (en) * | 2020-12-30 | 2024-02-02 | 东北大学 | Full-time space convolution module-based violent video detection algorithm |
CN112883983B (en) * | 2021-02-09 | 2024-06-14 | 北京迈格威科技有限公司 | Feature extraction method, device and electronic system |
CN112883929B (en) * | 2021-03-26 | 2023-08-08 | 全球能源互联网研究院有限公司 | On-line video abnormal behavior detection model training and abnormal detection method and system |
CN113466852B (en) * | 2021-06-08 | 2023-11-24 | 江苏科技大学 | Millimeter wave radar dynamic gesture recognition method applied to random interference scene |
CN113487247B (en) * | 2021-09-06 | 2022-02-01 | 阿里巴巴(中国)有限公司 | Digitalized production management system, video processing method, equipment and storage medium |
CN114510966B (en) * | 2022-01-14 | 2023-04-28 | 电子科技大学 | End-to-end brain causal network construction method based on graph neural network |
CN115100740B (en) * | 2022-06-15 | 2024-04-05 | 东莞理工学院 | Human motion recognition and intention understanding method, terminal equipment and storage medium |
CN115019239A (en) * | 2022-07-04 | 2022-09-06 | 福州大学 | Real-time action positioning method based on space-time cross attention |
CN114898166B (en) * | 2022-07-13 | 2022-09-27 | 合肥工业大学 | Method for detecting glass cleanliness based on evolution causal model |
CN115147935B (en) * | 2022-09-05 | 2022-12-13 | 浙江壹体科技有限公司 | Behavior identification method based on joint point, electronic device and storage medium |
CN116797972B (en) * | 2023-06-26 | 2024-09-06 | 中科(黑龙江)数字经济研究院有限公司 | Self-supervision group behavior recognition method and recognition system based on sparse graph causal time sequence coding |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830157A (en) * | 2018-05-15 | 2018-11-16 | 华北电力大学(保定) | Human bodys' response method based on attention mechanism and 3D convolutional neural networks |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10366292B2 (en) * | 2016-11-03 | 2019-07-30 | Nec Corporation | Translating video to language using adaptive spatiotemporal convolution feature representation with dynamic abstraction |
CN107609460B (en) * | 2017-05-24 | 2021-02-02 | 南京邮电大学 | Human body behavior recognition method integrating space-time dual network flow and attention mechanism |
CN109389055B (en) * | 2018-09-21 | 2021-07-20 | 西安电子科技大学 | Video classification method based on mixed convolution and attention mechanism |
-
2019
- 2019-05-29 CN CN201910459028.4A patent/CN110175580B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830157A (en) * | 2018-05-15 | 2018-11-16 | 华北电力大学(保定) | Human bodys' response method based on attention mechanism and 3D convolutional neural networks |
Also Published As
Publication number | Publication date |
---|---|
CN110175580A (en) | 2019-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110175580B (en) | Video behavior identification method based on time sequence causal convolutional network | |
CN109446923B (en) | Deep supervision convolutional neural network behavior recognition method based on training feature fusion | |
CN113936339B (en) | Fighting identification method and device based on double-channel cross attention mechanism | |
CN109086873B (en) | Training method, recognition method and device of recurrent neural network and processing equipment | |
CN109993102B (en) | Similar face retrieval method, device and storage medium | |
KR20190119548A (en) | Method and apparatus for processing image noise | |
CN108399380A (en) | A kind of video actions detection method based on Three dimensional convolution and Faster RCNN | |
CN111046821A (en) | Video behavior identification method and system and electronic equipment | |
CN111783712A (en) | Video processing method, device, equipment and medium | |
CN111008337A (en) | Deep attention rumor identification method and device based on ternary characteristics | |
CN113469289A (en) | Video self-supervision characterization learning method and device, computer equipment and medium | |
Wang et al. | Ttpp: Temporal transformer with progressive prediction for efficient action anticipation | |
Bai et al. | A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal | |
CN115240024A (en) | Method and system for segmenting extraterrestrial pictures by combining self-supervised learning and semi-supervised learning | |
CN113011320A (en) | Video processing method and device, electronic equipment and storage medium | |
CN113936175A (en) | Method and system for identifying events in video | |
CN112651267B (en) | Identification method, model training, system and equipment | |
CN115188022A (en) | Human behavior identification method based on consistency semi-supervised deep learning | |
CN117292307B (en) | Time sequence action nomination generation method and system based on coarse time granularity | |
CN115705706A (en) | Video processing method, video processing device, computer equipment and storage medium | |
CN117095460A (en) | Self-supervision group behavior recognition method and system based on long-short time relation predictive coding | |
CN112926517B (en) | Artificial intelligence monitoring method | |
CN116992947A (en) | Model training method, video query method and device | |
CN112883868B (en) | Training method of weak supervision video motion positioning model based on relational modeling | |
CN115082840A (en) | Action video classification method and device based on data combination and channel correlation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20201030 |
|
CF01 | Termination of patent right due to non-payment of annual fee |