CN112949544A - Action time sequence detection method based on 3D convolutional network - Google Patents
Action time sequence detection method based on 3D convolutional network Download PDFInfo
- Publication number
- CN112949544A CN112949544A CN202110285908.1A CN202110285908A CN112949544A CN 112949544 A CN112949544 A CN 112949544A CN 202110285908 A CN202110285908 A CN 202110285908A CN 112949544 A CN112949544 A CN 112949544A
- Authority
- CN
- China
- Prior art keywords
- action
- network
- video
- time
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009471 action Effects 0.000 title claims abstract description 114
- 238000001514 detection method Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 26
- 238000001914 filtration Methods 0.000 claims abstract description 16
- 238000003064 k means clustering Methods 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 5
- 239000000284 extract Substances 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 4
- 230000004927 fusion Effects 0.000 abstract description 3
- 238000012544 monitoring process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a 3D convolutional network-based action time sequence detection method, which comprises the steps of extracting key frames with obviously changed actions through K-means clustering, extracting action characteristics by using a 3D convolutional network, fusing a 3D convolutional deconvolution network with a space-time characteristic pyramid structure to realize multi-scale action frame level prediction, and fusing prediction results by using Kalman filtering to achieve the purpose of predicting action time sequences. The method of the invention predicts the frame level of the action which occurs at any position and has any duration, thereby achieving the effect of real-time performance; information difference between action key frames is maximized through K-means clustering, so that the 3D convolutional network can more effectively extract rich action characteristic information, and the classification accuracy is improved; the multi-scale fusion scheme of the 3D convolution deconvolution network and the space-time characteristic pyramid network overcomes the problem of low prediction precision under a single scale, the prediction result has the action integrity and the action detail information, and the detection precision is obviously improved.
Description
Technical Field
The invention relates to the technical field of human motion feature extraction and classification prediction in video images, in particular to a motion time sequence detection method based on a 3D convolutional network.
Background
With the rapid development of the acquisition capability of the visual sensor and the image processing capability of the computer, the computer acquires image video information through the visual sensor, and the human body action behaviors in the image can be understood by analyzing the image content through the image processing, the mode recognition, the machine learning and other artificial intelligence technologies. Effective human body action timing sequence detection technology is needed to analyze and understand action behaviors from large-scale video data. The action time sequence detection refers to a video processing method for searching a plurality of action segments in a section of original video and predicting the starting and ending time and the action category of action occurrence. The method is a technology for intelligently detecting and classifying and identifying human body actions in video images by a computer, needs to simultaneously process two-dimensional image information and three-dimensional space-time information in video data, and has important application value in the fields of safety monitoring, intelligent monitoring, medical care, video retrieval, human-computer interaction, intelligent robots and the like.
The action time sequence detection comprises two stages of action characteristic extraction and action time sequence proposal, the existing method not only depends heavily on the understanding and identifying capability of the action, but also has the problems that the time sequence proposal method is difficult to detect the target action time sequence area due to the complex video data structure and the different target action duration time lengths. The problems of effective extraction of action features in large-scale video data and high-precision time sequence detection meeting frame level boundary judgment need to be solved.
Disclosure of Invention
The invention provides an action time sequence detection method based on a 3D convolutional network, which is used for carrying out feature extraction and classification, identification and prediction on human body actions in a video image. The method of the invention is the basis for realizing the technologies of safety monitoring, intelligent monitoring, man-machine interaction, intelligent robot and the like.
In order to achieve the purpose of the invention, the invention adopts the following inventive concept:
aiming at time sequence information of video detection action with any unlimited duration and judging action types, a key frame-based action extraction method is designed, and multi-scale fusion is performed by combining a 3D convolutional network and a space-time characteristic pyramid structure to generate prediction of the whole action and details thereof.
Firstly, extracting key frames with obviously changed actions through K-means clustering, extracting action characteristics by using a 3D convolution network, and fusing the 3D convolution deconvolution network with a space-time characteristic pyramid structure to realize multi-scale action frame level prediction; and then, fusing the prediction results by using Kalman filtering so as to achieve the purpose of predicting the action time sequence.
According to the inventive concept, the invention adopts the following technical scheme:
a motion time sequence detection method based on a 3D convolutional network is characterized in that: extracting key frames with obviously changed actions through K-means clustering, and extracting action characteristics by utilizing a 3D (three-dimensional) convolutional network;
then fusing the 3D convolution deconvolution network with the space-time characteristic pyramid structure to perform multi-scale action frame level prediction;
and finally, fusing the prediction results by Kalman filtering, and predicting the action time sequence to generate a proposal.
Preferably, the motion feature extraction method includes the steps of:
1) dividing the video clip into a training video and a testing video, and taking the training video and the testing video as input in a training stage and a testing stage respectively;
2) clustering similar motion frames in the video by using the K-mean value, and selecting a frame of video frame in each cluster as a key frame;
3) and inputting the obtained action key frame sequence into a 3D convolutional network, and extracting the space-time action characteristics.
Preferably, the action schedule proposal comprises the following steps:
inputting feature data obtained by extracting action features into a 3D convolution deconvolution network, and restoring the features to the original input length through time dimension upsampling to meet the prediction of a frame level;
independently outputting motion predictions of different scales to the intermediate process of the 3D convolution deconvolution network by using the multi-scale characteristics of the space-time pyramid, and realizing overall prediction of the motion;
and thirdly, performing time sequence filtering on the characteristics obtained by each sliding window through Kalman filtering to improve the continuity of the predicted action between adjacent windows and generate a time sequence detection action proposal.
Compared with the prior art, the invention has the following prominent substantive characteristics and remarkable progress:
1. the invention adopts the action time sequence detection method based on the 3D convolution network, so that the action which occurs at any position and has any duration can be predicted at the frame level, and the real-time effect is achieved;
2. according to the method, the information difference between the action key frames is maximized through K-means clustering, so that the 3D convolutional network can more effectively extract rich action characteristic information, and the classification accuracy is improved;
3. the method disclosed by the invention integrates the frame level prediction of the 3D convolution deconvolution network and the multi-scale characteristics of the space-time pyramid network, and integrates the frame level prediction result of the action and the overall prediction result of the action, so that the time sequence position of the action can be accurately detected, and the detection precision is remarkably improved compared with that of single scale prediction.
Drawings
Fig. 1 is a block diagram of the structure of the method for detecting an action timing sequence based on a 3D convolutional network according to the present invention.
FIG. 2 is a schematic diagram of key frame extraction according to the method of the present invention.
FIG. 3 is a schematic diagram of the method of the present invention for extracting the motion characteristics.
FIG. 4 is a diagram illustrating multi-scale frame-level motion prediction in accordance with the present invention.
FIG. 5 is a schematic diagram of the proposed generation of timing action detection according to the method of the present invention.
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
Example one
In this embodiment, referring to fig. 1, a method for detecting an action timing sequence based on a 3D convolutional network extracts a key frame whose action changes significantly through K-means clustering, and extracts action features by using the 3D convolutional network;
then fusing the 3D convolution deconvolution network with the space-time characteristic pyramid structure to perform multi-scale action frame level prediction;
and finally, fusing the prediction results by Kalman filtering, and predicting the action time sequence to generate a proposal.
The method is used for carrying out feature extraction, classification, identification and prediction on human body actions in the video images, and can realize safety monitoring, intelligent monitoring and human-computer interaction functions.
Example two
This embodiment is substantially the same as the first embodiment, and is characterized in that:
in this embodiment, the motion feature extraction method includes the steps of:
1) dividing the video clip into a training video and a testing video, and taking the training video and the testing video as input in a training stage and a testing stage respectively;
2) clustering similar motion frames in the video by using the K-mean value, and selecting a frame of video frame in each cluster as a key frame;
3) and inputting the obtained action key frame sequence into a 3D convolutional network, and extracting the space-time action characteristics.
In this embodiment, the action sequence proposal includes the following steps:
inputting feature data obtained by extracting action features into a 3D convolution deconvolution network, and restoring the features to the original input length through time dimension upsampling to meet the prediction of a frame level;
independently outputting motion predictions of different scales to the intermediate process of the 3D convolution deconvolution network by using the multi-scale characteristics of the space-time pyramid, and realizing overall prediction of the motion;
and thirdly, performing time sequence filtering on the characteristics obtained by each sliding window through Kalman filtering to improve the continuity of the predicted action between adjacent windows and generate a time sequence detection action proposal.
According to the method, key frames with obviously changed actions are extracted through K-means clustering, action features are extracted through a 3D convolution network, then the 3D convolution deconvolution network is fused with a space-time feature pyramid structure to achieve multi-scale action frame level prediction, and prediction results are fused through Kalman filtering to achieve the purpose of predicting action time sequences. The method for detecting the action time sequence of the 3D convolutional network is used for predicting the frame level of the action which occurs at any position and has any duration, and achieves the effect of real-time performance; information difference between action key frames is maximized through K-means clustering, so that the 3D convolutional network can more effectively extract rich action characteristic information, and the classification accuracy is improved; by adopting a multi-scale fusion scheme of a 3D convolution deconvolution network and a space-time characteristic pyramid network, the problem of low prediction precision under a single scale is solved, the prediction result has the action integrity and the action detail information, and the detection precision is obviously improved.
EXAMPLE III
This embodiment is substantially the same as the above embodiment, and is characterized in that:
in this embodiment, as shown in fig. 1, a method for detecting an action timing based on a 3D convolutional network includes the following steps:
step 1: a video clip is generated for the input video sliding window. The length of a real natural video in a time dimension is very long, so for detecting an action time sequence of a video with unlimited length, a sliding window with a fixed length needs to be performed on the video, so as to perform subsequent operation on each sliding window.
Step 2: and extracting video action key frames. After all frame images of the video sequence are extracted according to the video sampling rate, clustering is carried out on the similar motion frames through a K-mean value, and one frame of video frame is selected from each cluster as a key frame to obtain a key frame sequence of the video. The extraction of the video action key frame can remove redundancy of a lengthy video, eliminate similar redundant frames and adjust the length of the video on the premise of ensuring the action integrity.
And step 3: the motion characteristics are extracted by a 3D convolutional network. And inputting the obtained action key frame sequence into a 3D convolutional network for extracting space-time action characteristics. The initial parameters of the network adopt a pre-training model, and the characteristics are extracted after fine tuning. In the training stage, loss obtained by the neural network through the softmax output layer is reversely propagated layer by layer, and network parameters are adjusted layer by layer through a gradient descent method, so that the 3D convolutional network can adaptively learn characteristics of actions of the input video. In the testing stage, the input action key frame sequence passes through the 5 th layer pooling layer of the network to obtain action characteristics for subsequent classification prediction tasks.
And 4, step 4: multi-scale frame-level prediction based on 3D convolution deconvolution and a spatio-temporal feature pyramid. And inputting the characteristic data obtained by the steps into a 3D convolution deconvolution network, sampling on a time dimension while sampling on a space dimension, and restoring the time dimension. On the basis, aiming at the problem that the overall action information is possibly lost based on single-scale network frame prediction, a space-time feature pyramid structure is introduced, action predictions of different scales are independently output in the middle process of the 3D convolution and deconvolution network, and multi-scale features are fused to obtain the final frame-level action prediction.
And 5: a time series operation prediction proposal is generated. The motion between adjacent windows of the frame-level motion prediction result generated in the above steps is divided by the sliding window, so that the completeness of motion proposal generation is influenced. And performing time sequence filtering on the frame level prediction result by using Kalman filtering, and making the optimal estimation of the current frame by combining the state value of the historical sequence and the observation value of the current frame to achieve the optimal action proposal generation result.
As shown in fig. 2, the method for extracting a video motion key frame according to this embodiment includes the following steps:
clustering similar motion frames of the video sequence through a K-mean value, and selecting one frame of video frame in each cluster as a key frame to obtain a key frame sequence of the video.
As shown in fig. 3, the steps of the motion feature extraction method adopted in this embodiment are as follows:
and inputting the obtained action key frame sequence into a 3D convolutional network for extracting space-time action characteristics.
As shown in fig. 4, the steps of the multi-scale frame level motion prediction method adopted in the present embodiment are as follows:
and inputting the motion characteristic data into a 3D convolution deconvolution network and a space-time characteristic pyramid structure to obtain motion predictions of different scales, and fusing multi-scale characteristics to obtain a final frame-level motion prediction.
As shown in fig. 5, the steps of the method for generating a time series motion detection proposal according to this embodiment are as follows:
and performing time sequence filtering on the frame level motion prediction by using Kalman filtering to make the optimal estimation of the current frame so as to achieve the optimal motion proposal generation result.
In the embodiment, the action time sequence detection method based on the 3D convolutional network is adopted, so that actions which occur at any position and have any duration can be predicted at the frame level, and the real-time effect is achieved; according to the method, the information difference between the action key frames is maximized through K-means clustering, so that the 3D convolutional network can more effectively extract rich action characteristic information, and the classification accuracy is improved; the method integrates the frame level prediction of the 3D convolution deconvolution network and the multi-scale characteristics of the space-time pyramid network, integrates the frame level prediction result of the action and the overall prediction result of the action, can accurately detect the time sequence position of the action, and remarkably improves the detection precision compared with single scale prediction.
The embodiments of the present invention have been described with reference to the accompanying drawings, but the present invention is not limited to the embodiments, and various changes and modifications can be made according to the purpose of the invention, and any changes, modifications, substitutions, combinations or simplifications made according to the spirit and principle of the technical solution of the present invention shall be equivalent substitutions, as long as the purpose of the present invention is met, and the present invention shall fall within the protection scope of the present invention without departing from the technical principle and inventive concept of the present invention.
Claims (3)
1. A motion time sequence detection method based on a 3D convolutional network is characterized in that: extracting key frames with obviously changed actions through K-means clustering, and extracting action characteristics by utilizing a 3D (three-dimensional) convolutional network;
then fusing the 3D convolution deconvolution network with the space-time characteristic pyramid structure to perform multi-scale action frame level prediction;
and finally, fusing the prediction results by Kalman filtering, and predicting the action time sequence to generate a proposal.
2. The 3D convolutional network-based action timing detection method according to claim 1, wherein: the action feature extraction method comprises the following steps:
1) dividing the video clip into a training video and a testing video, and taking the training video and the testing video as input in a training stage and a testing stage respectively;
2) clustering similar motion frames in the video by using the K-mean value, and selecting a frame of video frame in each cluster as a key frame;
3) and inputting the obtained action key frame sequence into a 3D convolutional network, and extracting the space-time action characteristics.
3. The 3D convolutional network-based action timing detection method according to claim 1, wherein: the action sequence proposal comprises the following steps:
inputting feature data obtained by extracting action features into a 3D convolution deconvolution network, and restoring the features to the original input length through time dimension upsampling to meet the prediction of a frame level;
independently outputting motion predictions of different scales to the intermediate process of the 3D convolution deconvolution network by using the multi-scale characteristics of the space-time pyramid, and realizing overall prediction of the motion;
and thirdly, performing time sequence filtering on the characteristics obtained by each sliding window through Kalman filtering to improve the continuity of the predicted action between adjacent windows and generate a time sequence detection action proposal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110285908.1A CN112949544A (en) | 2021-03-17 | 2021-03-17 | Action time sequence detection method based on 3D convolutional network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110285908.1A CN112949544A (en) | 2021-03-17 | 2021-03-17 | Action time sequence detection method based on 3D convolutional network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112949544A true CN112949544A (en) | 2021-06-11 |
Family
ID=76229361
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110285908.1A Pending CN112949544A (en) | 2021-03-17 | 2021-03-17 | Action time sequence detection method based on 3D convolutional network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112949544A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113345061A (en) * | 2021-08-04 | 2021-09-03 | 成都市谛视科技有限公司 | Training method and device for motion completion model, completion method and device, and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109541583A (en) * | 2018-11-15 | 2019-03-29 | 众安信息技术服务有限公司 | A kind of leading vehicle distance detection method and system |
CN109947986A (en) * | 2019-03-18 | 2019-06-28 | 东华大学 | Infrared video timing localization method based on structuring sectional convolution neural network |
CN110688927A (en) * | 2019-09-20 | 2020-01-14 | 湖南大学 | Video action detection method based on time sequence convolution modeling |
CN111291647A (en) * | 2020-01-21 | 2020-06-16 | 陕西师范大学 | Single-stage action positioning method based on multi-scale convolution kernel and superevent module |
CN111898514A (en) * | 2020-07-24 | 2020-11-06 | 燕山大学 | Multi-target visual supervision method based on target detection and action recognition |
CN112101243A (en) * | 2020-09-17 | 2020-12-18 | 四川轻化工大学 | Human body action recognition method based on key posture and DTW |
-
2021
- 2021-03-17 CN CN202110285908.1A patent/CN112949544A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109541583A (en) * | 2018-11-15 | 2019-03-29 | 众安信息技术服务有限公司 | A kind of leading vehicle distance detection method and system |
CN109947986A (en) * | 2019-03-18 | 2019-06-28 | 东华大学 | Infrared video timing localization method based on structuring sectional convolution neural network |
CN110688927A (en) * | 2019-09-20 | 2020-01-14 | 湖南大学 | Video action detection method based on time sequence convolution modeling |
CN111291647A (en) * | 2020-01-21 | 2020-06-16 | 陕西师范大学 | Single-stage action positioning method based on multi-scale convolution kernel and superevent module |
CN111898514A (en) * | 2020-07-24 | 2020-11-06 | 燕山大学 | Multi-target visual supervision method based on target detection and action recognition |
CN112101243A (en) * | 2020-09-17 | 2020-12-18 | 四川轻化工大学 | Human body action recognition method based on key posture and DTW |
Non-Patent Citations (2)
Title |
---|
LIU WANG等: ""Video action recognition based on improved 3D convolutional network and sparse representation classification"", 《PROCEEDINGS OF SPIE》 * |
刘望等: ""基于时空特征金字塔网络的动作时序检测方法"", 《系统仿真学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113345061A (en) * | 2021-08-04 | 2021-09-03 | 成都市谛视科技有限公司 | Training method and device for motion completion model, completion method and device, and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446923B (en) | Deep supervision convolutional neural network behavior recognition method based on training feature fusion | |
CN108491077B (en) | Surface electromyographic signal gesture recognition method based on multi-stream divide-and-conquer convolutional neural network | |
CN111079646A (en) | Method and system for positioning weak surveillance video time sequence action based on deep learning | |
Du et al. | Hierarchical recurrent neural network for skeleton based action recognition | |
CN108764059B (en) | Human behavior recognition method and system based on neural network | |
CN109858407B (en) | Video behavior recognition method based on multiple information flow characteristics and asynchronous fusion | |
CN107862275A (en) | Human bodys' response model and its construction method and Human bodys' response method | |
CN113673510B (en) | Target detection method combining feature point and anchor frame joint prediction and regression | |
CN110569843B (en) | Intelligent detection and identification method for mine target | |
CN111738054B (en) | Behavior anomaly detection method based on space-time self-encoder network and space-time CNN | |
Su et al. | HDL: Hierarchical deep learning model based human activity recognition using smartphone sensors | |
CN108399435A (en) | A kind of video classification methods based on sound feature | |
CN110991278A (en) | Human body action recognition method and device in video of computer vision system | |
CN108108688A (en) | A kind of limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic with theme modeling | |
CN113065515A (en) | Abnormal behavior intelligent detection method and system based on similarity graph neural network | |
Lorenzo et al. | Intformer: Predicting pedestrian intention with the aid of the transformer architecture | |
CN113705445A (en) | Human body posture recognition method and device based on event camera | |
Patil et al. | An approach of understanding human activity recognition and detection for video surveillance using HOG descriptor and SVM classifier | |
CN116956222A (en) | Multi-complexity behavior recognition system and method based on self-adaptive feature extraction | |
CN113343760A (en) | Human behavior recognition method based on multi-scale characteristic neural network | |
CN110659630A (en) | Video human body abnormal behavior detection method based on skeleton point track dynamic analysis | |
CN109002808B (en) | Human behavior recognition method and system | |
CN112949544A (en) | Action time sequence detection method based on 3D convolutional network | |
Al-Shakarchy et al. | Detecting abnormal movement of driver's head based on spatial-temporal features of video using deep neural network DNN | |
Singhal et al. | Deep Learning Based Real Time Face Recognition For University Attendance System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210611 |
|
RJ01 | Rejection of invention patent application after publication |