CN109583360B - Video human body behavior identification method based on spatio-temporal information and hierarchical representation - Google Patents
Video human body behavior identification method based on spatio-temporal information and hierarchical representation Download PDFInfo
- Publication number
- CN109583360B CN109583360B CN201811418871.XA CN201811418871A CN109583360B CN 109583360 B CN109583360 B CN 109583360B CN 201811418871 A CN201811418871 A CN 201811418871A CN 109583360 B CN109583360 B CN 109583360B
- Authority
- CN
- China
- Prior art keywords
- video
- frame
- spatio
- representation
- track
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the field of artificial intelligence, in particular to a video human body behavior identification method based on spatio-temporal information and hierarchy representation. The invention fully utilizes the spatio-temporal information in the video, and the hierarchical spatio-temporal bundle divides the video motion into a plurality of parts, thereby obtaining the higher-dimensional representation of the video motion. Aiming at the defects that the traditional video representation method ignores the semantic information of the middle and high layers of the video, only focuses on the occurrence frequency of the features, utilizes only 0-order information and the like, the video representation method based on the hierarchical space-time beam can effectively eliminate the background noise interference of the video, make up the semantic gap between the bottom-layer features and the high-layer features, and can capture higher-order and more complex motion structure information. The hierarchical space-time beam method can extract more complex and expressive video representation on higher dimensionality, and can effectively improve the video identification effect.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a video human body behavior identification method based on spatio-temporal information and hierarchy representation.
Background
The video human behavior identification is a leading-edge artificial intelligence technology, can automatically calculate through a computer so as to identify and classify video contents, and can be widely applied to intelligent monitoring, human-computer interaction and video content retrieval. Specifically, the method is a process of extracting features from a video data set with calibrated categories through a machine learning technology, training the features to obtain a classifier, and further judging unknown videos. In order to obtain a high human behavior recognition rate, it is first necessary to extract features having expressive power. The ideal characteristics firstly need to have robustness to human appearance and size, scene illumination, shooting visual angle and the like; secondly, the extracted features should contain rich scene context information, so that videos of other motion categories can be effectively distinguished.
From the aspect of feature extraction, the current human behavior recognition technology comprises video representation methods based on bottom-layer features, hierarchical features and depth features. The video representation method based on the underlying features can be divided into a video representation based on global features and a video representation based on local features, such as a video representation method of space-time interest points, track features and the like. The hierarchical feature-based method may be classified into a scene context-based video representation method and a space-time segment-based video representation method. The above techniques now have the following disadvantages:
1) Separation of foreground motion from background motion
Under the monitoring environment that the background is fixed and the optical flow is not changed much, the human behavior recognition can achieve a good effect. However, in a natural scene, a video is susceptible to many factors such as angle change, camera shake, illumination, background occlusion, rapid and irregular movement of background speckle, and the like.
2) Foreground feature extraction difficulties
The human motion video shot in the natural scene cannot avoid the background illumination and the camera motion. If the feature extraction is improper, a large amount of background noise is mixed, information redundancy is caused, the effectiveness of the extracted feature is reduced, and the identification result is influenced.
3) Video representation construction
Even two videos in the same motion category have different motion patterns. The speed of movement is also different for each individual video execution. Moreover, the same action category has different shooting scenes and angles.
Disclosure of Invention
The invention provides a video human body behavior identification method based on spatio-temporal information and hierarchical representation.
In order to realize the purpose of the invention, the technical scheme is as follows:
a video human body behavior identification method based on spatio-temporal information and hierarchical representation comprises the following steps:
step S1: extracting a foreground motion optical flow based on the whole optical flow of the camera motion compensation video clip, and forming a compensation track;
step S2: filtering to obtain a key frame with discrimination in the video through key frame selection;
and step S3: sampling and training the compensation track to obtain a Gaussian mixture model;
and step S4: selecting a key frame to obtain a video key frame set, and performing FV coding on the compensation track by combining a Gaussian mixture model to form a key track set;
step S5: carrying out a fragment segmentation and sequencing model on the whole video, and executing the steps S1-S4 on the segmented video fragments to obtain the hierarchical space-time beam characteristics of the segmented video fragments;
step S6: and taking the level space-time beam as video representation and as the input of a classifier, and obtaining a video classification label after SVM classification.
Preferably, the specific steps of step S1 are as follows:
step S101: simulating the motion of a camera by adopting a six-parameter affine model;
step S102: to video frameiPixelp i = (x i ,y i ) Affine optical flow vectorw A (p i ) The expression is shown in equation 3:
whereinu A (p i ) = c 1 (i) + a 1 (i)x i + a 2 (i)y i In the form of a horizontal affine optical flow vector,v A (p i ) = c 2 (i) + a 3 (i)x i + a 4 (i)y i in the form of a vertical affine optical flow vector,θ =[c 1 ,c 2 ,a 1 ,…,a 4 ] T a parameter vector being a six-parameter affine model, whereinc 1 , c 2 Representing the parameters of the translation of the camera,a 1 , a 2 , a 3 , a 4 representing camera rotation and zoom parameters, set pointsp i At the position of the next video frameAs shown in equation 4, whereinθ =[c a]The objective function to be solved is shown in equation 5, wheremThe number of the characteristic points in the video frame is shown;
wherein the content of the first and second substances,after the object is compensated by the camera, the motion of the camera is removed, and the displacement of the real world of the object is represented;
step S103: based on a real-time incremental multi-resolution Motion2D algorithm, an affine model parameter vector is obtained by calculation in a parameter incremental estimation modeθAnd defining a pixelp i = (x i ,y i ) A global optical flow vector ofw(p i ) = ( u(p i ),v(p i ) Compensating for the light fluxw F (p i ) As shown in equation 6:
defining compensated optical floww F The resulting improved dense tracks are tracked as compensated tracks.
Preferably, the specific steps of step S2 are as follows:
step S201: respectively calculating the time significance and the space significance of each input video frame;
step S202: the two saliency linear combinations are used to calculate the saliency of each pixel, and each
Defining the significance value of a video frame as the sum of the significance values of all pixel points of the frame;
step S203: video frames with higher than average significance are selected, and video frames with lower significance are filtered.
Preferably, the specific steps of step S3 are as follows:
step S301: randomly sampling a compensation track to construct a Gaussian mixture model GMM and create a visual vocabulary dictionary;
step S302: according to the analyzed compensated track characteristics in the video frame,Trj-HOG、Trj-HOF andTrj-MBH, estimating probability density functions corresponding to feature space points using FV coding;
step S303: all feature points are fitted by using Gaussian distribution to obtain the features of the key track set, and the GMM generation model can be expressed as formula 7:
whereinKIs the number of the gaussian kernels, and,model of parameters representing itWherein、Andrespectively representing prior mode probability, mean vector and covariance matrix,representDGaussian distribution of dimension whereinDAnd the compensated track dimension characteristics after dimension reduction.
Preferably, the specific steps of step S4 are as follows:
step S401: for input videoXSelecting a key frame to obtain a video key frame set;
step S402: FV coding to obtain a critical track set representation TB, if a frameiIn framei+Before 1, define the timing relationship as TB i+1 >TB i As shown in equation 1, the following linear function is defined:
equation 1 is a sequential regression problem in which the label y of (x, y) represents a rank rather than a scalar category, defining x as videoXIs defined as the number of frames in which x is located, and P is defined as the set P = { (TB) of pairs of video frame features i , TB j ): TB i >TB j And defineDefining the constrained objective function as equation 2 under the framework of structure risk minimization and max-margin algorithm, whereC In order to be a penalty factor,in order to be a function of the relaxation variable,wrepresenting spatio-temporal structure information between video frame features TB as videoXIs shown in
Preferably, the specific steps of step S5 are as follows:
s501: dividing the input video into repeated fixed length of16The video clip of (1);
s502: executing the steps S1-S4 on the segmented segments to obtain segment representation of each video segment;
s503: the video segment is subjected to dimensionality reduction and whitening by PCA, and then is used as the input of a LIBLINEAR tool set to obtain parameters according to a formula (2)wNormalized to the final video representation, i.e., a hierarchical spatio-temporal bundle.
Compared with the prior art, the invention has the beneficial effects that:
the invention fully utilizes the spatiotemporal information in the video, and the hierarchy spatiotemporal bundle divides the video motion into a plurality of parts, thereby obtaining the higher-dimensional representation of the video motion. Aiming at the defects that the traditional video representation method ignores the semantic information of the middle and high layers of the video, only focuses on the occurrence frequency of the features, utilizes only 0-order information and the like, the video representation method based on the hierarchical space-time beam can effectively eliminate the background noise interference of the video, make up the semantic gap between the bottom-layer features and the high-layer features, and can capture higher-order and more complex motion structure information. The hierarchical space-time beam method can extract more complex and expressive video representation on a higher dimension, and can effectively improve the effect of video identification.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a histogram of the average recognition accuracy of the recognition method of the present invention on the Hollywood2 movie data set.
FIG. 3 is a confusion matrix on UCF Sports data set by the recognition method of the present invention.
Fig. 4 is a confusion matrix on the HMDB51 data set by the identification method of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the invention is further illustrated below with reference to the figures and examples.
Example 1
A video human body behavior identification method based on spatio-temporal information and hierarchical representation comprises the following steps:
step S1: extracting a foreground motion light stream based on the whole light stream of the camera motion compensation video clip and forming a compensation track;
step S2: filtering to obtain a key frame with discrimination in the video through key frame selection;
and step S3: sampling and training the compensation track to obtain a Gaussian mixture model;
and step S4: selecting a key frame to obtain a video key frame set, and performing FV coding on the compensation track by combining a Gaussian mixture model to form a key track set;
step S5: performing a segment segmentation and sequencing model on the whole video, and executing the steps S1-S4 on the segmented video segments to obtain the level space-time beam characteristics of the segmented video segments;
step S6: and taking the level space-time beam as video representation and as the input of a classifier, and obtaining a video classification label after SVM classification.
Preferably, the specific steps of step S1 are as follows:
step S101: simulating the motion of a camera by adopting a six-parameter affine model;
step S102: to video frameiPixelp i = (x i ,y i ) Imitating thatVector of the incident lightw A (p i ) The expression is shown in equation 3:
whereinu A (p i ) = c 1 (i) + a 1 (i)x i + a 2 (i)y i In the form of a horizontal affine optical flow vector,v A (p i ) = c 2 (i) + a 3 (i)x i + a 4 (i)y i in the form of a vertical affine optical flow vector,θ =[c 1 ,c 2 ,a 1 ,…,a 4 ] T a parameter vector being a six-parameter affine model, whereinc 1 , c 2 Representing the parameters of the translation of the camera,a 1 , a 2 , a 3 , a 4 representing camera rotation and zoom parameters, set pointsp i At the position of the next video frameAs shown in equation 4, whereinθ =[c a]The objective function to be solved is shown in equation 5, wheremThe number of the characteristic points in the video frame is shown;
wherein the content of the first and second substances,supplementing the object with the cameraAfter compensation, the motion of the camera is removed, and the displacement of the real world of the object is represented;
step S103: based on a real-time incremental multiresolution Motion2D algorithm, an affine model parameter vector is obtained by calculation in a parameter incremental estimation modeθAnd define a pixelp i = (x i ,y i ) A global optical flow vector ofw(p i ) = ( u(p i ),v(p i ) Compensating for the light fluxw F (p i ) As shown in equation 6:
defining compensated optical floww F The resulting improved dense tracks are tracked as compensated tracks.
Preferably, the specific steps of step S2 are as follows:
step S201: respectively calculating the time significance and the space significance of each input video frame;
step S202: the two saliency linear combinations are used to calculate the saliency of each pixel, and each
Defining the significance value of the video frame as the sum of the significance values of all pixel points of the frame;
step S203: video frames with higher than average significance are selected, and video frames with lower significance are filtered.
Preferably, the specific steps of step S3 are as follows:
step S301: randomly sampling a compensation track to construct a Gaussian mixture model GMM and establish a visual vocabulary dictionary;
step S302: according to the compensated track characteristics analyzed from the video frames,Trj-HOG、Trj-HOF andTrj-MBH, estimating probability density functions corresponding to feature space points using FV coding;
step S303: all feature points are fitted by using Gaussian distribution to obtain the features of the key track set, and the GMM generation model can be expressed as formula 7:
whereinKIs the number of the gaussian kernels,model of parameters representing itIn which、Andrespectively representing the prior mode probability, the mean vector and the covariance matrix,to representDGaussian distribution of dimension whereinDAnd the dimension characteristic of the compensated track after dimension reduction.
Preferably, the specific steps of step S4 are as follows:
step S401: for input videoXSelecting a key frame to obtain a video key frame set;
step S402: FV coding to obtain a critical track set representation TB, if a frameiIn-framei+Before 1, define the timing relationship as TB i+1 >TB i As shown in equation 1, the following linear function is defined:
equation 1 is a sequential regression problem in which (x, y) isThe label y represents rank rather than scalar category, and x is defined as videoXIs defined as the number of frames in which x is located, and P is defined as the set P = { (TB) of pairs of video frame features i , TB j ): TB i >TB j And defineDefining the constrained objective function as formula 2 under the framework of the structure risk minimization and max-margin algorithm, whereinC In order to be a penalty factor,in order to be a function of the relaxation variable,wrepresenting spatio-temporal structure information between video frame features TB as videoXIs shown in
Preferably, the specific steps of step S5 are as follows:
s501: dividing the input video into repeated fixed length of16The video clip of (a);
s502: executing the steps S1-S4 on the segmented segments to obtain segment representation of each video segment;
s503: the video segments are subjected to PCA dimensionality reduction and whitening, and then used as input of LIBLINEAR toolset to obtain parameters according to formula (2)wAnd after regularization, the video is used as a final video representation, namely a hierarchical spatio-temporal bundle.
Example 2
As shown in fig. 1 and fig. 2, the experimental environment of this embodiment is: the single machine Linux operating system (Ubuntu 16.04 LTS) has the CPU frequency of 2.10GHz, 32-core CPU,64G memory and 15T hard disk capacity. The experimental codes are mainly C + + and Matlab, and open APIs and class libraries such as OpenCV2.4.9, libSVM, VLFeat, CUDA8.0 and CAFFE are used.
The experimental data sets of this example are three standard data sets of UCF Sports, hollywood2 and HMDB 51. The evaluation index of Hollywood2 is mean Average Accuracy of identification (mAP), and the evaluation index of the rest two data sets is mean Average Accuracy of identification (mAP). Wherein the calculation of mAA, mAP is shown in equations 10 and 11.
Wherein, R is the total number of the behaviors, and AAr and APr respectively represent the accuracy and the recognition precision of the R-th behavior.
The formula of the confusion matrix is shown in fig. 12, where any row in the matrix corresponds to the classification result of a class of behavior, and the sum of each row of the matrix is 1. The elements of the diagonal represent the percentage of correct classification, i.e. the accuracy of a certain type of behavior.
As shown in fig. 2, the average recognition accuracy histogram on the Hollywood2 movie data set by using the recognition method of the present invention is shown, and the average recognition accuracy is 66.71%. As shown in fig. 3 and 4, the confusion matrix on the UCF Sports and HMDB51 data sets using the identification method of the present invention is shown, and the average identification rates are 89.17% and 65.63%, respectively. The experimental result shows that the identification method of the invention achieves better identification effect, and has obvious progress compared with the existing method.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (5)
1. A video human body behavior identification method based on spatio-temporal information and hierarchical representation is characterized by comprising the following steps:
step S1: based on the overall optical flow of the whole video clip of the motion compensation of the camera, extracting the foreground motion optical flow and forming a compensation track;
step S2: filtering to obtain a key frame with discrimination in the video through key frame selection;
and step S3: sampling and training the compensation track to obtain a Gaussian mixture model;
and step S4: selecting a key frame to obtain a video key frame set, and performing FV coding on the compensation track by combining a Gaussian mixture model to form a key track set;
step S5: carrying out a segment segmentation and sequencing model on the whole video, and executing the steps S1-S4 on the segmented video segments to obtain the level space-time beam characteristics of the segmented video segments;
step S6: taking a level space-time beam as video representation and as input of a classifier, and obtaining a video classification label after SVM classification;
the specific steps of step S4 are as follows:
step S401: for an input video clip, selecting a key frame to obtain a video key frame set;
step S402: performing FV coding to obtain a key track set representing TB, and if frame i precedes frame i +1, defining the timing relationship as TB i+1 >TB i As shown in equation 1, the following linear function is defined:
equation 1 is a sequential regression problem in which P is defined as the set of video frame feature pairs P = { (TB) i ,TB j ):TB i >TB j And defineUnder the framework of structure risk minimization and max-margin algorithm, a limited objective function is defined as formula 2, wherein C is a penalty factor and ξ ij For the relaxation variables, w represents spatio-temporal structure information between video frame features TB as a representation of video X
2. The video human body behavior recognition method based on spatio-temporal information and hierarchical representation according to claim 1, wherein the specific steps of step S1 are as follows:
step S101: simulating the motion of a camera by adopting a six-parameter affine model;
step S102: for the ith pixel p of the video frame i =(x i ,y i ) Affine optical flow vector w A (p i ) The expression is shown in equation 3:
wherein u is A (p i )=c 1 (i)+a 1 (i)x i +a 2 (i)y i For horizontal affine optical flow vectors, v A (p i )=c 2 (i)+a 3 (i)x i +a 4 (i)y i For vertical affine optical flow vectors, θ = [ c = 1 ,c 2 ,a 1 ,…,a 4 ] T A parameter vector being a six-parameter affine model, wherein c 1 ,c 2 Representing camera translation parameters, a 1 ,a 2 ,a 3 ,a 4 Representing camera rotation and zoom parameters, set point p i At the position of the next video frame is p i ', as shown in equation 4, the objective function to be solved is as shown in equation 5Wherein m is the number of the characteristic points in the video frame;
wherein gamma (theta) is the displacement of the real world of the object, wherein the motion of the camera is removed after the object is compensated by the camera;
step S103: based on a real-time incremental multiresolution Motion2D algorithm, an affine model parameter vector theta is calculated by adopting a parameter incremental estimation mode, and a pixel p is defined i =(x i ,y i ) Has a global optical flow vector of w (p) i )=(u(p i ),v(p i ) Compensating the luminous flux w) F (p i ) As shown in equation 6:
w F (p i )=w(p i )-w A (p i ) (6)
defining a compensated optical flow w F The resulting improved dense track is tracked as a compensated track.
3. The video human body behavior recognition method based on spatio-temporal information and hierarchical representation according to claim 2, wherein the specific steps of step S2 are as follows:
step S201: respectively calculating the time significance and the space significance of each input video frame;
step S202: the two significance linear combinations are used for calculating the significance of each pixel, and the significance value of each video frame is defined as the sum of the significance values of all pixel points of the frame;
step S203: video frames with higher than average significance are selected, and video frames with lower significance are filtered.
4. The video human body behavior recognition method based on spatio-temporal information and hierarchical representation according to claim 3, wherein the specific steps of step S3 are as follows:
step S301: randomly sampling a compensation track to construct a Gaussian mixture model GMM and create a visual vocabulary dictionary;
step S302: estimating probability density functions corresponding to the feature space points by adopting FV coding according to the compensation track features, trj-HOG, trj-HOF and Trj-MBH, analyzed and obtained from the video frame;
step S303: all feature points are fitted by utilizing Gaussian distribution to obtain the features of the key track set, and the GMM generation formula model can be expressed as a formula 7:
where K is the number of Gaussian kernels and θ represents its parametric model { π k ,μ k ,σ k K =1,.., K }, wherein pi & k 、μ k And σ k Respectively representing the prior mode probability, the mean vector and the covariance matrix, zeta (x; mu) k And Σ k) represents a gaussian distribution in the D dimension, where D is the compensated trajectory dimensional feature after dimensionality reduction.
5. The method for recognizing the human body behaviors based on the videos of the spatio-temporal information and the hierarchical representations according to claim 4, wherein the specific steps of the step S5 are as follows:
s501: dividing an input video into repeated video segments with fixed length of 16;
s502: executing the steps S1 to S4 on the segmented segments to obtain segment representation of each video segment;
s503: the video segments are subjected to dimensionality reduction and whitening by PCA, then used as input of a LIBLINEAR tool set, a parameter w is obtained according to a formula (2), and the parameter w is used as final video representation, namely a level space-time beam after regularization.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811418871.XA CN109583360B (en) | 2018-11-26 | 2018-11-26 | Video human body behavior identification method based on spatio-temporal information and hierarchical representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811418871.XA CN109583360B (en) | 2018-11-26 | 2018-11-26 | Video human body behavior identification method based on spatio-temporal information and hierarchical representation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109583360A CN109583360A (en) | 2019-04-05 |
CN109583360B true CN109583360B (en) | 2023-01-10 |
Family
ID=65924617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811418871.XA Active CN109583360B (en) | 2018-11-26 | 2018-11-26 | Video human body behavior identification method based on spatio-temporal information and hierarchical representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109583360B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188733A (en) * | 2019-06-10 | 2019-08-30 | 电子科技大学 | Timing behavioral value method and system based on the region 3D convolutional neural networks |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529477A (en) * | 2016-11-11 | 2017-03-22 | 中山大学 | Video human behavior recognition method based on significant trajectory and time-space evolution information |
CN106682258A (en) * | 2016-11-16 | 2017-05-17 | 中山大学 | Method and system for multi-operand addition optimization in high-level synthesis tool |
CN106778854A (en) * | 2016-12-07 | 2017-05-31 | 西安电子科技大学 | Activity recognition method based on track and convolutional neural networks feature extraction |
CN107563345A (en) * | 2017-09-19 | 2018-01-09 | 桂林安维科技有限公司 | A kind of human body behavior analysis method based on time and space significance region detection |
CN108256434A (en) * | 2017-12-25 | 2018-07-06 | 西安电子科技大学 | High-level semantic video behavior recognition methods based on confusion matrix |
CN109508684A (en) * | 2018-11-21 | 2019-03-22 | 中山大学 | A kind of method of Human bodys' response in video |
-
2018
- 2018-11-26 CN CN201811418871.XA patent/CN109583360B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529477A (en) * | 2016-11-11 | 2017-03-22 | 中山大学 | Video human behavior recognition method based on significant trajectory and time-space evolution information |
CN106682258A (en) * | 2016-11-16 | 2017-05-17 | 中山大学 | Method and system for multi-operand addition optimization in high-level synthesis tool |
CN106778854A (en) * | 2016-12-07 | 2017-05-31 | 西安电子科技大学 | Activity recognition method based on track and convolutional neural networks feature extraction |
CN107563345A (en) * | 2017-09-19 | 2018-01-09 | 桂林安维科技有限公司 | A kind of human body behavior analysis method based on time and space significance region detection |
CN108256434A (en) * | 2017-12-25 | 2018-07-06 | 西安电子科技大学 | High-level semantic video behavior recognition methods based on confusion matrix |
CN109508684A (en) * | 2018-11-21 | 2019-03-22 | 中山大学 | A kind of method of Human bodys' response in video |
Non-Patent Citations (1)
Title |
---|
基于自适应特征融合的自然环境视频行为识别;郭梓鑫等;《计算机学报》;20131115;第36卷(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109583360A (en) | 2019-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109508684B (en) | Method for recognizing human behavior in video | |
Rössler et al. | Faceforensics: A large-scale video dataset for forgery detection in human faces | |
Wang et al. | Generative neural networks for anomaly detection in crowded scenes | |
CN106778854B (en) | Behavior identification method based on trajectory and convolutional neural network feature extraction | |
CN110147743B (en) | Real-time online pedestrian analysis and counting system and method under complex scene | |
Chung et al. | You said that? | |
Wang et al. | A robust and efficient video representation for action recognition | |
Islam et al. | Efficient two-stream network for violence detection using separable convolutional lstm | |
Zhao et al. | Dynamic texture recognition using local binary patterns with an application to facial expressions | |
CN106709419B (en) | Video human behavior recognition method based on significant trajectory spatial information | |
CN110889375B (en) | Hidden-double-flow cooperative learning network and method for behavior recognition | |
Chen et al. | End-to-end learning of object motion estimation from retinal events for event-based object tracking | |
Fernando et al. | Exploiting human social cognition for the detection of fake and fraudulent faces via memory networks | |
Huang et al. | Deepfake mnist+: a deepfake facial animation dataset | |
Vignesh et al. | Abnormal event detection on BMTT-PETS 2017 surveillance challenge | |
Rong et al. | Scene text recognition in multiple frames based on text tracking | |
CN108629301B (en) | Human body action recognition method | |
Oluwasammi et al. | Features to text: a comprehensive survey of deep learning on semantic segmentation and image captioning | |
CN113312973A (en) | Method and system for extracting features of gesture recognition key points | |
Zhang et al. | Contrastive spatio-temporal pretext learning for self-supervised video representation | |
CN113705490A (en) | Anomaly detection method based on reconstruction and prediction | |
Hirschorn et al. | Normalizing flows for human pose anomaly detection | |
Katircioglu et al. | Self-supervised human detection and segmentation via background inpainting | |
CN109583360B (en) | Video human body behavior identification method based on spatio-temporal information and hierarchical representation | |
CN105893967B (en) | Human behavior classification detection method and system based on time sequence retention space-time characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |