CN105893967B - Human behavior classification detection method and system based on time sequence retention space-time characteristics - Google Patents

Human behavior classification detection method and system based on time sequence retention space-time characteristics Download PDF

Info

Publication number
CN105893967B
CN105893967B CN201610201446.XA CN201610201446A CN105893967B CN 105893967 B CN105893967 B CN 105893967B CN 201610201446 A CN201610201446 A CN 201610201446A CN 105893967 B CN105893967 B CN 105893967B
Authority
CN
China
Prior art keywords
time
video
space
features
interest points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610201446.XA
Other languages
Chinese (zh)
Other versions
CN105893967A (en
Inventor
刘宏
刘梦源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Gandong Smart Technology Co ltd
Original Assignee
Shenzhen Gandong Smart Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Gandong Smart Technology Co ltd filed Critical Shenzhen Gandong Smart Technology Co ltd
Priority to CN201610201446.XA priority Critical patent/CN105893967B/en
Publication of CN105893967A publication Critical patent/CN105893967A/en
Application granted granted Critical
Publication of CN105893967B publication Critical patent/CN105893967B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a human body behavior classification detection method and a human body behavior classification detection system based on time sequence retention space-time characteristics, wherein in the system consisting of a video input end, a time sequence characteristic extraction output end and an off-line training classifier, the method comprises the following steps: 1) detecting a human target in a video sequence; 2) extracting space-time interest points from a time-space domain containing a human body target, and clustering the space-time interest points into K categories by using a mean value clustering method; 3) for the space-time interest point pairs, counting time axis distribution characteristics of the time sequence retention space-time characteristics; 4) carrying out weighted fusion on the time sequence characteristics and the space-time interest point bag characteristics; 5) and training the human behavior template by using a bag-of-words model and a classifier and classifying. According to the invention, the human body behavior model is established by describing the time sequence relation among the same category characteristic points, so that the discrimination among different human body behaviors can be well improved.

Description

Human behavior classification detection method and system based on time sequence retention space-time characteristics
Technical Field
The invention belongs to the field of target identification and intelligent man-machine interaction in machine vision, and particularly relates to a robust human behavior classification detection method based on time sequence retention space-time characteristics.
Background
The human behavior analysis comprises human behavior detection, human behavior classification, abnormal behavior analysis and the like. According to the number of human bodies, the method can be divided into single human body behavior analysis, multi-person behavior analysis and group behavior analysis. The detection, tracking and identification of human bodies all belong to the category of behavior analysis. The behavior analysis here refers to human behavior classification: for a given video sequence containing a certain motion, the video sequence is tagged by the category of the motion. Human behavioral analysis began as early as the thirties of the twentieth century. Earlier successful research has focused primarily on the study of rigid body motion. In the fifty years or so, research on non-rigid bodies has been gradually developed. Especially human motion analysis, it has extensive application prospect in fields such as intelligent video monitoring, network video retrieval, auxiliary medical treatment, sports video analysis. For example, in virtual reality, the gesture of a user in a real physical space is effectively analyzed and understood; in the human-computer interaction, a computer (a robot) can utilize visual information to complete more effective human-computer interaction; in training such as dance and gymnastic exercises, the movements of a practicer are guided and corrected by analyzing the movements of joints.
In a real-world scenario, human behavior classification has a number of difficulties: the human motion performers are often in different age stages and have different appearances, and the motion speed and the time-space variation degree are different from person to person; the large similarity of different motions is inter-class similarity, which is a difficult situation relative to the above-mentioned intra-class diversity; meanwhile, human behavior classification faces a plurality of classic difficult problems of image processing, such as human body occlusion, shadows in outdoor scenes, illumination change, crowd congestion and the like. In the face of the difficulties, how to realize robust human behavior classification is applied to intelligent monitoring in real scenes, and the method has important research significance. We focus on how to describe human behavior in a video sequence. In other words, the process of extracting feature vectors from the video to represent the original video. The feature vector should have the following characteristics: firstly, the vector extraction process should be as efficient as possible to meet the real-time requirement as possible; secondly, the vector dimension should be as low as possible to improve the classification efficiency; finally, the vectors should be representative and robust, with good discrimination of inter-class similarity and tolerance of intra-class diversity.
In view of the above requirements, we classify human behavior description methods into two major categories: global features and local features. The global feature is a top-down process, namely the motion description is obtained by taking human body behaviors as a whole body and extracting. Global feature description is a strong feature that can encode most of the information of motion. However, global features are extremely sensitive to view angle, occlusion, and noise, and the premise of extracting global features is that the motion foreground can be well segmented. This is extremely harsh on the preprocessing process required by human behavior description in complex scenes. In consideration of the defects of the global features, the local features are proposed for the description of human body behaviors in a complex scene. The extraction of local features is a bottom-up process: firstly, detecting spatio-temporal interest points, then extracting local texture blocks around the interest points, and finally combining the descriptions of the blocks to form a final descriptor. Due to the proposal of the bag-of-words model, a framework for classifying human body behaviors by using local features is widely adopted. The local features are different from the global features, the sensitivity to noise and partial shielding is low, and foreground segmentation and tracking processes are not needed in the extraction of the local features, so that the method can be well suitable for human behavior analysis in complex scenes. The main disadvantage of the local features is that the global constraint relationship between points is ignored, so a higher layer of spatial relationship description is required to improve the classification effect of the existing bag-of-words model.
Disclosure of Invention
The invention uses local characteristic points and establishes a human body behavior model by describing the time sequence relation of the characteristic points, and finally realizes the classification of human body behaviors. The extraction and description of local feature points refer to "Evaluation of localization-temporal features for action" (2009), h.wang, m.m.ullah, a.
Figure BDA0000956518590000021
Laptev and c.schmid; bmvc'09, in proc. The method effectively improves the accuracy and robustness of the traditional method by describing the time sequence relation of the local characteristic points.
The technical scheme adopted by the invention is as follows:
a human behavior classification detection method based on time sequence retention space-time characteristics comprises the following steps:
1) detecting a human target in a video sequence;
2) extracting space-time interest points from a space-time domain containing a human body target;
3) extracting the characteristics of the space-time interest points, and clustering the space-time interest points into a plurality of categories by using a mean value clustering method;
4) counting distribution information on a time axis of the time-space interest points belonging to each category to obtain time axis distribution characteristics;
5) combining time axis distribution characteristics corresponding to different types of space-time interest points to obtain time sequence characteristics of the video;
6) calculating bag-of-word characteristics of the space-time interest points, and fusing the bag-of-word characteristics with the time sequence characteristics to obtain a fusion characteristic histogram corresponding to the video;
7) utilizing a bag-of-words model and converting the histogram features in the model into the fusion feature histogram obtained in the steps 1) -6), and training aiming at different behavior categories to obtain feature description templates corresponding to the different behavior categories;
8) when a video to be detected is input, the feature description (namely the fusion feature histogram) of the video is extracted in the steps 1) -6), and then the nearest neighbor matching is carried out on the feature description template of different behavior categories, and the behavior category corresponding to the video is the one with the highest matching degree.
Furthermore, the human behavior classification in the above method is performed on human behaviors that can be detected in the video.
Further, the spatio-temporal interest points extracted in step 2) refer to points with severe gray level transformation in the spatio-temporal domain.
Further, the features of the spatio-temporal interest points extracted in step 3) are HOG (gradient histogram) and HOF (optical flow histogram) features, or 3DSIFT features, or 3DHOG features.
Furthermore, the time sequence of the spatio-temporal interest points in the above method refers to the relative position relationship of all spatio-temporal interest points with the same category label on the time axis. And 4) optimizing the extracted time axis distribution characteristics, namely, ignoring the video frames which do not contain the space-time interest points, thereby reducing the influence of the static video frames on the space-time interest point distribution and obtaining a more robust time sequence relation.
Further, step 7) averages the histogram features of the plurality of corresponding videos for each behavior category, and takes the averaged histogram feature as the feature corresponding to the behavior category.
The invention also provides a human behavior classification detection system based on time-sequence space-time characteristics, which comprises the following steps: the system comprises a video input end, a time sequence characteristic extraction output end and an offline training classifier;
the video input end comprises a camera device capable of acquiring RGB images;
the time sequence feature extraction output end is used for acquiring the RGB image sequence from the video input end, and extracting and outputting time sequence features corresponding to human behaviors in the video;
the offline training classifier: a) calculating bag-of-word characteristics of an input video, and fusing the bag-of-word characteristics with the time sequence characteristics output by the time sequence characteristic extraction output end to obtain a fusion characteristic histogram corresponding to the video; b) training different classes of behaviors by utilizing a bag-of-words model and a classifier to obtain characteristic description templates corresponding to different classes of behaviors; c) and obtaining the feature description corresponding to the human body behavior in the input test video, carrying out nearest neighbor matching on the feature description corresponding to different behavior categories, wherein the feature description with the highest matching degree is the behavior category corresponding to the test video, and outputting a category label.
The invention realizes a robust human behavior classification method and system based on time-sequence space-time characteristics, namely, the time domain structure characteristics of local space-time interest points are coded by utilizing the occurrence precedence relationship of the local space-time interest points, thereby increasing the discrimination between different behavior categories. The invention belongs to the expansion of a framework for behavior classification by utilizing a bag-of-words model and local feature points. Fig. 4 shows the effect diagram of the present invention, which shows that the human behavior classification effect of the present invention is the best.
Drawings
FIG. 1 is a flow chart of video descriptor (i.e., temporal-spatial feature histogram with order preservation) extraction according to the present invention;
FIG. 2 is a flow chart for combining a time-preserving spatiotemporal feature histogram with conventional bag-of-words features;
FIG. 3 is a sample database portion used by the present invention;
FIG. 4 is a graph comparing the accuracy of the human behavior classification method of the present invention with the conventional bag-of-words features; wherein, the abscissa represents the number of clustering categories K, and the ordinate is the recognition rate.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 and fig. 2, the time-series remaining spatio-temporal feature histogram feature extraction steps corresponding to the video containing human body behaviors are as follows:
1) and extracting and describing the spatio-temporal interest points. The present invention uses spatio-temporal point-of-interest detectors and descriptors used in the documents C.Schultt, I.Laptev, andB.Caputo, "Recognizing human actions: a local svm approach," in ICPR, pp.32-36,2004. The parameters of the spatio-temporal point of interest detector are kept consistent with those in the above-mentioned document. The spatial-temporal interest point descriptor adopts an HOG (gradient histogram) feature with 90-dimensional dimension and a 72-dimensional multi-HOF (optical flow histogram) feature, and the two features are connected in series to form a 162-dimensional descriptor.
2) Clustering of spatio-temporal interest points. The invention adopts a K-means clustering method and sets different clustering numbers aiming at different databases in an experiment. The experiment uses two databases of UT-Interaction and Rochester, which are respectively proposed by documents M.S. Ryoo, "Human Activity prediction: Early recognition of on going Activity from streaming video," in ICCV, pp.1036-1043,2011 and R.Messing, C.Pal, and H.Kautz, "Activity recognition using the said motion histories of tuckkeypoints," in ICCV, pp.104-111,2009. For the KTH database, the clustering number is set to 900; setting the number of clusters to 2300 for the Rochester database; for the UT-Interaction #1 database, the number of clusters is set to 2100; the number of clusters in the UT-Interaction #2 database was 1900. It should be noted that fig. 4 illustrates the relationship between the number of clusters and the recognition rate, and it can be seen from the figure that when the number of clusters varies in a large range, the method can still obtain a high recognition rate.
3) And (5) extracting video features. The video feature extraction method mainly comprises the following steps:
a) detecting a time-space interest point from an input video;
b) clustering the space-time interest points into K types;
c) counting the number of spatio-temporal interest points on each frame, and removing video frames which do not contain the spatio-temporal interest points;
d) respectively counting a distribution histogram of each type of time-space interest points on a time axis;
e) soft-distributing a distribution histogram corresponding to each type of time-space interest point into L-dimensional vectors; the invention designs a new soft distribution method based on distance weighting, which is shown in a formula (9) hereinafter;
f) counting a K-class time-space interest point number distribution histogram;
g) and weighting and fusing the L-dimensional vector representing the time sequence characteristics and the number distribution histogram to obtain a final video descriptor.
Let S be { S ═ S1,...,Sk,...,SKContains all spatio-temporal interest points extracted in one video V. SkAnd all spatio-temporal interest points with labels of K are contained, wherein K belongs to the range from 1 to the clustering number K.
Figure BDA0000956518590000041
Representing spatio-temporal points of interest labeled k and occurring at the nth frame,
Figure BDA0000956518590000042
the parameters α represent the weight of the time sequence feature, β represents the weight of the bag of words feature, L represents the dimension of the time sequence retention feature, and α and the default values of L are set to 0.5 and 3 respectively.
Figure BDA0000956518590000051
The above algorithm involves the following formula:
Figure BDA0000956518590000052
wherein,
Figure BDA0000956518590000053
represents the nth of the kth class (totally containing N)kOne) of the spatio-temporal interest points,
Figure BDA0000956518590000054
respectively represent the abscissa, the ordinate, the timestamp and the category of the time-space interest point.
Figure BDA0000956518590000055
Bow represents a number distribution histogram, and the function η obtains the number distribution histogram B of K-class spatio-temporal interest points through statistics.
Figure BDA0000956518590000056
Wherein M represents the number of video frames in the video that contain at least one spatio-temporal interest point.
Figure BDA0000956518590000057
The function delta is used for indicating whether the two variables are equal, if so, the value is 1, otherwise, the value is 0;
Figure BDA00009565185900000618
and the method is used for recording the number of the k-th class of spatio-temporal interest points contained in the ith frame.
Figure BDA0000956518590000061
Wherein,
Figure BDA0000956518590000062
representing the number of all spatio-temporal points of interest, R, contained in the ith frameiIndicating whether the number of all spatio-temporal interest points contained in the ith frame is greater than 0.
Figure BDA0000956518590000063
Wherein,
Figure BDA0000956518590000064
representing a set of class k spatio-temporal points of interest. The time stamp of the spatio-temporal interest point is relative to the video after the interference frame is removed. The interference frame is a video frame which does not contain the spatio-temporal interest point.
Figure BDA0000956518590000065
Wherein,
Figure BDA0000956518590000066
representing a probability density function, L representing the number of intervals,
Figure BDA0000956518590000067
indicates the number of points observed in the ith interval,
Figure BDA0000956518590000068
representing the number of sampling points, and function I for determining whether variable x falls within interval BiAnd if so, is 1, otherwise is 0.
Interval Bi(i belongs to 1 to L) is defined as shown in formula (8):
Figure BDA0000956518590000069
Figure BDA00009565185900000610
wherein liRepresents that it falls within the interval BiNumber of medium variables, function w (l)i) Considering the surrounding interval to the current interval BiThe influence of (c). This effect diminishes as the relative distance between the two compartments increases.
Figure BDA00009565185900000611
Wherein, index (l)j) Representing the sought variable ljIn that
Figure BDA00009565185900000612
Rank position in (1), Length represents
Figure BDA00009565185900000613
The length of (a) of (b),
Figure BDA00009565185900000614
the contribution values of all spatio-temporal interest points of the k-th class to the interval i are recorded.
Figure BDA00009565185900000615
Substituting the formula (10) into the formula (7) to obtain the L-dimensional time sequence characteristics representing the space-time interest points
Figure BDA00009565185900000616
Abbreviated as Q. Reducing the dimension of Q to D dimension by PCA method to obtain the time sequence characteristic after dimension reduction
Figure BDA00009565185900000617
As shown in FIG. 3, the databases KTH, Rochester and UT-Interaction were used for the experiments. Wherein KTH contains 6 human behavioral actions, which are respectively: "ringing", "clipping", "walking", "jumping", "running", "walking", was repeated 4 times by 25 persons for a total of 600 video segments. Rochester contains 10 human behavioral actions, which are respectively: "answer a phone", "chop a banana", "dial a phone", "drag water", "eat abaana", "eat snacks", "look up a phone number in a phone book", "peel abaana", "eat food with silver ware" and "write on a white board", obtained by 5 persons performing 3 times repeatedly, for a total of 150 videos. The UT-Interaction comprises 6 human body behaviors, which are respectively: "hug", "kick", "point", "punch", "push" and "shake-hands", obtained by the performer repeating 10 times in two scenes, respectively, for a total of 120 videos. The first from top to bottom in FIG. 3 are behavior examples of the KTH, Rochester, UT-Interaction #1 and UT-Interaction #2 databases, respectively.
As shown in FIG. 4, the classification results are shown in (a) a KTH database, (b) a Rochester database, (c) a UT database including UT-Interaction #1 and UT-Interaction #2, and (d) a UT-Interaction #2 database. And the offline training classification module adopts leave-one-cross verification and uses a support vector machine as a classifier to compare the matching degree of the test sample and the template obtained by training. The support vector machine employs the chebyshev kernel. FIG. 4 includes a conventional bag-of-words model, timing features proposed by the present invention, and a method for combining the two. The abscissa represents the number K of clusters, and the ordinate represents the recognition rate. It can be seen that when K varies in a large range, the time sequence feature classification accuracy rate provided by the invention is higher than that of a bag-of-words model, and the recognition rate obtained by combining the two features is the highest.
It should be noted that the HOG/HOF feature used in the present invention can be replaced by 3D SIFT (3D Scale Invariantfeature Transform), 3D DHOG (3D History of organized Gradients). The PCA dimension reduction method used can be replaced by an LDA (Linear Discriminont analysis) method.
The above examples are merely illustrative of the present invention and although the preferred embodiments of the present invention and the accompanying drawings have been disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, the present invention should not be limited to the disclosure of the preferred embodiments and the accompanying drawings.

Claims (8)

1. A robust human behavior classification detection method based on time sequence retention space-time characteristics is characterized by comprising the following steps:
1) detecting a human target in a video sequence;
2) extracting space-time interest points from a space-time domain containing a human body target;
3) extracting the characteristics of the space-time interest points, and clustering the space-time interest points into a plurality of categories by using a mean value clustering method;
4) counting a distribution histogram on a time axis of the time-space interest points belonging to each category to obtain time axis distribution characteristics;
the time axis distribution characteristics comprise complete precedence order among the time-space interest points of the same class labels;
the precedence order considers the relative appearance order between the spatio-temporal interest points, rather than the exact number of frames apart, to increase the robustness of the extracted timing sequence information;
5) combining time axis distribution characteristics corresponding to different types of space-time interest points to obtain time sequence characteristics of the video;
6) calculating bag-of-word characteristics of the space-time interest points, and fusing the bag-of-word characteristics with the time sequence characteristics to obtain a fusion characteristic histogram corresponding to the video;
7) utilizing a bag-of-words model and converting the histogram features in the model into the fusion feature histograms obtained in the steps 1) to 6), and training aiming at different behavior categories to obtain feature description templates corresponding to the different behavior categories;
8) when a video to be detected is input, firstly extracting the feature description of the video from the steps 1) to 6), and then carrying out nearest neighbor matching with feature description templates of different behavior categories, wherein the highest matching degree is the behavior category corresponding to the video.
2. The method of claim 1, wherein the spatio-temporal interest points of step 2) are points in the spatio-temporal domain where the gray scale transformation is severe.
3. The method of claim 1, wherein the features of the spatio-temporal points of interest extracted in step 3) are HOG and HOF features, or 3DSIFT features, or 3DHOG features.
4. The method of claim 1, wherein step 4) describes the timeline distribution features with spatiotemporal interest points having the same class labels rather than the entire spatiotemporal interest points.
5. The method as claimed in claim 4, wherein the step 4) optimizes the extracted time-axis distribution characteristics by omitting video frames containing no spatio-temporal interest points to reduce the influence of the still video frames on the spatio-temporal interest point distribution and obtain a more robust timing relationship.
6. The method of claim 1, wherein step 6) uses a weighted fusion method to perform weighted fusion of the bag-of-words model and the reduced-dimension time-series features.
7. The method according to claim 6, wherein step 7) averages the histogram features of the corresponding plurality of videos for each behavior class, and takes the averaged histogram feature as the feature corresponding to the behavior class.
8. A human behavior classification detection system based on time sequence space-time characteristics and adopting the method of claim 1 is characterized by comprising a video input end, a time sequence characteristic extraction output end and an off-line training classifier;
the video input end comprises a camera device capable of acquiring RGB images;
the time sequence feature extraction output end is used for acquiring an RGB image sequence from the video input end, and extracting and outputting time sequence features corresponding to human behaviors in the video;
the offline training classifier calculates the bag-of-word features of the input video, and fuses the bag-of-word features with the time sequence features output by the time sequence feature extraction output end to obtain a fusion feature histogram corresponding to the video; then training different classes of behaviors by utilizing a bag-of-words model and a classifier to obtain characteristic description templates corresponding to different classes of behaviors; and obtaining the feature description corresponding to the human body behavior in the input test video, carrying out nearest neighbor matching on the feature description corresponding to different behavior categories, wherein the feature description with the highest matching degree is the behavior category corresponding to the test video, and outputting a category label.
CN201610201446.XA 2016-04-01 2016-04-01 Human behavior classification detection method and system based on time sequence retention space-time characteristics Expired - Fee Related CN105893967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610201446.XA CN105893967B (en) 2016-04-01 2016-04-01 Human behavior classification detection method and system based on time sequence retention space-time characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610201446.XA CN105893967B (en) 2016-04-01 2016-04-01 Human behavior classification detection method and system based on time sequence retention space-time characteristics

Publications (2)

Publication Number Publication Date
CN105893967A CN105893967A (en) 2016-08-24
CN105893967B true CN105893967B (en) 2020-04-10

Family

ID=57012137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610201446.XA Expired - Fee Related CN105893967B (en) 2016-04-01 2016-04-01 Human behavior classification detection method and system based on time sequence retention space-time characteristics

Country Status (1)

Country Link
CN (1) CN105893967B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650617A (en) * 2016-11-10 2017-05-10 江苏新通达电子科技股份有限公司 Pedestrian abnormity identification method based on probabilistic latent semantic analysis
CN108764026B (en) * 2018-04-12 2021-07-30 杭州电子科技大学 Video behavior detection method based on time sequence detection unit pre-screening
CN111339980B (en) * 2020-03-04 2020-10-09 镇江傲游网络科技有限公司 Action identification method and device based on space-time histogram

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605986A (en) * 2013-11-27 2014-02-26 天津大学 Human motion recognition method based on local features
CN103854016A (en) * 2014-03-27 2014-06-11 北京大学深圳研究生院 Human body behavior classification and identification method and system based on directional common occurrence characteristics
CN104021381A (en) * 2014-06-19 2014-09-03 天津大学 Human movement recognition method based on multistage characteristics

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605986A (en) * 2013-11-27 2014-02-26 天津大学 Human motion recognition method based on local features
CN103854016A (en) * 2014-03-27 2014-06-11 北京大学深圳研究生院 Human body behavior classification and identification method and system based on directional common occurrence characteristics
CN104021381A (en) * 2014-06-19 2014-09-03 天津大学 Human movement recognition method based on multistage characteristics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities;M. S. Ryoo等;《2009 IEEE 12th International Conference on Computer Vision》;20100506;第1593-1600页 *
视频序列的人体运动描述方法综述;孙倩茹等;《智能系统学报》;20130630;第8卷(第3期);第189-198页 *

Also Published As

Publication number Publication date
CN105893967A (en) 2016-08-24

Similar Documents

Publication Publication Date Title
Zhang et al. Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch verification
de Melo et al. Combining global and local convolutional 3d networks for detecting depression from facial expressions
Soomro et al. Action recognition in realistic sports videos
Özyer et al. Human action recognition approaches with video datasets—A survey
Wang et al. Hierarchical attention network for action recognition in videos
CN103854016B (en) Jointly there is human body behavior classifying identification method and the system of feature based on directivity
CA3046035A1 (en) System and method for cnn layer sharing
Xian et al. Evaluation of low-level features for real-world surveillance event detection
Song et al. Unsupervised Alignment of Actions in Video with Text Descriptions.
CN106709419B (en) Video human behavior recognition method based on significant trajectory spatial information
Zhang et al. A survey on face anti-spoofing algorithms
Chen et al. Recognition of aggressive human behavior using binary local motion descriptors
Sekma et al. Human action recognition based on multi-layer fisher vector encoding method
Safaei et al. Still image action recognition by predicting spatial-temporal pixel evolution
Xu et al. Action recognition by saliency-based dense sampling
Gammulle et al. Coupled generative adversarial network for continuous fine-grained action segmentation
Symeonidis et al. Neural attention-driven non-maximum suppression for person detection
Yi et al. Mining human movement evolution for complex action recognition
CN105893967B (en) Human behavior classification detection method and system based on time sequence retention space-time characteristics
Wang et al. Action recognition using edge trajectories and motion acceleration descriptor
Khan et al. Robust head detection in complex videos using two-stage deep convolution framework
Wang et al. Pig face recognition model based on a cascaded network
El‐Henawy et al. Action recognition using fast HOG3D of integral videos and Smith–Waterman partial matching
Bukht et al. A novel framework for human action recognition based on features fusion and decision tree
Bhattacharya et al. Covariance of motion and appearance featuresfor spatio temporal recognition tasks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20181220

Address after: 518000 Guangdong Province Nanshan District Taoyuan Street Xili Town Lishan Road Sangtai Building University Town Pioneer Park 506

Applicant after: SHENZHEN GANDONG SMART TECHNOLOGY Co.,Ltd.

Address before: 518055 North University Campus, Shenzhen University Town, Xili Town, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: PEKING University SHENZHEN GRADUATE SCHOOL

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200410

CF01 Termination of patent right due to non-payment of annual fee