WO2019019678A1 - 一种基于超图转变的视频中暴力事件检测方法 - Google Patents

一种基于超图转变的视频中暴力事件检测方法 Download PDF

Info

Publication number
WO2019019678A1
WO2019019678A1 PCT/CN2018/080770 CN2018080770W WO2019019678A1 WO 2019019678 A1 WO2019019678 A1 WO 2019019678A1 CN 2018080770 W CN2018080770 W CN 2018080770W WO 2019019678 A1 WO2019019678 A1 WO 2019019678A1
Authority
WO
WIPO (PCT)
Prior art keywords
hypergraph
foreground
equation
interest
points
Prior art date
Application number
PCT/CN2018/080770
Other languages
English (en)
French (fr)
Inventor
李革
黄靖佳
李楠楠
Original Assignee
北京大学深圳研究生院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学深圳研究生院 filed Critical 北京大学深圳研究生院
Priority to US16/476,898 priority Critical patent/US10679067B2/en
Publication of WO2019019678A1 publication Critical patent/WO2019019678A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • G06V10/426Graphical representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Definitions

  • the invention relates to a video data processing technology, in particular to a method for detecting a violent event in a video based on a hypergraph transition.
  • the existing mainstream methods for behavior recognition and detection in video mostly use deep learning as the core technology, and use the depth model to automatically extract and identify video content.
  • deep learning models that require massive amounts of data as support are not effective on this issue. Therefore, for the detection of violent incidents, the method based on local spatiotemporal feature descriptors is still prevalent.
  • the main idea is to reflect the behavioral features by modeling the relationship between local feature descriptors (eg, spatio-temporal interest points).
  • the present invention provides a method for detecting violent events in a video based on hypergraph transition, and proposes a HVC (Histogram of Velocity Changing) for violent event detection.
  • HVC Heistogram of Velocity Changing
  • the spatial relationship of the feature points and the transition of the feature point groups are analyzed separately, and then jointly analyzed.
  • the method of the invention can effectively reflect the intensity and stability of the action, is sensitive to the disordered irregular behavior, and has certain universality for the detection of violent events.
  • the principle of the invention is: extracting and tracking the temporal Temporal Interest Point (STIP) in the video, and detecting and segmenting the motion foreground in the video; and extracting the extracted interest points according to the foreground block to which they belong.
  • TIP Temporal Interest Point
  • the invention analyzes the action pose information in the foreground block and the transition information of the pose in the trajectory, as follows: First, the hyper-graph is used to model the points of interest in each foreground block, and the super-map is used to construct the current scene block.
  • Hypergraph and HVC are grouped according to the foreground block trajectory sequence (HT Chain) , Hypergraph-Transition Chain, hypergraph transformation chain) to represent the entire sequence of motion.
  • HT Chain foreground block trajectory sequence
  • HMM Hidden Markov Model
  • a method for detecting violent incidents in video based on hypergraph transformation includes the construction of hypergraph model and the process of supergraph transformation.
  • the hyper-graph is used to describe the spatial relationship of feature points to reflect the motion.
  • the attitude information, the feature description sub-HVC of the violent event detection first analyzes the spatial relationship of the feature points and the transformation of the feature point groups, and then performs joint analysis; the specific steps include the following steps:
  • Steps 1) to 2) are foreground target trajectory extraction processes. Specifically, in video, especially in low-monitoring video, tracking of points of interest is simpler and more robust than target tracking.
  • the present invention estimates the trajectory of the foreground object by the trajectory of the point of interest. We use the formula 1 to record the foreground block of the t-th frame as S t :
  • Equation 2 n ⁇ t ⁇ t ⁇ t ⁇ t ⁇ t ⁇ ⁇ t ⁇ ⁇ t ⁇ ⁇ t ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ n ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇
  • p t,j is the jth point of interest in the foreground block.
  • m 1 is The number of points of interest in the thd is a probability threshold that can be adjusted. when Yes When tracking the point in the next frame, Equal to 1, otherwise 0.
  • Each foreground block trajectory corresponds to a set of motion sequences. For each sequence, the action pose information in the foreground block and the transition information of the pose in the trajectory are used to analyze it; the details are as follows:
  • the present invention uses a hypergraph to model the motion poses contained in the foreground block, and the hypergraph has good affine invariance;
  • a hypergraph structure G i for it expressed as Equation 4:
  • V i is a set of points of interest in the graph
  • E i is a super edge in the graph, each side is a triple composed of three points
  • F i is a feature corresponding to the point set.
  • N 1 and N 2 are the number of points in the two supergraphs, respectively.
  • the similarity of the poses contained in different blocks is represented by the score value under the best match, and is calculated by Equation 6:
  • Score(A) ⁇ i,i',j,j',k,k' H i,i',j,j',k,k' X i,i' X j,j' X k,k' (Formula 6)
  • HVC Histogram of Velocity Changing
  • the HVC is constructed as shown in Fig. 2.
  • v i is the optical flow at the point of interest p i
  • s is the step size at which the optical flow is calculated.
  • the intensity of the velocity change from point p i to point p i+s is expressed as Equation 9:
  • hypergraph and HVC are grouped according to the foreground block trajectory sequence (H-T Chain) to describe the whole motion sequence, and the H-T Chain is modeled to obtain the model of violent behavior, and the violent event detection in the video is realized.
  • H-T Chain foreground block trajectory sequence
  • the H-T Chain can be modeled using the Hidden Markov Model (HMM) to obtain a violent HMM model.
  • HMM Hidden Markov Model
  • the method of the invention models the relationship between local feature descriptors, and provides a method for detecting violence events in video based on hypergraph transition.
  • the present invention adopts a hyper-graph to describe the spatial relationship of feature points to reflect the posture information of the motion.
  • hypergraphs have good affine invariance and have better performance in the case of surveillance video, which has multiple perspectives.
  • the method of the present invention models the transition between the associated hypergraphs in the time series and proposes a new feature descriptor HVC.
  • HVC can effectively reflect the intensity and stability of the action, sensitive to the disorderly irregular behavior, and is suitable for the detection of violent events.
  • the method of the invention has the following characteristics:
  • a feature descriptor HVC that can effectively reflect the intensity and stability of the action and is sensitive to the disorderly irregular behavior is applicable to the detection of violent events.
  • the algorithm described in the present invention is tested on the UT-Interaction data set and the BEHAVE data set. The results show that the detection effect of the algorithm is better than the existing methods.
  • Figure 1 is a block diagram showing the overall flow of the method of the present invention.
  • v i is the optical flow at the point of interest p i
  • s is the step size at which the optical flow is calculated
  • Frames is a video frame
  • G k and G k+1 are supergraphs of numbers k and k+1, respectively.
  • (a) ⁇ (d) correspond to the HVC descriptors of the fight, run, peer, and standing behavior.
  • Figure 4 is a schematic illustration of the detection implementation using the method of the present invention.
  • the upper layer is a schematic diagram of the HT Chain, where G i is the i-th hypergraph in the sequence;
  • the lower layer is the observation sequence in the HMM model corresponding to HT Chain, and o is the observation value;
  • the arrow between the upper and lower layers indicates the correspondence between each part of the HT Chain and each observation value in the HMM observation sequence.
  • the invention provides a method for detecting a violent event in a video based on hypergraph transformation, and starts the detection of violent events from the transition process of posture information and posture.
  • the space-time points of interest are extracted from the video clips and the points of interest are tracked.
  • the foreground object of the motion is segmented from the background using the foreground segmentation algorithm or the foreground detection algorithm and the point of interest is associated with the associated foreground block.
  • the points of interest in each foreground block are modeled by the hypergraph, and the motion trajectory of the target corresponding to the foreground block is estimated according to the tracking result of the point of interest.
  • HVC is used to characterize the transition of the hypergraph.
  • FIG. 1 A block diagram of the overall process of the method of the present invention is shown in FIG.
  • the core of the violent event detection method in the video based on hypergraph transformation is to construct a hypergraph transformation model.
  • the method includes: foreground target trajectory extraction, supergraph construction and similarity metric, and hypergraph transition descriptor (HVC) construction. .
  • the present invention estimates the trajectory of the foreground object by the trajectory of the point of interest.
  • Equation 2 n ⁇ t ⁇ t ⁇ t ⁇ t ⁇ t ⁇ ⁇ t ⁇ ⁇ t ⁇ ⁇ t ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ n ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇
  • p t,j is the jth point of interest in the foreground block.
  • m 1 is The number of points of interest in the thd is a probability threshold that can be adjusted. when Yes When tracking the point in the next frame, Equal to 1, otherwise 0.
  • the present invention uses a hypergraph to model the motion poses contained in the foreground block.
  • a hypergraph structure G i for it Equation 4:
  • V i is a set of points of interest in the graph
  • E i is a super edge in the graph, each side is a triple composed of three points
  • F i is a feature corresponding to the point set.
  • N 1 and N 2 are the number of points in the two supergraphs, respectively.
  • the similarity of the poses contained in different blocks is represented by the score value under the best match, and is calculated by Equation 6:
  • Score(A) ⁇ i,i',j,j',k,k' H i,i',j,j',k,k' X i,i' X j,j' X k,k' (Formula 6)
  • HVC Hypergraph transition description Histogram of Velocity Changing
  • Equation 9 Equation 9:
  • Figure 3 is an example of HVC descriptor visualization of different behaviors, from left to right corresponding to fight, run, peer and stand.
  • the BEHAVE data set is taken as an example to illustrate how the algorithm of the present invention can be used in practice.
  • BEHAVE is a video dataset with an outdoor surveillance scene with multiplayer interactions, which mainly includes aggregation, separation, ambiguity, peering, and chasing.
  • the trajectory extraction stage we set the thd value to 0.2; the spectral clustering algorithm is used to cluster the hypergraphs, and the number of cluster centers is 25, thus constructing a dictionary.
  • a hypergraph is constructed with an interval of 3 frames, and HVC is calculated in steps of 9 frames and 3 frames.
  • HT Chain is used as the feature description of the block trajectory, and the HT Chain is sent to the HMM model for processing, so that all the sequences in the video are analyzed.
  • the detection implementation process is shown in Figure 4.
  • (1) is a motion track tracked from the video segment currently observed by the sliding window.
  • (2) the upper layer is a schematic diagram of HT Chain, where G i is the ith supergraph in the sequence; the lower layer is the observation sequence in the HMM model corresponding to HT Chain, and o is the observed value.
  • the arrows between the upper and lower layers indicate the correspondence between the various parts of the HT Chain and the observations in the HMM observation sequence.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种基于超图转变模型的视频中暴力事件检测方法,包括前景目标轨迹提取过程、构建超图与相似度量过程和构建超图转变描述子过程;采用超图描述特征点的空间关系,反映运动的姿态信息;为时间序列中相关联的超图之间的转变建模并提出特征描述子HVC,能有效反映动作的强度及稳定性。该方法先对特征点的空间关系以及特征点群体的转变情况分别进行分析,再将其进行联合分析;本发明方法对视频中杂乱的不规律行为敏感,适用于暴力事件的检测。

Description

一种基于超图转变的视频中暴力事件检测方法 技术领域
本发明涉及视频数据处理技术,尤其涉及一种基于超图转变的视频中暴力事件检测方法。
背景技术
随着现代社会中视频数据的海量增长,视频内容理解成为了一个重要的研究课题,而监控视频中的暴力事件检测对维护公共安全更是有着重大的意义。通过暴力事件检测技术可以对视频中暴力事件进行自动筛选与识别,一方面可以达到对暴力事件的及时发现的目的,另一方面可以有效地对视频大数据中可能危害公共安全的行为进行高效的离线筛选。但是,检测视频中的暴力事件有着很高的技术难度,其技术难点包括以下几个:
(一)暴力事件具有很强的多态性,难以提取出具有普适性的特征描述;
(二)可用于训练模型的正样本数据太少;
(三)监控视频的分辨率较低等。
现有的用于视频中行为识别与检测的主流方法大都以深度学习为核心技术,使用深度模型对视频内容进行自动特征提取与识别。但是,由于暴力事件的多态性以及可用的训练数据太少,使得需要海量数据作为支撑的深度学习模型在此问题上难以奏效。因此,对于暴力事件检测,基于局部时空特征描述子的方法依然盛行,其主要思想是通过对局部特征描述子(如,时空兴趣点)之间的关系进行建模从而反映行为特征。
发明内容
为了克服上述现有技术的不足,本发明提供一种基于超图转变的视频中暴力事件检测方法,提出暴力事件检测的特征描述子HVC(Histogram of Velocity Changing,超图转变描述子),先对特征点的空间关系以及特征点群体的转变情况分别进行分析,再将其进行联合分析。本发明方法能有效反映动作的强度及稳定性,对杂乱的不规律行为敏感,对于暴力事件检测具有一定普适性。
本发明的原理是:对视频中的时空兴趣点(Spatial Temporal Interest Point,STIP)进行提取和跟踪,同时对视频中的运动前景进行检测与分割;将提取的兴趣点按其所属的前景块进行划分,用前景块对可能存在的噪声兴趣点进行过滤,并按照兴趣点跟踪所得到的轨迹估计前景块的运动轨迹;每一条前景块轨迹对应的是一组运动序列,对于每一个序列,本发明使用 前景块中的动作姿态信息及轨迹中姿态的转变信息来对其进行分析,具体如下:先使用超图对每一个前景块中的兴趣点进行建模,以超图构建当前景块中的动作姿态信息;使用本发明中提出的一种新的特征描述子Histogram of Velocity Changing(HVC)对超图的转变过程进行表述;最后将超图与HVC按照前景块轨迹顺序组成链(H-T Chain,Hypergraph-Transition Chain,超图转变链)以表述整个运动序列。可以使用隐式马尔科夫模型(HMM)对H-T Chain进行建模,从而得到暴力行为的HMM模型。
本发明提供的技术方案是:
一种基于超图转变的视频中暴力事件检测方法,超图转变包括了超图模型的构建及超图转变过程;采用了超图(Hyper-graph)描述特征点的空间关系,从而反映运动的姿态信息,提出暴力事件检测的特征描述子HVC,先对特征点的空间关系以及特征点群体的转变情况分别进行分析,再将其进行联合分析;具体包括如下步骤:
1)对视频中的时空兴趣点(Spatial Temporal Interest Point,STIP)进行提取和跟踪,同时对视频中的运动前景进行检测与分割;
2)将提取的兴趣点按其所属的前景块进行划分,用前景块对可能存在的噪声兴趣点进行过滤,并按照兴趣点跟踪所得到的轨迹估计前景块的运动轨迹;
步骤1)~2)为前景目标轨迹提取过程,具体地,在视频中,尤其是像素较低的监控视频中,相较于目标跟踪,兴趣点的跟踪更为简单鲁棒。本发明通过兴趣点的轨迹估计出前景目标的轨迹。我们通过式1将第t帧的前景块记为S t
S t={s t,i}(i=1…n)           (式1)
其中,n是第t帧分割所得前景块的数量;s t,i表示为式2:
s t,i={p t,j}(j=1…m)           (式2)
其中,p t,j是前景块中的第j个兴趣点。
当前景块
Figure PCTCN2018080770-appb-000001
与前景块
Figure PCTCN2018080770-appb-000002
满足式3时,我们认为两个前景块属于同一序列:
Figure PCTCN2018080770-appb-000003
其中,m 1
Figure PCTCN2018080770-appb-000004
中的兴趣点个数,thd是可以调节的概率阈值。当
Figure PCTCN2018080770-appb-000005
Figure PCTCN2018080770-appb-000006
在下一帧中追踪到的点时,
Figure PCTCN2018080770-appb-000007
等于1,否则为0。
3)每一条前景块轨迹对应的是一组运动序列,对于每一个序列,使用前景块中的动作姿态信息及轨迹中姿态的转变信息来对其进行分析;具体如下:
31)构建超图与相似性度量,使用超图对每一个前景块中的兴趣点进行建模,以超图构 建当前景块中的动作姿态信息;
超图的构建与相似度量,具体地:
本发明使用超图对前景块中所包含的运动姿态进行建模,超图具有良好的仿射不变性;。对于一个前景块s i,我们为其定义超图结构G i,表示为式4:
G i=(V i,E i,F i)                (式4)
其中,V i是图中的兴趣点所构成的集合,E i是图中的超边,每一条边都是由三个点组成的三元组;F i是点所对应的特征所组成的集合。对于两个超图(如G 1和G 2),我们定义一个匹配矩阵A来表示他们之间的匹配关系,如式5:
Figure PCTCN2018080770-appb-000008
N 1与N 2分别是两个超图中点的个数。不同块中所含姿态的相似性由最佳匹配下的score值表示,通过式6进行计算:
score(A)=∑ i,i′,j,j′,k,k′H i,i′,j,j′,k,k′X i,i′X j,j′X k,k′           (式6)
其中,H i,i′,j,j′,k,k′是超边E={i,j,k}与E″={i‘′,j′,k′}间的相似度量函数,表示为式7。相似性越大,score值越高。
Figure PCTCN2018080770-appb-000009
其中,
Figure PCTCN2018080770-appb-000010
为特征向量,其中a ijk定义了三个点的空间关系,如式8:
Figure PCTCN2018080770-appb-000011
其中,sin(·)为三角正弦函数,
Figure PCTCN2018080770-appb-000012
为从点p i指向p j的向量。
32)构建新的特征描述子Histogram of Velocity Changing(HVC)对超图的转变过程进行表述;HVC能有效反映动作的强度及稳定性,对杂乱的不规律行为敏感,适用于暴力事件的检测。
构建HVC如图2所示,图中v i是兴趣点p i处的光流,s是光流计算时相隔的步长。从点p i到点p i+s速度变化的强度,表示为式9:
Figure PCTCN2018080770-appb-000013
同时通过式10计算轨迹
Figure PCTCN2018080770-appb-000014
中各点速度的平均幅值:
Figure PCTCN2018080770-appb-000015
最终由所有的轨迹
Figure PCTCN2018080770-appb-000016
得到从G k到G k+1的HVC描述子。
33)最后将超图与HVC按照前景块轨迹顺序组成链(H-T Chain)以表述整个运动序列,对H-T Chain进行建模,得到暴力行为的模型,实现视频中暴力事件检测。
可以使用隐式马尔科夫模型(HMM)对H-T Chain进行建模,从而得到暴力行为的HMM模型。
与现有技术相比,本发明的有益效果是:
本发明方法针对局部特征描述子之间的关系进行建模,提供一种基于超图转变的视频中暴力事件检测方法。与现有方法不同,本发明采用了超图(Hyper-graph)来描述特征点的空间关系从而反映运动的姿态信息。相较于别的图模型,超图具有良好的仿射不变性,在监控视频这一存在多视角的情景下有着更好的表现。同时,为了进一步挖掘运动信息,本发明方法为时间序列中相关联的超图之间的转变进行建模并为之提出一个新的特征描述子HVC。HVC能有效反映动作的强度及稳定性,对杂乱的不规律行为敏感,适用于暴力事件的检测。本发明方法具有如下特点:
(一)首次使用超图的转变过程进行视频中暴力事件的检测;
(二)先对特征点的空间关系以及特征点群体的转变情况分别进行分析,再将其进行联合分析;
(三)提出了一种能有效反映动作的强度及稳定性,对杂乱的不规律行为敏感,适用于暴力事件检测的特征描述子HVC。
本发明所描述的算法在UT-Interaction数据集及BEHAVE数据集上进行了测试,结果表明算法的检测效果优于目前现有方法。
附图说明
图1为本发明方法的整体流程框图。
图2为HVC的构建示意图;
其中,v i是兴趣点p i处的光流,s是光流计算时相隔的步长;Frames为视频帧;G k和G k+1分别为编号k和k+1的超图。
图3是本发明具体实施中不同行为的HVC描述子可视化示例;
其中,(a)~(d)分别对应打架、跑、同行、站立徘徊行为的HVC描述子。
图4是采用本发明方法进行检测实施的示意图;
其中,(1)为从滑动窗口当前所观测到的视频片段中跟踪得到的一条运动轨迹;在(2)中, 上层是H-T Chain的示意图,其中G i是序列中的第i个超图;下层是HT Chain所对应的HMM模型中的观测序列,o为观测值;上下层之间的箭头表明H-T Chain中各个部分与HMM观测序列中各个观测值的对应关系。
具体实施方式
下面结合附图,通过实施例进一步描述本发明,但不以任何方式限制本发明的范围。
本发明提供一种基于超图转变的视频中暴力事件检测方法,从姿态信息与姿态的转变过程入手进行暴力事件的检测。首先从视频片段中提取出时空兴趣点并对兴趣点进行跟踪。同时,使用前景分割算法或前景检测算法将运动的前景物体从背景中分割出来并将兴趣点与所属的前景块相关联。然后,用超图对每一个前景块中的兴趣点进行建模,并根据兴趣点的跟踪结果估计前景块所对应目标的运动轨迹。对于每一条运动轨迹,使用HVC对超图的转变进行特征表述。最终,为每一条运动轨迹建立一个H-T Chain模型,并使用隐式马尔可夫模型进行暴力检测。本发明方法的整体流程框图如图1所示。本发明基于超图转变的视频中暴力事件检测方法核心是构建超图转变模型,具体实施中,方法包括:前景目标轨迹提取、超图的构建与相似度量、超图转变描述子(HVC)构建。
1)前景目标轨迹提取:
在视频中,尤其是像素较低的监控视频中,相较于目标跟踪,兴趣点的跟踪更为简单鲁棒。本发明通过兴趣点的轨迹估计出前景目标的轨迹。我们通过式1将第t帧的前景块记为S t
S t={s t,i}(i=1…n)             (式1)
其中,n是第t帧分割所得前景块的数量;s t,i表示为式2:
s t,i={p t,j}(j=1…m)                (式2)
其中,p t,j是前景块中的第j个兴趣点。
当前景块
Figure PCTCN2018080770-appb-000017
与前景块
Figure PCTCN2018080770-appb-000018
满足式3时,我们认为两个前景块属于同一序列:
Figure PCTCN2018080770-appb-000019
其中,m 1
Figure PCTCN2018080770-appb-000020
中的兴趣点个数,thd是可以调节的概率阈值。当
Figure PCTCN2018080770-appb-000021
Figure PCTCN2018080770-appb-000022
在下一帧中追踪到的点时,
Figure PCTCN2018080770-appb-000023
等于1,否则为0。
2)超图的构建与相似度量:
本发明使用超图对前景块中所包含的运动姿态进行建模。对于一个前景块s i,我们为其定义超图结构G i,表示为式4:
G i=(V i,E i,F i)             (式4)
其中,V i是图中的兴趣点所构成的集合,E i是图中的超边,每一条边都是由三个点组成的三元组;F i是点所对应的特征所组成的集合。对于两个超图(如G 1和G 2),我们定义一个匹配矩阵A来表示他们之间的匹配关系,如式5:
Figure PCTCN2018080770-appb-000024
N 1与N 2分别是两个超图中点的个数。不同块中所含姿态的相似性由最佳匹配下的score值表示,通过式6进行计算:
score(A)=∑ i,i′,j,j′,k,k′H i,i′,j,j′,k,k′X i,i′X j,j′X k,k′            (式6)
其中,H i,i′,j,j′,k,k′是超边E={i,j,k}与E′={i′,j′,k′}间的相似度量函数,表示为式7。相似性越大,score值越高。
Figure PCTCN2018080770-appb-000025
其中,
Figure PCTCN2018080770-appb-000026
为特征向量,其中a ijk定义了三个点的空间关系,如式8:
Figure PCTCN2018080770-appb-000027
其中,sin(·)为三角正弦函数,
Figure PCTCN2018080770-appb-000028
为从点p i指向p j的向量。
3)超图转变描述子Histogram of Velocity Changing(HVC)构建:
图2为HVC的构建示意图,图中v i是兴趣点p i处的光流,s是光流计算时相隔的步长。我们定义从点p i到点p i+s速度变化的强度,表示为式9:
Figure PCTCN2018080770-appb-000029
同时通过式10计算轨迹
Figure PCTCN2018080770-appb-000030
中各点速度的平均幅值:
Figure PCTCN2018080770-appb-000031
最终由所有的轨迹
Figure PCTCN2018080770-appb-000032
得到从G k到G k+1的HVC描述子。图3是不同行为的HVC描述子可视化示例,从左到右分别对应打架,跑,同行与站立徘徊。
以下以BEHAVE数据集为例,来说明本发明算法如何在实际中使用。
BEHAVE是一个包含多人互动的室外监控场景下的视频数据集,其中主要包含有聚集,分开,徘徊,同行,追打等行为。我们为每个动作类别训练独自的HMM模型,并在最终的识别结果上做二分类处理。在轨迹提取阶段,我们将thd值设为0.2;采用谱聚类算法对超图进行聚类,聚类中心数量为25,从而构建词典。在HVC的计算中,我们将设置量化级数M=5, I=12,并建立了包含15个词汇的词典。我们使用大小为80帧的窗口,以20帧为步长在视频中滑动。在每一个块轨迹中,以间隔为3帧构建超图,并以9帧为区间,3帧为步长计算HVC。最终用H-T Chain作为块轨迹的特征描述,并将H-T Chain送入HMM模型进行处理,从而对视频中所有的序列进行分析。其检测实施过程如图4所示。其中,(1)中为从滑动窗口当前所观测到的视频片段中跟踪得到的一条运动轨迹。在(2)中,上层是H-T Chain的示意图,其中G i是序列中的第i个超图;下层是HT Chain所对应的HMM模型中的观测序列,o为观测值。上下层之间的箭头表明的是H-T Chain中各个部分与HMM观测序列中各个观测值的对应关系。
需要注意的是,公布实施例的目的在于帮助进一步理解本发明,但是本领域的技术人员可以理解:在不脱离本发明及所附权利要求的精神和范围内,各种替换和修改都是可能的。因此,本发明不应局限于实施例所公开的内容,本发明要求保护的范围以权利要求书界定的范围为准。

Claims (5)

  1. 一种基于超图转变的视频中暴力事件检测方法,包括前景目标轨迹提取过程、构建超图与相似度量过程和构建超图转变描述子过程,步骤如下:
    前景目标轨迹提取过程,包括步骤1)~2):
    1)对视频中的时空兴趣点STIP进行提取和跟踪,同时对视频中的运动前景进行检测与分割;
    2)将提取得到的兴趣点按兴趣点所属的前景块进行划分,用前景块对可能存在的噪声兴趣点进行过滤,并按照兴趣点跟踪所得到的轨迹估计前景块的运动轨迹;
    构建超图与相似度量过程、构建超图转变描述子的过程将每一条前景块轨迹与一组运动序列建立对应关系;对于每一个序列,使用前景块中的动作姿态信息和轨迹中姿态的转变信息对运动序列进行分析;包括步骤3)~4):
    3)构建超图与相似性度量,使用超图对每一个前景块中的兴趣点进行建模,以超图构建当前景块中的动作姿态信息;
    对一个前景块s i定义超图结构G i,表示为式4:
    G i=(V i,E i,F i)  (式4)
    其中,V i是图中的兴趣点所构成的集合,E i是图中的超边,每一条边都是由三个点组成的三元组;F i是点所对应的特征所组成的集合;
    对于两个超图G 1和G 2定义一个匹配矩阵A来表示他们之间的匹配关系,如式5:
    Figure PCTCN2018080770-appb-100001
    N 1与N 2分别是两个超图中点的个数;不同块中所含姿态的相似性由最佳匹配下的score值表示,通过式6进行计算:
    score(A)=∑ i,i′,j,j′,k,k′H i,i′,j,j′,k,k′X i,i′X j,j′X k,k′  (式6)
    其中,H i,i′,j,j′,k,k′是超边E={i,j,k}与E″={i‘′,j′,k′}间的相似度量函数,表示为式7;相似性越大,score值越高;
    Figure PCTCN2018080770-appb-100002
    其中,
    Figure PCTCN2018080770-appb-100003
    为特征向量,其中a ijk定义了三个点的空间关系,如式8:
    Figure PCTCN2018080770-appb-100004
    其中,sin(·)为三角正弦函数,
    Figure PCTCN2018080770-appb-100005
    为从点p i指向p j的向量;
    4)构建特征描述子HVC对超图的转变过程进行表述;
    将从点p i到点p i+s速度变化的强度表示为式9:
    Figure PCTCN2018080770-appb-100006
    同时通过式10计算轨迹
    Figure PCTCN2018080770-appb-100007
    中各点速度的平均幅值:
    Figure PCTCN2018080770-appb-100008
    由所有的轨迹
    Figure PCTCN2018080770-appb-100009
    得到从G k到G k+1的HVC描述子;
    5)将超图与特征描述子HVC按照前景块轨迹顺序组成超图转变链H-T Chain,用于表述整个运动序列;对H-T Chain进行建模,得到暴力行为的模型,从而实现视频中暴力事件的检测。
  2. 如权利要求1所述视频中暴力事件检测方法,其特征是,步骤5)具体使用隐式马尔科夫模型HMM对H-T Chain进行建模,得到暴力行为的HMM模型。
  3. 如权利要求1所述视频中暴力事件检测方法,其特征是,前景目标轨迹提取过程中,通过兴趣点的轨迹估计出前景目标的轨迹;具体地,通过式1将第t帧的前景块记为S t
    S t={s t,i}(i=1…n)  (式1)
    其中,n是第t帧分割所得前景块的数量;s t,i表示为式2:
    s t,i={p t,j}(j=1…m)  (式2)
    其中,p t,j是前景块中的第j个兴趣点;
    当前景块
    Figure PCTCN2018080770-appb-100010
    与前景块
    Figure PCTCN2018080770-appb-100011
    满足式3时,则两个前景块属于同一序列:
    Figure PCTCN2018080770-appb-100012
    其中,m 1
    Figure PCTCN2018080770-appb-100013
    中的兴趣点个数,thd是可调节的概率阈值;当
    Figure PCTCN2018080770-appb-100014
    Figure PCTCN2018080770-appb-100015
    在下一帧中追踪到的点时,
    Figure PCTCN2018080770-appb-100016
    等于1,否则为0。
  4. 如权利要求3所述视频中暴力事件检测方法,其特征是,thd值设置为0.2。
  5. 如权利要求1所述视频中暴力事件检测方法,其特征是,采用谱聚类算法对超图进行聚类。
PCT/CN2018/080770 2017-07-26 2018-03-28 一种基于超图转变的视频中暴力事件检测方法 WO2019019678A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/476,898 US10679067B2 (en) 2017-07-26 2018-03-28 Method for detecting violent incident in video based on hypergraph transition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710618257.7 2017-07-26
CN201710618257.7A CN107451553B (zh) 2017-07-26 2017-07-26 一种基于超图转变的视频中暴力事件检测方法

Publications (1)

Publication Number Publication Date
WO2019019678A1 true WO2019019678A1 (zh) 2019-01-31

Family

ID=60489051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/080770 WO2019019678A1 (zh) 2017-07-26 2018-03-28 一种基于超图转变的视频中暴力事件检测方法

Country Status (3)

Country Link
US (1) US10679067B2 (zh)
CN (1) CN107451553B (zh)
WO (1) WO2019019678A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472201A (zh) * 2019-07-26 2019-11-19 阿里巴巴集团控股有限公司 基于区块链的文本相似性检测方法及装置、电子设备
US10909317B2 (en) 2019-07-26 2021-02-02 Advanced New Technologies Co., Ltd. Blockchain-based text similarity detection method, apparatus and electronic device

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451553B (zh) * 2017-07-26 2019-08-02 北京大学深圳研究生院 一种基于超图转变的视频中暴力事件检测方法
US11295455B2 (en) * 2017-11-16 2022-04-05 Sony Corporation Information processing apparatus, information processing method, and program
CN109785214A (zh) * 2019-03-01 2019-05-21 宝能汽车有限公司 基于车联网的安全报警方法和装置
CN110427806A (zh) * 2019-06-20 2019-11-08 北京奇艺世纪科技有限公司 视频识别方法、装置及计算机可读存储介质
US11282509B1 (en) 2019-08-22 2022-03-22 Facebook, Inc. Classifiers for media content
USD988349S1 (en) 2019-08-22 2023-06-06 Meta Platforms, Inc. Display screen or portion thereof with a graphical user interface
US11354900B1 (en) * 2019-08-22 2022-06-07 Meta Platforms, Inc. Classifiers for media content
CN111967362B (zh) * 2020-08-09 2022-03-15 电子科技大学 面向可穿戴设备的超图特征融合和集成学习的人体行为识别方法
CN112102475B (zh) * 2020-09-04 2023-03-07 西北工业大学 一种基于图像序列轨迹跟踪的空间目标三维稀疏重构方法
CN114445913A (zh) * 2021-12-31 2022-05-06 中原动力智能机器人有限公司 暴力行为检测方法、装置、设备及存储介质
CN114529849A (zh) * 2022-01-14 2022-05-24 清华大学 基于姿态时序超图网络的行人重识别方法及装置
CN114996520A (zh) * 2022-05-05 2022-09-02 清华大学 时序数据超图结构的生成方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279737A (zh) * 2013-05-06 2013-09-04 上海交通大学 一种基于时空兴趣点的打架行为检测方法
CN106339716A (zh) * 2016-08-16 2017-01-18 浙江工业大学 一种基于加权欧氏距离的移动轨迹相似度匹配方法
CN106529477A (zh) * 2016-11-11 2017-03-22 中山大学 基于显著轨迹和时空演化信息的视频人体行为识别方法
CN107451553A (zh) * 2017-07-26 2017-12-08 北京大学深圳研究生院 一种基于超图转变的视频中暴力事件检测方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102473238B (zh) * 2009-08-20 2014-08-06 皇家飞利浦电子股份有限公司 用于图像分析的系统和方法
US8520975B2 (en) * 2009-10-30 2013-08-27 Adobe Systems Incorporated Methods and apparatus for chatter reduction in video object segmentation using optical flow assisted gaussholding
US20120137367A1 (en) * 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
KR101348904B1 (ko) * 2012-01-20 2014-01-09 한국과학기술원 고차 상관 클러스터링을 이용한 이미지 분할 방법, 이를 처리하는 시스템 및 기록매체
KR20130091596A (ko) * 2012-02-08 2013-08-19 한국전자통신연구원 비디오 정보를 통한 인간 행동 예측 방법
EP2763077B1 (en) * 2013-01-30 2023-11-15 Nokia Technologies Oy Method and apparatus for sensor aided extraction of spatio-temporal features
US9787640B1 (en) * 2014-02-11 2017-10-10 DataVisor Inc. Using hypergraphs to determine suspicious user activities
US9430840B1 (en) * 2015-07-23 2016-08-30 Mitsubishi Electric Research Laboratories, Inc. Method and system for segmenting an image based on motion vanishing points
US20180365575A1 (en) * 2017-07-31 2018-12-20 Seematics Systems Ltd System and method for employing inference models based on available processing resources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279737A (zh) * 2013-05-06 2013-09-04 上海交通大学 一种基于时空兴趣点的打架行为检测方法
CN106339716A (zh) * 2016-08-16 2017-01-18 浙江工业大学 一种基于加权欧氏距离的移动轨迹相似度匹配方法
CN106529477A (zh) * 2016-11-11 2017-03-22 中山大学 基于显著轨迹和时空演化信息的视频人体行为识别方法
CN107451553A (zh) * 2017-07-26 2017-12-08 北京大学深圳研究生院 一种基于超图转变的视频中暴力事件检测方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472201A (zh) * 2019-07-26 2019-11-19 阿里巴巴集团控股有限公司 基于区块链的文本相似性检测方法及装置、电子设备
CN110472201B (zh) * 2019-07-26 2020-07-21 阿里巴巴集团控股有限公司 基于区块链的文本相似性检测方法及装置、电子设备
US10909317B2 (en) 2019-07-26 2021-02-02 Advanced New Technologies Co., Ltd. Blockchain-based text similarity detection method, apparatus and electronic device
US11100284B2 (en) 2019-07-26 2021-08-24 Advanced New Technologies Co., Ltd. Blockchain-based text similarity detection method, apparatus and electronic device

Also Published As

Publication number Publication date
US20200117907A1 (en) 2020-04-16
CN107451553A (zh) 2017-12-08
US10679067B2 (en) 2020-06-09
CN107451553B (zh) 2019-08-02

Similar Documents

Publication Publication Date Title
WO2019019678A1 (zh) 一种基于超图转变的视频中暴力事件检测方法
Chen et al. Localizing visual sounds the hard way
CN109961051B (zh) 一种基于聚类和分块特征提取的行人重识别方法
Xiao et al. End-to-end deep learning for person search
CN108230364B (zh) 一种基于神经网络的前景物体运动状态分析方法
CN105069434B (zh) 一种视频中人体动作行为识别方法
CN104915655A (zh) 一种多路监控视频的管理方法与设备
Li et al. Robust people counting in video surveillance: Dataset and system
Avgerinakis et al. Recognition of activities of daily living for smart home environments
CN103235944A (zh) 人群流分割及人群流异常行为识别方法
CN106446922B (zh) 一种人群异常行为分析方法
CN103902966B (zh) 基于序列时空立方体特征的视频交互事件分析方法及装置
CN106327526A (zh) 图像目标跟踪方法与系统
CN111738218B (zh) 人体异常行为识别系统及方法
CN107491749A (zh) 一种人群场景中全局和局部异常行为检测方法
CN103996051A (zh) 基于运动特征变化的视频运动对象异常行为自动检测方法
CN103854016A (zh) 基于方向性共同发生特征的人体行为分类识别方法及系统
CN109934852B (zh) 一种基于对象属性关系图的视频描述方法
CN103699874A (zh) 基于surf流和lle稀疏表示的人群异常行为识别方法
CN108986143A (zh) 一种视频中目标检测跟踪方法
WO2013075295A1 (zh) 低分辨率视频的服装识别方法及系统
CN111027377A (zh) 一种双流神经网络时序动作定位方法
CN104200218B (zh) 一种基于时序信息的跨视角动作识别方法及系统
Farooq et al. Unsupervised video surveillance for anomaly detection of street traffic
CN103577804A (zh) 基于sift流和隐条件随机场的人群异常行为识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18837434

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18837434

Country of ref document: EP

Kind code of ref document: A1