CN106503647A

CN106503647A - The accident detection method that structural sparse is represented is approached based on low-rank

Info

Publication number: CN106503647A
Application number: CN201610915766.1A
Authority: CN
Inventors: 刘亚洲; 余博思; 刘柯柯; 孙权森
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2017-03-15

Abstract

The invention discloses an abnormal event detection method based on low-rank approximation structured sparse representation, which includes three processes of feature extraction, training and testing. 1) Extract the multi-scale 3D gradient features of the video sequence; 2) Reduce the dimensionality of the multi-scale 3D gradient features to form a training feature set and a test feature set; 3) Initialize the remaining training features and related parameters; 4) Perform training on the remaining training features Iteratively learn the group sparse dictionary to obtain the normal mode dictionary set; 5) use the group sparse dictionary set obtained by the training process to perform sparse reconstruction on the test features; 6) judge whether the test features are abnormal features according to the reconstruction error. The invention solves the shortcomings of not fully exploiting the low-rank characteristics of video data and slow detection rate in the abnormality detection technology.

Description

Anomaly Event Detection Method Based on Low Rank Approximation Structured Sparse Representation

技术领域technical field

本发明涉及模式识别和视频分析领域，更具体地说，涉及一种基于低秩逼近结构化稀疏表示的异常事件检测方法。The invention relates to the fields of pattern recognition and video analysis, and more specifically, relates to a method for detecting abnormal events based on low-rank approximation structured sparse representation.

背景技术Background technique

视频序列异常事件检测是计算机视觉中的一个活跃的研究课题，已被广泛使用在许多应用中，如人群监控，公共场所检测，交通安全和个人行为异常。面对海量的视频数据，传统的人工标记异常事件费时、低效。因此，自动化和快速的视频序列异常检测方法是迫切需要的。Anomaly event detection in video sequences is an active research topic in computer vision and has been widely used in many applications, such as crowd monitoring, public place detection, traffic safety, and abnormal personal behavior. In the face of massive video data, the traditional manual marking of abnormal events is time-consuming and inefficient. Therefore, automated and fast anomaly detection methods for video sequences are urgently needed.

虽然关于异常事件检测的研究在特征提取、行为建模和异常测量等方面已经取得了很大的进展，但是视频序列异常事件检测仍然是一个非常具有挑战性的任务。首先，对于视频中的异常事件没有精确的定义。一种常见的异常行为识别方法是异常行为模式聚类，另一种是把那些发生率低的检测样本作为异常。第一种方法的困难之处在于没有足够的先验知识来描述异常行为模式；第二种方法需要建立概率模型，异常检测依赖于正常模式的定义和特征的多尺度变化。其次，密集场景中的异常检测要求行为模型可以处理高密度的运动目标，这需要考虑多个目标之间遮掩和相互作用的影响。Although research on anomalous event detection has made great progress in feature extraction, behavior modeling, and anomaly measurement, anomalous event detection in video sequences is still a very challenging task. First, there is no precise definition of an anomalous event in a video. A common abnormal behavior identification method is abnormal behavior pattern clustering, and the other is to treat those detection samples with low occurrence rate as anomalies. The difficulty of the first method is that there is not enough prior knowledge to describe the abnormal behavior pattern; the second method needs to build a probabilistic model, and anomaly detection relies on the definition of normal patterns and the multi-scale variation of features. Second, anomaly detection in dense scenes requires that the behavior model can handle high-density moving targets, which needs to consider the effects of occlusion and interaction between multiple targets.

从特征提取的角度看，异常事件检测方法可以分为基于目标轨迹的方法和基于低层次特征的方法。基于目标轨迹的方法首先进行运动目标跟踪，然后利用目标轨迹来检测异常事件。基于目标轨迹的方法可以清晰地表示各时刻目标的空间状态，但是这一类方法对噪声、遮掩和跟踪错误敏感，不能对密集场景进行异常检测。基于低层次特征的方法通过提取视频序列中像素级别的运动特征和形态特征，可以克服目标轨迹法的缺点。From the perspective of feature extraction, abnormal event detection methods can be divided into object trajectory-based methods and low-level feature-based methods. The method based on object trajectory first tracks the moving object, and then utilizes the object trajectory to detect anomalous events. The method based on the target trajectory can clearly represent the spatial state of the target at each moment, but this type of method is sensitive to noise, occlusion and tracking errors, and cannot detect anomalies in dense scenes. Low-level feature-based methods can overcome the shortcomings of object trajectory methods by extracting pixel-level motion features and morphological features in video sequences.

目前，异常事件检测的主流方法包括动态贝叶斯网络(DBNs)，概率主题模型(PTMs)和稀疏表示模型。在DBNs中，隐藏马尔可夫模型(HMM)和马尔可夫随机场(MRF)随着检测目标的增加会成几何级数地提高建模代价，导致这些模型不足以处理密集的场景。与DBNs相比，PTMs，如PLSA和LDA，只关注空间上共生的视觉单词，却忽略了特征的时序信息，使得概率主题模型不能在时空上定位异常事件。近年来，针对异常检测的稀疏表示模型引起了人们的关注。大多数的稀疏表示模型通过训练得到一个过完备字典，但是没有充分挖掘视频数据的低秩特性和内在结构冗余。Currently, mainstream methods for anomalous event detection include Dynamic Bayesian Networks (DBNs), Probabilistic Topic Models (PTMs) and Sparse Representation Models. In DBNs, Hidden Markov Models (HMMs) and Markov Random Fields (MRFs) increase the modeling cost geometrically as the number of detected objects increases, making these models insufficient to handle dense scenes. Compared with DBNs, PTMs, such as PLSA and LDA, only focus on spatially co-occurring visual words, but ignore the temporal information of features, making probabilistic topic models unable to localize anomalous events in space and time. In recent years, sparse representation models for anomaly detection have attracted attention. Most sparse representation models are trained with an over-complete dictionary, but do not fully exploit the low-rank characteristics and inherent structural redundancy of video data.

发明内容Contents of the invention

本发明的目的在于，针对上述异常检测技术中未充分挖掘视频数据的低秩特性和检测速率较慢的缺点，提出一种基于低秩逼近结构化稀疏表示的异常事件检测方法。The object of the present invention is to propose an abnormal event detection method based on low-rank approximation structured sparse representation in view of the shortcomings of the low-rank characteristics of video data and the slow detection rate in the above-mentioned anomaly detection technology.

实现本发明目的的技术解决方案为：一种基于低秩逼近结构化稀疏表示的异常事件检测方法，包括特征提取、训练和测试三个过程：The technical solution to realize the purpose of the present invention is: a method for abnormal event detection based on low-rank approximation structured sparse representation, including three processes of feature extraction, training and testing:

特征提取过程包括以下步骤：The feature extraction process includes the following steps:

1)提取视频序列的多尺度三维梯度特征；1) Extract multi-scale three-dimensional gradient features of video sequences;

2)对多尺度三维梯度特征进行降维，形成训练特征集和测试特征集。2) Dimensionality reduction is performed on multi-scale 3D gradient features to form training feature sets and test feature sets.

训练过程包括以下步骤：The training process includes the following steps:

3)初始化剩余训练特征和相关参数；3) Initialize the remaining training features and related parameters;

4)对剩余训练特征进行迭代学习组稀疏字典，获得正常模式字典集。4) Iteratively learn the group sparse dictionary on the remaining training features to obtain the normal mode dictionary set.

测试过程包括以下步骤：The testing process includes the following steps:

5)利用由训练过程获取的组稀疏字典集，对测试特征进行稀疏重建；5) Sparsely reconstruct the test features using the group sparse dictionary set obtained by the training process;

6)根据重建误差，判断测试特征是否为异常特征。6) According to the reconstruction error, it is judged whether the test feature is an abnormal feature.

上述方法中，所述步骤1)包括以下具体步骤：In the above method, said step 1) includes the following specific steps:

1.1)对视频序列的每一帧图像进行不同尺度的缩放，形成一个三层图像金字塔。1.1) Each frame of the video sequence is scaled to different scales to form a three-layer image pyramid.

1.2)对每一层图像进行时空立方体采样，提取空间上不重叠区域的三维梯度特征。1.2) Sampling the spatio-temporal cube of each layer image to extract the 3D gradient features of spatially non-overlapping regions.

1.3)针对每一层视频序列，将同一空间区域上连续5帧的三维梯度特征叠加在一起，组成一个时空特征。1.3) For each layer of video sequence, the 3D gradient features of 5 consecutive frames on the same spatial region are superimposed together to form a spatio-temporal feature.

上述方法中，所述步骤2)包括以下具体步骤：In the above method, said step 2) includes the following specific steps:

2.1)利用主成分分析(PCA)，对上述提取的每一个时空特征进行降维。2.1) Use Principal Component Analysis (PCA) to reduce the dimensionality of each of the above extracted spatio-temporal features.

2.2)利用上述方法，将训练视频序列和测试视频序列转换为训练特征集和测试特征集。2.2) Using the above method, convert the training video sequence and the testing video sequence into a training feature set and a testing feature set.

上述方法中，所述步骤3)包括以下具体步骤：In the above method, said step 3) includes the following specific steps:

3.1)将步骤2.2)获取的训练特征集初始化为剩余训练特征集；3.1) Initialize the training feature set obtained in step 2.2) as the remaining training feature set;

3.2)初始化正则化参数、误差阈值、迭代次数和正常模式字典集；3.2) Initialize regularization parameters, error threshold, number of iterations and normal mode dictionary set;

上述方法中，所述步骤4)包括以下具体步骤：In the above method, said step 4) includes the following specific steps:

4.1)如果剩余特征集为空，训练过程结束；如果剩余特征集不为空，确定聚类数目，对剩余特征集进行K均值聚类。4.1) If the remaining feature set is empty, the training process ends; if the remaining feature set is not empty, determine the number of clusters, and perform K-means clustering on the remaining feature set.

4.2)分别对每一个特征聚类进行字典学习，得到组稀疏字典。4.2) Carry out dictionary learning for each feature cluster respectively to obtain group sparse dictionary.

4.3)挑选合适字典去表示剩余特征。如果字典可以表示剩余特征，则将该字典加入正常模式字典集，并将可以用该字典表示的特征从剩余训练特征集中移除；如果字典不可以表示任意一个剩余特征，则丢弃该字典。4.3) Choose a suitable dictionary to represent the remaining features. If the dictionary can represent the remaining features, the dictionary is added to the normal mode dictionary set, and the features that can be represented by the dictionary are removed from the remaining training feature set; if the dictionary cannot represent any of the remaining features, the dictionary is discarded.

4.4)迭代次数加1，跳到步骤4.1)；4.4) Add 1 to the number of iterations, skip to step 4.1);

上述方法中，所述步骤5)包括以下具体步骤：In the above method, said step 5) includes the following specific steps:

5.1)对于每一个测试特征，遍历由训练过程获取的正常模式字典集，估算字典对应的稀疏系数；5.1) For each test feature, traverse the normal pattern dictionary set obtained by the training process, and estimate the sparse coefficient corresponding to the dictionary;

5.2)根据特定字典及其对应的稀疏系数，计算该测试特征对应于该组稀疏字典的重建误差；5.2) According to specific dictionary and its corresponding sparse coefficient, calculate the reconstruction error corresponding to this group of sparse dictionary of this test feature;

上述方法中，所述步骤6)包括以下具体步骤：In the above method, said step 6) includes the following specific steps:

6.1)对于某一测试特征，如果找到一个对应重建误差小于重建阈值的字典，则判断该测试特征为正常特征；6.1) For a certain test feature, if a dictionary whose corresponding reconstruction error is less than the reconstruction threshold is found, the test feature is judged to be a normal feature;

6.2)对于某一测试特征，如果其对应的任意字典集的重建误差均大于重建阈值，则判断该测试特征为异常特征；6.2) For a certain test feature, if the reconstruction error of any dictionary set corresponding to it is greater than the reconstruction threshold, then it is judged that the test feature is an abnormal feature;

本发明与现有技术相比，其显著优点：其一，因为每个视频特征聚类的低秩特性，通过利用高效的低秩解决算法，如SVD阈值算法，该方法可以学习正常行为模式的组稀疏字典；其二，该方法自适应地决定每个正常行为模式的字典基数目，可以更准确地进行动态场景语义理解；其三，不同于传统的稀疏表示方法，该方法通过在一个字典集中挑选一个合适的字典，来表示测试样本，使得视频事件的稀疏重建更精准，显著提高了检测速度，保证实时性。Compared with the prior art, the present invention has significant advantages: First, because of the low-rank characteristic of each video feature cluster, by utilizing an efficient low-rank solution algorithm, such as the SVD threshold algorithm, the method can learn the normal behavior pattern group sparse dictionary; second, this method adaptively determines the number of dictionary bases for each normal behavior pattern, which can more accurately understand the semantics of dynamic scenes; third, different from traditional sparse representation methods, this method uses a dictionary Concentrate on selecting an appropriate dictionary to represent the test samples, making the sparse reconstruction of video events more accurate, significantly improving the detection speed and ensuring real-time performance.

附图说明Description of drawings

图1为自适应异常事件检测方法概览图。Figure 1 is an overview of adaptive anomaly event detection methods.

图2为三维时空梯度特征提取流程图。Fig. 2 is a flow chart of three-dimensional spatio-temporal gradient feature extraction.

图3为多尺度视频帧。Figure 3 shows multi-scale video frames.

图4为三维梯度特征。Figure 4 is a three-dimensional gradient feature.

图5为空间区域上重叠的时空立方体。Figure 5 is a space-time cube overlaid on a spatial region.

图6为组稀疏字典学习流程图。Fig. 6 is a flow chart of group sparse dictionary learning.

图7为异常事件检测流程图。Fig. 7 is a flow chart of abnormal event detection.

具体实施方式detailed description

下面结合附图对本发明作进一步详细的说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.

本发明的异常事件检测方法包括特征提取过程、训练过程和测试过程三个主要过程，如图1所示。The abnormal event detection method of the present invention includes three main processes: a feature extraction process, a training process and a testing process, as shown in FIG. 1 .

特征提取过程如图2所示，包括以下具体步骤：The feature extraction process is shown in Figure 2, including the following specific steps:

视频序列图像帧转换为三维图像金字塔过程21。将视频序列的每一帧图像转换为灰度图像，并将每一帧灰度图像缩放为三种不同尺度：20×20,30×40和120×160，形成一个三层图像金字塔。对于图像金字塔的每个尺度，每一帧被划分为同样空间大小(10×10)的不重叠区域，如图3所示。Video sequence image frames are converted into 3D image pyramid process 21 . Convert each frame of the video sequence to a grayscale image, and scale each frame of the grayscale image to three different scales: 20×20, 30×40 and 120×160, forming a three-level image pyramid. For each scale of the image pyramid, each frame is divided into non-overlapping regions of the same spatial size (10×10), as shown in Figure 3.

提取视频序列三维梯度特征过程22。为了兼顾密集场景中目标的形态特征和运动特征，采用三维时空梯度作为特征提取的目标，如图4所示。假设x,y和t分别表示视频序列中的水平、垂直和时间方向，G是一个像素的三维梯度，它对应的三维投影分别是G_x、G_y和G_t。The process of extracting the 3D gradient feature of the video sequence 22. In order to take into account the morphological features and motion features of objects in dense scenes, the three-dimensional space-time gradient is used as the target of feature extraction, as shown in Figure 4. Assuming that x, y and t represent the horizontal, vertical and time directions in the video sequence respectively, G is a three-dimensional gradient of a pixel, and its corresponding three-dimensional projections are G _x , G _y and G _t respectively.

利用时空立方体采样形成时空特征过程23。连续5帧上的同一空间区域组成时空立方体，每一个时空立方体由一个大小为10×10×5的立方体中的所有像素构成，l_x×l_y＝10×10和l_t＝5分别表示时空立方体的空间大小和时序长度。一个时空立方体中的所有像素的三维梯度构成一个三维时空梯度特征。为了保持场景事件的时序信息，在每一个空间区域上采样时序上重叠的时空立方体，如图5所示。假设一个时空立方体的体心为(S_x,S_y,S_t)，那么其时序上相邻的时空立方体位于(S_x,S_y,S_t-l_t/2)和(S_x,S_y,S_t+l_t/2)。A spatiotemporal feature process 23 is formed using spatiotemporal cube sampling. The same spatial region on 5 consecutive frames forms a space-time cube, each space-time cube is composed of all pixels in a cube with a size of 10×10×5, l _x ×l _y =10×10 and l _t =5 represent the space-time The spatial size and timing length of the cube. The 3D gradients of all pixels in a spatiotemporal cube constitute a 3D spatiotemporal gradient feature. To preserve the timing information of scene events, temporally overlapping space-time cubes are sampled on each spatial region, as shown in Figure 5. Assuming that the body center of a space-time cube is (S _x ,S _y ,S _t ), then its temporally adjacent space-time cubes are located at (S _x ,S _y ,S _t -l _t /2) and (S _x ,S _y , S _t + l _t /2).

利用时空特征降维形成训练特征集和测试特征集过程24。由于上述三维梯度特征的维数(10×10×5×3＝1500)过高，会导致训练过程和测试过程计算量过大，影响检测的实时性，所以使用PCA将时空特征降到100维。最后，得到训练特征集和测试特征集。Form training feature set and test feature set process 24 by using spatio-temporal feature dimensionality reduction. Since the dimension of the above-mentioned three-dimensional gradient feature (10×10×5×3=1500) is too high, it will lead to too much calculation in the training process and testing process, which will affect the real-time performance of detection, so PCA is used to reduce the spatio-temporal feature to 100 dimensions . Finally, the training feature set and test feature set are obtained.

需要注意的是，训练视频序列中只包含正常事件；测试视频序列既包含正常事件，也包含异常事件。It should be noted that only normal events are included in the training video sequence; the test video sequence contains both normal events and abnormal events.

训练过程是指通过已知的正常行为样本集，学习正常行为模式的过程。包括以下具体步骤：The training process refers to the process of learning normal behavior patterns through known normal behavior sample sets. Include the following specific steps:

本发明采用迭代的方式，分别对每个空间位置上的所有特征进行训练，得到每个空间位置上的一个组稀疏字典集，用来表示该空间位置上的所有训练特征。对于每个空间位置上的所有特征，利用K均值得到相似特征聚类，然后对每个相似特征聚类采用低秩逼近算法进行字典学习。每次迭代将可以用字典表示的特征去除，然后对剩余特征继续下一次迭代的字典学习，直到所有特征都可以被训练得到的字典表示。The present invention adopts an iterative method to train all the features on each spatial position respectively, and obtain a group of sparse dictionary sets on each spatial position, which is used to represent all the training features on the spatial position. For all features at each spatial location, similar feature clusters are obtained using K-means, and then a low-rank approximation algorithm is used for dictionary learning for each similar feature cluster. Each iteration removes the features that can be represented by the dictionary, and then continues the dictionary learning of the next iteration for the remaining features until all the features can be represented by the trained dictionary.

初始化训练参数过程61。剩余训练特征集初始化为由特征提取过程得到的训练特征集，即X＝[x₁,x₂,...,x_n]，其中n为特征数目，m为每个特征的维数，正则化参数τ＝0.015、误差阈值T＝0.01、迭代次数j＝1和正常模式字典集 Initialize training parameters process 61 . The remaining training feature set is initialized as the training feature set obtained by the feature extraction process, that is, X=[x ₁ ,x ₂ ,...,x _n ], where n is the number of features, m is the dimension of each feature, the regularization parameter τ=0.015, the error threshold T=0.01, the number of iterations j=1 and the normal mode dictionary set

利用K均值进行剩余训练特征集聚类过程62。根据剩余训练特征的数目，首先自适应地选择聚类数目，再对剩余特征集进行K均值聚类。假设第j^th次迭代中第c^th个聚类的特征表示为其中j＝1,2,...,N，表示第c^th个聚类的特征数目，函数f(·)把聚类中的特征序号映射到该特征在初始特征集中的序号。The remaining training feature set clustering process 62 is performed using K-means. According to the number of remaining training features, first adaptively select the number of clusters, and then perform K-means clustering on the remaining feature sets. Assume that the feature representation of the ^cth cluster in the ^jth iteration is where j=1,2,...,N, Indicates the number of features of the c ^th cluster, and the function f(·) maps the serial number of the feature in the cluster to the serial number of the feature in the initial feature set.

如图6，利用低秩逼近算法学习组稀疏字典过程63。不同于传统的稀疏表示模型，本发明利用视频特征聚类的低秩结构，采用低秩逼近算法，学习包含高度相关字典基的组稀疏字典，从而抛弃视频数据中的冗余信息。组稀疏字典学习的目标函数如下：As shown in FIG. 6 , the process 63 of learning a group sparse dictionary using a low-rank approximation algorithm. Different from the traditional sparse representation model, the present invention utilizes the low-rank structure of video feature clustering and adopts a low-rank approximation algorithm to learn group sparse dictionaries containing highly correlated dictionary bases, thereby discarding redundant information in video data. The objective function for group sparse dictionary learning is as follows:

其中，采用奇异值分解，τ是正则化参数，ω_i是第i个奇异值λ_i的权重。上述加权低秩优化问题采用奇异值阈值化算法，公式(1)的闭合解如下：in, Using singular value decomposition, τ is the regularization parameter, and ω _i is the weight of the ith singular value λ _i . The above weighted low-rank optimization problem adopts the singular value thresholding algorithm, and the closed solution of formula (1) is as follows:

其中，r是通过软阈值操作估算的秩。软阈值操作作用于每个奇异值，小于τ·ω_i的奇异值被置为0，大于τ·ω_i的奇异值仍保持原值。in, r is operated by soft thresholding Estimated rank. The soft threshold operation acts on each singular value, the singular value smaller than τ·ω _i is set to 0, and the singular value greater than τ·ω _i remains the original value.

选择组稀疏字典过程64。一旦确定对于任一特征目标函数如下：Select group sparse dictionary process 64 . Once determined for any feature The objective function is as follows:

其中，表示的稀疏系数向量，表示特征是否能够被字典表示，约束项∑γ_i,j＝1和γ_i,j＝{0,1}用于保证只能挑选一个字典来表示表示用字典表示特征聚类的系数向量，T表示误差阈值。γ的闭合解如下：in, express The sparse coefficient vector of , Express features Is it possible to be a dictionary Indicates that the constraint items ∑γ _i,j =1 and γ _i,j ={0,1} are used to ensure that only one dictionary can be selected to represent represent a dictionary Represent feature clustering A vector of coefficients, T represents the error threshold. The closed solution of γ is as follows:

如果可以表示则把字典加入字典集合D，保留该字典对应的奇异值信息，使得迭代次数j＝j+1，并把稀疏系数的特征从剩余特征集X^j中移除；如果不可以表示任意一个剩余特征则丢弃字典 if can express then the dictionary Add the dictionary set D, retain the singular value information corresponding to the dictionary, so that the number of iterations j=j+1, and the sparse coefficient Characteristics Removed from the remaining feature set X ^j ; if Cannot represent any of the remaining features then discard the dictionary

迭代进行过程63和64，直到剩余特征集为空集。Processes 63 and 64 are iteratively performed until the remaining feature set is an empty set.

最后，取得正常特征的组稀疏字典集，每个字典用于表示正常行为模式。注意，由于训练特征集对应的视频序列只包含正常事件，所以学习到的字典均表示正常行为模式。Finally, a set of sparse dictionaries of normal features is obtained, each dictionary is used to represent a normal behavior pattern. Note that since the video sequences corresponding to the training feature set only contain normal events, the learned dictionaries all represent normal behavior patterns.

如图7，测试过程是指通过训练过程学习到的正常模式字典集，检测测试样本是否为异常样本的过程。包括以下具体步骤：As shown in Figure 7, the test process refers to the process of detecting whether the test sample is an abnormal sample from the normal pattern dictionary set learned through the training process. Include the following specific steps:

初始化测试参数过程71。初始化正常行为模式字典集为由训练过程得到的字典集，即D＝[D₁,D₂,...D_N]，重建误差阈值为。Initialize the test parameters process 71 . Initialize the dictionary set of normal behavior patterns as the dictionary set obtained by the training process, that is, D=[D ₁ , D ₂ ,...D _N ], and the reconstruction error threshold is .

计算测试特征的重建误差过程72。针对任意一个测试特征x，对每个字典D_k计算重建误差，计算方法如下：Compute the reconstruction error process 72 for the test features. For any test feature x, calculate the reconstruction error for each dictionary D _k , the calculation method is as follows:

公式(5)中β_k的最优解为:The optimal solution of β _k in formula (5) is:

此时重建误差为：At this point the reconstruction error is:

判断测试特征是否为异常特征过程73。通过搜索一个合适的字典来表示该测试样本，进而利用重建误差来判断该测试样本是否为异常事件。如果重建误差小于重建误差阈值，则表明测试特征x可以被字典D_k表示，即测试特征x属于正常事件；否则，测试特征x不能被字典D_k表示。如果正常模式字典集中的所有字典均不能表示测试特征x，则测试特征x属于异常事件。Judging whether the test feature is an abnormal feature process 73 . A suitable dictionary is searched to represent the test sample, and then the reconstruction error is used to judge whether the test sample is an abnormal event. If the reconstruction error is less than the reconstruction error threshold, it indicates that the test feature x can be represented by the dictionary _Dk , that is, the test feature x belongs to a normal event; otherwise, the test feature x cannot be represented by the dictionary _Dk . If none of the dictionaries in the normal pattern dictionary set can represent the test feature x, the test feature x is an abnormal event.

这里需要着重指出，相比于目前最先进的算法，本发明采用迭代低秩逼近方法，取得了至少2％的检测正确率提升。通过检索正常模式字典集的方法，本发明的检测速度可以较传统异常检测方法提升20倍以上。另外，相比于传统稀疏表示方法，本发明采用SVD阈值算法可以减少至少10倍的训练时间。It should be pointed out here that, compared with the most advanced algorithms at present, the present invention adopts an iterative low-rank approximation method, which improves the detection accuracy by at least 2%. Through the method of retrieving the normal mode dictionary set, the detection speed of the present invention can be increased by more than 20 times compared with the traditional anomaly detection method. In addition, compared with the traditional sparse representation method, the present invention can reduce the training time by at least 10 times by adopting the SVD threshold algorithm.

Claims

1. An abnormal event detection method based on low-rank approximation structured sparse representation is characterized by comprising the following steps of: the method comprises three processes of feature extraction, training and testing:

the feature extraction process comprises the following steps:

1) extracting multi-scale three-dimensional gradient features of a video sequence;

2) reducing the dimension of the multi-scale three-dimensional gradient feature to form a training feature set and a testing feature set;

the training process comprises the following steps:

3) initializing residual training characteristics and related parameters;

4) performing iterative learning on the residual training features to form a sparse dictionary, and obtaining a normal mode dictionary set;

the test process comprises the following steps:

5) carrying out sparse reconstruction on the test features by utilizing a group sparse dictionary set obtained in the training process;

6) and judging whether the test feature is an abnormal feature or not according to the reconstruction error.

2. The method for detecting the abnormal events based on the low-rank approximation structured sparse representation as claimed in claim 1, wherein: the step 1) comprises the following specific steps:

1.1) zooming each frame image of the video sequence in different scales to form a three-layer image pyramid;

1.2) performing space-time cube sampling on each layer of image, and extracting three-dimensional gradient features of non-overlapped regions in space;

1.3) for each layer of video sequence, superposing three-dimensional gradient features of 5 continuous frames in the same spatial region together to form a space-time feature.

3. The abnormal event detection method based on the low-rank approximation structured sparse representation as claimed in claims 1 and 2, wherein: the step 2) comprises the following specific steps:

2.1) carrying out dimensionality reduction on each extracted space-time feature by using Principal Component Analysis (PCA);

2.2) converting the training video sequence and the test video sequence into a training characteristic set and a test characteristic set by using the method.

4. The method for detecting abnormal events based on the low-rank approximation structured sparse representation according to claim 1 or 3, wherein: the step 3) comprises the following specific steps:

3.1) initializing the training feature set obtained in the step 2.2) into a residual training feature set;

3.2) initializing a regularization parameter, an error threshold, iteration times and a normal pattern dictionary set.

5. The method for detecting abnormal events based on the low-rank approximation structured sparse representation as claimed in claim 1, wherein: the step 4) comprises the following specific steps:

assuming that the number of iterations j is 1, the remaining training features are denoted as X^j＝[x₁,x₂,...,x_n]Whereinn is the number of features, m is the dimension of each feature, the regularization parameter is τ, the normal pattern dictionary set is

4.1) determining the number of clusters C^jFor the remaining feature set X^jPerforming K-means clustering, at this time j^thC in the sub-iteration^thThe characteristics of the individual clusters are represented asWherein j is 1, 2., N,denotes the c^thThe feature number of each cluster, and a function f (-) maps the feature serial number in each cluster to the serial number of the feature in the initial feature set;

4.2) using formula (1) and formula (2):

(D_{c}^{j}, Λ_{c}^{j}, V_{c}^{j}) = \arg \underset{D, Λ, V}{m i n} | | X_{c}^{j} - D_{c}^{j} Λ_{c}^{j} {(V_{c}^{j})}^{T} | |_{2}^{2} + τ Σ_{i = 1}^{n_{c}^{j}} ω_{i} λ_{i} - - - (1)

\{\begin{matrix} (D_{c}^{j}, Λ_{c}^{j}, V_{c}^{j}) = s v d (X_{c}^{j}) \\ {\hat{λ}}_{i} = S_{τ \cdot ω_{i}} (λ_{i}) \end{matrix} - - - (2)

performing dictionary learning on all the clusters of the residual features to obtain the jth^thIn the next iteration C^jDictionary set for individual feature clusteringWherein,using singular value decomposition, τ being a regularization parameter, ω_iIs the ith singular value λ_iThe weight of (a) is determined,r is operated by soft thresholdAn estimated rank; soft threshold operation on each singular value, less than τ · ω_iIs set to 0 and is greater than tau omega_iThe singular value of (A) still keeps the original value;

4.3) Once determinedFor any remaining featureSelecting a dictionary by using formula (3) and formula (4)To represent the remaining features;

\begin{matrix} \min_{γ} Σ_{i = 1}^{n_{c}^{j}} γ_{f (i, c)}^{j} (| | x_{f (i, c)}^{j} - D_{c}^{j} β_{f (i, c)}^{j} | |_{2}^{2} - T) \\ s . t . Σ_{j = 1}^{N} γ_{f (i, c)}^{j} = 1, γ_{f (i, c)}^{j} = {0, 1} \end{matrix} - - - (3)

wherein,to representThe sparse coefficient vector of (a) is,representation featureWhether or not to be able to be usedRepresents, constraint term ∑ gamma_i,j1 and γ_i,jWith {0,1} being used to ensure that only one dictionary can be picked for representation Dictionary for representationRepresenting feature clustersT denotes an error threshold;

if it is notCan representThen the dictionaryAdding dictionary set D and thinning out coefficientIs characterized byFrom the remaining feature set X^jRemoving; if it is notMay not represent any of the remaining featuresThen the dictionary is discarded

4.4) the number of iterations plus 1, j ═ j + 1;

4.5) when the residual feature set is an empty set, the group sparse dictionary learning process is ended.

6. The method for detecting abnormal events based on the low-rank approximation structured sparse representation as claimed in claim 1, wherein: the step 5) comprises the following specific steps:

suppose the normal pattern dictionary set is D ═ D₁,D₂,...D_N]；

5.1) given a test feature x, for a dictionary D_k(k ═ 1, 2.. N) the reconstruction error was calculated:

\underset{β_{k}}{m i n} | | x - D_{k} β_{k} | |_{2}^{2} - - - (5)

5.2) solving β in equation (5)_kThe optimal solution of (2):

{\hat{β}}_{k} = {({D_{k}}^{T} D_{k})}^{- 1} {D_{k}}^{T} x - - - (6)

5.3) calculating the reconstruction error of the test feature x by using the formula (7):

E_{k} = | | x - D_{k} {\hat{β}}_{k} | |_{2}^{2} = | | x - D_{k} {({D_{k}}^{T} D_{k})}^{- 1} {D_{k}}^{T} x | |_{2}^{2} - - - (7)

if the reconstruction error is less than the reconstruction error threshold, it indicates that the test feature x can be stored in the dictionary D_kIndicating that the test feature x belongs to a normal event; otherwise, the test feature x cannot be used by dictionary D_kRepresenting that k is k +1, return to step 5.1);

5.4) if all dictionaries in the normal pattern dictionary set can not represent the test feature x, the test feature x belongs to the abnormal event.