CN106033548B

CN106033548B - Crowd abnormity detection method based on improved dictionary learning

Info

Publication number: CN106033548B
Application number: CN201510112141.7A
Authority: CN
Inventors: 袁媛; 卢孝强; 冯亚闯
Original assignee: XiAn Institute of Optics and Precision Mechanics of CAS
Current assignee: XiAn Institute of Optics and Precision Mechanics of CAS
Priority date: 2015-03-13
Filing date: 2015-03-13
Publication date: 2021-04-20
Anticipated expiration: 2035-03-13
Also published as: CN106033548A

Abstract

The invention discloses a crowded crowd abnormity detection method based on improved dictionary learning, which mainly solves the problems that typical event codes are not focused in the dictionary learning process, and similar sample codes have larger difference. The method comprises the following implementation steps: (1) extracting event characteristics; (2) mining typical event types in the training data; (3) learning the relation among the training samples; (4) constructing an objective function and learning a dictionary; (5) detecting an anomalous sample on the test video; (6) and (5) counting the experimental result and calculating the accuracy of the anomaly detection algorithm. Compared with the existing method, the method explores the potential typical event types in the video data, so that the learned dictionary is more suitable for a specific video set, and the distinctiveness of the abnormal events is increased. Meanwhile, the spatial information of the training data is effectively utilized, the effectiveness of coding is improved, the precision of anomaly detection is improved, and the method can be used in the fields of public safety intelligent management, military reconnaissance, criminal investigation assistance and the like.

Description

Crowd abnormity detection method based on improved dictionary learning

Technical Field

The invention belongs to the technical field of information processing, and particularly relates to an abnormal event detection technology which can be used in the fields of public safety, intelligent management, military reconnaissance, criminal investigation assistance and the like.

Background

In recent years, with the continuous enhancement of security protection consciousness of the whole society and the rapid development of computer technologies such as image processing, machine vision, network transmission and the like, intelligent video monitoring is developed vigorously. The current video surveillance is mainly done by people. Scientific research shows that when security personnel simultaneously face dozens or even hundreds of camera screens, fatigue and vague nerves are easy to generate after 10 minutes, and 90% of video information is leaked. Meanwhile, the method of searching mass data by manpower after the fact is time-consuming and labor-consuming (researches show that 99.9% of manpower is wasted), and even the timeliness is delayed. The video abnormal event detection technology is a key subject in the intelligent video monitoring technology, and is applied to specific scenes, such as hospitals, traffic intersections, banks, parking lots, shopping malls, airports, forests and public places with more people. The method aims to detect a small number of abnormal events from most normal events of video data by adopting technologies such as image processing, computer vision and the like, and send out alarm signals in time so as to be convenient for staff to deal with.

The crowded people are the most challenging scenes in the abnormal event detection problem, mainly because the moving objects in the scenes are more, the objects move complexly, and frequent shielding occurs among the objects. At present, existing congestion scene anomaly detection algorithms include the following two types:

one is a hybrid dynamic texture-based approach that attempts to extract a set of potential dynamic textures to model a given video sequence. Mahadevan et al, in the literature "V.Mahadevan, W.Li, V.Bhalodia, and N.Vasconce, analysis Detection in Crowdedes genes. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 1975-. The method utilizes the mixed dynamic texture to express the appearance and the dynamic of the video block. In the time dimension, video blocks with low model expression probability are considered as abnormal blocks; in the spatial dimension, a video block whose representation coefficients do not coincide with the surrounding block coefficients is considered to be anomalous. The method fully considers the apparent and dynamic changes of the video, and simultaneously analyzes in a time dimension and a space dimension. However, due to the complexity of crowded people, a plurality of dynamic textures are required to be mixed for sufficient expression, so that the computational complexity is very high.

The second is a sparse expression-based method. The core idea of this method is to make the reconstruction error of the training sample on the set of bases very small by learning an over-complete set of normal bases. Cong et al propose a dictionary learning method in the documents "Y.Cong, J.Yuan, and J.Liu, Sparse retrieval cost for the analysis of the normal event detection. in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 3449-. The method considers that the selected bases with high frequency are more likely to be required normal bases, and meanwhile, the weight of the base coefficient is reduced in the target function sparsity constraint. However, this method does not consider what events are mainly contained in the video and the encoding of these events. The influence of the distribution of the samples on the coding is not considered, and the learned dictionary is over-complete, so that similar samples can have coding with very large difference, and the accuracy of anomaly detection is reduced.

Disclosure of Invention

The invention aims to provide an anomaly detection method based on improved dictionary learning, aiming at the defects of the existing method. The method learns typical event types from raw training data while mining distribution information of a sample space.

The technical scheme for realizing the purpose of the invention is as follows:

a crowd abnormity detection method based on improved dictionary learning is characterized in that: the method comprises the following steps:

(1) extracting video event characteristics;

(1a) dividing the video into a training video and a testing video;

(1b) dividing each frame of training video and test video into a plurality of superpixels by using a superpixel division algorithm;

(1c) calculating optical flow graphs of each frame of training video and testing video, and counting optical flow information in each super pixel to obtain optical flow histogram features; respectively forming a training sample set X and a test sample set Y by the optical flow histogram features of the training video and the test video;

(2) mining typical event types in the training sample set X data obtained in the step (1 c);

(2a) training a Gaussian mixture model on a training sample set X; the class center mu in the Gaussian mixture model is a potential typical event type in the data; each type of typical event has a corresponding typical event code beta;

(2b) the code alpha of each training sample should be similar to the code beta of the corresponding typical event, and the degree of similarity is determined by the correlation coefficient between the two, i.e.

j∈C_kThe resulting code should satisfy the following objective formula

Wherein C is_kRepresents the kth event type;

(3) learning the spatial distribution of the samples in the training sample set X obtained in the step (1 c);

calculating the distance between training samples, and selecting n training samples closest to each training sample to construct a graph model; the weights of the edges in the graph model are calculated by

Where σ is a scale parameter of the Gaussian distribution;

expanding the graph model relation among the training samples to the codes alpha of the training samples, so that the similar training samples have similar codes, and the similarity degree is W_ijDetermination of W_ijLarger value indicates x_iAnd x_jThe more similar, the more specifically the following target formula can be expressed

(4) Constructing a dictionary learning framework and learning the improved dictionary;

synthesizing the target formulas in the steps (2) and (3) to obtain a final target function as follows

Wherein D, A ═ { α ═ α _j1,2 … and beta are the required dictionary, the encoding of the training sample set and the encoding of the typical event, respectively;

is the graph laplacian matrix; λ, η and ν are weight parameters that regulate the respective terms;

optimizing the above formula through an accelerated approximate gradient algorithm to obtain an expected dictionary D, the codes alpha of the training samples and the codes beta of the typical events;

(5) estimating the event type of the test sample Y in the test sample set Y obtained in the step (1), and calculating the code of the event type;

(5a) substituting the test sample y into the Gaussian mixture model obtained in the step (2), and estimating the corresponding event type;

(5b) encoding the test sample according to the event type estimated in the step (5a), and utilizing the following target formula

Solving codes s of the test sample y, wherein D and beta are respectively the dictionary and the typical event codes obtained by learning in the step (4); λ, η are the weighting parameters that regulate the terms; c is the correlation coefficient encoding s and β.

Preferably, the method further comprises a step (6) after the step:

calculating the reconstruction error of the test sample, and calculating the detection precision of the algorithm;

(6a) the reconstruction error of the test sample y can be expressed as

(6b) Calculating the reconstruction error of each super pixel through the above formula, and giving a threshold value xi to judge whether each super pixel is abnormal;

the detection precision of the algorithm adopts the following two indexes:

(6b1) precision of the frame layer; considering an abnormal frame as a correct detection as long as one pixel is detected on the frame;

(6b2) the precision of the pixel layer; only more than 40% of the anomalous pixels on an anomalous frame are accurately detected is the frame considered to be correctly detected.

Preferably, in step (3) above, n is equal to 10.

The invention has the advantages that:

according to the method, because typical event types are mined from the training data, normal samples in the homologous test data are better expressed, the difference between the abnormal sample and the normal sample codes is increased, and the abnormal samples are easier to distinguish. And meanwhile, the similarity among samples is explored, so that the codes among similar samples are similar, and the reliability of the codes is improved.

The steps performed by the present invention are described in further detail below with reference to the following figures:

drawings

FIG. 1 is a flow chart of an anomaly detection method based on improved dictionary learning according to the present invention;

FIG. 2(a) is a frame-level ROC curve obtained by running the present invention on a data set;

FIG. 2(b) is a graph of the ROC curve for a pixel layer obtained by running the present invention on a data set;

FIG. 3 is a visualization of the results of the present invention running on a data set.

Detailed Description

Referring to fig. 1, the steps implemented by the present invention are as follows:

step 1, extracting event characteristics.

(1a) Dividing the video into a training video and a testing video;

(1b) each frame of video is divided into a plurality of super pixels by using a super pixel division algorithm, so that the target content of each super pixel is basically similar.

(1c) And calculating an optical flow graph of each frame of video, and counting optical flow information in each super pixel to obtain an HOF (optical flow histogram) feature. The HOF features of superpixels in the training and testing videos respectively form a training sample set X and a testing sample set Y.

And 2, mining typical event types.

(2a) And clustering on the training sample set X by using a Gaussian mixture model algorithm to obtain parameter information of the Gaussian mixture model. The Gaussian mixture model algorithm not only considers the relation between the sample and the class center, but also considers the information of the clustering size. And taking the obtained class center mu as a potential typical event type in the training video, wherein each training sample has an event category.

(2b) The code β of a typical event μ is learned, and the code α of each sample should be similar to the corresponding β. The degree of similarity is determined by the correlation coefficient between the two, i.e.

j∈C_k(C_kIs the kth event type). A larger value of c indicates that x is more similar to μ. Therefore, the desired code should satisfy the following formula

And 3, learning the spatial distribution of the samples.

Calculating the distance between training samples, and selecting 10 training samples closest to each training sample to construct a graph model; the weights of the edges in the graph model are calculated by

Where σ is a scale parameter of the Gaussian distribution;

expanding the graph model relation among the training samples to the code alpha of the training samples, so that similar training is realizedSamples have similar codes, with the degree of similarity being given by W_ijDetermination of W_ijLarger value indicates x_iAnd x_jThe more similar, the more specifically the following target formula can be expressed

And 4, constructing a dictionary learning framework and learning the improved dictionary.

The final objective function obtained by integrating steps 2 and 3 is as follows

Wherein D, A ═ { α ═ α _j1,2 … and beta are the encodings of the required dictionary, training samples and typical events respectively,

is the graph laplacian matrix. λ, η and ν are weight parameters that adjust the terms.

The expected dictionary D, the encoding alpha of the training samples and the typical event encoding beta can be obtained by optimizing the above formula through an APG (accelerated approximation gradient) algorithm.

And 5, estimating the event type of the test sample and calculating the code of the test sample.

(5a) And (4) substituting the test sample y into the Gaussian mixture model obtained in the step (2) and estimating the corresponding event type.

(5b) The test sample is coded beta according to its event type, using the following target equation

Solving the code s of the test sample y, wherein D and beta are the dictionary and the typical event code obtained by learning in the step 4 respectively; λ, η are the weighting parameters that regulate the terms; c is the correlation coefficient encoding s and β.

And 6, calculating the reconstruction error of the test sample and calculating the detection precision of the algorithm.

(6a) The reconstruction error of the test sample y can be expressed as

(6b) The reconstruction error of each super pixel can be calculated by the above formula, and whether each super pixel is abnormal or not can be judged by giving a threshold value xi. The detection accuracy of the algorithm generally adopts the following two indexes:

(6b1) frame-level precision. Considering an anomalous frame as a correct detection (TP) whenever a pixel is detected on the frame;

(6b2) pixel-level precision. Only more than 40% of the anomalous pixels on an anomalous frame are accurately detected is the frame considered to be correctly detected.

In both cases, as long as one pixel is erroneously detected in the normal frame, it is considered as an erroneous detection (FP). Assuming that the number of abnormal frames in the test video is P and the number of normal frames is N, a ratio of correct detection (TPR) and a ratio of false detection (FPR) can be calculated: TPR is TP/P, FPR is FP/N. By varying the threshold ξ, a series of TPR and FPR may be derived. An ROC (receiver operating characteristic) curve can be drawn by using FPR and TPR as horizontal and vertical coordinates respectively. Quantitative comparison indices include the area under the ROC curve and the FPR and TPR values at the intersection of the ROC curve with the major diagonal, expressed as AUC (area under curve), EDR (estimated rate) and EER (estimated error rate), respectively.

The effects of the present invention can be further explained by the following experiments.

1. Simulation conditions

The invention uses MATLAB software to simulate the central processing unit of Intel (R) Core i 3-21303.4 GHZ and the memory 16G, WINDOWS 7 operating system.

The video database used in the experiment was the UCSD anomaly detection database of san rosgo school, university of california, whose video was captured by a fixed camera on campus. The shot content is not subjected to any human intervention, and is all actions occurring in natural situations. The normal video only contains pedestrians walking on the road, and abnormal videos can have behaviors of sliding a skateboard, riding a bicycle, trampling a lawn and the like and can also have non-pedestrian targets such as automobiles and trolleys.

2. Emulated content

First, on the UCSD data set, the experiment of the algorithm of the present invention (abnormality detection algorithm based on the improved dictionary learning) was completed. In order to prove the effectiveness of the algorithm, the popularity and the brand-new property of the algorithm are comprehensively considered, and 6 comparison methods are selected for comparison. The ROC curves for the frame and pixel layers are shown in fig. 2, and the accuracy of quantitative detection is shown in tables 1 and 2.

TABLE 1 frame layer detection accuracy

TABLE 2 pixel layer detection accuracy

The comparison algorithm is as follows:

the results of MDT and SF-MPPCA experiments from the documents V.Mahadevan, W.Li, V.Bhalodia, and N.Vasconce cells, analysis Detection in Crowded scenes.In Proc.IEEE Conference on Computer Vision and Pattern Recognition, pages 1975-.

Experimental results for MPPCA from the documents J.Kim and K.Grauman, observer locally, interference globly: A space-time MRF for detecting absolute activities with elementary updates, in Proc. IEEE Conference on Computer Vision and Pattern Recognition,2009, pp.2921-2928.

Experimental results for SF from documents r.mehran, a.oyama, and m.shah, Abnormal crown behavioridedetection using social force model.inproc.ieee Conference on computer vision and Pattern Recognition,2009, pp.935-942.

Adam's experimental results from documents a. Adam, e.rivlin, i.shimshoni, and d.reinitz, Robust real-time unified event detection using multiple fixed-location units. ieee transactions on Pattern Analysis and Machine Intelligence, vol.30, No.3, pp.555-560,2008.

Results of SRC experiments from the documents Y.Cong, J.Yuan, and J.Liu, Sparse retrieval cost for the analysis of the normal event detection. in Proc. IEEE Conference on Computer Vision and Pattern Recognition,2011, pages 3449-.

As can be seen from fig. 2 and table 1 and table 2, the detection accuracy of the present invention is better at the frame layer and the pixel layer than the conventional anomaly detection method. This is because the present invention mines the potential typical event types in the training set and the relationship between the samples and typical events, making the difference between abnormal and normal samples more significant. Meanwhile, codes of similar constrained samples are similar, and the reliability of the codes is improved, so that the precision of the abnormal detection is improved.

The comparison of the partial visualizations is shown in fig. 3, where (I) represents the true abnormal target, (II) is the result of MDT, (III) is the result of SF-MPPCA, (IV) is the result of SRC, and (V) is the result of our method. It can be seen that our method is more accurate than most algorithmic approaches. In the second image the localization result of SRC is better than we because SRC learns a dictionary at every position. Under the same conditions, the method can obtain more accurate detection.

Claims

1. A crowd abnormity detection method based on improved dictionary learning is characterized in that: the method comprises the following steps:

(1) extracting video event characteristics;

(1a) dividing the video into a training video and a testing video;

The resulting code should satisfy the following objective

Wherein C is_kRepresents the kth event type;

Where σ is a scale parameter of the Gaussian distribution;

expanding the graph model relation among the training samples to the codes alpha of the training samples, so that the similar training samples have similar codes, and the similarity degree is W_ijDetermination of W_ijLarger value indicates x_iAnd x_jThe more similar, specificallyExpressed as the following target formula

Wherein D, A ═ { α ═ α_j1,2 … and beta are the required dictionary, the encoding of the training sample set and the encoding of the typical event, respectively;

Solving codes s of the test sample y, wherein D and beta are respectively the dictionary and the typical event codes obtained by learning in the step (4); λ, η are the weighting parameters that regulate the terms; c is the correlation coefficient encoding s and β;

(6) calculating the reconstruction error of the test sample, and calculating the detection precision of the algorithm;

(6a) the reconstruction error of the test sample y can be expressed as

(6b) The reconstruction error of each super pixel is calculated by the above formula, and a threshold value xi is given to judge whether each super pixel is abnormal or not.

2. The crowd abnormality detection method based on the improved dictionary learning according to claim 1, characterized in that: in the step (6b), the detection precision of the algorithm adopts the following two indexes:

3. The crowd abnormality detection method based on the improved dictionary learning according to claim 1, characterized in that: in the step (3), n is equal to 10.