CN107330410B

CN107330410B - Anomaly detection method based on deep learning in complex environment

Info

Publication number: CN107330410B
Application number: CN201710535492.8A
Authority: CN
Inventors: 邱鹏; 霍瑛; 黄陈蓉; 陈行
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing electronic Mdt InfoTech Ltd.
Priority date: 2017-07-03
Filing date: 2017-07-03
Publication date: 2020-06-30
Anticipated expiration: 2037-07-03
Also published as: CN107330410A

Abstract

The invention provides an anomaly detection method based on deep learning in a complex environment, which comprises the steps of inputting object space-time characteristics extracted by a convolutional neural network regression method into an LSTM model, and tracking the motion tracks of multiple objects in the complex environment; capturing nonlinear space-time actions of adjacent individuals under the condition that multiple objects move irregularly, evaluating the dependency of motion tracks between the adjacent individuals, and predicting the future motion tracks of the adjacent individuals; and completing anomaly detection according to the anomaly probability of the future motion trail of the individual. The LSTM model predicts the future motion trail of the object by using a coding and decoding framework through evaluating the dependency among the coherent individuals, thereby obtaining a more accurate result when carrying out abnormal detection on the motion of a plurality of objects.

Description

Anomaly detection method based on deep learning in complex environment

Technical Field

The invention relates to an anomaly detection method based on deep learning in a complex environment.

Background

In general, anomaly detection refers to detecting anomalous or unexpected behavior data in an environment. With the popularization of deep learning in the field of artificial intelligence, anomaly detection in complex environments such as subways, stadiums, airports and the like is widely applied to computer vision technology, but the high-density environments bring huge challenges to anomaly detection. In the face of continuous and irregular motion of a large number of objects, how to solve the problem of mutual interference between the objects and how to detect abnormality under the condition that motion tracks of a plurality of objects mutually influence each other becomes an important problem in the current abnormality detection research.

The anomaly may be an unusual shape or action, and in the existing research results, a normal video frame region is obtained through machine learning, and the region is used as a reference model, and the reference model comprises a normal event or a normal training data set. In the testing phase, the researcher treats regions other than the reference model as anomalies. However, such non-standard reference models are often difficult to define accurately because of the presence of normal events with special attributes, and furthermore, it is difficult to prepare large data sets for training covering different domains.

At present, the methods for anomaly detection in a complex environment mainly comprise the following two methods: 1) based on a trajectory method: an object is said to be abnormal if it does not follow a normal trajectory or if it occurs less frequently; 2) based on the dynamic method: compared with the normal moving object, the abnormal object has a obviously different action mode.

The problem of anomaly detection in a complex environment can be summarized as the problem of anomaly detection by constructing a deep learning model, and the research difficulty is how to design a learning model with static data, sequence data and spatial data and how to effectively apply the data to the learning model. Particularly, the prior art is mainly responsible for detecting the space-time characteristics of a single object, and does not consider the condition that the motion tracks of adjacent individuals interfere with each other in a complex environment, so that the anomaly detection effect is not ideal. Therefore, the method for carrying out anomaly detection in the complex environment by designing the deep learning model has important theoretical significance and application value. However, there is no description in the prior art.

Disclosure of Invention

The invention aims to provide an anomaly detection method based on deep learning in a complex environment, which can reduce the false detection rate of images under the condition that the motions of adjacent individuals have mutual influence, has better performance particularly when anomaly detection is carried out in a crowded environment, provides a new idea for solving the problem of anomaly detection in the complex environment, and solves the problem that the anomaly detection effect is not ideal because the motion tracks of the adjacent individuals interfere with each other under the condition that the complex environment is not considered in the prior art.

The technical solution of the invention is as follows:

a method for detecting abnormality based on deep learning in a complex environment comprises the following steps of tracking multiple object trajectories through a long-term and short-term memory model, capturing nonlinear space-time actions between adjacent individuals, predicting future motion trajectories of the adjacent individuals, and completing abnormality detection according to the abnormality probability of the future motion trajectories of the individuals, wherein the method comprises the following steps:

step 1, inputting the object space-time characteristics extracted by a convolutional neural network regression method into an LSTM model, and tracking the motion tracks of multiple objects in a complex environment;

step 2, capturing nonlinear space-time actions of adjacent individuals under the condition that multiple objects move irregularly, evaluating the dependency of motion tracks between the adjacent individuals, and predicting the future motion tracks of the adjacent individuals;

and 3, completing anomaly detection according to the anomaly probability of the future motion trail of the individual.

Further, step 1 specifically comprises:

step 11, inputting the object space-time characteristics extracted by the convolutional neural network regression method into an LSTM model; introducing YOLO to solve the object detection as a regression problem, finishing the input from an original image to the output of the positions, the categories and the corresponding confidence probabilities of all objects in the image, and taking a feature vector obtained through the YOLO as an input frame of an LSTM model; function of input frame as

Wherein phi (x)_t) Input frame function, x, representing an LSTM model_tFor the original image input frame at time t, conv_θc(.) is the parameter θ_cThe convolutional neural network of (a) is,

representing the previous frame x_t-1A predicted object position;

step 12, tracking the motion tracks of multiple objects in a complex environment; the LSTM model is a deep recursive network for regressing the pixel brightness and position of the object bounding box, and using them as the original input frame to perform frame-by-frame detection and tracking, and the mathematical expression of the whole-course trajectory tracking probability is

Wherein, B_TAnd X_TRespectively the position of the object at the maximum time TAnd input frame, B_tThen it is the position of the object at time T, and 1. ltoreq. t.ltoreq.T, B_<TIs all positions, X, before time T_≤TAre all input frames up to time T.

Further, step 2 specifically comprises:

step 21, capturing nonlinear space-time motion of adjacent individuals under the condition that multiple objects move irregularly, specifically:

step 211, judging whether the two motion tracks of the two adjacent objects in the time-space domain are coherent, specifically if the motion tracks of the two dynamic objects are coherent in the time-space domain, that is, when the relative speed of the adjacent objects is kept unchanged, the two dynamic objects have similar hidden states;

step 212, for the motion trajectory of each object, the LSTM model will create and track a series of nonlinear spatio-temporal actions of the object, and the states of the object and the neighboring objects are integrated through coherent regularization, so as to update the states of the storage unit of the LSTM model, where the coherent regularization expression is:

wherein, c_tAccumulator representing the state information of the memory cell, f_tFor forgetting the gate, for resetting the state of the memory cell, if the forgetting gate is activated, the state c of the memory cell at the last time_t-1It will be forgotten that the user is,

representing an array operation; i.e. i_tFor the input gate, x is input at the current time_tAnd a time-last hidden layer h_t-1Is activated; w is a weight matrix, W_xcFor recursive memory cell state input matrices, W_hcRecursive hidden state input matrix, b_cIs a deviation vector;

representing the trajectory and spatio-temporal characteristics of adjacent objects, λ_j(t) is the weight of dependence between objects, f^j _tAnd c^j _t-1Respectively, the current heritage of coherent objects of the LSTM modelForgetting a door state and a last time storage unit state;

step 22, evaluating the dependency of the motion trail between the adjacent individuals, which specifically comprises the following steps:

step 221, obtaining time-varying characteristics of adjacent individual movements through the hidden state information of the LSTM model;

step 222, evaluating the dependency of the motion trail between the adjacent individuals by using the pairwise velocity correlation, and the weight lambda of the dependency of the motion trail between the adjacent individuals_jThe expression of (t) is:

where i and j represent the motion trajectories of adjacent individuals, v_i(t) and v_j(t) represents the respective velocities of the adjacent objects, σ is a normalization constant, the two velocity values are multiplied and normalized by the normalization constant, γ_iFor obtaining the dependency weight, when the deviation of the motion tracks i and j of the adjacent individuals is larger, lambda is_jThe closer the value of (t) is to 0, the higher the similarity between the motion trajectories i and j of the adjacent individuals, the higher the λ_jThe closer the value of (t) approaches 1;

and step 23, training the LSTM model by adopting an encoding-decoding framework to predict the future motion trail of the object.

Further, step 23 specifically includes:

231, mapping the input of the motion trajectory to a fixed-length vector by an LSTM model-based encoder through learning training, wherein an implicit vector in the encoding stage is described as follows by an expression: h is_T＝LSTM_e(Z_T,h_T-1)，h_TFor implicit vectors of current time, LSTM_eInput Z representing motion trajectory of object by LSTM model-based encoder_TMapping to a last-in-time implicit vector h_T-1；

Step 232, in the learning training, the decoder based on the LSTM model predicts the future motion trajectory of the object by using a fixed-length implicit vector, where the implicit vector expression is:

LSTM_drepresenting decoder based on LSTM model using last-time implicit vector h_T-1Obtaining an implicit vector h of the current time_TThen, inputting Z through the motion track of the object at the current time_TPredicting next-time object motion trajectory as output

The invention has the beneficial effects that: compared with the prior art, the anomaly detection method based on deep learning in the complex environment has the remarkable advantages that:

firstly, the false detection rate of the prior art is high, the object space-time characteristics extracted by the convolutional neural network regression method are input into an LSTM model, and the object detection is solved as a regression problem by introducing YOLO, so that the false detection rate of the image is reduced.

The LSTM model predicts the future motion trail of the object by using a coding and decoding framework through evaluating the dependency among the coherent individuals by evaluating the dependency among the coherent individuals, thereby optimizing the space-time robustness of an abnormal detection system and obtaining a more accurate result when performing abnormal detection on the motion of multiple objects.

Drawings

Fig. 1 is an overview of the tracking process of the motion trajectory of an object.

FIG. 2 is a graph of the relationship between adjacent individuals in the LSTM model.

Fig. 3 is a diagram of an encoding-decoding framework of the LSTM model.

FIG. 4 is a diagram of a convolutional neural network tracing invisible sequences over crowded streets.

Fig. 5 is a time robustness tre (temporal robustness evaluation) evaluation diagram of the anomaly detection method based on deep learning in a complex environment according to the embodiment.

Fig. 6 is a spatial robustness sre (spatial robustness evaluations) evaluation diagram of the anomaly detection method based on deep learning in the complex environment according to the embodiment.

Fig. 7 is an abnormality detection accuracy chart of the abnormality detection method based on deep learning in the complex environment according to the embodiment.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The anomaly detection method based on deep learning in the complex environment carries out multi-object trajectory tracking through a long-short term memory model, captures nonlinear space-time actions of adjacent individuals and predicts two parts of future motion trajectories of the adjacent individuals. In the process of tracking multiple object tracks through a long-short term memory model (LSTM), by introducing YOLO (only one eye of YouOnly Look Once-a real-time rapid target detection technology) detection, object space-time characteristics extracted by a convolutional neural network regression method are input into the LSTM model, and the motion tracks of multiple objects in a complex environment are tracked; in capturing nonlinear space-time motion between adjacent individuals and predicting future motion trajectories thereof, under the condition that multiple objects move irregularly, the dependency of the motion trajectories between the adjacent individuals is evaluated, the future motion trajectories of the objects are predicted, and the abnormality detection is completed according to the abnormality probability of the future motion trajectories of the individuals. The anomaly detection method provided by the invention can reduce the false detection rate of the image under the condition that the motions of adjacent individuals have mutual influence, has better performance particularly when the anomaly detection is carried out in a crowded environment, and provides a new thought for solving the problem of anomaly detection in a complex environment.

Examples

An anomaly detection method based on deep learning in a complex environment comprises the steps of firstly tracking multiple object tracks through a long-short term memory model (LSTM), collecting space-time characteristics of objects and preliminarily deducing the object running tracks by introducing YOLO training and detection, inputting the object space-time characteristics extracted by a convolutional neural network regression method into the LSTM, and tracking the motion tracks of multiple objects in the complex environment; and then capturing nonlinear space-time motion between adjacent individuals, evaluating the dependency of motion tracks between the adjacent individuals under the condition that the multiple objects move irregularly, predicting the future motion track of the object, and completing anomaly detection according to the anomaly probability of the future motion track of the individual. The method specifically comprises the following steps:

step 1, inputting the object space-time characteristics extracted by the convolutional neural network regression method into an LSTM model, and tracking the motion tracks of multiple objects in a complex environment, wherein FIG. 1 is an overview diagram of the tracking process of the motion tracks of the objects.

representing the previous frame x_t-1A predicted object position;

Wherein, B_TAnd X_TRespectively object position and input frame at maximum time T, B_tThen it is the position of the object at time T, and 1. ltoreq. t.ltoreq.T, B_<TIs all positions, X, before time T_≤TAre all input frames up to time T.

And 2, capturing nonlinear space-time actions of adjacent individuals under the condition that multiple objects move irregularly, evaluating the dependency of motion tracks between the adjacent individuals, and predicting the future motion tracks of the adjacent individuals.

Step 21, capturing nonlinear space-time motion of adjacent individuals under the condition of irregular motion of multiple objects.

And step 211, judging whether the two motion tracks of the two adjacent objects in the time-space domain are coherent, specifically if the motion tracks of the two dynamic objects are coherent in the time-space domain, that is, when the relative speed of the adjacent objects is kept unchanged, the two motion tracks have similar hidden states.

representing the trajectory and spatio-temporal characteristics of adjacent objects, λ_j(t) is the weight of dependence between objects, f^j _tAnd c^j _t-1Respectively representing the current forgetting gate state and the last time storage unit state of the LSTM model coherent object;

and step 22, evaluating the dependency of the motion trail between adjacent individuals, wherein the relation between the adjacent individuals in the LSTM model is described in figure 2.

and step 23, as shown in fig. 3, training the LSTM model by using an encoding-decoding framework to predict the future motion trajectory of the object.

231, mapping the input of the motion trajectory to a fixed-length vector by an LSTM model-based encoder through learning training, wherein an implicit vector in the encoding stage is described as follows by an expression: h is_T＝LSTM_e(Z_T,h_T-1)，h_TFor implicit vectors of current time, LSTM_eInput Z representing motion trajectory of object by LSTM model-based encoder_TMapping to a last-in-time implicit vector h_T-1。

LSTM_drepresenting decoder based on LSTM model using last-time implicit vector h_T-1Obtaining an implicit vector h of the current time_TThen, the motion track of the object is input according to the current timeZ_TPredicting next-time object motion trajectory as output

And 3, obtaining the abnormal probability of the future motion trail of the individual according to the future motion trail of the individual obtained in the step 2, and completing abnormal detection according to the abnormal probability of the future motion trail of the individual.

Experimental validation analysis was as follows:

different data sets are run by using an artificial intelligence learning system Tensorflow to detect, track and identify abnormal objects, and the screens for training in the YOLO detection and LSTM models are 20FPS (Frames Per second) and 60FPS respectively. Tracking the motion tracks of multiple objects in a complex environment according to step 1, and in order to avoid an over-fitting problem, a deep learning structure requires training a large number of data sets, so that a convolutional neural network is put in advance in the ImageNet with the largest image recognition in the world for data training, and then an LSTM model is used for Fine-tuning (Fine-tuning) on a smaller data set PASCAL VOC 2012, so that the data set PASCAL VOC 2012 can detect 20 different types of objects. FIG. 4 is a diagram of tracing invisible sequence traces over crowded streets through a convolutional neural network.

Step 2, capturing nonlinear space-time actions of adjacent individuals under the condition that multiple objects move irregularly, evaluating the dependency of motion tracks between the adjacent individuals, and predicting the future motion tracks of the adjacent individuals specifically comprises the following steps:

The LSTM model was trained and tested in the OTB-30 dataset. 80% of the data was used in the training phase and the remaining 20% in the experimental testing phase, and the model was trained item by item for better experimental results. The results of the experiment are shown in fig. 5 and 6. As can be seen from fig. 5 and 6, the proposed method is superior to other trajectory tracking methods in both temporal and spatial robustness evaluation.

Step 22, evaluating the dependency of the motion tracks between adjacent individuals, step 23, training the LSTM model to predict the future motion track of the object by using an encoding-decoding framework, wherein a step size (step size) concept is used in the experiment, the step size represents the number of all previous frames considered when the encoding-decoding framework based on the LSTM model predicts the future track, and the experimental result is shown in fig. 7. Fig. 7 shows an abnormality detection accuracy chart according to the detection evaluation function interaction-over-unity (iou) in the proposed method under the condition of step size change.

Therefore, the embodiment method has advantages compared with other algorithms in both space-time robustness and false detection rate, and particularly has more obvious advantages in a complex environment in which motion tracks of multiple objects influence each other, so that the effectiveness of the embodiment method is verified.

Claims

1. An anomaly detection method based on deep learning in a complex environment is characterized in that: the method comprises the following steps of tracking multiple object tracks through a long-term and short-term memory model, capturing nonlinear space-time actions between adjacent individuals, predicting future motion tracks of the adjacent individuals, and completing anomaly detection according to the anomaly probability of the future motion tracks of the individuals, wherein the method specifically comprises the following steps:

step 2, capturing nonlinear space-time actions of adjacent individuals under the condition that multiple objects move irregularly, evaluating the dependency of motion tracks between the adjacent individuals, and predicting the future motion tracks of the adjacent individuals; the step 2 specifically comprises the following steps:

wherein, c_tAccumulator representing the state information of the memory cell, f_tFor forgetting the gate, for resetting the state of the memory cell, if the forgetting gate is activated, the state c of the memory cell at the last time_t-1Will be forgotten, ⊙ represents an array operation, i_tFor the input gate, x is input at the current time_tAnd a time-last hidden layer h_t-1Is activated; w is a weight matrix, W_xcFor recursive memory cell state input matrices, W_hcRecursive hidden state input matrix, b_cIs a deviation vector;

where i and j represent the motion trajectories of adjacent individuals, v_i(t) and v_j(t) represents the respective velocities of adjacent objects, σ is a normalization constant, and the two velocity values are multiplied and passedNormalization with an over-normalization constant, gamma_iFor obtaining the dependency weight, when the deviation of the motion tracks i and j of the adjacent individuals is larger, lambda is_jThe closer the value of (t) is to 0, the higher the similarity between the motion trajectories i and j of the adjacent individuals, the higher the λ_jThe closer the value of (t) approaches 1;

step 23, adopting an encoding-decoding frame to train an LSTM model to predict the future motion track of the object; step 23 specifically comprises:

2. The anomaly detection method based on deep learning in the complex environment according to claim 1, wherein the step 1 specifically comprises:

step 11, inputting the object space-time characteristics extracted by the convolutional neural network regression method into an LSTM model; introducing YOLO derivativesThe body detection is used as regression problem solving to finish the output of the positions, the categories and the corresponding confidence probabilities of all objects in the image from the input of the original image, and the feature vector obtained by the YOLO is used as an input frame of the LSTM model; function of input frame as

representing the previous frame x_t-1A predicted object position;