CN113807178A

CN113807178A - Video anomaly detection method based on memory-enhanced automatic encoder

Info

Publication number: CN113807178A
Application number: CN202110932291.8A
Authority: CN
Inventors: 倪伟; 王汉奇; 张冠华; 胡兴; 宋梁
Original assignee: Shanghai Guanghua Zhichuang Network Technology Co ltd
Current assignee: Shanghai Guanghua Zhichuang Network Technology Co ltd
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-12-17

Abstract

The invention provides a video anomaly detection method based on a memory-enhanced automatic encoder, and belongs to the technical field of video anomaly detection. Randomly initializing memory entries in a video sample training set and a memory library, and embedding the output of an encoder after a sample is input into the encoder; a reading module in the external memory module calculates the embedding after reconstruction; obtaining a reconstructed video frame by using the reconstructed embedded part as the input of a decoding module; designing a new loss function; and carrying out abnormity scoring on the test samples to obtain abnormal values, and marking the corresponding test samples as abnormal videos when the abnormal values are larger than a threshold value. An external memory module is added in the original automatic encoder frame, and a typical normal sample in the memory for embedding is reconstructed and then decoded, so that the reconstruction result is close to the normal sample, the reconstruction error of the abnormal sample is increased, and the problems of small reconstruction error of the abnormal sample and poor abnormal detection effect caused by generalization capability are solved.

Description

Video anomaly detection method based on memory-enhanced automatic encoder

Technical Field

The invention relates to the technical field of video anomaly detection, in particular to a video anomaly detection method based on a memory-enhanced automatic encoder.

Background

The video anomaly detection is a computer vision technology for positioning the anomaly occurrence position in a video in space/time, has great application value in the fields of video monitoring, intelligent traffic and the like, and has wide research prospect. Since it is usually easy to obtain normal samples rather than abnormal samples in real production life, most video abnormality detection tasks are unsupervised learning tasks. Therefore, researchers firstly propose the idea of finding a hyperplane which can wrap normal samples to separate abnormal samples for classification, and in 2001, Yunqiang Chen and Bernhard Scholkopf et al propose a method of using a classification SVM classifier to perform abnormality detection; subsequently, Liang Xiong, Arthur Zimek proposed methods using a mixed gaussian model in 2011 and 2012, respectively, but their essence still models normal samples. Then, in 2018, Bo Zong and Qi Song propose a new idea of carrying out anomaly detection based on reconstruction, and the method assumes that a model capable of carrying out abstract characterization and reconstructing an original video frame can be obtained by learning on a normal sample. In abstract characterization, researchers have proposed various techniques, such as PCA principal component analysis, sparse characterization, and the like. With the rise of the neural network, the automatic encoder based on the neural network architecture becomes one of the hot spots of the research in recent years.

The method is an emerging method in the field of video anomaly detection, and the basic idea of the existing implementation method is to use a convolutional neural network as an encoding module and a decoding module, use reconstruction loss as a loss function, and learn on a normal sample training set to obtain model parameters. During testing, an input video frame is encoded to obtain an abstract representation (also called embedding) of an original sample, and then an image is reconstructed in a decoding module according to the embedding obtained by the encoder. Because the model is obtained based on normal sample training, the learning result of the model is generally considered to be better close to the normal sample rather than the abnormal sample, the normal sample subjected to model encoding-reconstruction is better reconstructed, and the reconstruction result of the abnormal sample has larger error with the original video frame.

However, because the neural network has a certain generalization capability, even if there is an abnormality in the input video frame, the training model can reconstruct the video frame well in some cases. Therefore, we need to propose new methods to weaken the generalization capability of neural network models. Meanwhile, due to the normal diversity, the distribution of normal samples should be around multiple centers, and we design a new loss function to encourage model multiple center distribution.

Disclosure of Invention

In view of the problems identified in the background art, the present invention provides a video anomaly detection method based on a memory-enhanced automatic encoder.

The technical scheme of the invention is realized as follows:

a video anomaly detection method based on a memory-enhanced automatic encoder comprises the following steps:

s1, training set in video sample

In the memory bank size N, randomly initializing memory entries in the memory bank, and when a sample x is input into an encoder, outputting the encoder to obtain an embedded z;

s2, embedding after calculation and reconstruction of reading module in external memory module

S3, embedding after reconstruction is utilized

As input to a decoding module, obtaining a reconstructed video frame

S4, designing a model to train a multi-center distributed loss function;

and S5, performing model training by adopting reverse propagation and gradient descent to obtain a final model, so as to perform abnormity scoring on the test sample to obtain an abnormal value, and marking the corresponding test sample as an abnormal video when the abnormal value is greater than a threshold value.

According to one embodiment of the invention, the embedding z is obtained by the following formula:

wherein

Representing the encoder parameters.

According to one embodiment of the invention, the reconstructed embeddings are calculated by the following formula

Where w represents the weight and M represents the memory matrix.

According to one embodiment of the invention, the weight w is calculated by softmax, among others:

according to one embodiment of the invention, the reconstructed video frame is obtained by the following formula

Wherein

Representing the decoding parameters.

According to one embodiment of the invention, the loss function is designed by the following formula:

wherein m is^pIs shown and

memory entry nearest, mⁿIs shown and

the item that is the second closest.

According to one embodiment of the present invention, the outlier is obtained by the following equation:

where i, j is the spatial index of the video frame.

In conclusion, the invention has the beneficial effects that:

an external memory module is added in the original automatic encoder frame, and a typical normal sample in the memory for embedding is reconstructed and then decoded, so that the reconstruction result is close to the normal sample, the reconstruction error of the abnormal sample is increased, and the problems of small reconstruction error of the abnormal sample and poor abnormal detection effect caused by generalization capability are solved. Meanwhile, a new loss function is designed to fit the essence of multi-center distribution of normal samples.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is an overall block diagram of an embodiment of the present invention;

fig. 2 is an internal structural diagram of an external memory module according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention is illustrated below with reference to fig. 1 and 2:

fig. 1 is a general block diagram of an embodiment of the present invention.

The embodiment of the invention provides a video anomaly detection method based on a memory enhanced automatic encoder, which comprises the following steps:

s1, training set in video sample

in one embodiment of the invention, the embedding z is obtained by the following formula:

wherein

Representing the encoder parameters.

In one embodiment of the invention, the reconstructed embeddings are calculated by the following formula

Where w represents the weight, M represents the memory matrix, and M is 1000.

Embedding after reconstruction

In the calculation formula (2), the element of the weight w calculates the weight w by softmax:

s3, embedding after reconstruction is utilized

As input to a decoding module, obtaining a reconstructed video frame

In one embodiment of the invention, the reconstructed video frame is obtained by the following formula

Wherein

Representing the decoding parameters.

S4, designing a model to train a multi-center distributed loss function;

in one embodiment of the present invention,

the loss function is designed by the following formula:

wherein m is^pIs shown and

memory entry nearest, mⁿIs shown and

the entry from the second closest is λ c ═ 0.01, and λ s ═ 0.01.

Of a loss function

Representing the reconstruction error of the reconstructed sample and the original sample;

for intra-class differential penalties, it will encourage recombination to embed into its closest memory entries as close as possible, which will force the sample point distribution to tightly surround multiple centers, conforming to the nature of the abnormal sample multi-center distribution;

for inter-class similarity penalties, it will force the sample point to be as far away as possible from the second similar memory entry while being close to the closest memory entry, eventually rendering the result as the center points of the classes are each far away.

And S5, performing model training by adopting reverse propagation and gradient descent to obtain a final model, and performing abnormity scoring on the test sample to obtain an abnormal value, wherein when the abnormal value is greater than a threshold value, the threshold value gamma is 0.015, and the corresponding test sample is marked as an abnormal video.

In one embodiment of the present invention, the outlier is obtained by the following equation:

where i, j is the spatial index of the video frame.

In summary, as shown in fig. 1, an external memory module is added to the original automatic encoder framework, and this module stores the learned typical normal samples. When the samples are embedded by the autoencoder encoding, the memory is "read" and reconstructed by looking for a similar typical memory based on similarity, which can be referred to as content-based addressing. And then, a new embedded input decoding module formed by reconstructing the typical memory is decoded, because the new embedded is formed by typical normal memory, the reconstructed output obtained by decoding the new embedded input decoding module is similar to a normal sample, so that the reconstruction error of the abnormal sample can be increased, and the problem of over-good effect when the abnormal sample is reconstructed is solved.

Fig. 2 is a diagram illustrating an internal structure of an external memory module according to an embodiment of the present invention. A memory bank in the external memory module provides a memory item to be updated to the updating module; embedded in a reading module for reading; embedded in an updating module to update some memories in the memory bank, and then covering the original entries in the memory bank; the reading module reads the memory entries needed by the reorganization embedding and outputs the reorganization embedding.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A video anomaly detection method based on a memory-enhanced automatic encoder is characterized by comprising the following steps:

s1, training set in video sample

S3, embedding after reconstruction is utilized

As input to a decoding module, obtaining a reconstructed video frame

S4, designing a model to train a multi-center distributed loss function;

2. The method of claim 1, wherein the embedding z is obtained by the following formula:

wherein

Representing the encoder parameters.

3. The method of claim 2, wherein the embedding after reconstruction is calculated by the following formula

Where w represents the weight and M represents the memory matrix.

4. The method of claim 3, wherein the weight w is calculated by softmax:

5. the method of claim 3, wherein the reconstructed video frame is obtained by the following formula

Wherein

Representing decoding parameters。

6. The video anomaly detection method based on the memory-enhanced automatic encoder according to claim 1, wherein the loss function is designed by the following formula:

wherein m is^pIs shown and

memory entry nearest, mⁿIs shown and

the item that is the second closest.

7. The method of claim 1, wherein the outlier is obtained by the following formula:

where i, j is the spatial index of the video frame.