CN113807178A - Video anomaly detection method based on memory-enhanced automatic encoder - Google Patents
Video anomaly detection method based on memory-enhanced automatic encoder Download PDFInfo
- Publication number
- CN113807178A CN113807178A CN202110932291.8A CN202110932291A CN113807178A CN 113807178 A CN113807178 A CN 113807178A CN 202110932291 A CN202110932291 A CN 202110932291A CN 113807178 A CN113807178 A CN 113807178A
- Authority
- CN
- China
- Prior art keywords
- memory
- sample
- video
- abnormal
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention provides a video anomaly detection method based on a memory-enhanced automatic encoder, and belongs to the technical field of video anomaly detection. Randomly initializing memory entries in a video sample training set and a memory library, and embedding the output of an encoder after a sample is input into the encoder; a reading module in the external memory module calculates the embedding after reconstruction; obtaining a reconstructed video frame by using the reconstructed embedded part as the input of a decoding module; designing a new loss function; and carrying out abnormity scoring on the test samples to obtain abnormal values, and marking the corresponding test samples as abnormal videos when the abnormal values are larger than a threshold value. An external memory module is added in the original automatic encoder frame, and a typical normal sample in the memory for embedding is reconstructed and then decoded, so that the reconstruction result is close to the normal sample, the reconstruction error of the abnormal sample is increased, and the problems of small reconstruction error of the abnormal sample and poor abnormal detection effect caused by generalization capability are solved.
Description
Technical Field
The invention relates to the technical field of video anomaly detection, in particular to a video anomaly detection method based on a memory-enhanced automatic encoder.
Background
The video anomaly detection is a computer vision technology for positioning the anomaly occurrence position in a video in space/time, has great application value in the fields of video monitoring, intelligent traffic and the like, and has wide research prospect. Since it is usually easy to obtain normal samples rather than abnormal samples in real production life, most video abnormality detection tasks are unsupervised learning tasks. Therefore, researchers firstly propose the idea of finding a hyperplane which can wrap normal samples to separate abnormal samples for classification, and in 2001, Yunqiang Chen and Bernhard Scholkopf et al propose a method of using a classification SVM classifier to perform abnormality detection; subsequently, Liang Xiong, Arthur Zimek proposed methods using a mixed gaussian model in 2011 and 2012, respectively, but their essence still models normal samples. Then, in 2018, Bo Zong and Qi Song propose a new idea of carrying out anomaly detection based on reconstruction, and the method assumes that a model capable of carrying out abstract characterization and reconstructing an original video frame can be obtained by learning on a normal sample. In abstract characterization, researchers have proposed various techniques, such as PCA principal component analysis, sparse characterization, and the like. With the rise of the neural network, the automatic encoder based on the neural network architecture becomes one of the hot spots of the research in recent years.
The method is an emerging method in the field of video anomaly detection, and the basic idea of the existing implementation method is to use a convolutional neural network as an encoding module and a decoding module, use reconstruction loss as a loss function, and learn on a normal sample training set to obtain model parameters. During testing, an input video frame is encoded to obtain an abstract representation (also called embedding) of an original sample, and then an image is reconstructed in a decoding module according to the embedding obtained by the encoder. Because the model is obtained based on normal sample training, the learning result of the model is generally considered to be better close to the normal sample rather than the abnormal sample, the normal sample subjected to model encoding-reconstruction is better reconstructed, and the reconstruction result of the abnormal sample has larger error with the original video frame.
However, because the neural network has a certain generalization capability, even if there is an abnormality in the input video frame, the training model can reconstruct the video frame well in some cases. Therefore, we need to propose new methods to weaken the generalization capability of neural network models. Meanwhile, due to the normal diversity, the distribution of normal samples should be around multiple centers, and we design a new loss function to encourage model multiple center distribution.
Disclosure of Invention
In view of the problems identified in the background art, the present invention provides a video anomaly detection method based on a memory-enhanced automatic encoder.
The technical scheme of the invention is realized as follows:
a video anomaly detection method based on a memory-enhanced automatic encoder comprises the following steps:
s1, training set in video sampleIn the memory bank size N, randomly initializing memory entries in the memory bank, and when a sample x is input into an encoder, outputting the encoder to obtain an embedded z;
S3, embedding after reconstruction is utilizedAs input to a decoding module, obtaining a reconstructed video frame
S4, designing a model to train a multi-center distributed loss function;
and S5, performing model training by adopting reverse propagation and gradient descent to obtain a final model, so as to perform abnormity scoring on the test sample to obtain an abnormal value, and marking the corresponding test sample as an abnormal video when the abnormal value is greater than a threshold value.
According to one embodiment of the invention, the embedding z is obtained by the following formula:
According to one embodiment of the invention, the reconstructed embeddings are calculated by the following formula
Where w represents the weight and M represents the memory matrix.
According to one embodiment of the invention, the weight w is calculated by softmax, among others:
according to one embodiment of the invention, the reconstructed video frame is obtained by the following formula
According to one embodiment of the invention, the loss function is designed by the following formula:
According to one embodiment of the present invention, the outlier is obtained by the following equation:
where i, j is the spatial index of the video frame.
In conclusion, the invention has the beneficial effects that:
an external memory module is added in the original automatic encoder frame, and a typical normal sample in the memory for embedding is reconstructed and then decoded, so that the reconstruction result is close to the normal sample, the reconstruction error of the abnormal sample is increased, and the problems of small reconstruction error of the abnormal sample and poor abnormal detection effect caused by generalization capability are solved. Meanwhile, a new loss function is designed to fit the essence of multi-center distribution of normal samples.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is an overall block diagram of an embodiment of the present invention;
fig. 2 is an internal structural diagram of an external memory module according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is illustrated below with reference to fig. 1 and 2:
fig. 1 is a general block diagram of an embodiment of the present invention.
The embodiment of the invention provides a video anomaly detection method based on a memory enhanced automatic encoder, which comprises the following steps:
s1, training set in video sampleIn the memory bank size N, randomly initializing memory entries in the memory bank, and when a sample x is input into an encoder, outputting the encoder to obtain an embedded z;
in one embodiment of the invention, the embedding z is obtained by the following formula:
In one embodiment of the invention, the reconstructed embeddings are calculated by the following formula
Where w represents the weight, M represents the memory matrix, and M is 1000.
Embedding after reconstructionIn the calculation formula (2), the element of the weight w calculates the weight w by softmax:
s3, embedding after reconstruction is utilizedAs input to a decoding module, obtaining a reconstructed video frame
In one embodiment of the invention, the reconstructed video frame is obtained by the following formula
S4, designing a model to train a multi-center distributed loss function;
in one embodiment of the present invention,
the loss function is designed by the following formula:
wherein m ispIs shown andmemory entry nearest, mnIs shown andthe entry from the second closest is λ c ═ 0.01, and λ s ═ 0.01.
Of a loss functionRepresenting the reconstruction error of the reconstructed sample and the original sample; for intra-class differential penalties, it will encourage recombination to embed into its closest memory entries as close as possible, which will force the sample point distribution to tightly surround multiple centers, conforming to the nature of the abnormal sample multi-center distribution;for inter-class similarity penalties, it will force the sample point to be as far away as possible from the second similar memory entry while being close to the closest memory entry, eventually rendering the result as the center points of the classes are each far away.
And S5, performing model training by adopting reverse propagation and gradient descent to obtain a final model, and performing abnormity scoring on the test sample to obtain an abnormal value, wherein when the abnormal value is greater than a threshold value, the threshold value gamma is 0.015, and the corresponding test sample is marked as an abnormal video.
In one embodiment of the present invention, the outlier is obtained by the following equation:
where i, j is the spatial index of the video frame.
In summary, as shown in fig. 1, an external memory module is added to the original automatic encoder framework, and this module stores the learned typical normal samples. When the samples are embedded by the autoencoder encoding, the memory is "read" and reconstructed by looking for a similar typical memory based on similarity, which can be referred to as content-based addressing. And then, a new embedded input decoding module formed by reconstructing the typical memory is decoded, because the new embedded is formed by typical normal memory, the reconstructed output obtained by decoding the new embedded input decoding module is similar to a normal sample, so that the reconstruction error of the abnormal sample can be increased, and the problem of over-good effect when the abnormal sample is reconstructed is solved.
Fig. 2 is a diagram illustrating an internal structure of an external memory module according to an embodiment of the present invention. A memory bank in the external memory module provides a memory item to be updated to the updating module; embedded in a reading module for reading; embedded in an updating module to update some memories in the memory bank, and then covering the original entries in the memory bank; the reading module reads the memory entries needed by the reorganization embedding and outputs the reorganization embedding.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (7)
1. A video anomaly detection method based on a memory-enhanced automatic encoder is characterized by comprising the following steps:
s1, training set in video sampleIn the memory bank size N, randomly initializing memory entries in the memory bank, and when a sample x is input into an encoder, outputting the encoder to obtain an embedded z;
S3, embedding after reconstruction is utilizedAs input to a decoding module, obtaining a reconstructed video frame
S4, designing a model to train a multi-center distributed loss function;
and S5, performing model training by adopting reverse propagation and gradient descent to obtain a final model, so as to perform abnormity scoring on the test sample to obtain an abnormal value, and marking the corresponding test sample as an abnormal video when the abnormal value is greater than a threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110932291.8A CN113807178A (en) | 2021-08-13 | 2021-08-13 | Video anomaly detection method based on memory-enhanced automatic encoder |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110932291.8A CN113807178A (en) | 2021-08-13 | 2021-08-13 | Video anomaly detection method based on memory-enhanced automatic encoder |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113807178A true CN113807178A (en) | 2021-12-17 |
Family
ID=78942872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110932291.8A Pending CN113807178A (en) | 2021-08-13 | 2021-08-13 | Video anomaly detection method based on memory-enhanced automatic encoder |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113807178A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743153A (en) * | 2022-06-10 | 2022-07-12 | 北京航空航天大学杭州创新研究院 | Non-sensory dish-taking model establishing and dish-taking method and device based on video understanding |
-
2021
- 2021-08-13 CN CN202110932291.8A patent/CN113807178A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743153A (en) * | 2022-06-10 | 2022-07-12 | 北京航空航天大学杭州创新研究院 | Non-sensory dish-taking model establishing and dish-taking method and device based on video understanding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106991355B (en) | Face recognition method of analytic dictionary learning model based on topology maintenance | |
Yang et al. | Skeletonnet: A hybrid network with a skeleton-embedding process for multi-view image representation learning | |
CN111461232A (en) | Nuclear magnetic resonance image classification method based on multi-strategy batch type active learning | |
CN112036513B (en) | Image anomaly detection method based on memory-enhanced potential spatial autoregression | |
Wu | Masked face recognition algorithm for a contactless distribution cabinet | |
CN112418292B (en) | Image quality evaluation method, device, computer equipment and storage medium | |
CN111275175A (en) | Neural network training method, neural network training device, image classification method, image classification equipment and medium | |
CN113535953B (en) | Meta learning-based few-sample classification method | |
CN111475622A (en) | Text classification method, device, terminal and storage medium | |
CN116089838B (en) | Training method and recognition method for intelligent recognition model of electricity stealing user | |
CN113807178A (en) | Video anomaly detection method based on memory-enhanced automatic encoder | |
CN116245513A (en) | Automatic operation and maintenance system and method based on rule base | |
CN116821646A (en) | Data processing chain construction method, data reduction method, device, equipment and medium | |
CN112633315A (en) | Electric power system disturbance classification method | |
Peng et al. | Virtual samples and sparse representation‐based classification algorithm for face recognition | |
Kong et al. | A brief summary of dictionary learning based approach for classification (revised) | |
CN117458440A (en) | Method and system for predicting generated power load based on association feature fusion | |
Yang et al. | Deep hashing network for material defect image classification | |
CN115035455A (en) | Cross-category video time positioning method, system and storage medium based on multi-modal domain resisting self-adaptation | |
CN110069666A (en) | The Hash learning method and device kept based on Near-neighbor Structure | |
CN117011741A (en) | Training method, device, equipment and storage medium of video detection model | |
Jia et al. | Research on multi-label classification problems based on neural networks and label correlation | |
CN117591942B (en) | Power load data anomaly detection method, system, medium and equipment | |
CN114387623B (en) | Unsupervised pedestrian re-identification method based on multi-granularity block features | |
CN117349637A (en) | Communication power supply health degree assessment method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |