CN113807178A - Video anomaly detection method based on memory-enhanced automatic encoder - Google Patents

Video anomaly detection method based on memory-enhanced automatic encoder Download PDF

Info

Publication number
CN113807178A
CN113807178A CN202110932291.8A CN202110932291A CN113807178A CN 113807178 A CN113807178 A CN 113807178A CN 202110932291 A CN202110932291 A CN 202110932291A CN 113807178 A CN113807178 A CN 113807178A
Authority
CN
China
Prior art keywords
memory
sample
video
abnormal
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110932291.8A
Other languages
Chinese (zh)
Inventor
倪伟
王汉奇
张冠华
胡兴
宋梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Guanghua Zhichuang Network Technology Co ltd
Original Assignee
Shanghai Guanghua Zhichuang Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Guanghua Zhichuang Network Technology Co ltd filed Critical Shanghai Guanghua Zhichuang Network Technology Co ltd
Priority to CN202110932291.8A priority Critical patent/CN113807178A/en
Publication of CN113807178A publication Critical patent/CN113807178A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention provides a video anomaly detection method based on a memory-enhanced automatic encoder, and belongs to the technical field of video anomaly detection. Randomly initializing memory entries in a video sample training set and a memory library, and embedding the output of an encoder after a sample is input into the encoder; a reading module in the external memory module calculates the embedding after reconstruction; obtaining a reconstructed video frame by using the reconstructed embedded part as the input of a decoding module; designing a new loss function; and carrying out abnormity scoring on the test samples to obtain abnormal values, and marking the corresponding test samples as abnormal videos when the abnormal values are larger than a threshold value. An external memory module is added in the original automatic encoder frame, and a typical normal sample in the memory for embedding is reconstructed and then decoded, so that the reconstruction result is close to the normal sample, the reconstruction error of the abnormal sample is increased, and the problems of small reconstruction error of the abnormal sample and poor abnormal detection effect caused by generalization capability are solved.

Description

Video anomaly detection method based on memory-enhanced automatic encoder
Technical Field
The invention relates to the technical field of video anomaly detection, in particular to a video anomaly detection method based on a memory-enhanced automatic encoder.
Background
The video anomaly detection is a computer vision technology for positioning the anomaly occurrence position in a video in space/time, has great application value in the fields of video monitoring, intelligent traffic and the like, and has wide research prospect. Since it is usually easy to obtain normal samples rather than abnormal samples in real production life, most video abnormality detection tasks are unsupervised learning tasks. Therefore, researchers firstly propose the idea of finding a hyperplane which can wrap normal samples to separate abnormal samples for classification, and in 2001, Yunqiang Chen and Bernhard Scholkopf et al propose a method of using a classification SVM classifier to perform abnormality detection; subsequently, Liang Xiong, Arthur Zimek proposed methods using a mixed gaussian model in 2011 and 2012, respectively, but their essence still models normal samples. Then, in 2018, Bo Zong and Qi Song propose a new idea of carrying out anomaly detection based on reconstruction, and the method assumes that a model capable of carrying out abstract characterization and reconstructing an original video frame can be obtained by learning on a normal sample. In abstract characterization, researchers have proposed various techniques, such as PCA principal component analysis, sparse characterization, and the like. With the rise of the neural network, the automatic encoder based on the neural network architecture becomes one of the hot spots of the research in recent years.
The method is an emerging method in the field of video anomaly detection, and the basic idea of the existing implementation method is to use a convolutional neural network as an encoding module and a decoding module, use reconstruction loss as a loss function, and learn on a normal sample training set to obtain model parameters. During testing, an input video frame is encoded to obtain an abstract representation (also called embedding) of an original sample, and then an image is reconstructed in a decoding module according to the embedding obtained by the encoder. Because the model is obtained based on normal sample training, the learning result of the model is generally considered to be better close to the normal sample rather than the abnormal sample, the normal sample subjected to model encoding-reconstruction is better reconstructed, and the reconstruction result of the abnormal sample has larger error with the original video frame.
However, because the neural network has a certain generalization capability, even if there is an abnormality in the input video frame, the training model can reconstruct the video frame well in some cases. Therefore, we need to propose new methods to weaken the generalization capability of neural network models. Meanwhile, due to the normal diversity, the distribution of normal samples should be around multiple centers, and we design a new loss function to encourage model multiple center distribution.
Disclosure of Invention
In view of the problems identified in the background art, the present invention provides a video anomaly detection method based on a memory-enhanced automatic encoder.
The technical scheme of the invention is realized as follows:
a video anomaly detection method based on a memory-enhanced automatic encoder comprises the following steps:
s1, training set in video sample
Figure BDA0003211468390000021
In the memory bank size N, randomly initializing memory entries in the memory bank, and when a sample x is input into an encoder, outputting the encoder to obtain an embedded z;
s2, embedding after calculation and reconstruction of reading module in external memory module
Figure BDA0003211468390000022
S3, embedding after reconstruction is utilized
Figure BDA0003211468390000023
As input to a decoding module, obtaining a reconstructed video frame
Figure BDA0003211468390000024
S4, designing a model to train a multi-center distributed loss function;
and S5, performing model training by adopting reverse propagation and gradient descent to obtain a final model, so as to perform abnormity scoring on the test sample to obtain an abnormal value, and marking the corresponding test sample as an abnormal video when the abnormal value is greater than a threshold value.
According to one embodiment of the invention, the embedding z is obtained by the following formula:
Figure BDA0003211468390000031
wherein
Figure BDA0003211468390000032
Representing the encoder parameters.
According to one embodiment of the invention, the reconstructed embeddings are calculated by the following formula
Figure BDA0003211468390000033
Figure BDA0003211468390000034
Where w represents the weight and M represents the memory matrix.
According to one embodiment of the invention, the weight w is calculated by softmax, among others:
Figure BDA0003211468390000035
according to one embodiment of the invention, the reconstructed video frame is obtained by the following formula
Figure BDA0003211468390000036
Figure BDA0003211468390000037
Wherein
Figure BDA0003211468390000038
Representing the decoding parameters.
According to one embodiment of the invention, the loss function is designed by the following formula:
Figure BDA0003211468390000039
wherein m ispIs shown and
Figure BDA00032114683900000310
memory entry nearest, mnIs shown and
Figure BDA00032114683900000311
the item that is the second closest.
According to one embodiment of the present invention, the outlier is obtained by the following equation:
Figure BDA00032114683900000312
where i, j is the spatial index of the video frame.
In conclusion, the invention has the beneficial effects that:
an external memory module is added in the original automatic encoder frame, and a typical normal sample in the memory for embedding is reconstructed and then decoded, so that the reconstruction result is close to the normal sample, the reconstruction error of the abnormal sample is increased, and the problems of small reconstruction error of the abnormal sample and poor abnormal detection effect caused by generalization capability are solved. Meanwhile, a new loss function is designed to fit the essence of multi-center distribution of normal samples.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is an overall block diagram of an embodiment of the present invention;
fig. 2 is an internal structural diagram of an external memory module according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is illustrated below with reference to fig. 1 and 2:
fig. 1 is a general block diagram of an embodiment of the present invention.
The embodiment of the invention provides a video anomaly detection method based on a memory enhanced automatic encoder, which comprises the following steps:
s1, training set in video sample
Figure BDA0003211468390000041
In the memory bank size N, randomly initializing memory entries in the memory bank, and when a sample x is input into an encoder, outputting the encoder to obtain an embedded z;
in one embodiment of the invention, the embedding z is obtained by the following formula:
Figure BDA0003211468390000051
wherein
Figure BDA0003211468390000052
Representing the encoder parameters.
S2, embedding after calculation and reconstruction of reading module in external memory module
Figure BDA0003211468390000053
In one embodiment of the invention, the reconstructed embeddings are calculated by the following formula
Figure BDA0003211468390000054
Figure BDA0003211468390000055
Where w represents the weight, M represents the memory matrix, and M is 1000.
Embedding after reconstruction
Figure BDA0003211468390000056
In the calculation formula (2), the element of the weight w calculates the weight w by softmax:
Figure BDA0003211468390000057
s3, embedding after reconstruction is utilized
Figure BDA0003211468390000058
As input to a decoding module, obtaining a reconstructed video frame
Figure BDA0003211468390000059
In one embodiment of the invention, the reconstructed video frame is obtained by the following formula
Figure BDA00032114683900000510
Figure BDA00032114683900000511
Wherein
Figure BDA00032114683900000512
Representing the decoding parameters.
S4, designing a model to train a multi-center distributed loss function;
in one embodiment of the present invention,
the loss function is designed by the following formula:
Figure BDA00032114683900000513
wherein m ispIs shown and
Figure BDA00032114683900000514
memory entry nearest, mnIs shown and
Figure BDA00032114683900000515
the entry from the second closest is λ c ═ 0.01, and λ s ═ 0.01.
Of a loss function
Figure BDA0003211468390000061
Representing the reconstruction error of the reconstructed sample and the original sample;
Figure BDA0003211468390000062
Figure BDA0003211468390000063
for intra-class differential penalties, it will encourage recombination to embed into its closest memory entries as close as possible, which will force the sample point distribution to tightly surround multiple centers, conforming to the nature of the abnormal sample multi-center distribution;
Figure BDA0003211468390000064
for inter-class similarity penalties, it will force the sample point to be as far away as possible from the second similar memory entry while being close to the closest memory entry, eventually rendering the result as the center points of the classes are each far away.
And S5, performing model training by adopting reverse propagation and gradient descent to obtain a final model, and performing abnormity scoring on the test sample to obtain an abnormal value, wherein when the abnormal value is greater than a threshold value, the threshold value gamma is 0.015, and the corresponding test sample is marked as an abnormal video.
In one embodiment of the present invention, the outlier is obtained by the following equation:
Figure BDA0003211468390000065
where i, j is the spatial index of the video frame.
In summary, as shown in fig. 1, an external memory module is added to the original automatic encoder framework, and this module stores the learned typical normal samples. When the samples are embedded by the autoencoder encoding, the memory is "read" and reconstructed by looking for a similar typical memory based on similarity, which can be referred to as content-based addressing. And then, a new embedded input decoding module formed by reconstructing the typical memory is decoded, because the new embedded is formed by typical normal memory, the reconstructed output obtained by decoding the new embedded input decoding module is similar to a normal sample, so that the reconstruction error of the abnormal sample can be increased, and the problem of over-good effect when the abnormal sample is reconstructed is solved.
Fig. 2 is a diagram illustrating an internal structure of an external memory module according to an embodiment of the present invention. A memory bank in the external memory module provides a memory item to be updated to the updating module; embedded in a reading module for reading; embedded in an updating module to update some memories in the memory bank, and then covering the original entries in the memory bank; the reading module reads the memory entries needed by the reorganization embedding and outputs the reorganization embedding.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A video anomaly detection method based on a memory-enhanced automatic encoder is characterized by comprising the following steps:
s1, training set in video sample
Figure FDA0003211468380000011
In the memory bank size N, randomly initializing memory entries in the memory bank, and when a sample x is input into an encoder, outputting the encoder to obtain an embedded z;
s2, embedding after calculation and reconstruction of reading module in external memory module
Figure FDA0003211468380000014
S3, embedding after reconstruction is utilized
Figure FDA0003211468380000015
As input to a decoding module, obtaining a reconstructed video frame
Figure FDA0003211468380000016
S4, designing a model to train a multi-center distributed loss function;
and S5, performing model training by adopting reverse propagation and gradient descent to obtain a final model, so as to perform abnormity scoring on the test sample to obtain an abnormal value, and marking the corresponding test sample as an abnormal video when the abnormal value is greater than a threshold value.
2. The method of claim 1, wherein the embedding z is obtained by the following formula:
Figure FDA0003211468380000018
wherein
Figure FDA0003211468380000019
Representing the encoder parameters.
3. The method of claim 2, wherein the embedding after reconstruction is calculated by the following formula
Figure FDA0003211468380000017
Figure FDA0003211468380000012
Where w represents the weight and M represents the memory matrix.
4. The method of claim 3, wherein the weight w is calculated by softmax:
Figure FDA0003211468380000013
5. the method of claim 3, wherein the reconstructed video frame is obtained by the following formula
Figure FDA0003211468380000027
Figure FDA0003211468380000021
Wherein
Figure FDA0003211468380000028
Representing decoding parameters。
6. The video anomaly detection method based on the memory-enhanced automatic encoder according to claim 1, wherein the loss function is designed by the following formula:
Figure FDA0003211468380000022
wherein m ispIs shown and
Figure FDA0003211468380000024
memory entry nearest, mnIs shown and
Figure FDA0003211468380000025
the item that is the second closest.
7. The method of claim 1, wherein the outlier is obtained by the following formula:
Figure FDA0003211468380000026
where i, j is the spatial index of the video frame.
CN202110932291.8A 2021-08-13 2021-08-13 Video anomaly detection method based on memory-enhanced automatic encoder Pending CN113807178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110932291.8A CN113807178A (en) 2021-08-13 2021-08-13 Video anomaly detection method based on memory-enhanced automatic encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110932291.8A CN113807178A (en) 2021-08-13 2021-08-13 Video anomaly detection method based on memory-enhanced automatic encoder

Publications (1)

Publication Number Publication Date
CN113807178A true CN113807178A (en) 2021-12-17

Family

ID=78942872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110932291.8A Pending CN113807178A (en) 2021-08-13 2021-08-13 Video anomaly detection method based on memory-enhanced automatic encoder

Country Status (1)

Country Link
CN (1) CN113807178A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743153A (en) * 2022-06-10 2022-07-12 北京航空航天大学杭州创新研究院 Non-sensory dish-taking model establishing and dish-taking method and device based on video understanding

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743153A (en) * 2022-06-10 2022-07-12 北京航空航天大学杭州创新研究院 Non-sensory dish-taking model establishing and dish-taking method and device based on video understanding

Similar Documents

Publication Publication Date Title
CN106991355B (en) Face recognition method of analytic dictionary learning model based on topology maintenance
Yang et al. Skeletonnet: A hybrid network with a skeleton-embedding process for multi-view image representation learning
CN111461232A (en) Nuclear magnetic resonance image classification method based on multi-strategy batch type active learning
CN112036513B (en) Image anomaly detection method based on memory-enhanced potential spatial autoregression
Wu Masked face recognition algorithm for a contactless distribution cabinet
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN111275175A (en) Neural network training method, neural network training device, image classification method, image classification equipment and medium
CN113535953B (en) Meta learning-based few-sample classification method
CN111475622A (en) Text classification method, device, terminal and storage medium
CN116089838B (en) Training method and recognition method for intelligent recognition model of electricity stealing user
CN113807178A (en) Video anomaly detection method based on memory-enhanced automatic encoder
CN116245513A (en) Automatic operation and maintenance system and method based on rule base
CN116821646A (en) Data processing chain construction method, data reduction method, device, equipment and medium
CN112633315A (en) Electric power system disturbance classification method
Peng et al. Virtual samples and sparse representation‐based classification algorithm for face recognition
Kong et al. A brief summary of dictionary learning based approach for classification (revised)
CN117458440A (en) Method and system for predicting generated power load based on association feature fusion
Yang et al. Deep hashing network for material defect image classification
CN115035455A (en) Cross-category video time positioning method, system and storage medium based on multi-modal domain resisting self-adaptation
CN110069666A (en) The Hash learning method and device kept based on Near-neighbor Structure
CN117011741A (en) Training method, device, equipment and storage medium of video detection model
Jia et al. Research on multi-label classification problems based on neural networks and label correlation
CN117591942B (en) Power load data anomaly detection method, system, medium and equipment
CN114387623B (en) Unsupervised pedestrian re-identification method based on multi-granularity block features
CN117349637A (en) Communication power supply health degree assessment method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination