CN116385935A - Abnormal event detection algorithm based on unsupervised domain self-adaption - Google Patents

Abnormal event detection algorithm based on unsupervised domain self-adaption Download PDF

Info

Publication number
CN116385935A
CN116385935A CN202310369508.8A CN202310369508A CN116385935A CN 116385935 A CN116385935 A CN 116385935A CN 202310369508 A CN202310369508 A CN 202310369508A CN 116385935 A CN116385935 A CN 116385935A
Authority
CN
China
Prior art keywords
domain
video frame
abnormal
normal
abnormal event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310369508.8A
Other languages
Chinese (zh)
Inventor
李璐
路文
伍凌帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Haiyuhong Intelligent Technology Co ltd
Original Assignee
Suzhou Haiyuhong Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Haiyuhong Intelligent Technology Co ltd filed Critical Suzhou Haiyuhong Intelligent Technology Co ltd
Priority to CN202310369508.8A priority Critical patent/CN116385935A/en
Publication of CN116385935A publication Critical patent/CN116385935A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an abnormal event detection algorithm based on unsupervised domain self-adaption, which comprises the following steps: step 1) performing experimental verification on UCSD data set and ShanghaiTech Campus data set; step 2) constructing an abnormal event detection algorithm based on unsupervised domain self-adaption: the model comprises a pre-training module and a domain self-adapting module; step 3) carrying out iterative training on an abnormal event detection algorithm model based on unsupervised domain self-adaption; and 4) obtaining an abnormal event detection model detection result based on unsupervised domain self-adaption. The abnormal event detection algorithm based on the unsupervised domain self-adaption performs supervised pre-training on a source domain data set, provides a perception contrast loss, can promote the reconstruction of a normal sample by a video frame reconstruction network, inhibits the reconstruction of the abnormal sample by the video frame reconstruction network, and determines the discrimination limit of the normal event and the abnormal event of the source domain to lead the prior knowledge of the normal event and the abnormal event to the target domain.

Description

Abnormal event detection algorithm based on unsupervised domain self-adaption
Technical Field
The invention relates to an abnormal event detection technology in a monitoring video, in particular to an abnormal event detection algorithm based on unsupervised domain self-adaption.
Background
The continuous development and wide application of the abnormal event detection technology in the monitoring video play an important role in promoting the intelligent monitoring development. In recent years, an abnormal event detection algorithm in a monitoring video based on deep learning has better progress in improving the model detection performance. Because the supervised abnormal event detection algorithm generally depends on a large amount of tag data, each frame of image of the video needs to be manually marked before a network model is trained; and the collected data often has the problem of unbalanced categories, so that the abnormal event detection effect is poor. The unsupervised abnormal event detection algorithm does not need to label the data, and the data which does not accord with the characteristic distribution rule of the normal event is judged to be the abnormal event by learning the characteristic representation of a large number of normal events, so that a large amount of manual labeling cost is saved. Because the abnormal event detection algorithm based on the unsupervised learning only uses the normal sample for training, the prior information of the abnormal event is lacking, and therefore the judgment limit of the normal event and the abnormal event is unclear, and false detection is easy to generate. In addition, there is a problem that the conventional abnormal event detection algorithm in the monitoring video lacks scene applicability, and is good in one scene and general in other scenes.
The method for detecting the abnormal event in the monitoring video based on the computer vision can be divided into a traditional method and a deep learning method. Deep learning is widely used in various fields because of its strong learning ability. In solving many of the challenging problems in the field of abnormal event detection in surveillance videos, students often use a deep learning approach. Compared with the traditional algorithm, the abnormal event detection algorithm in the monitoring video based on the deep learning has more excellent performance, and is a main stream algorithm of the abnormal event detection research direction in the monitoring video.
The abnormal event detection algorithm in the monitoring video can be divided into a full-supervision learning algorithm, a weak-supervision learning algorithm and an unsupervised learning algorithm according to whether to label data or not and how to label the data. In a real video monitoring scene, abnormal events are relatively few, and normal event data and abnormal event data are unbalanced, so that the detection performance of an algorithm model is reduced. And the abnormal events are diverse and non-exhaustive. The unsupervised learning algorithm does not need to label the data, and the data which does not accord with the characteristic distribution rule of the normal event is judged as the abnormal event by learning the characteristic representation of a large number of normal events. The existing unsupervised abnormal event detection algorithm only adopts normal event data for training and lacks prior information of abnormal events, so that the discrimination boundaries of the normal events and the abnormal events are unclear, and false detection is easy to generate.
In order to improve the discrimination capability of a network model for normal events and abnormal events, park H, noh J, ham B. Et al propose learning memory-guide normal anomaly detection in an IEEE computer vision and pattern recognition conference, and introduce a memory module and simultaneously provide a feature compactness loss and a feature separability loss to train the memory module so that the normal event features in the memory module have diversity, thereby improving the discrimination capability of the network model.
However, the existing abnormal event algorithm has the following two main disadvantages:
(1) The supervised abnormal event detection algorithm generally relies on a large amount of label data, each frame of image of the video needs to be manually marked before a network model is trained, and the collected data often has the problem of unbalanced categories.
(2) The abnormal event detection algorithm based on the unsupervised learning only uses normal samples for training, lacks prior information of the abnormal event, and further causes the problem that the discrimination boundaries of the normal event and the abnormal event are unclear.
Disclosure of Invention
The invention aims to solve the technical problems that the normal event and the abnormal event are unclear in discrimination boundaries and easy to generate false detection because the unsupervised abnormal event detection algorithm lacks prior information of the abnormal event, and the prior knowledge in a source domain can be well defined to be introduced into a target domain.
In order to solve the technical problems, the invention is realized by the following technical scheme: an abnormal event detection algorithm based on unsupervised domain adaptation comprises the following steps:
step 1) performing experimental verification on UCSD data set and ShanghaiTech Campus data set; using ShanghaiTech Campus data set as source domain data, and using UCSD Ped1 data set and UCSD Ped2 data set as target domain data respectively to perform experiments;
step 2) constructing an abnormal event detection algorithm based on unsupervised domain self-adaption: the model comprises a pre-training module and a domain self-adapting module;
the pre-training module performs supervised training on the source domain data, and only uses normal samples to train a reconstruction network when reconstructing an input video frame; in the pre-training stage, a supervised learning mode is adopted to input the reconstructed video frame, the normal sample and the abnormal sample into a feature extraction network to extract corresponding features, and the distance between the reconstructed video frame features and the normal sample features is enlarged by reducing the distance between the reconstructed video frame features and the normal sample features, so that the reconstruction of the normal sample by the video frame reconstruction network can be promoted, the reconstruction of the abnormal sample by the video frame reconstruction network is restrained, and the discrimination limits of the normal event and the abnormal event of the source domain are defined;
step 3) carrying out iterative training on an abnormal event detection algorithm model based on unsupervised domain self-adaption: the loss function L=lambda is adopted in the pre-training module res L resper L per Back propagation is carried out, and the weight parameters of the reconstructed network are updated; video frame reconstruction network in target domain employs loss function L rec Back propagation is carried out, and domain discriminators and video frame reconstruction network parameters are updated;
step 4) obtaining an abnormal event detection model detection result based on unsupervised domain self-adaption: and taking the test sample set as the input of a trained abnormal event detection model based on the unsupervised domain self-adaption to perform forward reasoning so as to obtain the detection result of each test sample.
Further, in the step 1), the source domain data is data having different scenes or environmental conditions from the target domain data, and the training set of the source domain data set includes both normal event video frames and abnormal event video frames, which are labeled, and the training set of the target domain data set includes only normal event video frames, which are unlabeled.
Further, in the step 2), the VGG19 pretrained model in Pytorch is adopted as a feature extraction network in the self-adaptive abnormal event detection algorithm based on the unsupervised domain, the reconstructed video frame, the normal sample and the abnormal sample are input into the network, the features of the 4 th layer, the 9 th layer, the 14 th layer, the 23 rd layer and the 32 rd layer corresponding to the reconstructed video frame, the features of different scales are extracted respectively, a perception contrast loss is provided, the distance between the features of the reconstructed video frame and the features of the normal sample is enlarged by reducing the distance between the features of the reconstructed video frame and the features of the normal sample and the features of the abnormal sample, so that the reconstruction of the normal sample by the video frame reconstruction network can be promoted, the reconstruction of the abnormal sample by the video frame reconstruction network is restrained, and the discrimination limit of the normal event and the abnormal event of the source domain is clear; the perceptual contrast loss function formula is as follows:
Figure BDA0004168122840000041
wherein C is i ,H i ,W i The channel number, the height and the width of the feature diagram corresponding to the ith item are respectively represented by I p Represents a normal sample, I n Representing an abnormal sample, I r Representing reconstructed video frames, f i (. Cndot.) represents the ith feature map obtained by inputting video frames into the feature extraction network; the constraint of perceived contrast loss enables the reconstructed video frame to be more similar to a normal sample and more different from an abnormal sample in texture detail and semantic information;
therefore, in the pre-training module, the total loss function formula is as follows:
L=λ res L resper L per
wherein lambda is res ,λ per The method comprises the steps of respectively representing the weight coefficients of an MSE loss function and a perception contrast loss function, keeping the parameters of a feature extraction network unchanged in a pre-training process, and continuously updating optimization parameters by a reconstruction network, so that the reconstruction of a normal sample by a video frame reconstruction network is promoted, the reconstruction of an abnormal sample by the video frame reconstruction network is restrained, and the discrimination limit of a normal event and an abnormal event of a source domain is defined.
Further, the lambda res ,λ per The weight coefficients representing the MSE loss function and the perceptual contrast loss function, respectively, may be replaced by using l 1 The loss function replaces the MSE loss function to minimize the difference between the predicted video frame and the actual video frame.
Further, the step 2) continuously optimizes parameters of the video frame reconstruction network and the domain discriminator through alternate training of the video frame reconstruction network and the domain discriminator, so that the video frame reconstruction network can learn domain invariant features well represented in both domains, thereby aligning data distribution of the two domains, and enabling the domain discriminator to discriminate whether the input video frame is from the source domain or the target domain as much as possible.
Further, the target domain adopts an unsupervised learning mode, and a normal sample in the target domain and a normal sample in the source domain are input into a pre-trained reconstruction network to be trained together; the whole training process is based on countermeasure learning, the video frame reconstruction network and the domain discriminator are trained alternately to continuously optimize the video frame reconstruction network and the domain discriminator, and the data distribution of the source domain and the target domain is aligned.
Further, the specific training process is that the video frame reconstruction network is fixed when the domain discriminator is trained, so that the domain discriminator can distinguish whether the data come from the source domain or the target domain as far as possible; the domain discriminator is fixed when the video frame reconstruction network is trained, so that the video frame reconstruction network generates a result which cannot be resolved by the domain discriminator;
the loss function in training the domain arbiter is shown as the formula
L dis =f(d(g(x s )),t s )+f(d(g(x t )),t t )
The loss function when training the video frame reconstruction network is shown as the following formula, the optimized domain discriminator and the video frame reconstruction network parameters are continuously updated through back propagation, and finally, the target domain data distribution is aligned with the source domain data distribution;
L rec =f(d(g(x s )),t t )+f(d(g(x t )),t s )
wherein x is s Representing source domain data, x t Representing target domain data, g (·) representing a video frame reconstruction network, d (·) representing a domain arbiter, t s A tag representing source domain data, defined as 0, t t A label representing target domain data, defined as 1, f (·) represents a binary cross entropy loss function, the formula of which is shown below:
f(p,q)=-w×[p×log(q)+(1-p)×log(1-q)]
where p represents a label value, q represents an actual predicted value, and w represents a weight coefficient.
Furthermore, the non-supervision learning mode is adopted in the target domain, the non-supervision learning mode is mainly adopted, only the normal samples in the target domain are required to be trained, priori knowledge capable of well defining normal events and abnormal events in the pre-training module can be introduced into the target domain, the judgment limit of the normal events and the abnormal events in the target domain is clear, the abnormal event detection performance of the algorithm model in the target domain is improved, and Ji Yuanyu data and target domain data can be distributed through countermeasure training, so that domain offset is reduced, and the algorithm model has better scene applicability.
Compared with the prior art, the invention has the following advantages: firstly, performing supervised pre-training on a source domain data set, and providing a perception contrast loss; the distance between the reconstructed video frame features and the normal sample features is shortened, and the distance between the reconstructed video frame features and the abnormal sample features is enlarged, so that the reconstruction of the normal event by the video frame reconstruction network is promoted, the reconstruction of the abnormal event by the video frame reconstruction network is restrained, and the discrimination limit of the normal event and the abnormal event of the source domain is defined; then, an unsupervised learning mode is adopted in the target domain, the priori knowledge of the normal event and the abnormal event can be well defined in the pre-training stage and introduced into the target domain through the unsupervised domain self-adaption based on the counterlearning, the discrimination limit of the normal event and the abnormal event in the target domain is defined, and the abnormal event detection performance of the algorithm model in the target domain is improved; finally, by countertraining, domain offset is reduced for Ji Yuanyu data and target domain data distribution, so that the algorithm model has better scene applicability, and priori knowledge of normal events and abnormal events can be well defined and introduced into the target domain.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a schematic flow chart of an anomaly event detection algorithm based on unsupervised domain adaptation.
Detailed Description
The invention is described in detail below with reference to the drawings and the detailed description.
An abnormal event detection algorithm based on unsupervised domain adaptation, as shown in fig. 1, comprises the following steps:
step 1) performing experimental verification on UCSD data set and ShanghaiTech Campus data set; using ShanghaiTech Campus data set as source domain data, and using UCSD Ped1 data set and UCSD Ped2 data set as target domain data respectively to perform experiments; the source domain data is data with different scenes or environmental conditions from the target domain data, and the training set of the source domain data set not only contains normal event video frames, but also contains abnormal event video frames, and is labeled; the training set of the target domain data set only comprises normal event video frames and is unlabeled;
step 2) constructing an abnormal event detection algorithm based on unsupervised domain self-adaption: the model comprises a pre-training module and a domain self-adapting module;
the pre-training module performs supervised training on the source domain data, and only uses normal samples to train a reconstruction network when reconstructing an input video frame; in the pre-training stage, a supervised learning mode is adopted to input the reconstructed video frame, the normal sample and the abnormal sample into a feature extraction network to extract corresponding features, and the distance between the reconstructed video frame features and the normal sample features is enlarged by reducing the distance between the reconstructed video frame features and the normal sample features, so that the reconstruction of the normal sample by the video frame reconstruction network can be promoted, the reconstruction of the abnormal sample by the video frame reconstruction network is restrained, and the discrimination limits of the normal event and the abnormal event of the source domain are defined;
the VGG19 pre-training model in Pytorch is adopted as a feature extraction network based on an unsupervised domain self-adaptive abnormal event detection algorithm, a reconstructed video frame, a normal sample and an abnormal sample are input into the network, features of different scales of a 4 th layer, a 9 th layer, a 14 th layer, a 23 rd layer and a 32 nd layer corresponding to the reconstructed video frame, the features are extracted respectively, a perception contrast loss is provided, the distance between the reconstructed video frame features and the normal sample features at the corresponding scales is enlarged by reducing the distance between the reconstructed video frame features and the normal sample features at the corresponding scales, and therefore the reconstruction of the normal sample by the video frame reconstruction network can be promoted, the reconstruction of the abnormal sample by the video frame reconstruction network is restrained, and the discrimination limit of the normal event and the abnormal event of a source domain is defined; the perceptual contrast loss function formula is as follows:
Figure BDA0004168122840000081
wherein C is i ,H i ,W i The channel number, the height and the width of the feature diagram corresponding to the ith item are respectively represented by I p Represents a normal sample, I n Representing an abnormal sample, I r Representing reconstructed video frames, f i (. Cndot.) represents the ith feature map obtained by inputting video frames into the feature extraction network; the constraint of perceived contrast loss enables the reconstructed video frame to be more similar to a normal sample and more different from an abnormal sample in texture detail and semantic information;
therefore, in the pre-training module, the total loss function formula is as follows:
L=λ res L resper L per
wherein lambda is res ,λ per Weight coefficients representing the MSE loss function and the perceptual contrast loss function, respectively, or using l 1 The loss function replaces the MSE loss function to minimize the difference between the predicted video frame and the real video frame; in the pre-training process, the parameters of the feature extraction network are kept unchanged, and the reconstruction network continuously updates the optimization parameters, so that the reconstruction of the normal samples by the video frame reconstruction network is promoted, the reconstruction of the abnormal samples by the video frame reconstruction network is restrained, and the discrimination limit of the normal events and the abnormal events of the source domain is defined.
The abnormal event detection algorithm carries out unsupervised domain self-adaption based on the generated countermeasures; through the alternate training of the video frame reconstruction network and the domain discriminator, the parameters of the video frame reconstruction network and the domain discriminator are continuously optimized, so that the video frame reconstruction network can learn domain invariant features which are well represented in both domains, thereby aligning the data distribution of the two domains, and enabling the domain discriminator to discriminate whether an input video frame comes from a source domain or a target domain as far as possible;
inputting a normal sample in a target domain and a normal sample in a source domain into a pre-trained reconstruction network to be trained together in an unsupervised learning mode; the whole training process is based on countermeasure learning, the video frame reconstruction network and the domain discriminator are trained alternately to continuously optimize the video frame reconstruction network and the domain discriminator, and the data distribution of the source domain and the target domain is aligned; the specific training process is that the video frame reconstruction network is fixed when the domain discriminator is trained, so that the domain discriminator can distinguish whether the data come from the source domain or the target domain as far as possible; the domain discriminator is fixed when the video frame reconstruction network is trained, so that the video frame reconstruction network generates a result which cannot be resolved by the domain discriminator;
the loss function in training the domain arbiter is shown as the formula
L dis =f(d(g(x s )),t s )+f(d(g(x t )),t t )
The loss function when training the video frame reconstruction network is shown as the following formula, the optimized domain discriminator and the video frame reconstruction network parameters are continuously updated through back propagation, and finally, the target domain data distribution is aligned with the source domain data distribution;
L rec =f(d(g(x s )),t t )+f(d(g(x t )),t s )
wherein x is s Representing source domain data, x t Representing target domain data, g (·) representing a video frame reconstruction network, d (·) representing a domain arbiter, t s A tag representing source domain data, defined as 0, t t A label representing target domain data, defined as 1, f (·) represents a binary cross entropy loss function, the formula of which is shown below:
f(p,q)=-w×[p×log(q)+(1-p)×log(1-q)]
wherein p represents a tag value, q represents an actual predicted value, and w represents a weight coefficient;
through the self-adaptive module of the unsupervised domain, the normal sample in the target domain is trained only by adopting the unsupervised learning mode, the priori knowledge of the normal event and the abnormal event can be well defined in the pre-training module and introduced into the target domain, the discrimination limit of the normal event and the abnormal event of the target domain is defined, the abnormal event detection performance of the algorithm model in the target domain is improved, and the Ji Yuanyu data and the target domain data can be distributed by countermeasure training, so that the domain offset is reduced, and the algorithm model has better scene applicability;
step 3) carrying out iterative training on an abnormal event detection algorithm model based on unsupervised domain self-adaption: the loss function L=lambda is adopted in the pre-training module res L resper L per Back propagation is carried out, and the weight parameters of the reconstructed network are updated; video frame reconstruction network in target domain employs loss function L rec Back propagation is carried out, and domain discriminators and video frame reconstruction network parameters are updated;
step 4) obtaining an abnormal event detection model detection result based on unsupervised domain self-adaption: and taking the test sample set as the input of a trained abnormal event detection model based on the unsupervised domain self-adaption to perform forward reasoning so as to obtain the detection result of each test sample.
Firstly, performing supervised pre-training on a source domain data set, and providing a perception contrast loss; the distance between the reconstructed video frame features and the normal sample features is shortened, and the distance between the reconstructed video frame features and the abnormal sample features is enlarged, so that the reconstruction of the normal event by the video frame reconstruction network is promoted, the reconstruction of the abnormal event by the video frame reconstruction network is restrained, and the discrimination limit of the normal event and the abnormal event of the source domain is defined; then, an unsupervised learning mode is adopted in the target domain, the priori knowledge of the normal event and the abnormal event can be well defined in the pre-training stage and introduced into the target domain through the unsupervised domain self-adaption based on the counterlearning, the discrimination limit of the normal event and the abnormal event in the target domain is defined, and the abnormal event detection performance of the algorithm model in the target domain is improved; finally, by countertraining, domain offset is reduced for Ji Yuanyu data and target domain data distribution, so that the algorithm model has better scene applicability, and priori knowledge of normal events and abnormal events can be well defined and introduced into the target domain.
It is emphasized that: the above embodiments are merely preferred embodiments of the present invention, and the present invention is not limited in any way, and any simple modification, equivalent variation and modification made to the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (8)

1. An abnormal event detection algorithm based on unsupervised domain self-adaption is characterized by comprising the following steps:
step 1) performing experimental verification on UCSD data set and ShanghaiTech Campus data set; using ShanghaiTech Campus data set as source domain data, and using UCSD Ped1 data set and UCSD Ped2 data set as target domain data respectively to perform experiments;
step 2) constructing an abnormal event detection algorithm based on unsupervised domain self-adaption: the model comprises a pre-training module and a domain self-adapting module;
the pre-training module performs supervised training on the source domain data, and only uses normal samples to train a reconstruction network when reconstructing an input video frame; in the pre-training stage, a supervised learning mode is adopted to input the reconstructed video frame, the normal sample and the abnormal sample into a feature extraction network to extract corresponding features, and the distance between the reconstructed video frame features and the normal sample features is enlarged by reducing the distance between the reconstructed video frame features and the normal sample features, so that the reconstruction of the normal sample by the video frame reconstruction network can be promoted, the reconstruction of the abnormal sample by the video frame reconstruction network is restrained, and the discrimination limits of the normal event and the abnormal event of the source domain are defined;
step 3) carrying out iterative training on an abnormal event detection algorithm model based on unsupervised domain self-adaption: the loss function L=lambda is adopted in the pre-training module res L resper L per Back propagation is carried out, and the weight parameters of the reconstructed network are updated; video frame reconstruction network in target domain employs loss function L rec Back propagation is carried out, and domain discriminators and video frame reconstruction network parameters are updated;
step 4) obtaining an abnormal event detection model detection result based on unsupervised domain self-adaption: and taking the test sample set as the input of a trained abnormal event detection model based on the unsupervised domain self-adaption to perform forward reasoning so as to obtain the detection result of each test sample.
2. The abnormal event detection algorithm based on unsupervised domain adaptation according to claim 1, wherein the source domain data in step 1) is data having different scene or environmental condition from the target domain data, and the training set of the source domain data set includes both normal event video frames and abnormal event video frames, and the training set of the target domain data set includes only normal event video frames, and is not labeled.
3. The abnormal event detection algorithm based on the unsupervised domain self-adaption according to claim 1, wherein the abnormal event detection algorithm based on the unsupervised domain self-adaption in step 2) adopts a VGG19 pretrained model in a Pytorch as a feature extraction network, inputs a reconstructed video frame, a normal sample and an abnormal sample into the network, extracts features of different scales of a corresponding layer 4, a layer 9, a layer 14, a layer 23 and a layer 32 respectively, proposes a perception contrast loss, and enlarges the distance between the features of the reconstructed video frame and the features of the normal sample in the corresponding scales by reducing the distance between the features of the reconstructed video frame and the features of the normal sample, thereby promoting the reconstruction of the normal sample by the video frame reconstruction network, inhibiting the reconstruction of the abnormal sample by the video frame reconstruction network, and defining the discrimination limits of the normal event and the abnormal event of the source domain; the perceptual contrast loss function formula is as follows:
Figure FDA0004168122830000021
wherein C is i ,H i ,W i The channel number, the height and the width of the feature diagram corresponding to the ith item are respectively represented by I p Represents a normal sample, I n Representing an abnormal sample, I r Representing reconstructed video frames, f i (. Cndot.) represents the ith feature map obtained by inputting video frames into the feature extraction network; the constraint of perceived contrast loss enables the reconstructed video frame to be more similar to a normal sample and more different from an abnormal sample in texture detail and semantic information;
therefore, in the pre-training module, the total loss function formula is as follows:
L=λ res L resper L per
wherein lambda is res ,λ per The weight coefficients respectively representing the MSE loss function and the perception contrast loss function, the characteristic extraction network parameters are kept unchanged in the pre-training process, and the reconstruction network continuously updates the optimization parameters, so that the reconstruction of the video frame reconstruction network to the normal samples is promoted, the reconstruction of the video frame reconstruction network to the abnormal samples is restrained, and the source domain normal events and abnormal events are definedDiscrimination limits for the part.
4. An unsupervised domain adaptation based anomaly event detection algorithm according to claim 3, wherein λ is res ,λ per The weight coefficients representing the MSE loss function and the perceptual contrast loss function, respectively, may be replaced by using l 1 The loss function replaces the MSE loss function to minimize the difference between the predicted video frame and the actual video frame.
5. The abnormal event detection algorithm based on unsupervised domain adaptation according to claim 1, wherein the step 2) continuously optimizes parameters of the video frame reconstruction network and the domain discriminator through alternate training of the video frame reconstruction network and the domain discriminator, so that the video frame reconstruction network can learn domain invariant features well behaved in both domains, thereby aligning data distribution of both domains, and enabling the domain discriminator to discriminate whether an input video frame is from a source domain or a target domain as much as possible.
6. The abnormal event detection algorithm based on the unsupervised domain adaptation as claimed in claim 5, wherein the target domain adopts an unsupervised learning mode to input the normal samples in the target domain and the normal samples in the source domain into the pre-trained reconstruction network for training together; the whole training process is based on countermeasure learning, the video frame reconstruction network and the domain discriminator are trained alternately to continuously optimize the video frame reconstruction network and the domain discriminator, and the data distribution of the source domain and the target domain is aligned.
7. The abnormal event detection algorithm based on unsupervised domain adaptation according to claim 6, wherein the specific training process is to fix the video frame reconstruction network when training the domain arbiter, so that the domain arbiter can distinguish whether the data is from the source domain or the target domain as far as possible; the domain discriminator is fixed when the video frame reconstruction network is trained, so that the video frame reconstruction network generates a result which cannot be resolved by the domain discriminator;
the loss function in training the domain arbiter is shown as the formula
L dis =f(d(g(x s )),t s )+f(d(g(x t )),t t )
The loss function when training the video frame reconstruction network is shown as the following formula, the optimized domain discriminator and the video frame reconstruction network parameters are continuously updated through back propagation, and finally, the target domain data distribution is aligned with the source domain data distribution;
L rec =f(d(g(x s )),t t )+f(d(g(x t )),t s )
wherein x is s Representing source domain data, x t Representing target domain data, g (·) representing a video frame reconstruction network, d (·) representing a domain arbiter, t s A tag representing source domain data, defined as 0, t t A label representing target domain data, defined as 1, f (·) represents a binary cross entropy loss function, the formula of which is shown below:
f(p,q)=-w×[p×log(q)+(1-p)×log(1-q)]
where p represents a label value, q represents an actual predicted value, and w represents a weight coefficient.
8. The abnormal event detection algorithm based on the unsupervised domain self-adaption according to claim 6, wherein the unsupervised domain self-adaption module is adopted in the target domain, only the normal samples in the target domain are required to be trained, the priori knowledge capable of well defining the normal events and the abnormal events in the pre-training module can be introduced into the target domain, the discrimination limits of the normal events and the abnormal events in the target domain are defined, the abnormal event detection performance of the algorithm model in the target domain is improved, and Ji Yuanyu data and the target domain data can be distributed through countermeasure training, so that domain offset is reduced, and the algorithm model has better scene applicability.
CN202310369508.8A 2023-04-08 2023-04-08 Abnormal event detection algorithm based on unsupervised domain self-adaption Pending CN116385935A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310369508.8A CN116385935A (en) 2023-04-08 2023-04-08 Abnormal event detection algorithm based on unsupervised domain self-adaption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310369508.8A CN116385935A (en) 2023-04-08 2023-04-08 Abnormal event detection algorithm based on unsupervised domain self-adaption

Publications (1)

Publication Number Publication Date
CN116385935A true CN116385935A (en) 2023-07-04

Family

ID=86978392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310369508.8A Pending CN116385935A (en) 2023-04-08 2023-04-08 Abnormal event detection algorithm based on unsupervised domain self-adaption

Country Status (1)

Country Link
CN (1) CN116385935A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743646A (en) * 2023-08-15 2023-09-12 云南省交通规划设计研究院有限公司 Tunnel network anomaly detection method based on domain self-adaptive depth self-encoder

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743646A (en) * 2023-08-15 2023-09-12 云南省交通规划设计研究院有限公司 Tunnel network anomaly detection method based on domain self-adaptive depth self-encoder
CN116743646B (en) * 2023-08-15 2023-12-19 云南省交通规划设计研究院股份有限公司 Tunnel network anomaly detection method based on domain self-adaptive depth self-encoder

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN113554089B (en) Image classification countermeasure sample defense method and system and data processing terminal
CN111797326B (en) False news detection method and system integrating multi-scale visual information
CN113936339A (en) Fighting identification method and device based on double-channel cross attention mechanism
CN111611847A (en) Video motion detection method based on scale attention hole convolution network
CN111738054B (en) Behavior anomaly detection method based on space-time self-encoder network and space-time CNN
CN110968845B (en) Detection method for LSB steganography based on convolutional neural network generation
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN112434599B (en) Pedestrian re-identification method based on random occlusion recovery of noise channel
CN116030396B (en) Accurate segmentation method for video structured extraction
CN116385935A (en) Abnormal event detection algorithm based on unsupervised domain self-adaption
CN110827265A (en) Image anomaly detection method based on deep learning
CN116910752B (en) Malicious code detection method based on big data
CN114842343A (en) ViT-based aerial image identification method
Li et al. Image manipulation localization using attentional cross-domain CNN features
Zhou et al. Deep multi-scale features learning for distorted image quality assessment
CN109871790A (en) A kind of video decolorizing method based on hybrid production style
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method
CN114677670B (en) Method for automatically identifying and positioning identity card tampering
CN112686844B (en) Threshold setting method, storage medium and system based on video quality inspection scene
CN111797732B (en) Video motion identification anti-attack method insensitive to sampling
CN114937222A (en) Video anomaly detection method and system based on double branch network
CN114169462A (en) Feature-guided depth sub-field self-adaptive steganography detection method
CN113487506A (en) Countermeasure sample defense method, device and system based on attention denoising
CN111950363B (en) Video anomaly detection method based on open data filtering and domain adaptation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination