CN111767826B

CN111767826B - Timing and fixed-point scene anomaly detection method

Info

Publication number: CN111767826B
Application number: CN202010589246.2A
Authority: CN
Inventors: 董亚波; 方盛凯; 吕一帆
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2023-06-06
Anticipated expiration: 2040-06-24
Also published as: CN111767826A

Abstract

The invention discloses a timing and fixed-point scene anomaly detection method, which comprises the following steps: acquiring a scene image at fixed time and fixed point by adopting a wireless image sensor, and preprocessing the scene image to obtain a sample image and a test image; constructing an image detection model, and training the image detection model by a training system, wherein the training system comprises an encoder, a decoder and a discriminator; during training, taking the sum of content consistency loss, reconstruction loss, antagonism loss and cycle consistency loss as the loss of a training system, carrying out parameter optimization on the training system, and extracting an encoder with determined parameters as an image detection model; screening an anchoring image group from the sample image by using an image detection model; and carrying out outlier test on the test image according to the anchoring image group and the image detection model to obtain an outlier test result. The timing fixed point scene anomaly detection method is used for automatically detecting scene anomalies based on the timing fixed point static image, and is simple in operation and low in power consumption.

Description

Timing and fixed-point scene anomaly detection method

Technical Field

The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a timing and fixed-point scene anomaly detection method.

Background

In recent years, with the progress of technologies such as artificial intelligence and hardware, video media technology has been rapidly developed and widely used in various fields. However, conventional video monitoring systems often look at the monitoring video by a person without interruption, so as to find problems and deal with them in time. However, with the rise of labor cost and the continuous increase of the number of monitoring cameras, the application of the monitoring system is gradually limited by the lagging monitoring means. The intelligent video monitoring system automatically analyzes video data in real time by utilizing technologies such as computer vision, image processing and the like, so that the dependence of the monitoring system on manpower is gradually relieved. When the intelligent video monitoring system discovers that abnormal data occurs in a monitoring scene, an abnormal signal is sent immediately, so that relevant departments are prompted to respond timely. Considering that for scene monitoring, the definition of an abnormal scene is wider than that of a normal scene, a method of modeling with a normal scene has the following advantages compared with a method of starting from an abnormal scene: 1. the dependence on abnormal data is reduced, and the difficulty of the abnormal data is relieved; 2. the modeling is more flexible and convenient, and the type and the number of the anomalies are not required to be explicitly declared.

On the other hand, the conventional monitoring system is often based on video unfolding operation, so that the conventional monitoring system has high requirements on the working environment and hardware equipment. However, in some special situations, such as in a field environment, which cannot afford uninterrupted power to the camera, video-based monitoring systems are not an effective solution. For this reason, a monitoring system based on a low-power wireless image sensor with timing shooting can be regarded as an effective replacement of a video-based monitoring system in some special cases;

the patent application with the publication number of CN105335703A discloses a traffic scene anomaly detection method based on a motion reconstruction technology, and the patent application with the publication number discloses a video anomaly event detection method under a crowded scene. Both of these two solutions are applicable to detection of motion scenes.

Disclosure of Invention

The invention aims to provide a timing and fixed-point scene anomaly detection method which is used for automatically detecting scene anomalies based on static images with timing and fixed points, and is simple to operate and low in power consumption.

The technical scheme of the invention is as follows:

a timing fixed point scene anomaly detection method comprises the following steps:

step 1, acquiring a scene image at fixed time and fixed point by adopting a wireless image sensor, and preprocessing the scene image to obtain a sample image and a test image;

step 2, constructing an image detection model, wherein the image detection model is obtained through training based on a training system constructed by generating an countermeasure network, the training system comprises an encoder, a decoder and a discriminator, the encoder and the decoder form a generator, the generator is used for generating a reconstructed image according to an input sample image, and the discriminator is used for discriminating the authenticity of the input sample image and the reconstructed image; during training, taking the sum of content consistency loss, reconstruction loss, antagonism loss and cycle consistency loss as the loss of a training system, carrying out parameter optimization on the training system, and extracting an encoder with determined parameters as an image detection model;

step 3, screening an anchor image group from the sample image by using an image detection model;

and 4, performing outlier test on the test image according to the anchoring image group and the image detection model to obtain an outlier test result.

Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:

according to the timing and fixed-point scene anomaly detection method provided by the invention, scene images are acquired at fixed points at fixed time, the power consumption is low, interference of light and shadow changes on image content can be filtered out based on the image detection model constructed by the generated type countermeasure network, the detection precision of the image detection model is improved, scene anomaly value detection is automatically carried out by using the image detection model, the scene anomaly detection accuracy is improved, and the full-automatic detection is realized, so that the limitation of relying on manpower and severe working environment is overcome.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a timing and fixed point scene anomaly detection method provided by an embodiment of the present invention;

FIG. 2 is a block flow diagram of the screening of the anchor image group and the testing step in the timing and fixed point scene anomaly detection method provided by the embodiment of the invention;

FIG. 3 is a schematic diagram of an encoder according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a decoder according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a discriminator according to the embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.

In order to solve the limitation problems that the monitoring of the existing video monitoring system depends on manpower, depends on abnormal sample data and has a severe working environment, the embodiment provides a timing fixed-point monitoring scene abnormality detection method based on discontinuous frames.

As shown in fig. 1 and 2, the method for detecting the timing fixed point scene anomaly provided by the embodiment includes the following steps:

s101, acquiring a scene image at fixed time and fixed point by adopting a wireless image sensor, and preprocessing the scene image to obtain a sample image and a test image.

The method adopts the low-power consumption wireless image sensor to collect the scene images of the monitored scene at fixed time and fixed point, the scene images are continuous frame images, and the scene images cannot be directly input into the model detection, so that the scene images need to be preprocessed, and the method comprises the following steps:

firstly, labeling each pixel of a scene image by using abnormal pixels, screening the scene image with the abnormal pixel duty ratio exceeding the duty ratio threshold as an abnormal image, and the rest being normal images;

in general, the abnormal pixels in the scene image are marked as 1, and other normal pixels are marked as 0, so that the abnormal pixels in the scene image are marked. The duty threshold is used as a boundary line for screening the abnormal image, and is limited according to the actual application scenario, and may be 5% for example. And when the number of the abnormal pixel points in the scene image exceeds the duty ratio threshold value, the scene image is an abnormal image.

And then, calculating the mean value and variance of the normal image on the content information channel, the light intensity information channel and the point light source incidence rate information channel, carrying out normalization processing on each channel of the normal image and the abnormal image by utilizing the mean value and variance of each channel, extracting the normalization result of the normal image as a sample image, and the normalization result of the normal image and the normalization result of the abnormal image as test images.

The content information reflects scene semantic information presented by the scene image, and the light intensity information and the point light source incidence rate information reflect illumination information presented by the scene image. The illumination information includes light intensity information and incidence rate of the point light source. The incidence rate of a point light source is cos theta information indicating an angle formed by incident light of the point light source and a normal vector of a point in space.

In an embodiment, the z-score normalization process may be performed on each channel of the normal image and the abnormal image according to the mean μ and the variance σ of each channel.

S102, constructing an image detection model.

In the embodiment, the image detection model is mainly used for extracting the content information of the image, a training system is constructed according to the generated countermeasure network in order to prevent the interference of the illumination information on the content information, the training process of the training system is continuously optimized on the network parameters, the image detection model can further filter the interference of the illumination information when extracting the content information,

the training system comprises an encoder, a decoder and a discriminator, wherein the encoder and the decoder form a generator and are used for generating a reconstructed image according to an input sample image, and the discriminator is used for discriminating the authenticity of the input sample image and the reconstructed image.

Specifically, in the training system, block images of the same corresponding positions of any two sample images are used as the input of an encoder and a discriminator;

the encoder is used for extracting first content information, second content information, first light intensity information and second light intensity information from two input block images, and first point light source incidence rate information and second point light source incidence rate information;

the decoder is used for generating a first reconstruction image according to the first content information, the first light intensity information and the first point light source incidence rate information, generating a second reconstruction image according to the second content information, the second light intensity information and the second point light source incidence rate information, generating a first illumination migration image according to the first content information, the second light intensity information and the second point light source incidence rate information, and generating a second illumination migration image according to the second content information, the first light intensity information and the first point light source incidence rate information, wherein the first reconstruction image, the second reconstruction image, the first illumination migration image and the second illumination migration image are collectively called as a reconstruction image;

the discriminator is used for discriminating the authenticity of the two input block images, the first illumination migration image and the second illumination migration image.

In order to extract the content information and the illumination information of the image, the encoder comprises three branches, which respectively correspond to the content information, the light intensity information and the incidence rate information of the point light source of the extracted image;

the first branch circuit for extracting the light intensity information comprises a feature extraction module, a residual error network module, a global pooling layer and a convolution layer;

the second branch for extracting the content information comprises a feature extraction module and a residual error network module;

the third branch for extracting the incidence rate information of the point light source comprises a characteristic extraction module, a residual error network module and a convolution layer;

the feature extraction module comprises a convolution layer, an instance regularization layer and an activation layer.

The decoder comprises a first feature extraction module, a residual error network module, a second feature extraction module and a third feature extraction module;

the first feature extraction module and the third feature extraction module comprise a convolution layer, an instance regularization layer and an activation layer, the computation processes of the instance regularization layers of the first feature extraction module and the third feature extraction module are different, and the activation functions adopted by the activation layer are different;

the second feature extraction module comprises an upsampling layer, a convolution layer, an instance regularization layer and an activation layer;

the arbiter comprises a plurality of feature extraction units and a fully connected layer, each feature extraction unit comprising a convolution layer and an activation layer.

Fig. 3 to 5 exemplarily show specific structural diagrams of the encoder, the decoder and the discriminator.

The method comprises the following steps of aiming at an encoder, including three branches, wherein the three branches comprise three feature extraction modules, a residual error network module residual block_1, a residual error network module residual block_3 and a residual error network module residual block_3 which are sequentially connected; each feature extraction module comprises a convolution layer, an instance regularization layer IN and an activation layer Relu; the difference is that the first branch for extracting the light intensity information i is also connected with a global pooling layer GlobalPooling and a convolution layer Conv4 after the residual network module residual Block_3; the third branch for extracting the point light source incidence information α is further connected with a convolutional layer Conv5 after the residual network module residual block_3.

The method comprises the following steps of aiming at a decoder, including two first feature extraction modules, a residual error network module residual block_1, a residual error network module residual block_3, two second feature extraction modules and a third feature extraction module which are connected in sequence; each first feature extraction module comprises a convolution layer, a modified instance regularization layer Illumination IN and an activation layer leak Relu; the input of the convolution layer Conv1 of the first feature extraction module is content information k _d The input of a convolution layer Conv1 of the second first characteristic extraction module is the output of the first characteristic extraction module, the light intensity information i and the point light source incidence information alpha, and the second characteristic extraction module comprises an up-sampling layer Upsample, a convolution layer, an instance regularization layer IN and an activation layer leak Relu; the third feature extraction module comprises a convolution layer, an instance regularization layer IN and an activation layer Tanh.

Aiming at the discriminator, the method comprises 7 feature extraction units, a full connection layer FC and a classification evaluation unit which are sequentially connected, wherein each feature extraction unit comprises a convolution layer and an activation layer leak Relu, and the classification evaluation unit adopts a Logit model.

Among them, the example regularization layer IN employs Adaptive Instance Normalization (AdaIN). AdaIN resets the mapping parameters based on the original Instance Normalization, see Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization for details.

Considering that AdaIN only considers global light intensity information and does not consider the incidence rate of point light sources for each point IN space, a modified example regularization layer Illumination IN is proposed and used IN the decoder, and its implementation expression is:

wherein, the liquid crystal display device comprises a liquid crystal display device,

then it is indicated that in the kth channel the normalized content feature value with spatial coordinates i, j, and +.>

Then it is calculated from Instance Normalization:

/>

while

α _ij Respectively representing the light intensity of the ambient light in a kth channel, the light intensity of a point light source in the kth channel and the incidence rate of the point light source at the spatial position i and j; x is x _ijk For the content eigenvalues at the k-th channel with spatial coordinates i, j, μ _k 、/>

To avoid the variance of 0 during normalization, the divisor is 0, so a small error term e is added, typically a value of 10 ^-5 . Normalization of the content feature map is performed to further refine information within the input image, improving the quality of the migration. 10 ^-5

On the basis of constructing a training system, constructing the loss of the training system, carrying out parameter optimization on the training system by utilizing the loss, ensuring that an encoder can accurately extract content information, light intensity information and point light source incidence rate information, ensuring that a decoder can generate a reconstructed image which is as real as possible by utilizing the provided content information, light intensity information and point light source incidence rate information, ensuring that a discriminator can accurately judge whether an input image is a real image, and extracting the encoder with the determined parameters as an image detection model.

Wherein, the loss L of the training system is:

L＝ωL ₁ +βL ₂ +σL ₃ +δL ₄ ；

wherein omega, beta, sigma and delta are weight coefficients, L ₁ 、L ₂ 、L ₃ 、L ₄ Content consistency loss, reconstruction loss, fight loss, and loop consistency loss, respectively;

content consistency loss L ₁ The method comprises the following steps:

considering the collected fixed scene, the content information is basically consistent although the illumination conditions received by the same position in different time periods are different, so that the content consistency loss L is set ₁ To ensure the extracted content informationSubstantially uniform, but allows isolated points to exist.

Reconstruction loss L ₂ The method comprises the following steps:

reconstruction loss L ₂ For ensuring that the encoder and decoder can restore the input image.

Countering loss L ₃ The method comprises the following steps:

by countering losses L ₃ On one hand, the method can promote the encoder and the decoder to continuously generate illumination migration images which are enough to confuse the discriminator, and on the other hand, the accuracy of the discriminator to identify whether the input image is a real image or not is continuously improved. The capacity of the encoder, decoder and arbiter is increased to converge by continuous optimization.

Cycle consistency penalty L ₄ The method comprises the following steps:

utilizing cyclic consistency loss L ₄ It is ensured that the encoder and decoder do not fool the arbiter by memorizing part of the training set image, thereby alleviating collapse.

In the above loss, E denotes an encoder, C denotes a decoder, D denotes a decoder,

representing content information of the block image X and the block image Y extracted by the encoder, respectively, I ^X ,I ^Y The light intensity information, α, extracted by the encoder for each of the block image X and the block image Y ^X ,α ^Y Point light sources extracted by encoder for block image X and block image Y, respectivelyIncidence information; />

Representing a first reconstructed image and a second reconstructed image generated by the decoder according to the block image X and the block image Y, respectively; l (L) _X ,l _Y ,/>

Respectively representing a block image X, a block image Y and a first illumination shift image +.>

Second illumination transition image->

A score obtained by the discriminator D; />

Respectively indicate->

Content information, light intensity information, and point light source incidence information extracted by the encoder.

S103, screening the sample images by using an image detection model to obtain an anchor image group.

In this embodiment, the screening the anchor image from the sample image by using the image detection model includes:

obtaining content information of all sample images by using an image detection model;

screening K Zhang Yangben images as initial anchor image groups, wherein each initial anchor image and the rest sample images form a cluster;

when each sample image in the cluster is taken as a central sample image, the distance between the rest sample images and the central sample image is calculated according to the content information, and the central sample image with the smallest sum of the distances from the rest sample images to the central sample point is selected as an anchor image to form an anchor image group.

In the embodiment, use is made ofThe content information extracted by the encoder measures the difference between the two images (X, Y), i.e

As a distance measure. When abnormal value detection is carried out on the test image, taking the sum of the distances of the test image and all the anchor images in the anchor image group about content information as the distance between the test image and the anchor image group;

the two norms between the content information of the test image and the content information of the anchor image are adopted as the distance between the test image and the anchor image with respect to the content information.

S104, performing outlier test on the test image according to the anchoring image group and the image detection model to obtain an outlier test result.

In an embodiment, performing outlier testing on a test image from an anchor image set and an image detection model includes:

inputting the test image into an image detection model to obtain content information of the test image;

and calculating the minimum distance between the test image and the anchor image group relative to the content information according to the content information of the test image and the content information of the anchor image, and when the minimum distance is larger than a distance threshold value, considering the detected image as an abnormal image.

Specifically, the two-norm distances between the content information of the tested image and the content information of all the anchor images are calculated according to an anomaly metric criterion, and the minimum two-norm distance and a distance threshold value in all the two-norm distances are selected for judgment so as to screen the anomaly images.

In an embodiment, the distance threshold is obtained by:

and calculating the distance between the content information of all the sample images and the content information of the anchor image group, sequencing all the distances from large to small, and selecting a distance value at 5% of the maximum distance as a distance threshold. Assuming that 100 distance values are ordered from large to small, the fifth largest distance value ordered at 5 is selected as the distance threshold.

According to the timing and fixed-point scene anomaly detection method, scene images are collected at fixed points at fixed time, the power consumption is low, interference of light and shadow changes on detection in an outdoor environment can be avoided based on the image detection model constructed by the generated countermeasure network, the detection precision of the image detection model is improved, scene anomaly value detection is automatically carried out by using the image detection model, the scene anomaly detection accuracy is improved, and meanwhile, the permanent changes generated by fixed scenes can be flexibly adapted through changing image samples of an anchoring image group. In addition, the timing fixed-point scene anomaly detection method is fully automatic in detection, and overcomes the limitation of relying on manpower and harsh working environment.

The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims

1. The timing fixed-point scene anomaly detection method is characterized by comprising the following steps of:

step 1, acquiring a scene image at fixed time and fixed point by adopting a wireless image sensor, and preprocessing the scene image to obtain a sample image and a test image, wherein preprocessing the scene image comprises the following steps:

each pixel of the scene image is marked with an abnormal pixel, the scene image with the abnormal pixel duty ratio exceeding the duty ratio threshold is screened to be the abnormal image, and the rest is the normal image; calculating the mean value and variance of the normal image about the content information channel, the light intensity information channel and the point light source incidence rate information channel, carrying out normalization processing on each channel of the normal image and the abnormal image by utilizing the mean value and variance of each channel, extracting the normalization result of the normal image as a sample image, and extracting the normalization result of the normal image and the normalization result of the abnormal image as test images;

in the training system, block images of the same corresponding positions of any two sample images are used as the input of an encoder and a discriminator;

the discriminator is used for discriminating the authenticity of the two input block images, the first illumination migration image and the second illumination migration image;

step 3, screening an anchor image group from a sample image by using an image detection model, wherein the step comprises the following steps:

obtaining content information of all sample images by using an image detection model; screening K Zhang Yangben images as initial anchor image groups, wherein each initial anchor image and the rest sample images form a cluster; when each sample image in the cluster is taken as a central sample image, calculating the distance between the rest sample images and the central sample image according to the content information, and selecting the central sample image with the smallest sum of the distances from the rest sample images to the central sample image as an anchor image to form an anchor image group;

2. The method for detecting abnormal scene at regular time and fixed point according to claim 1, wherein the encoder comprises three branches, which respectively correspond to the content information, the light intensity information and the incidence rate information of the point light source of the extracted image;

3. The timing fixed point scene anomaly detection method of claim 1 or 2, wherein the decoder comprises a first feature extraction module, a residual network module, a second feature extraction module, and a third feature extraction module;

4. The method for detecting abnormal scene at regular time and fixed point according to claim 1, wherein the loss L of the training system is:

L＝ωL ₁ +βL ₂ +σL ₃ +δL ₄ ；

content consistency loss L ₁ The method comprises the following steps:

reconstruction loss L ₂ The method comprises the following steps:

countering loss L ₃ The method comprises the following steps:

cycle consistency penalty L ₄ The method comprises the following steps:

wherein E represents an encoder, C represents a decoder, D represents a decoder,

respectively representing block imagesContent information of X and block image Y extracted by encoder, I ^X ,I ^Y The light intensity information, α, extracted by the encoder for each of the block image X and the block image Y ^X ,α ^Y Point light source incidence rate information extracted by an encoder for the block image X and the block image Y respectively; />

Representing a first reconstructed image and a second reconstructed image generated by the decoder according to the block image X and the block image Y, respectively; l (L) _X ,l _Y ,

Second illumination transition image->

A score obtained by the discriminator D; />

Respectively indicate->

5. The method for detecting anomalies in a timed fixed point scene according to claim 1, wherein said performing anomaly value testing on test images from the set of anchor images and the image detection model includes:

6. The timing and fixed point scene anomaly detection method of claim 5, wherein the distance threshold is obtained by:

and calculating the distance between the content information of all the sample images and the content information of the anchor image group, sequencing all the distances from large to small, and selecting a distance value at 5% of the maximum distance as a distance threshold.

7. The method for detecting abnormal scene at regular time and fixed point according to claim 5 or 6, wherein the sum of the distances of the test image and all the anchor images in the anchor image group with respect to the content information is taken as the distance of the test image and the anchor image group when the abnormal value of the test image is detected;