CN112418149A - Abnormal behavior detection method based on deep convolutional neural network - Google Patents
Abnormal behavior detection method based on deep convolutional neural network Download PDFInfo
- Publication number
- CN112418149A CN112418149A CN202011408898.8A CN202011408898A CN112418149A CN 112418149 A CN112418149 A CN 112418149A CN 202011408898 A CN202011408898 A CN 202011408898A CN 112418149 A CN112418149 A CN 112418149A
- Authority
- CN
- China
- Prior art keywords
- frame
- image
- optical flow
- motion
- appearance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010000117 Abnormal behaviour Diseases 0.000 title claims abstract description 32
- 238000001514 detection method Methods 0.000 title claims abstract description 25
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 13
- 230000033001 locomotion Effects 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 40
- 230000005856 abnormality Effects 0.000 claims abstract description 5
- 230000003287 optical effect Effects 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 11
- 230000002159 abnormal effect Effects 0.000 claims description 7
- 238000012544 monitoring process Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 230000036961 partial effect Effects 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000003068 static effect Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013459 approach Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- QGZKDVFQNNGYKY-UHFFFAOYSA-N Ammonia Chemical compound N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 2
- 230000002547 anomalous effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- UCKMPCXJQFINFW-UHFFFAOYSA-N Sulphide Chemical compound [S-2] UCKMPCXJQFINFW-UHFFFAOYSA-N 0.000 description 1
- 241000607479 Yersinia pestis Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 229910021529 ammonia Inorganic materials 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A method for detecting abnormal behavior based on a deep convolutional neural network, the method comprising: a1: encoding an input video frame; a2: decoding the encoded stream to obtain an appearance stream and a motion stream; a3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors. The method makes full use of the structure information and the motion information extracted from the video frame, and can accurately and efficiently finish the intelligent detection of abnormal behaviors.
Description
Technical Field
The invention relates to the field of computer vision and video detection and analysis, in particular to an abnormal behavior detection method based on a deep convolutional neural network.
Background
A practical anomaly monitoring system aims to signal in time, and identify the type of anomaly, in case of an anomaly. In general, anomaly detection can be viewed as a rough video understanding that only distinguishes anomalies from normals. Once an abnormal condition is detected, the abnormal behavior is identified and classified using further classification techniques.
The following three difficulties need to be overcome to realize online detection of abnormal behaviors in video monitoring: the algorithm can meet the real-time requirement; the algorithm can effectively utilize a long time sequence uncut video data set; the algorithm can cope with the complexity of the environment where the monitoring camera is located.
To date, image-based tasks such as image classification and object detection have revolutionized the development driven by deep learning (especially convolutional neural networks). Compared with the traditional method, the deep learning method has higher identification accuracy and stronger robustness. However, advances in video analysis are not satisfactory, suggesting that learning the characterization of spatiotemporal data is very difficult. The main difficulties are as follows: finding motion information that is apparent in video requires some new network design that has not been found and tested.
Previous research has learned features by performing convolution operations simultaneously in both the spatial and temporal dimensions. Optical flow features have wide and effective applications in video analysis. The application of optical flow to the video understanding task can explicitly and conveniently realize the modeling of motion clues. However, this approach is inefficient and tends to be costly in computing and storing the estimated optical flow.
The abnormal behavior detection method in video monitoring can be used for detecting the garbage throwing behavior. The household garbage is discarded at will, a large amount of harmful gases such as ammonia and sulfide are released, water body pollution, harm of breeding bacteria and pests and the like are caused, and the main reason of the urban environmental pollution problem is that the household garbage is discarded at will. For this reason, a household garbage classification measure is necessary. It is desirable to provide an abnormal behavior detection method based on an intelligent computer vision algorithm that can accurately and efficiently detect abnormal behaviors such as litter.
Disclosure of Invention
The main purpose of the present invention is to overcome the problems in the background art, and to provide an abnormal behavior detection method based on a deep convolutional neural network, so as to achieve accurate and efficient intelligent detection of abnormal behaviors.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for detecting abnormal behavior based on a deep convolutional neural network, the method comprising:
a1: encoding an input video frame;
a2: decoding the encoded stream to obtain an appearance stream and a motion stream;
a3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors.
Further:
the step a1 specifically includes:
a11, adding an inclusion module after inputting a layer to determine low-level characteristics;
a12: video is encoded using a convolutional auto-encoder.
In the step A11:
and an inclusion module is added behind the input layer to determine low-level features as early as possible, so that the model automatically selects proper convolution operation, and is preferably applied to processing the monitoring video shot at a fixed angle.
In the step A12:
the encoder adopts a method that a convolution self-encoder Conv-AE learns the abnormal target detection from a template of normal expression; the encoder is a sequence of layer blocks comprising three layers: convolution, batch normalization and leave-ReLU activation functions, applying convolution directly rather than using pooling layers to reduce the resolution of feature mapping;
wherein the spatial resolution of the feature map is reduced by parameterisation to support network finding information ways and further up-sampling is learned in the decoding phase.
The step a2 specifically includes:
a21: decoding the coded stream by an appearance decoder to obtain an appearance stream;
a22: and decoding the coded stream by a motion decoder to obtain the motion stream.
In the step A21:
the appearance decoder learns appearance information from a static image and outputs probability distribution of different abnormal behavior categories, wherein the appearance information comprises textures, contours and interest points; the appearance decoder is a layer block sequence, and a Dropout layer is added before the ReLU activation function of each block as a regularization method for reducing the risk of over-fitting in the training phase.
In the step A21:
for input image I and its reconstructed imageForcing the generation of an image with similar intensity for each pixel, the intensity loss being estimated as
Adding a constraint to preserve the original gradient, i.e. sharpness, in the reconstructed image, the gradient loss being defined as the difference between the absolute gradients along two spatial dimensions
Wherein x, y represent the horizontal and vertical directions of the image space, respectively, gdRepresenting the image gradient in both the horizontal and vertical directions, the final loss function of the appearance transformation is the sum of the intensity and gradient losses:
in the step A22:
the motion decoder learns motion information and predicts the probability of different abnormal behavior categories; the motion decoder is a layer block sequence, and a Dropout layer is added before a ReLU activation function of each block as a regularization method for reducing the risk of over-fitting in a training stage; and the network used by the motion decoder comprises a jump connection which can extract low-level features from the original image;
wherein a pre-trained FlowNet2 is employed to estimate optical flow;
wherein associations between U-Net subnet learning patterns and corresponding movements are employed;
distance-based losses between output optical flow and ground truth optical flow are
Wherein FtIs based on two successive frames ItAnd It+1The estimated ground truth-value optical flow,is given bytThe output of U-Net of (1);
given an input video frame I and its associated optical flow F obtained by FlowNet2, the network in the model graph produces a reconstructed frameAnd predicted optical flowDiscriminator D estimates the probability that the optical flow associated with I is the ground truth F, and the GAN objective function consists of two loss functions:
where x, y and c represent the spatial positions and corresponding channels, respectively, of the cells in the feature map output from discriminator D, and the λ value is the weight associated with the partial loss in the model; GAN is optimized by alternately minimizing the two GAN losses to indicate the efficiency of motion prediction.
The step a3 specifically includes:
a fractional estimation scheme is used in which only a small region is considered instead of the entire frame;
where partial scores are defined that are respectively estimated on two model streams sharing the same patch position:
where P represents an image patch, | P | is the number of pixels thereof, I and j represent pixel indices in the horizontal and vertical directions of the image P, respectively, Ii,jIs the value of the input image at i, j,is the value of its reconstructed image at i, j, Fi,jIs the ground truth optical flow at i, j,is the output, S, of U-Net given at i, jIAnd SFRespectively representing the fraction of the original image and the fraction of the optical flow; the frame-level score is then computed as a weighted combination of two part scores:
wherein, wFAnd wIIs a weight, λ, calculated from training dataSIs the contribution of the control portion score to the sum,is to consider S in the frameFThe highest value patch, namely:
weighting wFAnd wIEstimated as the inverse of the average score of the training data for n images:
where, i represents an image index,an optical flow score representing the ith image,consider the ith image S in the frameFThe patch with the highest value.
Normalizing the frame level score of each evaluation video, the final frame level score being
Where t is the frame index in a video containing m frames, StDenotes the fraction of the t-th frame, max (S)1...m) Represents the maximum value of the scores of all the frames,i.e. the normalized frame fraction.
A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method.
The invention has the following beneficial effects:
the invention provides an abnormal behavior detection method based on a deep convolutional neural network. The method makes full use of the structure information and the motion information extracted from the video frame, and can accurately and efficiently finish the intelligent detection of abnormal behaviors. In a preferred embodiment, the deep convolutional neural network combines a convolutional auto-encoder (Conv-AE) and U-Net, so that each stream contributes to the task of detecting outlier frames. Usually the network depth is a carefully selected hyper-parameter, and in order to mitigate the influence of the network depth on the accuracy, it is preferable that the method integrates a tuned inclusion module after the input layer. The method further provides a patch-based approach for evaluating the framework-level normalization score that reduces the effects of model output noise. Compared with other high-level methods, the method has obvious competitive advantages in the operation effect of the reference data set.
Drawings
FIG. 1 is a flow chart of an abnormal behavior detection method based on a deep convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a model including spatial resolution of feature mapping according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
The embodiment of the invention provides an abnormal behavior detection method based on a deep convolutional neural network, which mainly comprises the following steps: after an input video passes through an encoder formed by a series of sub-modules, an appearance decoder and a motion decoder are respectively used for obtaining an appearance stream and a motion stream, and finally an abnormal detection module is used for judging whether the input video has abnormal behaviors. The invention can be used for detecting abnormal behaviors such as garbage throwing and the like. Referring to fig. 1 and 2, the method includes the steps of:
a1: an input video frame is encoded. The encoder comprises an inclusion module, a convolution module, a batch standardization module and an activation module;
a2: decoding the coded stream, and obtaining an appearance stream through an appearance decoder; and obtaining the motion stream through a motion decoder.
A3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors.
In particular embodiments, when performing the above steps, the following may be followed. It should be noted that the specific methods employed in the practice are merely illustrative, and the scope of the present invention includes, but is not limited to, the following methods.
A1: an input video frame is encoded.
The encoder in the preferred embodiment includes an inclusion, convolution, batch normalization, activation module.
The network proposed in the embodiments comprises an encoding-decoding architecture, which creates a bottleneck. Deep-level structures may omit features critical to decoding; conversely, a shallow level network may lose high levels of abstract information. The inclusion module was originally developed to allow the convolutional neural network to automatically determine the size of the filter. Preferably, the method uses an inclusion module to enable the model to automatically select a suitable convolution operation.
Some embodiments are mainly applied to processing surveillance videos shot at fixed angles. If a convolutional layer with a predefined size is added behind an input layer, the information extracted from the target is different along with the change of the distance, and the influence is also transferred to the next layer, so that the method adds an inclusion module behind the input layer to determine the low-level features as early as possible. The use of the inclusion module also significantly reduces the amount of computation compared to other approaches.
The convolutional auto-encoder (Conv-AE) used in some embodiments may learn the method of detecting abnormal objects from templates of normal performance. The convolutional self-encoder includes an encoder and a decoder.
The encoder consists of a series of blocks, including three layers: convolution, batch normalization and leak-ReLU activation functions. Some embodiments apply convolution directly rather than using pooling layers to reduce the resolution of the feature map. This parameterization supports the network to find an information way to reduce the spatial resolution of the feature map and to learn further up-sampling in the decoding phase.
A2: decoding the coded stream, and obtaining an appearance stream through an appearance decoder; and obtaining the motion stream through a motion decoder.
The decoder is a sequence of layer blocks, each block having a Dropout layer added before the ReLU activation function, as a regularization method to reduce the risk of over-fitting during the training phase.
The appearance decoder can effectively learn appearance information such as textures, outlines, interest points and the like from the static image and output probability distribution of different abnormal behavior categories. The motion decoder can effectively learn motion information and predict the probability of different abnormal behavior categories.
The Conv-AE support used in some embodiments detects anomalous objects within an input frame by learning common appearance templates in normal events. Since Conv-AE learns common appearance patterns of normal events, we consider the input image I and its reconstructed imageL between2Distance. Thus, the model forces the generation of images with similar intensity for each pixel. The loss of intensity is estimated as
Using only l2One disadvantage of the loss is the blurring in the output, so we add a constraint to preserve the original gradient (i.e. sharpness) in the reconstructed image. Gradient loss is defined as the difference between the absolute gradients along two spatial dimensions
Wherein x, y represent the horizontal and vertical directions of the image space, respectively, gdRepresenting the image gradient in both the horizontal and vertical directions. The final loss function of the appearance transformation is the sum of the intensity and gradient losses:
this combination of losses provides good performance for the video prediction task.
The motion decoder can effectively learn motion information and predict the probability of different abnormal behavior categories. The difference between the motion decoder and the appearance decoder is that the network used by the motion decoder contains a skip connection that can extract low-level features (edges, tiles, etc.) from the original image.
In addition to abnormal object structure, abnormal motion of typical objects is also suitable for evaluating video frames. Each module in the encoder is to enhance the level of spatial abstraction of common objects in the training frame. Thus, the method employs an association between a U-Net sub-network learning mode and a corresponding motion.
Some embodiments employ a pre-trained FlowNet2 to estimate optical flow. The optical flow output by FlowNet2 is not only much smoother, but also preserves discontinuities in motion with sharp boundaries, as compared to other models. The use of the leak-ReLU activation in the encoder will also maintain a weak response, which helps to provide useful information to the decoder.
The U-Net subnet focuses on learning the associations between these patterns and the corresponding motions, and the ground truth optical flow used in the method is estimated by a pre-trained FlowNet 2. To reduce the effect of these outliers when learning the motion correlations, the loss between the output optical flow and its ground-truth optical flow is represented by1Distance measurement:
wherein FtIs based on two successive frames ItAnd It+1The estimated ground truth-value optical flow,is given bytAnd (4) the output of U-Net. The stream can predict the instantaneous motion of objects appearing in the video.
Except for distance-based loss LflowIn addition, another penalty is added that makes the potential distribution of predicted optical flow similar to the ground truth optical flow case.
Input video frame I and its associated optical flow F obtained given FlowNet2The network proposed in the model map (G stands for Generator) produces reconstructed framesAnd predicted optical flowAnd discriminator D estimates the probability that the optical flow associated with I is the ground truth F. The GAN objective function consists of two loss functions:
where x, y and c represent the spatial position and corresponding channel, respectively, of the cell in the feature map output from D, and the λ value is the weight associated with the fractional loss in our proposed model. Our GAN is optimized by alternately minimizing the two GAN losses. GAN is used to indicate the efficiency of motion prediction.
A3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors.
The anomaly detection model in some embodiments aims to provide a normalized score for each frame. In a correlation approach, the score is typically a quantity that measures the similarity between ground truth and the reconstructed or predicted output. The normality of each video frame is determined by comparing its score to a threshold. It is clear that due to the summing or averaging operation of all pixel positions, anomalous events occurring within small image areas may be ignored. Thus, the method proposes another fractional estimation scheme, considering only one small region, rather than the entire frame.
Partial scores are defined that are estimated separately on two model streams sharing the same patch position:
where P represents an image patch, | P | is the number of pixels thereof, I and j represent pixel indices in the horizontal and vertical directions of the image P, respectively, Ii,jIs the value of the input image at i, j,is the value of its reconstructed image at i, j, Fi,jIs the ground truth optical flow at i, j,is the output, S, of U-Net given at i, jIAnd SFThe scores of the original image and the optical flow are respectively expressed. Then, our frame-level score is computed as a weighted combination of the two part scores, as shown below:
in the formula, wFAnd wIIs a weight, λ, calculated from training dataSIs the contribution of the control portion score to the sum,is to consider S in the frameFThe highest value patch, namely:
weighting wFAnd wIEstimated as the inverse of the average score of the training data for n images:
finally, the frame-level scores of each evaluation video were normalized according to the recommendations of the relevant study.
The final frame-level score is
Where t is the frame index in a video containing m frames, StDenotes the fraction of the t-th frame, max (S)1...m) Represents the maximum value of the scores of all the frames,i.e. the normalized frame fraction.
The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.
The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.
Claims (10)
1. A method for detecting abnormal behaviors based on a deep convolutional neural network is characterized by comprising the following steps:
a1: encoding an input video frame;
a2: decoding the encoded stream to obtain an appearance stream and a motion stream;
a3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors.
2. The method according to claim 1, wherein the step a1 specifically comprises:
a11, adding an inclusion module after inputting a layer to determine low-level characteristics;
a12: video is encoded using a convolutional auto-encoder.
3. The method of claim 2, wherein in step a11:
and an inclusion module is added behind the input layer to determine low-level features as early as possible, so that the model automatically selects proper convolution operation, and is preferably applied to processing the monitoring video shot at a fixed angle.
4. The method of claim 2, wherein in step a 12:
the encoder adopts a method that a convolution self-encoder Conv-AE learns the abnormal target detection from a template of normal expression; the encoder is a sequence of layer blocks comprising three layers: convolution, batch normalization and leave-ReLU activation functions, applying convolution directly rather than using pooling layers to reduce the resolution of feature mapping;
wherein the spatial resolution of the feature map is reduced by parameterisation to support network finding information ways and further up-sampling is learned in the decoding phase.
5. The method according to claim 1, wherein the step a2 specifically comprises:
a21: decoding the coded stream by an appearance decoder to obtain an appearance stream;
a22: and decoding the coded stream by a motion decoder to obtain the motion stream.
6. The method of claim 5, wherein in step A21:
the appearance decoder learns appearance information from a static image and outputs probability distribution of different abnormal behavior categories, wherein the appearance information comprises textures, contours and interest points; wherein the appearance decoder is a layer block sequence, and a Dropout layer is added before the ReLU activation function of each block as a regularization method for reducing the risk of over-fitting in the training phase.
7. The method of claim 5, wherein in step A21:
for input image I and its reconstructed imageForcing the generation of an image with similar intensity for each pixel, the intensity loss being estimated as
Adding a constraint to preserve the original gradient, i.e. sharpness, in the reconstructed image, the gradient loss being defined as the difference between the absolute gradients along two spatial dimensions
Wherein x, y respectively represent image spaceHorizontal and vertical directions, gdRepresenting the image gradient in both the horizontal and vertical directions, the final loss function of the appearance transformation is the sum of the intensity and gradient losses:
8. the method of claim 5, wherein in step A22:
the motion decoder learns motion information and predicts the probability of different abnormal behavior categories; wherein the motion decoder is a layer block sequence, and a Dropout layer is added before a ReLU activation function of each block as a regularization method for reducing the risk of over-fitting in a training stage; and the network used by the motion decoder comprises a jump connection which can extract low-level features from the original image;
wherein a pre-trained FlowNet2 is employed to estimate optical flow;
wherein associations between U-Net subnet learning patterns and corresponding movements are employed;
distance-based losses between output optical flow and ground truth optical flow are
Wherein FtIs based on two successive frames ItAnd It+1The estimated ground truth-value optical flow,is given bytThe output of U-Net of (1);
given an input video frame I and its associated optical flow F obtained by FlowNet2, the network in the model graph produces a reconstructed frameAnd predicted optical flowDiscriminator D is used to estimate the probability that the optical flow associated with I is the ground truth F, and the GAN objective function consists of two loss functions:
where x, y and c represent the spatial positions and corresponding channels, respectively, of the cells in the feature map output from discriminator D, and the λ value is the weight associated with the partial loss in the model; GAN is optimized by alternately minimizing the two GAN losses to indicate the efficiency of motion prediction.
9. The method according to any one of claims 1 to 8, wherein step a3 specifically comprises:
a fractional estimation scheme is used in which only a small region is considered instead of the entire frame;
where partial scores are defined that are respectively estimated on two model streams sharing the same patch position:
where P represents an image patch, | P | is the number of pixels thereof, I and j represent pixel indices in the horizontal and vertical directions of the image P, respectively, Ii,jIs the value of the input image at i, j,is the value of its reconstructed image at i, j, Fi,jIs the ground truth optical flow at i, j,is the output of U-Net, S, given at i, jIAnd SFRespectively representing the fraction of the original image and the fraction of the optical flow; the frame-level score is then computed as a weighted combination of two part scores:
wherein, wFAnd wIIs a weight, λ, calculated from training dataSIs the contribution of the control portion score to the sum,is to consider S in the frameFThe highest value patch, namely:
weighting wFAnd wIEstimated as the inverse of the average score of the training data for n images:
where, i represents an image index,an optical flow score representing the ith image,consider the ith image S in the frameFThe patch with the highest value.
Normalizing the frame level score of each evaluation video, the final frame level score being
10. A computer-readable storage medium storing computer instructions, wherein the computer instructions, when executed by a processor, implement the method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011408898.8A CN112418149A (en) | 2020-12-04 | 2020-12-04 | Abnormal behavior detection method based on deep convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011408898.8A CN112418149A (en) | 2020-12-04 | 2020-12-04 | Abnormal behavior detection method based on deep convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112418149A true CN112418149A (en) | 2021-02-26 |
Family
ID=74830341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011408898.8A Pending CN112418149A (en) | 2020-12-04 | 2020-12-04 | Abnormal behavior detection method based on deep convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112418149A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343757A (en) * | 2021-04-23 | 2021-09-03 | 重庆七腾科技有限公司 | Space-time anomaly detection method based on convolution sparse coding and optical flow |
CN115078894A (en) * | 2022-08-22 | 2022-09-20 | 广东电网有限责任公司肇庆供电局 | Method, device and equipment for detecting abnormity of electric power machine room and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919032A (en) * | 2019-01-31 | 2019-06-21 | 华南理工大学 | A kind of video anomaly detection method based on action prediction |
US20200013148A1 (en) * | 2018-07-06 | 2020-01-09 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for Detecting Motion Anomalies in Video |
CN110705376A (en) * | 2019-09-11 | 2020-01-17 | 南京邮电大学 | Abnormal behavior detection method based on generative countermeasure network |
-
2020
- 2020-12-04 CN CN202011408898.8A patent/CN112418149A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200013148A1 (en) * | 2018-07-06 | 2020-01-09 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for Detecting Motion Anomalies in Video |
CN109919032A (en) * | 2019-01-31 | 2019-06-21 | 华南理工大学 | A kind of video anomaly detection method based on action prediction |
CN110705376A (en) * | 2019-09-11 | 2020-01-17 | 南京邮电大学 | Abnormal behavior detection method based on generative countermeasure network |
Non-Patent Citations (1)
Title |
---|
TRONG-NGUYEN NGUYEN 等: "Anomaly Detection in Video Sequence with Appearance-Motion Correspondence", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION(ICCV)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343757A (en) * | 2021-04-23 | 2021-09-03 | 重庆七腾科技有限公司 | Space-time anomaly detection method based on convolution sparse coding and optical flow |
CN115078894A (en) * | 2022-08-22 | 2022-09-20 | 广东电网有限责任公司肇庆供电局 | Method, device and equipment for detecting abnormity of electric power machine room and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109146921B (en) | Pedestrian target tracking method based on deep learning | |
CN107563372B (en) | License plate positioning method based on deep learning SSD frame | |
CN110543911B (en) | Weak supervision target segmentation method combined with classification task | |
Goyette et al. | A novel video dataset for change detection benchmarking | |
CN111553397B (en) | Cross-domain target detection method based on regional full convolution network and self-adaption | |
Mittal et al. | Motion-based background subtraction using adaptive kernel density estimation | |
US9361702B2 (en) | Image detection method and device | |
Rout | A survey on object detection and tracking algorithms | |
CN108764325A (en) | Image-recognizing method, device, computer equipment and storage medium | |
CN112597815A (en) | Synthetic aperture radar image ship detection method based on Group-G0 model | |
CN112446436A (en) | Anti-fuzzy unmanned vehicle multi-target tracking method based on generation countermeasure network | |
CN113313037A (en) | Method for detecting video abnormity of generation countermeasure network based on self-attention mechanism | |
Doulamis | Coupled multi-object tracking and labeling for vehicle trajectory estimation and matching | |
CN112418149A (en) | Abnormal behavior detection method based on deep convolutional neural network | |
CN111598928A (en) | Abrupt change moving target tracking method based on semantic evaluation and region suggestion | |
Li et al. | A review of deep learning methods for pixel-level crack detection | |
Kadim et al. | Deep-learning based single object tracker for night surveillance. | |
CN112233145A (en) | Multi-target shielding tracking method based on RGB-D space-time context model | |
Guo et al. | Surface defect detection of civil structures using images: Review from data perspective | |
Li et al. | SAR-TSCC: A novel approach for long time series SAR image change detection and pattern analysis | |
CN113129336A (en) | End-to-end multi-vehicle tracking method, system and computer readable medium | |
CN108985216B (en) | Pedestrian head detection method based on multivariate logistic regression feature fusion | |
CN116385493A (en) | Multi-moving-object detection and track prediction method in field environment | |
CN108038872B (en) | Dynamic and static target detection and real-time compressed sensing tracking research method | |
CN115331162A (en) | Cross-scale infrared pedestrian detection method, system, medium, equipment and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210226 |