CN112418149A - Abnormal behavior detection method based on deep convolutional neural network - Google Patents

Abnormal behavior detection method based on deep convolutional neural network Download PDF

Info

Publication number
CN112418149A
CN112418149A CN202011408898.8A CN202011408898A CN112418149A CN 112418149 A CN112418149 A CN 112418149A CN 202011408898 A CN202011408898 A CN 202011408898A CN 112418149 A CN112418149 A CN 112418149A
Authority
CN
China
Prior art keywords
frame
image
optical flow
motion
appearance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011408898.8A
Other languages
Chinese (zh)
Inventor
蔡畅奇
金欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202011408898.8A priority Critical patent/CN112418149A/en
Publication of CN112418149A publication Critical patent/CN112418149A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A method for detecting abnormal behavior based on a deep convolutional neural network, the method comprising: a1: encoding an input video frame; a2: decoding the encoded stream to obtain an appearance stream and a motion stream; a3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors. The method makes full use of the structure information and the motion information extracted from the video frame, and can accurately and efficiently finish the intelligent detection of abnormal behaviors.

Description

Abnormal behavior detection method based on deep convolutional neural network
Technical Field
The invention relates to the field of computer vision and video detection and analysis, in particular to an abnormal behavior detection method based on a deep convolutional neural network.
Background
A practical anomaly monitoring system aims to signal in time, and identify the type of anomaly, in case of an anomaly. In general, anomaly detection can be viewed as a rough video understanding that only distinguishes anomalies from normals. Once an abnormal condition is detected, the abnormal behavior is identified and classified using further classification techniques.
The following three difficulties need to be overcome to realize online detection of abnormal behaviors in video monitoring: the algorithm can meet the real-time requirement; the algorithm can effectively utilize a long time sequence uncut video data set; the algorithm can cope with the complexity of the environment where the monitoring camera is located.
To date, image-based tasks such as image classification and object detection have revolutionized the development driven by deep learning (especially convolutional neural networks). Compared with the traditional method, the deep learning method has higher identification accuracy and stronger robustness. However, advances in video analysis are not satisfactory, suggesting that learning the characterization of spatiotemporal data is very difficult. The main difficulties are as follows: finding motion information that is apparent in video requires some new network design that has not been found and tested.
Previous research has learned features by performing convolution operations simultaneously in both the spatial and temporal dimensions. Optical flow features have wide and effective applications in video analysis. The application of optical flow to the video understanding task can explicitly and conveniently realize the modeling of motion clues. However, this approach is inefficient and tends to be costly in computing and storing the estimated optical flow.
The abnormal behavior detection method in video monitoring can be used for detecting the garbage throwing behavior. The household garbage is discarded at will, a large amount of harmful gases such as ammonia and sulfide are released, water body pollution, harm of breeding bacteria and pests and the like are caused, and the main reason of the urban environmental pollution problem is that the household garbage is discarded at will. For this reason, a household garbage classification measure is necessary. It is desirable to provide an abnormal behavior detection method based on an intelligent computer vision algorithm that can accurately and efficiently detect abnormal behaviors such as litter.
Disclosure of Invention
The main purpose of the present invention is to overcome the problems in the background art, and to provide an abnormal behavior detection method based on a deep convolutional neural network, so as to achieve accurate and efficient intelligent detection of abnormal behaviors.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for detecting abnormal behavior based on a deep convolutional neural network, the method comprising:
a1: encoding an input video frame;
a2: decoding the encoded stream to obtain an appearance stream and a motion stream;
a3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors.
Further:
the step a1 specifically includes:
a11, adding an inclusion module after inputting a layer to determine low-level characteristics;
a12: video is encoded using a convolutional auto-encoder.
In the step A11:
and an inclusion module is added behind the input layer to determine low-level features as early as possible, so that the model automatically selects proper convolution operation, and is preferably applied to processing the monitoring video shot at a fixed angle.
In the step A12:
the encoder adopts a method that a convolution self-encoder Conv-AE learns the abnormal target detection from a template of normal expression; the encoder is a sequence of layer blocks comprising three layers: convolution, batch normalization and leave-ReLU activation functions, applying convolution directly rather than using pooling layers to reduce the resolution of feature mapping;
wherein the spatial resolution of the feature map is reduced by parameterisation to support network finding information ways and further up-sampling is learned in the decoding phase.
The step a2 specifically includes:
a21: decoding the coded stream by an appearance decoder to obtain an appearance stream;
a22: and decoding the coded stream by a motion decoder to obtain the motion stream.
In the step A21:
the appearance decoder learns appearance information from a static image and outputs probability distribution of different abnormal behavior categories, wherein the appearance information comprises textures, contours and interest points; the appearance decoder is a layer block sequence, and a Dropout layer is added before the ReLU activation function of each block as a regularization method for reducing the risk of over-fitting in the training phase.
In the step A21:
for input image I and its reconstructed image
Figure BDA0002818898410000021
Forcing the generation of an image with similar intensity for each pixel, the intensity loss being estimated as
Figure BDA0002818898410000031
Adding a constraint to preserve the original gradient, i.e. sharpness, in the reconstructed image, the gradient loss being defined as the difference between the absolute gradients along two spatial dimensions
Figure BDA0002818898410000032
Wherein x, y represent the horizontal and vertical directions of the image space, respectively, gdRepresenting the image gradient in both the horizontal and vertical directions, the final loss function of the appearance transformation is the sum of the intensity and gradient losses:
Figure BDA0002818898410000033
in the step A22:
the motion decoder learns motion information and predicts the probability of different abnormal behavior categories; the motion decoder is a layer block sequence, and a Dropout layer is added before a ReLU activation function of each block as a regularization method for reducing the risk of over-fitting in a training stage; and the network used by the motion decoder comprises a jump connection which can extract low-level features from the original image;
wherein a pre-trained FlowNet2 is employed to estimate optical flow;
wherein associations between U-Net subnet learning patterns and corresponding movements are employed;
distance-based losses between output optical flow and ground truth optical flow are
Figure BDA0002818898410000034
Wherein FtIs based on two successive frames ItAnd It+1The estimated ground truth-value optical flow,
Figure BDA0002818898410000035
is given bytThe output of U-Net of (1);
given an input video frame I and its associated optical flow F obtained by FlowNet2, the network in the model graph produces a reconstructed frame
Figure BDA0002818898410000036
And predicted optical flow
Figure BDA0002818898410000037
Discriminator D estimates the probability that the optical flow associated with I is the ground truth F, and the GAN objective function consists of two loss functions:
Figure BDA0002818898410000038
Figure BDA0002818898410000039
where x, y and c represent the spatial positions and corresponding channels, respectively, of the cells in the feature map output from discriminator D, and the λ value is the weight associated with the partial loss in the model; GAN is optimized by alternately minimizing the two GAN losses to indicate the efficiency of motion prediction.
The step a3 specifically includes:
a fractional estimation scheme is used in which only a small region is considered instead of the entire frame;
where partial scores are defined that are respectively estimated on two model streams sharing the same patch position:
Figure BDA0002818898410000041
where P represents an image patch, | P | is the number of pixels thereof, I and j represent pixel indices in the horizontal and vertical directions of the image P, respectively, Ii,jIs the value of the input image at i, j,
Figure BDA0002818898410000042
is the value of its reconstructed image at i, j, Fi,jIs the ground truth optical flow at i, j,
Figure BDA0002818898410000043
is the output, S, of U-Net given at i, jIAnd SFRespectively representing the fraction of the original image and the fraction of the optical flow; the frame-level score is then computed as a weighted combination of two part scores:
Figure BDA0002818898410000044
wherein, wFAnd wIIs a weight, λ, calculated from training dataSIs the contribution of the control portion score to the sum,
Figure BDA0002818898410000045
is to consider S in the frameFThe highest value patch, namely:
Figure BDA0002818898410000046
weighting wFAnd wIEstimated as the inverse of the average score of the training data for n images:
Figure BDA0002818898410000047
where, i represents an image index,
Figure BDA0002818898410000048
an optical flow score representing the ith image,
Figure BDA0002818898410000049
consider the ith image S in the frameFThe patch with the highest value.
Normalizing the frame level score of each evaluation video, the final frame level score being
Figure BDA00028188984100000410
Where t is the frame index in a video containing m frames, StDenotes the fraction of the t-th frame, max (S)1...m) Represents the maximum value of the scores of all the frames,
Figure BDA00028188984100000411
i.e. the normalized frame fraction.
A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method.
The invention has the following beneficial effects:
the invention provides an abnormal behavior detection method based on a deep convolutional neural network. The method makes full use of the structure information and the motion information extracted from the video frame, and can accurately and efficiently finish the intelligent detection of abnormal behaviors. In a preferred embodiment, the deep convolutional neural network combines a convolutional auto-encoder (Conv-AE) and U-Net, so that each stream contributes to the task of detecting outlier frames. Usually the network depth is a carefully selected hyper-parameter, and in order to mitigate the influence of the network depth on the accuracy, it is preferable that the method integrates a tuned inclusion module after the input layer. The method further provides a patch-based approach for evaluating the framework-level normalization score that reduces the effects of model output noise. Compared with other high-level methods, the method has obvious competitive advantages in the operation effect of the reference data set.
Drawings
FIG. 1 is a flow chart of an abnormal behavior detection method based on a deep convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a model including spatial resolution of feature mapping according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
The embodiment of the invention provides an abnormal behavior detection method based on a deep convolutional neural network, which mainly comprises the following steps: after an input video passes through an encoder formed by a series of sub-modules, an appearance decoder and a motion decoder are respectively used for obtaining an appearance stream and a motion stream, and finally an abnormal detection module is used for judging whether the input video has abnormal behaviors. The invention can be used for detecting abnormal behaviors such as garbage throwing and the like. Referring to fig. 1 and 2, the method includes the steps of:
a1: an input video frame is encoded. The encoder comprises an inclusion module, a convolution module, a batch standardization module and an activation module;
a2: decoding the coded stream, and obtaining an appearance stream through an appearance decoder; and obtaining the motion stream through a motion decoder.
A3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors.
In particular embodiments, when performing the above steps, the following may be followed. It should be noted that the specific methods employed in the practice are merely illustrative, and the scope of the present invention includes, but is not limited to, the following methods.
A1: an input video frame is encoded.
The encoder in the preferred embodiment includes an inclusion, convolution, batch normalization, activation module.
The network proposed in the embodiments comprises an encoding-decoding architecture, which creates a bottleneck. Deep-level structures may omit features critical to decoding; conversely, a shallow level network may lose high levels of abstract information. The inclusion module was originally developed to allow the convolutional neural network to automatically determine the size of the filter. Preferably, the method uses an inclusion module to enable the model to automatically select a suitable convolution operation.
Some embodiments are mainly applied to processing surveillance videos shot at fixed angles. If a convolutional layer with a predefined size is added behind an input layer, the information extracted from the target is different along with the change of the distance, and the influence is also transferred to the next layer, so that the method adds an inclusion module behind the input layer to determine the low-level features as early as possible. The use of the inclusion module also significantly reduces the amount of computation compared to other approaches.
The convolutional auto-encoder (Conv-AE) used in some embodiments may learn the method of detecting abnormal objects from templates of normal performance. The convolutional self-encoder includes an encoder and a decoder.
The encoder consists of a series of blocks, including three layers: convolution, batch normalization and leak-ReLU activation functions. Some embodiments apply convolution directly rather than using pooling layers to reduce the resolution of the feature map. This parameterization supports the network to find an information way to reduce the spatial resolution of the feature map and to learn further up-sampling in the decoding phase.
A2: decoding the coded stream, and obtaining an appearance stream through an appearance decoder; and obtaining the motion stream through a motion decoder.
The decoder is a sequence of layer blocks, each block having a Dropout layer added before the ReLU activation function, as a regularization method to reduce the risk of over-fitting during the training phase.
The appearance decoder can effectively learn appearance information such as textures, outlines, interest points and the like from the static image and output probability distribution of different abnormal behavior categories. The motion decoder can effectively learn motion information and predict the probability of different abnormal behavior categories.
The Conv-AE support used in some embodiments detects anomalous objects within an input frame by learning common appearance templates in normal events. Since Conv-AE learns common appearance patterns of normal events, we consider the input image I and its reconstructed image
Figure BDA0002818898410000061
L between2Distance. Thus, the model forces the generation of images with similar intensity for each pixel. The loss of intensity is estimated as
Figure BDA0002818898410000062
Using only l2One disadvantage of the loss is the blurring in the output, so we add a constraint to preserve the original gradient (i.e. sharpness) in the reconstructed image. Gradient loss is defined as the difference between the absolute gradients along two spatial dimensions
Figure BDA0002818898410000071
Wherein x, y represent the horizontal and vertical directions of the image space, respectively, gdRepresenting the image gradient in both the horizontal and vertical directions. The final loss function of the appearance transformation is the sum of the intensity and gradient losses:
Figure BDA0002818898410000072
this combination of losses provides good performance for the video prediction task.
The motion decoder can effectively learn motion information and predict the probability of different abnormal behavior categories. The difference between the motion decoder and the appearance decoder is that the network used by the motion decoder contains a skip connection that can extract low-level features (edges, tiles, etc.) from the original image.
In addition to abnormal object structure, abnormal motion of typical objects is also suitable for evaluating video frames. Each module in the encoder is to enhance the level of spatial abstraction of common objects in the training frame. Thus, the method employs an association between a U-Net sub-network learning mode and a corresponding motion.
Some embodiments employ a pre-trained FlowNet2 to estimate optical flow. The optical flow output by FlowNet2 is not only much smoother, but also preserves discontinuities in motion with sharp boundaries, as compared to other models. The use of the leak-ReLU activation in the encoder will also maintain a weak response, which helps to provide useful information to the decoder.
The U-Net subnet focuses on learning the associations between these patterns and the corresponding motions, and the ground truth optical flow used in the method is estimated by a pre-trained FlowNet 2. To reduce the effect of these outliers when learning the motion correlations, the loss between the output optical flow and its ground-truth optical flow is represented by1Distance measurement:
Figure BDA0002818898410000073
wherein FtIs based on two successive frames ItAnd It+1The estimated ground truth-value optical flow,
Figure BDA0002818898410000074
is given bytAnd (4) the output of U-Net. The stream can predict the instantaneous motion of objects appearing in the video.
Except for distance-based loss LflowIn addition, another penalty is added that makes the potential distribution of predicted optical flow similar to the ground truth optical flow case.
Input video frame I and its associated optical flow F obtained given FlowNet2The network proposed in the model map (G stands for Generator) produces reconstructed frames
Figure BDA0002818898410000075
And predicted optical flow
Figure BDA0002818898410000076
And discriminator D estimates the probability that the optical flow associated with I is the ground truth F. The GAN objective function consists of two loss functions:
Figure BDA0002818898410000081
Figure BDA0002818898410000082
where x, y and c represent the spatial position and corresponding channel, respectively, of the cell in the feature map output from D, and the λ value is the weight associated with the fractional loss in our proposed model. Our GAN is optimized by alternately minimizing the two GAN losses. GAN is used to indicate the efficiency of motion prediction.
A3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors.
The anomaly detection model in some embodiments aims to provide a normalized score for each frame. In a correlation approach, the score is typically a quantity that measures the similarity between ground truth and the reconstructed or predicted output. The normality of each video frame is determined by comparing its score to a threshold. It is clear that due to the summing or averaging operation of all pixel positions, anomalous events occurring within small image areas may be ignored. Thus, the method proposes another fractional estimation scheme, considering only one small region, rather than the entire frame.
Partial scores are defined that are estimated separately on two model streams sharing the same patch position:
Figure BDA0002818898410000083
where P represents an image patch, | P | is the number of pixels thereof, I and j represent pixel indices in the horizontal and vertical directions of the image P, respectively, Ii,jIs the value of the input image at i, j,
Figure BDA0002818898410000084
is the value of its reconstructed image at i, j, Fi,jIs the ground truth optical flow at i, j,
Figure BDA0002818898410000085
is the output, S, of U-Net given at i, jIAnd SFThe scores of the original image and the optical flow are respectively expressed. Then, our frame-level score is computed as a weighted combination of the two part scores, as shown below:
Figure BDA0002818898410000086
in the formula, wFAnd wIIs a weight, λ, calculated from training dataSIs the contribution of the control portion score to the sum,
Figure BDA0002818898410000087
is to consider S in the frameFThe highest value patch, namely:
Figure BDA0002818898410000088
weighting wFAnd wIEstimated as the inverse of the average score of the training data for n images:
Figure BDA0002818898410000091
finally, the frame-level scores of each evaluation video were normalized according to the recommendations of the relevant study.
The final frame-level score is
Figure BDA0002818898410000092
Where t is the frame index in a video containing m frames, StDenotes the fraction of the t-th frame, max (S)1...m) Represents the maximum value of the scores of all the frames,
Figure BDA0002818898410000093
i.e. the normalized frame fraction.
The background of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe the prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.
The foregoing is a more detailed description of the invention in connection with specific/preferred embodiments and is not intended to limit the practice of the invention to those descriptions. It will be apparent to those skilled in the art that various substitutions and modifications can be made to the described embodiments without departing from the spirit of the invention, and these substitutions and modifications should be considered to fall within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the claims.

Claims (10)

1. A method for detecting abnormal behaviors based on a deep convolutional neural network is characterized by comprising the following steps:
a1: encoding an input video frame;
a2: decoding the encoded stream to obtain an appearance stream and a motion stream;
a3: and scoring the frame through an abnormality detection module, comparing the frame with a threshold value, and judging abnormal behaviors.
2. The method according to claim 1, wherein the step a1 specifically comprises:
a11, adding an inclusion module after inputting a layer to determine low-level characteristics;
a12: video is encoded using a convolutional auto-encoder.
3. The method of claim 2, wherein in step a11:
and an inclusion module is added behind the input layer to determine low-level features as early as possible, so that the model automatically selects proper convolution operation, and is preferably applied to processing the monitoring video shot at a fixed angle.
4. The method of claim 2, wherein in step a 12:
the encoder adopts a method that a convolution self-encoder Conv-AE learns the abnormal target detection from a template of normal expression; the encoder is a sequence of layer blocks comprising three layers: convolution, batch normalization and leave-ReLU activation functions, applying convolution directly rather than using pooling layers to reduce the resolution of feature mapping;
wherein the spatial resolution of the feature map is reduced by parameterisation to support network finding information ways and further up-sampling is learned in the decoding phase.
5. The method according to claim 1, wherein the step a2 specifically comprises:
a21: decoding the coded stream by an appearance decoder to obtain an appearance stream;
a22: and decoding the coded stream by a motion decoder to obtain the motion stream.
6. The method of claim 5, wherein in step A21:
the appearance decoder learns appearance information from a static image and outputs probability distribution of different abnormal behavior categories, wherein the appearance information comprises textures, contours and interest points; wherein the appearance decoder is a layer block sequence, and a Dropout layer is added before the ReLU activation function of each block as a regularization method for reducing the risk of over-fitting in the training phase.
7. The method of claim 5, wherein in step A21:
for input image I and its reconstructed image
Figure FDA0002818898400000021
Forcing the generation of an image with similar intensity for each pixel, the intensity loss being estimated as
Figure FDA0002818898400000022
Adding a constraint to preserve the original gradient, i.e. sharpness, in the reconstructed image, the gradient loss being defined as the difference between the absolute gradients along two spatial dimensions
Figure FDA0002818898400000023
Wherein x, y respectively represent image spaceHorizontal and vertical directions, gdRepresenting the image gradient in both the horizontal and vertical directions, the final loss function of the appearance transformation is the sum of the intensity and gradient losses:
Figure FDA0002818898400000024
8. the method of claim 5, wherein in step A22:
the motion decoder learns motion information and predicts the probability of different abnormal behavior categories; wherein the motion decoder is a layer block sequence, and a Dropout layer is added before a ReLU activation function of each block as a regularization method for reducing the risk of over-fitting in a training stage; and the network used by the motion decoder comprises a jump connection which can extract low-level features from the original image;
wherein a pre-trained FlowNet2 is employed to estimate optical flow;
wherein associations between U-Net subnet learning patterns and corresponding movements are employed;
distance-based losses between output optical flow and ground truth optical flow are
Figure FDA0002818898400000025
Wherein FtIs based on two successive frames ItAnd It+1The estimated ground truth-value optical flow,
Figure FDA0002818898400000026
is given bytThe output of U-Net of (1);
given an input video frame I and its associated optical flow F obtained by FlowNet2, the network in the model graph produces a reconstructed frame
Figure FDA0002818898400000027
And predicted optical flow
Figure FDA0002818898400000028
Discriminator D is used to estimate the probability that the optical flow associated with I is the ground truth F, and the GAN objective function consists of two loss functions:
Figure FDA0002818898400000031
Figure FDA0002818898400000032
where x, y and c represent the spatial positions and corresponding channels, respectively, of the cells in the feature map output from discriminator D, and the λ value is the weight associated with the partial loss in the model; GAN is optimized by alternately minimizing the two GAN losses to indicate the efficiency of motion prediction.
9. The method according to any one of claims 1 to 8, wherein step a3 specifically comprises:
a fractional estimation scheme is used in which only a small region is considered instead of the entire frame;
where partial scores are defined that are respectively estimated on two model streams sharing the same patch position:
Figure FDA0002818898400000033
where P represents an image patch, | P | is the number of pixels thereof, I and j represent pixel indices in the horizontal and vertical directions of the image P, respectively, Ii,jIs the value of the input image at i, j,
Figure FDA0002818898400000034
is the value of its reconstructed image at i, j, Fi,jIs the ground truth optical flow at i, j,
Figure FDA0002818898400000035
is the output of U-Net, S, given at i, jIAnd SFRespectively representing the fraction of the original image and the fraction of the optical flow; the frame-level score is then computed as a weighted combination of two part scores:
Figure FDA0002818898400000036
wherein, wFAnd wIIs a weight, λ, calculated from training dataSIs the contribution of the control portion score to the sum,
Figure FDA0002818898400000037
is to consider S in the frameFThe highest value patch, namely:
Figure FDA0002818898400000038
weighting wFAnd wIEstimated as the inverse of the average score of the training data for n images:
Figure FDA0002818898400000039
where, i represents an image index,
Figure FDA00028188984000000310
an optical flow score representing the ith image,
Figure FDA00028188984000000311
consider the ith image S in the frameFThe patch with the highest value.
Normalizing the frame level score of each evaluation video, the final frame level score being
Figure FDA0002818898400000041
Where t is the frame index in a video containing m frames, StDenotes the fraction of the t-th frame, max (S)1...m) Represents the maximum value of the scores of all the frames,
Figure FDA0002818898400000042
i.e. the normalized frame fraction.
10. A computer-readable storage medium storing computer instructions, wherein the computer instructions, when executed by a processor, implement the method of any one of claims 1 to 9.
CN202011408898.8A 2020-12-04 2020-12-04 Abnormal behavior detection method based on deep convolutional neural network Pending CN112418149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011408898.8A CN112418149A (en) 2020-12-04 2020-12-04 Abnormal behavior detection method based on deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011408898.8A CN112418149A (en) 2020-12-04 2020-12-04 Abnormal behavior detection method based on deep convolutional neural network

Publications (1)

Publication Number Publication Date
CN112418149A true CN112418149A (en) 2021-02-26

Family

ID=74830341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011408898.8A Pending CN112418149A (en) 2020-12-04 2020-12-04 Abnormal behavior detection method based on deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN112418149A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343757A (en) * 2021-04-23 2021-09-03 重庆七腾科技有限公司 Space-time anomaly detection method based on convolution sparse coding and optical flow
CN115078894A (en) * 2022-08-22 2022-09-20 广东电网有限责任公司肇庆供电局 Method, device and equipment for detecting abnormity of electric power machine room and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919032A (en) * 2019-01-31 2019-06-21 华南理工大学 A kind of video anomaly detection method based on action prediction
US20200013148A1 (en) * 2018-07-06 2020-01-09 Mitsubishi Electric Research Laboratories, Inc. System and Method for Detecting Motion Anomalies in Video
CN110705376A (en) * 2019-09-11 2020-01-17 南京邮电大学 Abnormal behavior detection method based on generative countermeasure network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200013148A1 (en) * 2018-07-06 2020-01-09 Mitsubishi Electric Research Laboratories, Inc. System and Method for Detecting Motion Anomalies in Video
CN109919032A (en) * 2019-01-31 2019-06-21 华南理工大学 A kind of video anomaly detection method based on action prediction
CN110705376A (en) * 2019-09-11 2020-01-17 南京邮电大学 Abnormal behavior detection method based on generative countermeasure network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TRONG-NGUYEN NGUYEN 等: "Anomaly Detection in Video Sequence with Appearance-Motion Correspondence", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION(ICCV)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343757A (en) * 2021-04-23 2021-09-03 重庆七腾科技有限公司 Space-time anomaly detection method based on convolution sparse coding and optical flow
CN115078894A (en) * 2022-08-22 2022-09-20 广东电网有限责任公司肇庆供电局 Method, device and equipment for detecting abnormity of electric power machine room and readable storage medium

Similar Documents

Publication Publication Date Title
CN109146921B (en) Pedestrian target tracking method based on deep learning
CN107563372B (en) License plate positioning method based on deep learning SSD frame
CN110543911B (en) Weak supervision target segmentation method combined with classification task
Goyette et al. A novel video dataset for change detection benchmarking
CN111553397B (en) Cross-domain target detection method based on regional full convolution network and self-adaption
Mittal et al. Motion-based background subtraction using adaptive kernel density estimation
US9361702B2 (en) Image detection method and device
Rout A survey on object detection and tracking algorithms
CN108764325A (en) Image-recognizing method, device, computer equipment and storage medium
CN112597815A (en) Synthetic aperture radar image ship detection method based on Group-G0 model
CN112446436A (en) Anti-fuzzy unmanned vehicle multi-target tracking method based on generation countermeasure network
CN113313037A (en) Method for detecting video abnormity of generation countermeasure network based on self-attention mechanism
Doulamis Coupled multi-object tracking and labeling for vehicle trajectory estimation and matching
CN112418149A (en) Abnormal behavior detection method based on deep convolutional neural network
CN111598928A (en) Abrupt change moving target tracking method based on semantic evaluation and region suggestion
Li et al. A review of deep learning methods for pixel-level crack detection
Kadim et al. Deep-learning based single object tracker for night surveillance.
CN112233145A (en) Multi-target shielding tracking method based on RGB-D space-time context model
Guo et al. Surface defect detection of civil structures using images: Review from data perspective
Li et al. SAR-TSCC: A novel approach for long time series SAR image change detection and pattern analysis
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN108985216B (en) Pedestrian head detection method based on multivariate logistic regression feature fusion
CN116385493A (en) Multi-moving-object detection and track prediction method in field environment
CN108038872B (en) Dynamic and static target detection and real-time compressed sensing tracking research method
CN115331162A (en) Cross-scale infrared pedestrian detection method, system, medium, equipment and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210226