CN112149596A

CN112149596A - Abnormal behavior detection method, terminal device and storage medium

Info

Publication number: CN112149596A
Application number: CN202011049302.XA
Authority: CN
Inventors: 苏鹭梅; 陈鑫强; 李天友; 吴家俊; 黄明勇
Original assignee: Xiamen University of Technology
Current assignee: Xiamen University of Technology
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2020-12-29

Abstract

The invention relates to an abnormal behavior detection method, a terminal device and a storage medium, wherein the method comprises the following steps: collecting a video image sequence formed by a plurality of groups of continuous t +1 frame video images, and forming a training set by all the video image sequences; constructing a U-Net neural network, inputting the U-Net neural network into a video image sequence consisting of 1 st to t th frame video images, outputting a reconstructed image reconstructed by the U-Net neural network, and training the U-Net neural network through a training set to minimize the difference between the reconstructed image and the real image of the acquired t +1 th frame; and continuously reconstructing images of continuous frames in the video through the trained U-Net neural network, and judging whether abnormal behaviors exist in the video or not through the relation between the difference between the reconstructed images and the corresponding real images and a threshold value. The invention autonomously learns the classification of behavior categories from the rules of a large amount of video data, and saves the workload of labeling labels.

Description

Abnormal behavior detection method, terminal device and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to an abnormal behavior detection method, a terminal device, and a storage medium.

Background

The abnormal behavior detection technology has a wide application prospect in a security system, the real-time monitoring of the abnormal behavior at present mostly depends on monitoring personnel to carry out manual inspection through a monitoring system, but due to the fact that the monitoring personnel are fatigued, the attention cannot be concentrated for a long time, and the like, the conditions of missed detection, false detection and the like of the abnormal behavior often occur. Therefore, a system capable of automatically identifying abnormal behavior in a video will improve the efficiency of fighting against illegal crimes.

The existing abnormal behavior detection methods are mainly divided into two types:

1) the abnormal behavior detection method is based on traditional manual feature extraction. The traditional manual feature extraction method comprises the following steps: 1. sampling a video and extracting characteristics; 2. encoding the features; 3. normalizing the encoded vector; 4. and training a classifier. However, the types of abnormal behaviors that can be detected by this method are few, and many abnormal behaviors are included in a real scene, so this method is not well suited for detecting complex scenes.

(2) Abnormal behavior detection method based on deep learning. Neural networks for abnormal behavior detection in deep learning mainly include Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). The more classical approach is the dual-stream CNN (dual-stream CNN) proposed by simony et al for behavior recognition, which treats video as a sequence of images, where a spatial stream is used to compute the CNN features of an image frame, a temporal stream is used to compute the optical stream CNN features between several image frames, and finally both features are merged. Ji et al propose a 3D CNN (three-dimensional CNN) based method that adds time to a two-dimensional CNN so that a neural network can learn spatial and temporal information from a video. However, the above abnormal behavior detection method based on supervised learning only relies on a powerful behavior detection classifier and does not fully utilize prior knowledge. This method has a problem that the amount of calculation is large and the detection speed is affected. And creating labels for supervised learning requires a lot of work, the fewer labels created manually, the less data the algorithm can use for training.

Disclosure of Invention

In order to solve the above problems, the present invention provides an abnormal behavior detection method, a terminal device, and a storage medium.

The specific scheme is as follows:

an abnormal behavior detection method comprises the following steps:

s1: collecting a video image sequence formed by a plurality of groups of continuous t +1 frame video images, and forming a training set by all the video image sequences;

s2: constructing a U-Net neural network, inputting the U-Net neural network into a video image sequence consisting of 1 st to t th frame video images, outputting a reconstructed image reconstructed by the U-Net neural network, and training the U-Net neural network through a training set to minimize the difference between the reconstructed image and the real image of the acquired t +1 th frame; the training process of the U-Net neural network also comprises appearance constraint, motion constraint and information gain constraint;

s3: and continuously reconstructing images of continuous frames in the video through the trained U-Net neural network, and judging whether abnormal behaviors exist in the video or not through the relation between the difference between the reconstructed images and the corresponding real images and a threshold value.

Further, an auto-encoder formed by the U-Net neural network is divided into an encoder and a decoder, the encoder is responsible for extracting the features of the image and gradually reducing the spatial size of the pooling layer, and the decoder is responsible for reconstructing the image and restoring the details and the spatial size of the image.

Further, the appearance constraints include intensity constraints and gradient constraints; the intensity constraint is to compute the difference of all pixel values between the reconstructed image and the real image, and the gradient constraint is to compute the gradient between the reconstructed image and the real image.

Further, the motion constraint includes optical flow loss, which is used to calculate a difference between the optical flow of the reconstructed image and the optical flow of the real image.

Further, the information gain constraint is used for calculating a difference value between the information entropy of the reconstructed image and the information entropy of the real image.

Further, the difference is evaluated by the peak signal-to-noise ratio.

An abnormal behavior detection terminal device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the steps of the method described above in the embodiment of the present invention.

A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.

The invention adopts the technical scheme and has the beneficial effects that:

1. compared with supervised learning which needs a large amount of manual labeling, the method adopts an unsupervised learning method based on the self-encoder, and the classifier independently learns the classification of behavior categories from the rules of a large amount of video data instead of learning from the existing experience of human beings, so that the workload of labeling labels is saved.

2. Compared with the supervised learning which does not fully utilize the prior knowledge, the unsupervised learning method based on the self-encoder is adopted, the appearance constraint, the motion constraint and the information gain constraint are added, and the prior knowledge is fully utilized.

3. Compared with the abnormal behavior detection method for training the manual feature extraction of the specific type of abnormal behaviors, the method adopts an unsupervised learning method based on the self-encoder, utilizes a large number of videos containing normal behaviors to train the self-encoder, when the normal behaviors appear in the videos, the difference between the reconstructed images generated by the self-encoder and the real images is small, and when the abnormal behaviors appear in the videos, the difference between the reconstructed images generated by the self-encoder and the real images is large, so that the abnormal behaviors in the complex scene can be judged.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart of the method in this embodiment.

FIG. 3 is a diagram showing the structure of the U-Net neural network in this embodiment.

Fig. 4 shows the abnormal behavior detection result of the test video in section 7 in the CUHK Avenue data set in this embodiment.

FIG. 5 shows the abnormal behavior detection results of segment 7 of the test video in the UCSD Ped1 data set in this embodiment.

FIG. 6 shows the abnormal behavior detection results of segment 4 of the test video in the UCSD Ped2 data set in this embodiment.

Detailed Description

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.

The invention will now be further described with reference to the accompanying drawings and detailed description.

The first embodiment is as follows:

an embodiment of the present invention provides an abnormal behavior detection method, as shown in fig. 1 and fig. 2, the method includes the following steps:

s1: and acquiring a video image sequence consisting of a plurality of groups of continuous t +1 frame video images, and forming a training set by all the video image sequences.

For the successive t +1 frames of video images to be randomly extracted from the video, the collected video images also need to be normalized, which is 256 × 256 pixels in this embodiment.

S2: and constructing a U-Net neural network, wherein the input of the U-Net neural network is a video image sequence consisting of 1 st to t th frame video images, outputting a reconstructed image reconstructed by the U-Net neural network, and training the U-Net neural network through a training set to ensure that the difference between the reconstructed image and the real image of the acquired t +1 th frame is minimum.

The structure diagram of the U-Net neural network in this embodiment is shown in fig. 3, the structure of the U-Net is similar to a Full Convolutional Network (FCN), the U-Net is named by its left-right symmetry, and its main structure is composed of a left half encoder (including 4 down-sampling processes) and a right half decoder (including 4 up-sampling processes). The encoder performs feature extraction on the image and gradually reduces the spatial size of the pooling layer, while the decoder is used to reconstruct the image and restore the details and spatial size of the image. There is usually a cascade of processes between the encoder and the decoder, and copying the features in the encoder to the corresponding positions of the decoder can help the decoder to better determine the details of the target. The intermediate cascade avoids the problems of gradient elimination and asymmetric information of each layer of image, so that the size of a reconstructed image is the same as that of an original image.

Appearance constraint, motion constraint and information gain constraint are added in the training process of the U-Net neural network, so that the reconstructed image is closer to a real image. The U-Net neural network inputs 1 st to t th frame video images I₁，I₂，...，I_tReconstructing to generate a reconstructed image

The reconstructed image corresponds to the t +1 frame video image, and the result of network training is used for enabling the reconstructed image

The difference from the real image (t +1 th frame video image) is minimal. The added appearance constraint, the motion constraint and the information gain constraint are all used for ensuring that the reconstructed image generated by the U-Net neural network is similar to the real image as much as possible.

Three constraints are described separately below.

1) The appearance constraints include intensity constraints and gradient constraints, the intensity constraints utilizing l₂Norm distance metric reconstructed image

Image of the real image IThe similarity of elements is calculated as follows:

the gradient constraint is used to calculate the gradient between the reconstructed image and the real image, and the calculation formula is as follows:

where i, j represents the width and height of a video frame.

2) The motion constraint comprises an optical flow constraint, and the calculation formula is as follows:

where f represents a function that calculates the optical flow of the image frame.

3) And (4) information gain constraint.

The information gain constraint calculates the difference between the information entropy of the reconstructed image and the information entropy of the real image. The concept of entropy first originates from the field of thermodynamics, and shannon introduces the concept of entropy into the information theory, where entropy is used to measure the degree of uncertainty of a system. The amount of information of an event is inversely proportional to the probability of the event occurring. Abnormal behavior can be defined as the occurrence of unexpected, non-normal events in the video, so abnormal behavior as a small probability event generally results in greater information gain. The calculation formula of the information entropy is as follows:

wherein, let the sample space H contain n elementary events w₁，w₂，...，w₃，p_iRepresents w_iProbability of occurrence, 0 ≦ p_i≤1，

When the degree of disorder of the system is larger, the probability of occurrence of the event is smaller, and the information entropy is larger. Abnormal behavior is defined as an unexpected situation, so the entropy of the information can be used to detect abnormal behavior. That is, when the video frame is in normal behavior, the uncertainty of the event is small, and the information entropy is small; when abnormal behaviors exist in the video frame, the uncertainty of the event is large, and the information entropy is also large.

When the information entropy H of the reconstructed image is calculated_rAnd the information entropy H of the actual frame_aThen, subtracting the information entropy of the real image from the information entropy of the reconstructed image, wherein the difference between the two is the information gain H_mThe calculation formula of the information gain is as follows:

H_m＝H_r-H_a

In practical application, for a certain monitoring video, acquiring continuous t frames of video images of the monitoring video, inputting the continuous t frames of video images into a trained U-Net neural network, judging whether the difference between a reconstructed image generated by the reconstruction of the U-Net neural network and a t +1 th frame of real image behind the continuous t frames of video images corresponding to the monitoring video is larger than a threshold value, and if so, judging that abnormal behaviors exist in the monitoring video.

The difference in this embodiment is evaluated by peak signal to noise ratio (PSNR), which is calculated as:

the peak signal-to-noise ratio PSNR is then normalized to [0, 1 ]]Can obtain an abnormality score A_SThe calculation formula is as follows:

abnormal score A_SIndicating the possibility of abnormal behaviors in the video sequence, the higher the abnormal score is, the higher the possibility of abnormal behaviors is, and the lower the abnormal score is, the lower the possibility of abnormal behaviors is, i.e. the higher the possibility of normal behaviors is. Therefore, a threshold value is set to judge whether abnormal behaviors occur in the video sequence, when the abnormal score is larger than the threshold value, the abnormal behaviors are considered to occur in the video frame, and when the abnormal score is smaller than the threshold value, the abnormal behaviors are not considered to occur in the video frame.

Simulation experiment

This example performed experiments on the CUHK Avenue dataset and the UCSD pedistrian dataset:

(1) the ucsdpeedestrian dataset contains 2 subsets (Ped1 and Ped2), each subset corresponding to a different scene. The scene of Ped1 is a pedestrian moving towards or away from the camera, and comprises 34 training video sequences and 36 test video sequences. Each video sequence has 200 frames and a resolution of 238 x 158 pixels. The scene of Ped2 is a pedestrian moving parallel to the direction of the camera, and comprises 16 training video sequences and 12 testing video sequences. Each video sequence has 180 frames and a resolution of 360 x 240 pixels. Abnormal events in a data set are caused by two situations: the running of vehicles on sidewalks (e.g., trucks, bicycles) and abnormal pedestrian movement patterns (e.g., practicing lawns, running).

The CUHK Avenue data set contains 16 training video sequences and 21 test video sequences, with 47 number of exceptional events, including throwing objects, wandering, running, etc. The resolution of each video frame is 360 x 640 pixels. Anomalous events in the data set include throwing objects, running, etc.

In this example, Area Under the Curve (AUC) is used as an evaluation index of frame level (frame-level). The AUC is used to evaluate whether the video frame is correctly classified, and a higher value of the AUC indicates better performance of abnormal behavior detection. Table 1 shows the results of AUC for different methods:

TABLE 1

As shown in fig. 4, the result of abnormal behavior detection for the test video of paragraph 7 in the CUHK Avenue dataset is shown. A child in jumping appears in the video, and the method can detect the abnormal behavior from the video.

As shown in fig. 5, is the result of abnormal behavior detection for segment 7 of the test video in the UCSD Ped1 data set. The abnormal behavior can be detected from the video by the method of the embodiment.

As shown in fig. 6, the result of abnormal behavior detection for segment 4 test video in the UCSD Ped2 data set is shown. The method can detect the abnormal behavior from the video.

Example two:

the invention further provides an abnormal behavior detection terminal device, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor executes the computer program to realize the steps of the method embodiment of the first embodiment of the invention.

Further, as an executable scheme, the abnormal behavior detection terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The abnormal behavior detection terminal device may include, but is not limited to, a processor and a memory. It is understood by those skilled in the art that the above-mentioned structure of the abnormal behavior detection terminal device is only an example of the abnormal behavior detection terminal device, and does not constitute a limitation on the abnormal behavior detection terminal device, and may include more or less components than the above, or combine some components, or different components, for example, the abnormal behavior detection terminal device may further include an input/output device, a network access device, a bus, and the like, which is not limited in this embodiment of the present invention.

Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general-purpose processor may be a microprocessor or the processor may be any conventional processor, and the processor is a control center of the abnormal behavior detection terminal device, and various interfaces and lines are used to connect various parts of the entire abnormal behavior detection terminal device.

The memory may be configured to store the computer program and/or the module, and the processor may implement various functions of the abnormal behavior detection terminal device by executing or executing the computer program and/or the module stored in the memory and calling data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The invention also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method of an embodiment of the invention.

The abnormal behavior detection terminal device integrated module/unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An abnormal behavior detection method is characterized by comprising the following steps:

2. The abnormal behavior detection method according to claim 1, characterized in that: the self-encoder formed by the U-Net neural network is divided into an encoder and a decoder, wherein the encoder is responsible for extracting the features of the image and gradually reducing the spatial size of the pooling layer, and the decoder is responsible for reconstructing the image and recovering the details and the spatial size of the image.

3. The abnormal behavior detection method according to claim 1, characterized in that: appearance constraints include intensity constraints and gradient constraints; the intensity constraint is to compute the difference of all pixel values between the reconstructed image and the real image, and the gradient constraint is to compute the gradient between the reconstructed image and the real image.

4. The abnormal behavior detection method according to claim 1, characterized in that: the motion constraint includes optical flow losses, which are used to compute the difference between the optical flow of the reconstructed image and the optical flow of the real image.

5. The abnormal behavior detection method according to claim 1, characterized in that: the information gain constraint is used to calculate the difference between the information entropy of the reconstructed image and the information entropy of the real image.

6. The abnormal behavior detection method according to claim 1, characterized in that: the difference is evaluated by the peak signal-to-noise ratio.

7. An abnormal behavior detection terminal device, characterized in that: comprising a processor, a memory and a computer program stored in the memory and running on the processor, the processor implementing the steps of the method according to any one of claims 1 to 6 when executing the computer program.

8. A computer-readable storage medium storing a computer program, characterized in that: the computer program when executed by a processor implementing the steps of the method as claimed in any one of claims 1 to 6.