CN111325073A

CN111325073A - Monitoring video abnormal behavior detection method based on motion information clustering

Info

Publication number: CN111325073A
Application number: CN201811541700.6A
Authority: CN
Inventors: 林巍峣; 许奇超
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2020-06-23
Anticipated expiration: 2038-12-17
Also published as: CN111325073B

Abstract

A monitoring video abnormal behavior detection method based on motion information clustering is characterized in that non-overlapping continuous frames are extracted from a video each time, an optical flow amplitude image is calculated and preprocessed aiming at the continuous frames, an effective connected region in a preprocessed binary image is calculated and corrected and noise is removed, behavior recognition is carried out on an obtained target detection result, and an abnormal behavior detection result is finally obtained. The invention utilizes the light stream amplitude image to obtain the motion information in the video to carry out primary behavior positioning on the image, utilizes the target detector to eliminate the noise in the motion area obtained by the light stream amplitude image, and ensures that the obtained motion area contains people, thus leading the identification object of the behavior identification network to be more targeted, and leading the monitoring videos of different scenes to realize very high detection precision and very low false detection rate.

Description

Monitoring video abnormal behavior detection method based on motion information clustering

Technical Field

The invention relates to a technology in the field of abnormal behavior detection in a surveillance video, in particular to a method for detecting abnormal behavior of a surveillance video based on motion information clustering.

Background

The abnormal behavior detection technology under the monitoring video scene plays an important role in the security field. The prior art utilizes two technologies of an object detector and a behavior recognition network, and the method has two defects: firstly, the existing target detector cannot achieve a very ideal effect in a video of a monitored scene; second, only single individual behavior detection is considered, and for group behaviors such as fighting, this approach cannot be detected.

The existing monitoring scheme is also applied to a C3D-based network, but targets obtained by only relying on the C3D network tracking cannot be directly used for behavior detection, particularly for complex interactive behaviors, which often involve a plurality of individual targets.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a monitoring video abnormal behavior detection method based on motion information clustering, which can detect whether the video contains abnormal behaviors and the occurrence positions of the abnormal behaviors through region positioning and behavior identification.

The invention is realized by the following technical scheme:

the method comprises the steps of extracting non-overlapping continuous frames from a video each time, calculating an optical flow amplitude image aiming at the continuous frames, preprocessing the optical flow amplitude image, calculating an effective connected region in a preprocessed binary image, correcting and removing noise, and performing behavior recognition on an obtained target detection result to finally obtain an abnormal behavior detection result.

The optical flow amplitude image is formed by moving information of pixel points of every two adjacent frames of continuous frames in the x and y directions according to a formula

And (4) calculating.

The preprocessing is to calculate an average image of the optical flow amplitude image, binarize the average image, and set the pixel points higher than the gray threshold value as 1 and the pixel points lower than the gray threshold value as 0.

The effective connected region is a connected region of the average image after binarization is calculated, the connected region with the area smaller than a target threshold value is removed, and the reserved region is the effective connected region, wherein each effective connected region is represented by coordinates of the upper left part and the lower right part of the effective connected region, such as: b is_i＝(x_i1，y_i1，x_i2，x_i2)。

The target threshold is determined based on the experience of the actual size of the target in the scene, for example, if the pixel area of a person in a certain monitored scene is between 150 and 400, the threshold is set as 150.

The correcting and noise removing means: detecting the position of all persons in the intermediate frames of successive frames with the object detector, wherein the position of each person is represented by its upper left and lower right coordinates, as: p_i＝(x_i1，y_i1，x_i2，x_i2) And when a person exists in the effective communication area, the area containing the person is also brought into the effective communication area.

The identification is to identify each effective connected region by using a behavior identification network and calculate the probability of possibly abnormal behavior, and when the probability is greater than the threshold value of the abnormal behavior, the abnormal behavior of the region is judged.

The invention relates to a system for realizing the method, which comprises the following steps: preprocessing module, target detection module, action identification module, wherein: the preprocessing module is connected with the target detection module and transmits optical flow motion information, the target detection module is connected with the behavior recognition module and transmits information of a detected target area, and the behavior recognition module outputs abnormal behavior information identified by detection.

Technical effects

Compared with the prior art, the method has the advantages that the motion information in the video is obtained by utilizing the optical flow amplitude image to perform primary behavior positioning on the image, the noise in the motion area obtained by the optical flow amplitude image is eliminated by utilizing the target detector, and the obtained motion area is ensured to contain people, so that the recognition object of the behavior recognition network is more targeted, and the monitoring videos of different scenes can realize very high detection precision and very low false detection rate.

Drawings

FIG. 1 is a schematic view of the present invention;

FIG. 2 is an image of an intermediate frame of successive frames;

FIG. 3 is a schematic view of the effective communication area;

FIG. 4 is a schematic diagram of the detection results of the target detector;

FIG. 5 is a schematic illustration of the results of effective connected component correction;

fig. 6 is a diagram illustrating the result of noise removal.

Detailed Description

As shown in fig. 1, in this embodiment, abnormal behavior detection is performed on a video shot by a monitoring camera in a section of prison, and whether an abnormal behavior is contained in the section of video and a position where the abnormal behavior occurs can be detected through area positioning and behavior recognition, which includes the following specific steps:

1) detecting a target;

1.1) taking out 16 frames of T/2 without overlapping from the monitoring video each time, as shown in FIG. 2, the T/2-8 frames of images;

1.2) calculating optical flow amplitude images of two adjacent frames, wherein each optical flow amplitude image comprises two channels respectively containing the motion displacement information of each pixel point in the x direction and the y direction and calculating the optical flow amplitude images

1.3) calculating an average image of the 15 frames of optical flow amplitude images and carrying out binarization processing on the average image, setting a pixel point with a pixel value higher than 0.8 on the average image as 1, otherwise setting the pixel point as 0, and setting the mosaic part as an area with the pixel point as 1 as shown in FIG. 3;

1.4) calculating the connected region of the average image after binarization, eliminating the connected region with the area less than 200, and reserving the effective connected region, as shown in fig. 4, wherein the part framed by the square frame is two obtained effective connected regions: b ═ B_i1, 2, wherein each connected area is represented by its upper left and lower right coordinates: b is_i＝(x_i1，y_i1，x_i2，x_i2)；

1.5) for the intermediate frame T/2-8 frames of the continuous image with the frame T being 16 frames, network detection is performed by using a single-step multi-frame detector (SSD), and the positions of all objects are obtained, as shown in fig. 5, the positions of four detected persons are framed by the square frame: p ═{P_i-1, 2, 3, 4}, where each person's position is represented by its top-left and bottom-right coordinates: p_i＝(x_i1，y_i1，x_i2，x_i2)；

1.6) removing noise: judging an effective connected region B according to the detected position of the target₁And B₂If the person is in the effective communication area, the area containing the person is also included in the effective communication area; if not, the connected region remains unchanged, as shown in fig. 6, and the part outlined by the box is the finally obtained effective connected region;

the judgment means that: for effective connected region B_iAnd the position P of the target_iWhen Area (B)_i^P_i)/Area(B_i)>0.6, it indicates that the object is contained in the effective communication area, wherein: area is the Area and symbol ^ is the intersection of two rectangular areas, i.e., the common Area of the two rectangles.

2) Behavior recognition;

2.1) Using the behavior recognition network C3D based on 3D convolution operation for the two valid connected regions B obtained in step 1.6₁And B₂Behavior recognition is performed and the probability that two valid regions are likely to be anomalous behaviors is calculated: prob (B)₁)＝ 0.24，Prob(B₂)＝0.91；

2.2) judging the probability of the abnormal behavior, and when the probability value is more than 0.75, judging that the abnormal behavior occurs in the area, thus obtaining an effective area B₂Abnormal behavior occurs, as shown in the lower right box selection area of FIG. 6.

Compared with the prior art, the method can more accurately detect the target, and for the target human body which can not be detected by a common target detector, for example, the human body has small area and is shielded and deformed, under the conditions, the method can make up for the defects through the motion information of the target, so that more accurate target detection can be obtained.

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A monitoring video abnormal behavior detection method based on motion information clustering is characterized in that non-overlapping continuous frames are extracted from a video each time, an optical flow amplitude image is calculated and preprocessed aiming at the continuous frames, an effective connected region in a preprocessed binary image is calculated and corrected and noise is removed, behavior recognition is carried out on an obtained target detection result, and an abnormal behavior detection result is finally obtained.

2. The method as claimed in claim 1, wherein the optical flow amplitude image is formed by moving the pixel points of each two adjacent frames of the continuous frames in the x and y directions according to the formula

And (4) calculating.

3. The method of claim 1 wherein the preprocessing is to compute an average image of the optical flow amplitude image and binarize the average image to set pixels above the gray scale threshold to 1 and pixels below the threshold to 0.

4. The method according to claim 1, wherein the valid connected regions are connected regions of the average image after binarization, the connected regions with the area smaller than the target threshold are removed, and the remaining regions are valid connected regions, wherein each valid connected region is represented by the coordinates of the upper left and the lower right of the valid connected region.

5. The method of claim 1, wherein said rectifying and denoising comprises: and detecting the positions of all targets in the intermediate frames of the continuous frames by using a target detector, wherein the position of each target is represented by coordinates of the upper left and the lower right of the target, the coordinate position corresponds to the coordinate position of the effective connected region, and when the target exists in the effective connected region, the region containing the target is also included in the effective connected region.

6. The method of claim 5, wherein said target detector is implemented by a single-step multi-frame detector for network detection.

7. The method of claim 5, wherein said removing noise comprises: for effective connected region B_iAnd the position P of the target_iWhen Area (B)_i^P_i)/Area(B_i)>0.6, the object is considered to be contained in the effective communication area, wherein: area is the Area and symbol ^ is the intersection of two rectangular areas, i.e., the common Area of the two rectangles.

8. The method of claim 1, wherein the identifying is performed by identifying each valid connected region by using a behavior recognition network and calculating the probability of possible abnormal behavior, and when any one of the probabilities is greater than an abnormal behavior threshold value, determining that abnormal behavior occurs in the region.

9. The method according to claim 1 or 8, characterized in that said identification is carried out by means of a behavior recognition network C3D based on a 3D convolution operation.

10. A system for implementing the method of any preceding claim, comprising: preprocessing module, target detection module, action identification module, wherein: the preprocessing module is connected with the target detection module and transmits optical flow motion information, the target detection module is connected with the behavior recognition module and transmits information of a detected target area, and the behavior recognition module outputs abnormal behavior information identified by detection.