CN111160117A

CN111160117A - Abnormal behavior detection method based on multi-example learning modeling

Info

Publication number: CN111160117A
Application number: CN201911262679.0A
Authority: CN
Inventors: 纪刚; 周亚敏; 周萌萌; 商胜楠; 周粉粉
Original assignee: Qingdao Lianhe Chuangzhi Technology Co ltd
Current assignee: Qingdao Lianhe Chuangzhi Technology Co ltd
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-05-15

Abstract

The invention discloses a method for detecting abnormal behaviors based on multi-example learning modeling, which comprises the following steps of: step 1, marking an original monitoring video based on a multi-example learning method, wherein a marking target is a video sequence, and a video segment is an example; step 2, extracting space-time sequence characteristics; step 3, calculating abnormal scores of the video segments; step 4, constructing a multi-example highest abnormal score solving function; step 5, constructing a sequencing loss function; step 6, constructing an objective function: and 7, accessing the trained video depth abnormal score ordering model to a real-time video stream, calculating the abnormal score of the real-time video through the model, and judging whether the video is abnormal or not. The method has the advantages that the video abnormity is detected by adopting a multi-example deep learning method, the abnormity type does not need to be subdivided, the abnormal video frame does not need to be accurately detected, and the early-stage sample marking workload of model training is greatly reduced.

Description

Abnormal behavior detection method based on multi-example learning modeling

Technical Field

The invention belongs to the technical field of video abnormal behavior detection, and particularly relates to an abnormal behavior detection method based on multi-example learning modeling.

Background

The traditional monitoring system mainly realizes the safety management of public places in a manual monitoring mode and lacks real-time performance and initiative. In many cases, video monitoring is not the role of supervision because unattended management only plays a role of video backup. In addition, with the popularization and wide arrangement of monitoring cameras and the continuous development of video monitoring technology and information science, the fields of video monitoring, human-computer interaction, video searching and the like enable the automatic monitoring of abnormal behaviors to gradually become a technical field with wide application prospects. In recent years, researchers have proposed different methods for detecting abnormal behaviors, such as a pyramid optical flow method, a 3D-SIFT description operator, and a calculation method of multi-attribute fusion, in which a training model is obtained by processing each frame of video picture, a large number of training samples are required and labeling work is performed, the types of abnormal behaviors are large, the sample size is small, and the detection accuracy obtained by outputting the detection model is difficult to achieve an ideal effect.

Disclosure of Invention

In order to overcome the problems, the invention provides the abnormal behavior detection method based on the multi-example learning modeling, the abnormal type does not need to be subdivided, the abnormal video frame does not need to be accurate, and the early sample labeling workload of model training is greatly reduced. The technical scheme is that the method comprises the following steps of,

an abnormal behavior detection method based on multi-example learning modeling comprises the following steps:

step 1, marking an original monitoring video based on a multi-example learning method, wherein a marking target is a video sequence, and a video segment is an example;

step 2, extracting space-time sequence characteristics;

step 3, calculating the abnormal score s of the video segment_i；

Step 4, constructing a multi-example highest abnormal score solving function f;

step 5, constructing a sequencing loss function l (V)_m,V_n) The method is used for correcting the multi-example highest abnormal score solving function f to help an abnormal video to obtain a score higher than that of a normal video in the model training process; wherein V_mAnd V_nRespectively represent differentA normal video m and a normal video n;

step 6, constructing an objective function:

L(W)＝l(V_m,V_n)+||W||

wherein W is the model weight;

and 7, accessing the trained video depth abnormal score ordering model to a real-time video stream, calculating the abnormal score of the real-time video through the model, and judging whether the video is abnormal or not.

Preferably, in the step 2, the spatio-temporal sequence feature extraction method is that in the marked video data set, a certain video V in the data set is extracted_jIs divided into n video segments, wherein each video segment v_iIncluding continuous 16 frames without overlapping, sending each video segment into C3D convolution neural network, entering full connection layer after 8 times of 3D convolution and 5 times of 3D pooling to obtain video segment v_iSpace-time feature vector x of_iWill video V_jAfter the space-time feature vectors of all the video segments are spliced according to the time sequence, the video V is obtained_jSpace-time feature matrix X of_j。

Preferably, in step 3, the video V obtained in step 2_jSpace-time feature matrix X of_j＝(x₁,x₂,Λ,x_n) Obtaining a spatio-temporal feature vector x from any video segment_iInputting three full connection layers to obtain the abnormal score s of the video segment_iThen a certain video V_jAbnormal score S of all video segments_j＝(s₁,s₂,Λs_n) (ii) a Abnormality score s_iThe calculation formula is as follows:

where t is the set of weights for the three fully-connected layers (t)₁,t₂,t₃) (ii) a b is the deviation set of three fully connected layers (b)₁,b₂,b₃) (ii) a Phi is a three-layer fully-connected neural network.

Preferably, in step 4, the multi-example highest abnormality score solving function f is:

wherein z represents the number of videos in a given set of video data,

representing a video V_jThe corresponding label.

Preferably, in step 5, in the multi-instance learning, at least one video segment with abnormality in the video marked as positive definitely exists, and the ranking loss function l (V) is constructed by calculating the instance with the highest abnormality score in each video_m,V_n)；

Wherein, V_mAnd V_nRespectively representing an abnormal video m and a normal video n,

in order to smooth the terms in the time domain,

for sparse terms, λ₁0.00008 is the coefficient of the time domain smoothing term, λ₂0.00008 is the coefficient of the sparse term.

Preferably, in step 7, a plurality of abnormal videos in a plurality of scenes are used as positive samples based on the weak labeling mode of the video.

Advantageous effects

The video abnormity is detected by adopting a multi-example deep learning method, the abnormity type does not need to be subdivided, the abnormal video frame does not need to be accurately detected, and the early-stage sample marking workload of model training is greatly reduced.

Detailed Description

step 1, marking an original monitoring video based on a multi-example learning method, wherein a marking target is a video sequence, and a video segment is an example; when one video sequence is marked as negative, the marks of all sample data in the video sequence are negative, namely, the video sequence is a normal video, and when one video sequence is marked as positive, the marks indicate that at least one sample in the video sequence is positive, namely, the video is marked to have abnormity.

Step 2, extracting space-time sequence characteristics; in a given video data set, a certain video V in the data set_jCut into n video segments, i.e. V_j＝(v₁，v₂，K，v_n) Wherein each video band v_iThe method comprises the steps of respectively sending each video segment into a C3D convolutional neural network containing continuous 16 non-overlapping frames, entering a full connection layer after 8 times of 3D volume and 5 times of 3D pooling, and obtaining a video segment v_iSpace-time feature vector x of_iWill video V_jAfter the space-time feature vectors of all the video segments are spliced according to the time sequence, the video V is obtained_jSpace-time feature matrix X of_j。

Step 3, calculating the abnormal score s of the video segment_i(ii) a From the video V obtained in step 2_jSpace-time feature matrix X of_j＝(x₁,x₂,Λ,x_n) Obtaining a spatio-temporal feature vector x from any video segment_iInputting three full-connected layers to obtain the abnormal score s of the video segment_iThen a certain video V_jAbnormal score S of all video segments in the video_j＝(s₁,s₂,Λs_n) (ii) a Abnormality score s_iThe calculation formula is as follows:

wherein t is threeWeight set (t) of fully connected layers₁,t₂,t₃) (ii) a b is the deviation set of three fully connected layers (b)₁,b₂,b₃) (ii) a Phi is a three-layer fully-connected neural network.

Step 4, in the multi-example learning, at least one video segment containing abnormality is certainly existed in the video marked as positive, and a multi-example highest abnormality score solving function f is constructed by calculating the example with the highest abnormality score in each video;

wherein z represents the number of videos in a given set of video data,

representing a video V_jThe corresponding label.

In step 5, the abnormal score of the abnormal video is higher than that of the normal video, so a sort loss function needs to be constructed to help the abnormal video obtain a higher score than that of the normal video in the model training process. In real life, the abnormality generally occurs in a very short time, the video is taken as a multi-example object, the abnormality score between the examples is smoothly changed, the time smoothness between the examples is enhanced by minimizing the deviation of the abnormality score between the multiple examples, and the ranking loss function l (V) is constructed_m,V_n) The multi-example highest abnormal score solving function f in the step 4 is used for correcting, so that the abnormal video is helped to obtain a higher score than the normal video in the model training process;

in order to smooth the terms in the time domain,

Step 6, in order to ensure that the model obtained by constructing the network training can enable abnormal video segments in the positive sample to be predicted to obtain high scores, the invention constructs an objective function:

L(W)＝l(V_m,V_n)+||W||

w is a model weight set and comprises weights t of the convolutional neural network to be trained and deviations b of the convolutional neural network to be trained.

And 7, by extracting data characteristic representation of a time-space sequence of the video, taking multiple abnormal videos including limb conflicts, fires, explosions, thefts, the damage to public objects, the leaving of articles and dangerous driving under multiple scenes as positive samples and normal videos under multiple scenes as negative samples in a video-based weak labeling mode, training to obtain a video depth abnormal score ordering model by adopting a multi-example learning method, accessing a real-time video stream, obtaining abnormal scores of the real-time video through model calculation, and judging whether the videos are abnormal or not.

It is understood that the above description is not intended to limit the present application, and the present application is not limited to the above examples, and those skilled in the art can make variations, modifications, additions and substitutions within the spirit and scope of the present application.

Claims

1. An abnormal behavior detection method based on multi-example learning modeling is characterized by comprising the following steps:

step 2, extracting space-time sequence characteristics;

step 3, calculating the abnormal score s of the video segment_i；

Step 4, constructing a multi-example highest abnormal score solving function f;

step 5, constructing a sequencing loss function l (V)_m,V_n) The multi-example highest abnormal score solving function f in the step 4 is used for correcting, so that the abnormal video is helped to obtain a score higher than that of a normal video in the model training process; wherein V_mAnd V_nRespectively representing an abnormal video m and a normal video n;

step 6, constructing an objective function:

L(W)＝l(V_m,V_n)+||W||

wherein W is the model weight;

2. The method according to claim 1, wherein the spatio-temporal sequence feature extraction method in step 2 is to extract a certain video V in the data set from the marked video data set_jSlicing into n video segments, wherein each video segment v_iThe method comprises the steps of respectively sending each video segment into a C3D convolutional neural network containing continuous 16 non-overlapping frames, entering a full connection layer after 8 times of 3D volume and 5 times of 3D pooling, and obtaining a video segment v_iSpace-time feature vector x of_iWill video V_jAfter the space-time feature vectors of all the video segments are spliced according to the time sequence, the video V is obtained_jSpace-time feature matrix X of_j。

3. The method for detecting abnormal behaviors based on multi-instance learning modeling as claimed in claim 2, wherein in step 3, the video V obtained in step 2_jSpace-time feature matrix X of_j＝(x₁,x₂,Λ,x_n) Obtaining a spatio-temporal feature vector x from any video segment_iInputting three full-connected layers to obtain the abnormal score s of the video segment_iThen a certain video V_jAbnormal score S of all video segments in the video_j＝(s₁,s₂,Λs_n) (ii) a Abnormality score s_iThe calculation formula is as follows:

4. The method for detecting abnormal behavior based on multi-instance learning modeling as claimed in claim 3, wherein in step 4, the multi-instance highest abnormal score solving function f is:

wherein z represents the number of videos in a given set of video data,

representing a video V_jA corresponding label.

5. The method according to claim 4, wherein in the step 5, in the multi-instance learning, at least one video segment with abnormality in the video labeled as positive is definitely existed, and the ranking loss function l (V) is constructed by calculating the instance with highest abnormality score in each video_m,V_n)；

in order to smooth the terms in the time domain,

6. The method for detecting abnormal behavior based on multi-instance learning modeling as claimed in claim 1, wherein in step 7, multiple abnormal videos in multiple scenes are used as positive samples based on the weak labeling mode of the videos.