CN111008601A

CN111008601A - Fighting detection method based on video

Info

Publication number: CN111008601A
Application number: CN201911244078.7A
Authority: CN
Inventors: 吴斌; 贠周会; 谢吉朋; 王欣欣; 应艳丽; 叶超; 王旭; 黄江林; 贾楠; 赖泽玮
Original assignee: Jiangxi Hongdu Aviation Industry Group Co Ltd
Current assignee: Jiangxi Hongdu Aviation Industry Group Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-04-14

Abstract

A fighting detection method based on video, carry on the effective detection to the human body target in the video on the basis of the target detection method, then utilize the extraction algorithm of the skeleton to extract the key point information of human body, including the key point 2D coordinate information of skeleton of each human body of consecutive multiframe, construct the skeleton sequence, and construct the space-time convolution diagram on the skeleton sequence, input the space-time convolution diagram into the multilayer space-time convolution network (st-gcn) that has already trained at the same time and carry on the motion recognition; the method can accurately identify action behaviors such as fighting, can be widely applied to important public occasions such as stations, airports, supermarkets, commercial blocks, sports grounds and the like, and realizes real-time early warning.

Description

Fighting detection method based on video

Technical Field

The invention relates to the technical field of intelligent video monitoring, in particular to a fighting detection method based on video.

Background

Fighting belongs to illegal discipline behaviors, and brings negative effects on social stability and people's life, and at present, fighting events are mainly informed and alarmed afterwards, and are difficult to obtain an alarm in the first time, so that subsequent investigation and treatment are difficult. With the comprehensive application and the gradual improvement of the internet engineering, a fighting detection technology based on video analysis is gradually developed.

Currently adopted technique 1: calculating pixel motion vectors of adjacent frames of a video by using a two-dimensional image registration method, performing statistical analysis or converting the motion vectors of a human body into an energy form, extracting pixel points with disordered motion directions and rapid motion, and forming a basic characteristic of fighting by multiple people by using a pixel set as a severe motion area; through the analysis of the space-time distribution of the violent motion areas, a fighting rule based on the motion field is established, whether the violent motion areas are generated from a fighting event or not is judged, and the detection of the abnormal behaviors of the human body is realized. Patents based on such technology are: a method and a device for detecting violent crowd movement, and patent numbers are as follows: 200910242555.6 fighting detection system for audio and video combined analysis, patent No.: 200920291779.1, a drawback of such techniques: the pixel position offset on a two-dimensional image is obtained by adopting a gray-scale feature-based differential optical flow method for calculating motion vectors of adjacent frames of a video, and the gray-scale feature-based optical flow method has a plurality of defects which limit the popularization of the technology, for example, the technology is greatly influenced by the change of an illumination environment, the illumination change of a scene can cause the calculation error of a real object motion field, and the motion field when the object is shielded cannot be calculated; because the motion vectors reflecting on the two-dimensional plane are different when the target is far away from the camera and near the camera in the three-dimensional scene, the traditional method is not robust enough for judging the violent motion;

technique 2: the method combining stereoscopic vision and playground extraction is adopted, so that optical flow calculation errors caused by filtering illumination change, target motion confusion and staggered shielding are reduced, the robustness of human fighting behavior detection in a complex environment is improved, and patents based on the technology include: a fighting detection method based on stereoscopic vision motion field analysis, patent numbers: 201210304084.9, a drawback of such techniques: by adopting the method of combining stereoscopic vision and stadium extraction, the target position of the human body cannot be accurately detected under the condition of complex scenes, and greater misinformation can be generated for strenuous sports such as running and the like.

Disclosure of Invention

The invention aims to provide a fighting detection method based on video to solve the problems in the background technology.

The technical problem solved by the invention is realized by adopting the following technical scheme:

a fighting detection method based on videos comprises the following specific steps:

(1) fighting model for training

1) Collecting a sufficient number of fighting video clip samples and daily behavior samples;

2) extracting 2D coordinates of each human body skeleton key point of continuous multiple frames from the fighting video clip sample and the daily behavior sample in the step 1) by using an Alphapos or Openpos as a skeleton key point extraction algorithm, constructing a fighting skeleton sequence and a daily behavior skeleton sequence, and storing the fighting skeleton sequence and the daily behavior skeleton sequence in a local file according to an Openpos format;

3) constructing a space-time diagram by the fighting skeleton sequence and the daily behavior skeleton sequence file data in the step 2), inputting the space-time diagram into a multi-layer space-time convolution network, performing classification training by adopting a st-gcn algorithm training method, and gradually generating a high-level feature diagram;

4) generating a model file after the training in the step 3) is iterated for a plurality of times, and constructing a fighting model;

(2) detecting and recognizing fighting video

a) Acquiring a video stream: decoding and image conversion are carried out on the real-time video stream/local video file to obtain RGB image data which can be calculated;

b) acquiring a real-time video through the step a), temporarily storing the real-time video in a memory, then acquiring video data from the memory, and inputting the video into an alpha or Openpos to extract 2D coordinate information of skeleton key points of each human body of continuous multiple frames to construct a skeleton sequence, and temporarily storing the skeleton sequence in a system memory;

c) constructing the skeleton sequence obtained in the step b) into a space-time diagram, inputting the space-time diagram into the fighting model constructed in the step (1) to obtain an output result, and simultaneously performing motion recognition on the video content obtained in the step b) to judge whether fighting behaviors occur or not;

d) and d), continuously repeating the steps b) to c), detecting the fighting behavior of the real-time video, giving an alarm in real time if the fighting behavior occurs, and otherwise, entering the next round of detection.

In the invention, in the step 1), the sample quantity ratio requirement of the fighting video clip sample to the daily behavior sample is 1: 3.

In the invention, in the step 1), the requirement on the number of the sample of the fighting video clip is not lower than 200.

In the invention, in the step 1), the length of each video segment in the video segment sample is 10s, the video format avi and the video resolution is 320 × 240.

In the invention, in step 3), a standard Softmax classifier is used for classification, and the number of classes is 2.

In the present invention, in step a), the resolution of the RGB image is 320 × 240.

In the present invention, in step b), the length of the video data retrieved from the memory is 10 s.

In the invention, in the step b), after the skeleton sequence is constructed, the skeleton sequence is temporarily stored in a system memory according to an openposition format.

Has the advantages that: the method effectively integrates target detection, human skeleton key point identification, multilayer space-time convolution network and deep learning technology, is used for accurately identifying fighting behaviors and the like, greatly improves the identification accuracy, can be widely applied to important public occasions such as stations, airports, supermarkets, commercial blocks, sports grounds and the like, and can create certain economic benefit and use value.

Drawings

Fig. 1 is a schematic diagram of a process of a fighting model for training in the preferred embodiment of the present invention.

FIG. 2 is a flow chart illustrating a preferred embodiment of the present invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further explained below by combining the specific drawings.

Referring to fig. 1-2, the fighting detection method based on video comprises the following specific steps:

(1) fighting model for training

1) Collecting fighting video clip samples into one type (the video quantity is required to be wide in coverage, the scene is rich, the sample quantity is not lower than 200), collecting behaviors of daily walking, standing, running and the like as other types of samples (the sample quantity is required to be not lower than 600), and the ratio of the two types of samples is required to be close to 1:3, the length of each video section is 10s, the video format avi and the video resolution is 320 x 240;

2) extracting 2D coordinates of each human body skeleton key point of continuous multiple frames from the two types of samples in the step 1) by using an Alphapos or Openpos as a skeleton key point extraction algorithm to construct a fighting skeleton sequence and a daily behavior skeleton sequence, and storing the fighting skeleton sequence and the daily behavior skeleton sequence in a local file according to an Openpos format;

3) constructing a space-time diagram by using the fighting skeleton sequence and daily behavior skeleton sequence file data (a local file for extracting skeleton key point information through the step 2)) in the step 2), inputting the data into a multi-layer space-time convolution network (st-gcn algorithm), performing classification training by using a st-gcn algorithm training method, and gradually generating a high-level feature diagram, wherein a standard Softmax classifier is used for classification, and the number of classes is 2;

the training process needs a workstation, under an ubuntu16.04LTS system, the hardware video card is English, reaches titania, needs to be configured with cafe, pyrrch and openposition environments, and needs to download a training configuration file of the stc-gn algorithm;

(2) detecting and recognizing fighting video

a) Acquiring a video stream: decoding and image converting the real-time video stream/local video file by using ffmpeg or opencv to obtain RGB image data which can be calculated, and scaling the resolution by 320 x 240;

b) acquiring a real-time video through the step a), temporarily storing the real-time video in a memory, then acquiring video data with the duration of 10 seconds from the memory (acquiring the video from the current time to the previous 10 seconds due to the fact that the information of continuous multiple frames needs to be input by a fighting model), and inputting the video into an alpha or Openpos so as to extract 2D coordinate information of skeleton key points of each human body of the continuous multiple frames, construct a skeleton sequence, and temporarily store the skeleton sequence in a system memory according to an openpos format;

c) constructing the skeleton sequence obtained in the step b) into a space-time diagram, inputting the space-time diagram into the fighting model constructed in the step (1), obtaining an output result, performing action recognition on the video content with the duration of 10s, and judging whether fighting behavior occurs or not;

d) continuously repeating the steps b) to c), detecting the fighting behavior of the real-time video, if the fighting behavior occurs, giving an alarm in real time, and if the fighting behavior does not occur, entering the next round of detection;

detecting environmental requirements: in this example, a workstation is used, and under the ubuntu16.04LTS system, the hardware video card is English, which is Titan x, and the environments of cafe, pytorch, openpos and opencv need to be configured.

In this embodiment, the alarm module sends an alarm signal to the background server through a TCP/IP protocol.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A fighting detection method based on videos is characterized by comprising the following specific steps:

(1) fighting model for training

(2) detecting and recognizing fighting video

2. The video-based fighting detection method according to claim 1, characterized in that in the step 1), the sample quantity ratio requirement of the fighting video clip sample to the daily behavior sample is 1: 3.

3. The video-based fighting detection method according to claim 2, characterized in that in step 1), the requirement on the number of fighting video clip samples is not lower than 200.

4. The video-based fighting detection method according to claim 1, characterized in that in step 1), the length of each video segment in the video segment sample is 10s, the video format is avi, and the video resolution is 320.

5. The video-based fighting detection method according to claim 1, wherein in the step 3), a standard Softmax classifier is used for classification, and the number of classes is 2.

6. The video-based fighting detection method according to claim 1, wherein in step a), the RGB image has a resolution of 320 x 240.

7. The video-based fighting detection method according to claim 1, wherein in step b), the length of the video data retrieved from the memory is 10 s.

8. The video-based fighting detection method according to claim 1, wherein in step b), after the skeleton sequence is constructed, the skeleton sequence is temporarily stored in the system memory according to an openposition format.