CN114694080A

CN114694080A - Detection method, system and device for monitoring violent behavior and readable storage medium

Info

Publication number: CN114694080A
Application number: CN202210415750.XA
Authority: CN
Inventors: 徐映千; 何欣楠; 唐雪萍
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2022-04-20
Filing date: 2022-04-20
Publication date: 2022-07-01

Abstract

The invention discloses a detection method, a system and a device for monitoring violent behaviors and a readable storage medium, belonging to the technical field of computer vision and comprising the following steps: step 1, constructing a violent behavior video data set; step 2, constructing a three-dimensional convolution neural network, and extracting violent behavior video data characteristics; step 3, classifying the feature data by using a multilayer perceptron; the method, the system, the device and the readable storage medium for monitoring the violent behavior replace a time-consuming and labor-consuming manual detection method, adopt a dense connection network combined with 3D convolution to extract the characteristics of violent behavior video data, and use a multilayer perceptron algorithm to classify the characteristics in the violent behavior video data extracted by the network; replacing a 2D convolution kernel in the dense connection network by the 3D convolution kernel, so that the convolution neural network has the function of extracting video features; the violent behavior videos are classified through an algorithm, and the video content shot by the camera is analyzed through a computer, so that the labor can be saved, and violent events can be prevented.

Description

Method, system and device for detecting monitoring violent behavior and readable storage medium

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a detection method, a system and a device for monitoring violent behaviors and a readable storage medium.

Background

With the development of neural network algorithms and the improvement of computer performance, the neural network algorithms have been widely applied in various fields; the monitoring camera has penetrated deep into each corner of a city, so that the aggressive pursuit behavior in the society is restrained, the safety of the society is maintained, but how to quickly and timely detect the violent behavior of the monitoring camera is large in data amount of the existing monitoring camera, and a task which cannot be completed by a manual method is required to be developed, so that the existing problem is solved.

Disclosure of Invention

The invention aims to provide a method for detecting violent behaviors of a monitoring camera, which aims to solve the problem of low detection efficiency of the violent behaviors of the monitoring camera.

In order to achieve the purpose, the invention provides the following technical scheme: a detection method for monitoring violent behaviors comprises the following steps:

step 1, constructing a violent behavior video data set;

step 2, constructing a three-dimensional convolutional neural network, and extracting violent behavior video data characteristics;

and 3, classifying the feature data by using a multilayer perceptron.

Preferably, the method for constructing the three-dimensional convolutional neural network is dense connection and 3D convolution.

Preferably, the construction of the dense connection and the 3D convolution comprises the following steps:

step 21, connecting each convolution layer in the network to construct a dense connection network; modifying the convolution kernels of the dense connection network, and replacing all 2D convolution kernels in the dense connection network with convolution kernels of the 3D convolution;

and step 22, training the neural network through the data set constructed in the step 1, so that the three-dimensional convolution neural network can extract picture features.

The method for replacing all 2D convolution kernels in the dense connection network by the convolution kernels of the 3D convolution comprises the following steps: generalizing the 2D convolution kernel to three dimensions using 3D convolution kernels, for values in (xyz) coordinates in the j-th Feature Map of the i-th 3D convolution layer

Is shown in formula 1:

where x, y, z are input sample points,

is the value in the (p.q, r) coordinate in the kth layer Feature Map where the current 3D convolution kernel is located, g (-) is the activation function, b (-) is the activation function_ijIs bias, m is the set of layer (i-1) Feature Map indices, P_iAnd Q_iLength and width of convolution kernel, Q_iIs the size of the time-sequential direction convolution kernel; and p, q and r are sampling points obtained by the input sampling points x, y and z according to the convolution definition, and omega is the weight of the sampling points.

Preferably, the building of the violent behavior video data set comprises collecting a network data set, monitoring video calling, and editing violent behavior segments of the collected video.

Preferably, in step 3, the multi-layer perceptron is trained by using the trained network output result to have a classification function, wherein the classification method used by the multi-layer perceptron is binary classification.

Preferably, the 3D convolution includes a dense connection layer 1 of 6 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional convolution kernels, a conversion layer 1 of 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional average pooling, a dense connection layer 2 of 12 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional convolution kernels, a dense connection layer 3 of 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 × 3 three-dimensional average pooling, a dense connection layer 3 of 24 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional convolution kernels, a conversion layer 3 of 1 × 7 × 7 three-dimensional global maximum pooling, a full connection layer; wherein

After translation layer 3, the activation function is as shown in equation 2; where n is the dimension of the input data, e_iIs an input value of dimension i, S_iAn output probability of dimension i;

and sending the output result of the activation function to a multi-layer perceptron for classification processing.

The invention also provides a system for detecting the monitoring violent behavior, which comprises:

the violent behavior video data set construction module is used for constructing a violent behavior video data set;

the building module of the three-dimensional convolutional neural network is used for building the three-dimensional convolutional neural network;

the data feature extraction module is used for extracting violent behavior video data features;

the data classification module is used for classifying the feature data by using a multilayer perceptron;

the dense connection and 3D convolution constructing module is used for connecting each convolution layer in the network to construct a dense connection network; modifying the convolution kernel of the dense connection network, and replacing all 2D convolution kernels in the dense connection network with the convolution kernel of the 3D convolution; and training the neural network through a Hockey lights data set and a VIolent-Flows data set, and extracting picture characteristics.

The invention also provides a detection device for monitoring violent behaviors, which comprises:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions such that the computer readable instructions, when executed by the processor, implement the method for monitoring violent behavior.

The present invention further provides a computer-readable storage medium for storing non-transitory computer-readable instructions which, when executed by a computer, cause the computer to perform the method for monitoring violent behavior.

The invention has the technical effects and advantages that: the method, the system, the device and the readable storage medium for monitoring the violent behavior replace a time-consuming and labor-consuming manual detection method, adopt a dense connection network combined with 3D convolution to extract the characteristics of violent behavior video data, and use a multilayer perceptron algorithm to classify the characteristics in the violent behavior video data extracted by the network;

replacing a 2D convolution kernel in the dense connection network by the 3D convolution kernel, so that the convolution neural network has the function of extracting video features;

the violent behavior videos are classified through the algorithm, and the video content shot by the camera is analyzed through the computer, so that the labor can be saved, the labor cost can be reduced, and the violent events can be prevented.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the construction of dense joins and 3D convolutions of the present invention;

FIG. 3 is a block diagram of the framework flow of the present invention;

FIG. 4 is a diagram of a dense connection layer structure according to the present invention;

FIG. 5 is a schematic diagram of the 3D convolution according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a detection method for monitoring violent behaviors, which comprises the following steps of:

step 1, a violent behavior video data set is constructed, and in the embodiment, videos for monitoring violent behaviors are collected in various forms: the method comprises the following steps of (1) carrying out network data set, calling actual monitoring videos and the like, and editing violent behavior segments of collected videos;

step 2, constructing an improved CNN through a dense connection and 3D convolution method, and performing feature extraction on violent behavior video data; in this embodiment, the CNN is a convolutional neural network;

in the process of extracting the characteristics of violent behavior video data, the method specifically comprises the following steps: as shown in figure 2 of the drawings, in which,

step 21, dense connection network construction characteristics are extracted to obtain CNN, and the network structure of the dense connection network is modified; as shown in fig. 4, the feature maps in each dense module are connected, and high-bottom characteristics are fused, so that the network model can better extract the high-bottom semantic features of the video;

as shown in FIG. 5, the 3D convolution will be for the value in the (x, y, z) coordinate in the j-th layer Feature Map of the i-th layer 3D convolution layer

The calculation formula of (a) is shown in formula 2:

where x is the input sample point, g (-) is the activation function, b_ijIs bias, m is the set of layer (i-1) Feature Map indices, P_iAnd Q_iIs the length and width p of the convolution kernel_nIs a distance of p from the first sampling point₀Sample point at position n, Δ p_nIs the offset, ω is the sampling point weight;

step 22, training the modified dense connection network by using the data set constructed in the step S21, and extracting picture features;

step 3, training the multi-layer perceptron by using the trained network output result to enable the multi-layer perceptron to have a classification function;

in this embodiment, the 3D-CNN is based on a dense connection network, and includes a dense connection layer 1 including 6 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional convolution kernels, a dense connection layer 1 including 1 × 1 × 1 three-dimensional convolution kernel and 3 × 3 × 3 × 3 three-dimensional average pooled, a dense connection layer 2 including 12 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 × 3 three-dimensional convolution kernels, a dense connection layer 3 including 1 × 1 × 1 three-dimensional convolution kernel and 3 × 3 × 3 three-dimensional average pooled, 24 1 × 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 × 3 three-dimensional convolution kernels, a 1 × 7 × 7 three-dimensional global maximally pooled translation layer 3, and a fully-connected layer, that is, a multilayer perceptron; wherein:

after translation layer 3, the function is activated:

the data can be sent to a multi-layer perceptron for classification processing.

a memory for storing non-transitory computer readable instructions; and

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims

1. A detection method for monitoring violent behaviors is characterized by comprising the following steps: the method comprises the following steps:

step 1, constructing a violent behavior video data set;

and 3, classifying the feature data by using a multilayer perceptron.

2. A method for monitoring violent behavior as in claim 1, comprising the steps of: the method for constructing the three-dimensional convolution neural network is dense connection and 3D convolution.

3. A method for monitoring violent behavior as in claim 1, comprising the steps of: the construction of the dense connection and the 3D convolution comprises the following steps:

step 21, connecting each convolution layer in the network to construct a dense connection network; modifying the convolution kernel of the dense connection network, and replacing all 2D convolution kernels in the dense connection network with the convolution kernel of the 3D convolution;

4. A method for monitoring violent behavior as in claim 1, comprising the steps of:the method for replacing all 2D convolution kernels in the dense connection network by the convolution kernels of the 3D convolution comprises the following steps: generalizing the 2D convolution kernel to three dimensions using 3D convolution kernels, for values in (xyz) coordinates in the j-th Feature Map of the i-th 3D convolution layer

Is shown in formula 1:

where x, y, z are input sample points,

is the value in the (p.q, r) coordinate in the kth layer Feature Map where the current 3D convolution kernel is located, g (-) is the activation function, b (-) is the activation function_1jIs bias, m is the set of (i-i) th layer Feature Map indices, P_iAnd Q₁Length and width of convolution kernel, Q_iIs the size of the time-sequential direction convolution kernel; and p, q and r are sampling points obtained by the input sampling points x, y and z according to the convolution definition, and omega is the weight of the sampling points.

5. A method for monitoring violent behavior as in claim 1, comprising the steps of: the method for constructing the violent behavior video data set comprises the steps of collecting a network data set, calling monitoring videos, and editing violent behavior segments of the collected videos.

6. A method for monitoring violent behavior as in claim 1, comprising the steps of: and 3, training the multilayer perceptron by using the trained network output result to enable the multilayer perceptron to have a classification function, wherein the classification mode used by the multilayer perceptron is binary classification.

7. A method for monitoring violent behavior as in claim 1, comprising the steps of: the 3D convolution includes a dense connection layer 1 of 6 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional convolution kernels, a conversion layer 1 of 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional average pooling, a dense connection layer 2 of 12 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional convolution kernels, a dense connection layer 3 of 1 × 1 × 1 three-dimensional convolution kernel and 3 × 3 × 3 three-dimensional average pooled conversion layer 2, 24 dense connection layers 3 of 1 × 1 × 1 three-dimensional convolution kernels and 3 × 3 × 3 three-dimensional convolution kernels, a conversion layer 3 of 1 × 7 × 7 three-dimensional global maximum pooling, a full connection layer; wherein

After translation of layer 3, the activation function is as shown in equation 2; where n is the dimension of the input data, e_iIs an input value of dimension i, S_iAn output probability of dimension i;

8. A surveillance violent behavior detection system comprising:

the dense connection and 3D convolution constructing module is used for connecting each convolution layer in the network to construct a dense connection network; modifying the convolution kernel of the dense connection network, and replacing all 2D convolution kernels in the dense connection network with the convolution kernel of the 3D convolution; and training the neural network through the constructed data set, so that the three-dimensional convolution neural network can extract picture characteristics.

9. A surveillance violence detection apparatus comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions such that the computer readable instructions, when executed by the processor, implement the method of monitoring violent behavior detection according to any one of claims 1 to 7.

10. A computer-readable storage medium storing non-transitory computer-readable instructions which, when executed by a computer, cause the computer to perform the method of monitoring violent behavior detection of any of claims 1 to 7.