CN109145822A - A kind of violence detection system of deep learning - Google Patents
A kind of violence detection system of deep learning Download PDFInfo
- Publication number
- CN109145822A CN109145822A CN201810960914.0A CN201810960914A CN109145822A CN 109145822 A CN109145822 A CN 109145822A CN 201810960914 A CN201810960914 A CN 201810960914A CN 109145822 A CN109145822 A CN 109145822A
- Authority
- CN
- China
- Prior art keywords
- image
- network model
- layer
- module
- violence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/44—Event detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of violence detection systems of deep learning, including image input module, image overall personality modnies, depth network model module, 3D network model module, D3D network model module, image output module, described image input module is for inputting image detected, described image global property module is used to extract the global characteristics of image, the depth network model module is used for extracted image overall Fusion Features in depth network model, the 3D network model module determines violence testing result based on depth network model module, the D3D network model module is for optimizing 3D network model module violence testing result, described image output module is used to export the violence testing result of optimization.The invention has the benefit that providing a kind of violence detection system of deep learning, the accuracy rate of violence detection is effectively raised.
Description
Technical field
The present invention relates to violence detection technique fields, and in particular to a kind of violence detection system of deep learning.
Background technique
With the continuous propulsion that safe city is built, video monitoring system is popularized on a large scale, utilizes intelligent video analysis skill
Art realizes that carrying out detection and early warning to similar act of violence becomes a kind of urgent need.
Existing violence detection system can be subdivided into the method based on audio, be regarded based on sound according to the difference of analysis signal
The method of frequency and method based on video.In actual monitored system, there is no installation audios to adopt for most of monitoring system
Collect equipment, in this case, the method based on audio just can not prove effective, and just based on the violence of image/video data detection
Become more researching value.In addition, explosion, bleed and the behaviors such as car chasing be usually detect vidclip in Violent scene
Effective clue, but in daily life, this class behavior is very rare.It is opposite, violence is fought and group has a fist fight row
For occur in daily life the most frequently, caused by damaging range it is most wide.
Summary of the invention
In view of the above-mentioned problems, the present invention provides a kind of violence detection system of deep learning.
The purpose of the present invention is realized using following technical scheme:
Provide a kind of violence detection system of deep learning, including image input module, image overall personality modnies, depth
Spend network model module, 3D network model module, D3D network model module, image output module, described image input module
For inputting image detected, described image global property module is used to extract the global characteristics of image, the depth net
Network model module is used for by extracted image overall Fusion Features in depth network model, the 3D network model module
Violence testing result is determined based on depth network model module, and the D3D network model module is for optimizing 3D network model mould
Block violence testing result, described image output module are used to export the violence testing result of optimization.
The invention has the benefit that providing a kind of violence detection system of deep learning, violence is effectively raised
The accuracy rate of detection.
Detailed description of the invention
The present invention will be further described with reference to the accompanying drawings, but the embodiment in attached drawing is not constituted to any limit of the invention
System, for those of ordinary skill in the art, without creative efforts, can also obtain according to the following drawings
Obtain other attached drawings.
Fig. 1 is structural schematic diagram of the invention;
Appended drawing reference:
Image input module 1, image overall personality modnies 2, depth network model module 3,3D network model module 4, D3D
Network model module 5, image output module 6.
Specific embodiment
The invention will be further described with the following Examples.
Referring to Fig. 1, a kind of violence detection system of deep learning of the present embodiment, including image input module 1, image are complete
Office personality modnies 2, depth network model module 3,3D network model module 4, D3D network model module 5, image output module
6, described image input module 1 is for inputting image detected, and described image global property module 2 is for extracting image
Global characteristics, the depth network model module 3 be used for by extracted image overall Fusion Features in depth network model
In, the 3D network model module 4 determines violence testing result, the D3D network model based on depth network model module 3
Module 5 is used to export the violence of optimization for optimizing 4 violence testing result of 3D network model module, described image output module 6
Testing result.
The violence detection system for present embodiments providing a kind of deep learning effectively raises the accurate of violence detection
Rate.
Preferably, described image global property module 2 includes data input layer, convolutional calculation layer, excitation layer, pond layer;
The data input layer pre-processes the image of input;The convolutional calculation layer is filtered to image and convolution behaviour
Make;The output result of convolutional calculation layer is done Nonlinear Mapping by the excitation layer;The pond layer is mapped for compressive non-linearity
Image afterwards;In convolutional calculation layer, by convolution operation to pretreated image zooming-out local neighborhood feature, through excessive
In stacking generation, extracts the global characteristics of image by two-dimensional convolution:
In above formula, i indicates that the convolutional layer that image is currently located, j indicate the Feature Mapping quantity of this layer,It indicates i-th
Activation value in j-th of Feature Mapping of layer at the position (x, y), this activation value is exactly the two-dimentional global characteristics of image;F () table
Show activation primitive, wherein H, W respectively indicate the height of two-dimensional convolution core, the size of width;Indicate the weight of convolution kernel,Indicate activation value of (i-1)-th layer of d-th of the Feature Mapping at (x, y), bijIndicate bias vector.
This preferred embodiment by two-dimensional convolution can easily abstract image spatial information, simple and convenient, application
It is widest in area, but be not sufficient to carry out expressed intact to video merely with these appearance features, video can be made to be lacked.
Preferably, the depth network model module 3 is by the two-dimensional convolution core in image overall personality modnies 2 by space
Extension generates three dimensional convolution kernel, and the Three dimensional convolution at pixel (x, y, z) calculates is defined as:
In above formula, i indicates that the convolutional layer that image is currently located, j indicate the Feature Mapping quantity of this layer,It indicates the
Activation value in i j-th of Feature Mapping of layer at the position (x, y, z);This activation value is exactly the three-dimensional global characteristics of image;f(·)
Indicate activation primitive, wherein H, W, T respectively indicate the size on height, width and the time dimension of three dimensional convolution kernel;
Indicate the weight of convolution kernel,Indicate activation value of (i-1)-th layer of d-th of the Feature Mapping at (x, y, z),
bijIndicate bias vector.
Compared with two-dimensional convolution formula, Three dimensional convolution all increases this preferred embodiment in the expression to convolution kernel and pixel
Time dimension is added.After convolution kernel is extended to three-dimensional space, when carrying out convolution to image sequence, convolution operation will be
It spatially and temporally carries out simultaneously, in this way after the operation of convolution sum pondization, the characteristic pattern of output remains image sequence, can be with
The space time information being effectively maintained in video.By the feature extraction of multiple Three dimensional convolutions, so that it may extract the overall situation of video
Space-time characteristic.
Preferably, the 3D network model module 4 is based on use tri- convolution of C1, C2, C3 of depth network model module 3
Layer is calculated, the three dimensional convolution kernel size that C1, C2 and C3 are used is respectively 7 × 7 × 5,5 × 5 × 5 and 3 × 3 × 3 pixels;3D network
The input of model module 4 is the image segments X being made of 40 frame consecutive images;Picture frame is normalized to after pretreatment
60 × 90 pixel sizes are simultaneously converted to grayscale image;Scalar Y scalar is exported, for indicating testing result that model inputs image,
For trained model, if in test image including Violent scene, output Y is 1, and otherwise exporting result is 0;
The 3D network model module 4 carries out pondization operation, pond to the characteristic pattern that the first two convolutional calculation layer is calculated
Change is calculate by the following formula:
In formula, δTFor sampling function,Wherein, t is the time, and T is the sampling period, n ∈ [0,
+ ∞] and n be positive integer,Indicate y-th of characteristic pattern of x layer,Indicate that y-th of characteristic pattern of x-1 layer, θ and B are respectively to multiply
Property biasing and additivity biasing,Indicate y-th of the multiplying property biasing of x layer,Indicate y-th of the additivity biasing of x layer;
The pondization operation does not carry out input feature vector graphic sequence in time dimension down-sampled using two-dimentional pondization operation
Operation, the pond factor are set to 3 × 3 and 2 × 2 pixels;
During model training, 3D network model module 4 is using mean square error as cost function, and expression formula is such as
Under:
In formula, H1(X, θ) indicates that 3D network model cost function, G are pattern function, and θ is model parameter, and X is training sample
This, N is sample size, andIt is sample physical tags, k ∈ [1, N], N ∈ [1 ,+∞];Cost function value is smaller to show model
It is better to be fitted with training set;
On the one hand this preferred embodiment can be further reduced network parameter, on the other hand also give characteristic pattern translation not
The characteristics such as change and invariable rotary, so that the feature acquired is more robust.
Preferably, the D3D network model module 5 is based on 3D network model, and input is 40 frames of 128 × 128 pixels
Consecutive image, consecutive image are Three Channel Color image;
Three dimensional convolution kernel is uniformly set as to 3 × 3 × 3 pixels, in convolution operation, D3D network model module 5 is to characteristic pattern
It is filled operation, so that the size before the characteristic pattern obtained after convolution and calculating holding;It is also used during pond
Three-dimensional pondization operation, i.e., carry out down-sampled operation to input feature vector graphic sequence in time dimension, the pond factor is set as 2 × 2 × 2
Pixel;
Cost function of the D3D network model module 5 during model training chooses negative log-likelihood function, table
It is as follows up to formula:
In formula, H2(X, θ) indicates D3D network model cost function, and G is pattern function, and θ is model parameter, XkIt is k-th
Training sample, m are classification numbers, and N is every class sample number,It is k-th of data physical tags;K ∈ [1, N], N ∈ [1 ,+∞], l
∈ [1, m], m ∈ [1 ,+∞].
This preferred embodiment uses more complicated structure, therefore the image data dimension handled can be higher, in this way
The extraction that can accelerate image temporal information removes bulk redundancy therein.
Violence detection is carried out using the violence detection system of deep learning of the present invention, 5 detection scenes is chosen and is tested,
It respectively detects scene 1, detect scene 2, detection scene 3, detection scene 4, detection scene 5, to violence Detection accuracy and cruelly
Power detection speed is counted, and is compared compared with violence detection system, generation has the beneficial effect that shown in table:
Violence Detection accuracy improves | Violence detects speed and improves | |
Detect scene 1 | 29% | 27% |
Detect scene 2 | 27% | 26% |
Detect scene 3 | 26% | 26% |
Detect scene 4 | 25% | 24% |
Detect scene 5 | 24% | 22% |
Through the above description of the embodiments, those skilled in the art can be understood that it should be appreciated that can
To realize the embodiments described herein with hardware, software, firmware, middleware, code or its any appropriate combination.For hard
Part realizes that processor can be realized in one or more the following units: specific integrated circuit (ASIC), Digital Signal Processing
Device (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), place
Manage device, controller, microcontroller, microprocessor, other electronic units or its group designed for realizing functions described herein
It closes.For software implementations, some or all of embodiment process can instruct relevant hardware come complete by computer program
At.When realization, above procedure can be stored in computer-readable medium or as one on computer-readable medium or
Multiple instruction or code are transmitted.Computer-readable medium includes computer storage media and communication media, wherein communication is situated between
Matter includes convenient for from a place to any medium of another place transmission computer program.Storage medium can be calculating
Any usable medium that machine can access.Computer-readable medium can include but is not limited to RAM, ROM, EEPROM, CD-ROM
Or other optical disc storages, magnetic disk storage medium or other magnetic storage apparatus or can be used in carry or store have instruction
Or data structure form desired program code and can be by any other medium of computer access.
Finally it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than the present invention is protected
The limitation of range is protected, although explaining in detail referring to preferred embodiment to the present invention, those skilled in the art are answered
Work as understanding, it can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the reality of technical solution of the present invention
Matter and range.
Claims (8)
1. a kind of violence detection system of deep learning, which is characterized in that including image input module, image overall characteristic mould
Block, depth network model module, 3D network model module, D3D network model module, image output module, described image input
Module is used to extract the global characteristics of image, the depth for inputting image detected, described image global property module
It spends network model module to be used for by extracted image overall Fusion Features in depth network model, the 3D network model mould
Block determines violence testing result based on depth network model module, and the D3D network model module is for optimizing 3D network model
Module violence testing result, described image output module are used to export the violence testing result of optimization.
2. the violence detection system of deep learning according to claim 1, which is characterized in that described image global property mould
Block includes data input layer, convolutional calculation layer, excitation layer, pond layer;The data input layer locates the image of input in advance
Reason;The convolutional calculation layer is filtered to image and convolution operation;Output result of the excitation layer convolutional calculation layer
Do Nonlinear Mapping;The pond layer is for the image after compressive non-linearity mapping.
3. the violence detection system of deep learning according to claim 2, which is characterized in that in convolutional calculation layer, lead to
Convolution operation is crossed to pretreated image zooming-out local neighborhood feature, by Multilevel Iteration, figure is extracted by two-dimensional convolution
The global characteristics of picture:
In above formula, i indicates that the convolutional layer that image is currently located, j indicate the Feature Mapping quantity of this layer,It indicates at i-th layer the
Activation value in j Feature Mapping at the position (x, y), this activation value are exactly the two-dimentional global characteristics of image;F () indicates activation
Function, wherein H, W respectively indicate the height of two-dimensional convolution core, the size of width;Indicate the weight of convolution kernel,Indicate activation value of (i-1)-th layer of d-th of the Feature Mapping at (x, y), bijIndicate bias vector.
4. the violence detection system of deep learning according to claim 3, which is characterized in that the depth network model mould
Two-dimensional convolution core in image overall personality modnies is generated three dimensional convolution kernel by spatial spread by block, at pixel (x, y, z)
The Three dimensional convolution at place calculates is defined as:
In above formula, i indicates that the convolutional layer that image is currently located, j indicate the Feature Mapping quantity of this layer,It indicates at i-th layer
Activation value in j-th of Feature Mapping at the position (x, y, z);This activation value is exactly the three-dimensional global characteristics of image;F () is indicated
Activation primitive, wherein H, W, T respectively indicate the size on height, width and the time dimension of three dimensional convolution kernel;It indicates
The weight of convolution kernel,Indicate activation value of (i-1)-th layer of d-th of the Feature Mapping at (x, y, z), bijIt indicates
Bias vector.
5. the violence detection system of deep learning according to claim 4, which is characterized in that the 3D network model module
Tri- convolutional calculation layers of C1, C2, C3, the three dimensional convolution kernel size that C1, C2 and C3 are used are used based on depth network model module
The pixel of respectively 7 × 7 × 5,5 × 5 × 5 and 3 × 3 × 3.
6. the violence detection system of deep learning according to claim 5, which is characterized in that the 3D network model module
Input be the image segments X being made of 40 frame consecutive images;Picture frame is normalized to 60 × 90 pixels after pretreatment
Size is simultaneously converted to grayscale image;Scalar Y is exported, for indicating testing result that model inputs image, for trained mould
Type, if in test image including Violent scene, output Y is 1, and otherwise exporting result is 0.
7. the violence detection system of deep learning according to claim 6, which is characterized in that the 3D network model module
Pondization operation is carried out to the characteristic pattern that the first two convolutional calculation layer is calculated, pond is calculate by the following formula:
In formula, δTFor sampling function,Wherein, t is the time, and T is sampling period, n ∈ [0 ,+∞]
And n is positive integer,Indicate y-th of characteristic pattern of x layer,Indicate that y-th of characteristic pattern of x-1 layer, θ and B are respectively the biasing of multiplying property
It is biased with additivity,Indicate y-th of the multiplying property biasing of x layer,Indicate y-th of the additivity biasing of x layer.
8. the violence detection system of deep learning according to claim 7, which is characterized in that the pondization operation uses two
Wei Chiization operation does not carry out down-sampled operation to input feature vector graphic sequence in time dimension, the pond factor is set to 3 × 3
With 2 × 2 pixels.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810960914.0A CN109145822A (en) | 2018-08-22 | 2018-08-22 | A kind of violence detection system of deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810960914.0A CN109145822A (en) | 2018-08-22 | 2018-08-22 | A kind of violence detection system of deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109145822A true CN109145822A (en) | 2019-01-04 |
Family
ID=64790766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810960914.0A Withdrawn CN109145822A (en) | 2018-08-22 | 2018-08-22 | A kind of violence detection system of deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145822A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111091060A (en) * | 2019-11-20 | 2020-05-01 | 吉林大学 | Deep learning-based fall and violence detection method |
CN111191528A (en) * | 2019-12-16 | 2020-05-22 | 江苏理工学院 | Campus violent behavior detection system and method based on deep learning |
CN111860064A (en) * | 2019-04-30 | 2020-10-30 | 杭州海康威视数字技术股份有限公司 | Target detection method, device and equipment based on video and storage medium |
CN112287754A (en) * | 2020-09-23 | 2021-01-29 | 济南浪潮高新科技投资发展有限公司 | Violence detection method, device, equipment and medium based on neural network |
-
2018
- 2018-08-22 CN CN201810960914.0A patent/CN109145822A/en not_active Withdrawn
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860064A (en) * | 2019-04-30 | 2020-10-30 | 杭州海康威视数字技术股份有限公司 | Target detection method, device and equipment based on video and storage medium |
CN111860064B (en) * | 2019-04-30 | 2023-10-20 | 杭州海康威视数字技术股份有限公司 | Video-based target detection method, device, equipment and storage medium |
CN111091060A (en) * | 2019-11-20 | 2020-05-01 | 吉林大学 | Deep learning-based fall and violence detection method |
CN111091060B (en) * | 2019-11-20 | 2022-11-04 | 吉林大学 | Fall and violence detection method based on deep learning |
CN111191528A (en) * | 2019-12-16 | 2020-05-22 | 江苏理工学院 | Campus violent behavior detection system and method based on deep learning |
CN111191528B (en) * | 2019-12-16 | 2024-02-23 | 江苏理工学院 | Campus violence behavior detection system and method based on deep learning |
CN112287754A (en) * | 2020-09-23 | 2021-01-29 | 济南浪潮高新科技投资发展有限公司 | Violence detection method, device, equipment and medium based on neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145822A (en) | A kind of violence detection system of deep learning | |
CN110120020A (en) | A kind of SAR image denoising method based on multiple dimensioned empty residual error attention network | |
CN106600557B (en) | PSF estimation method based on mixed Gauss model and sparse constraint | |
CN106133788A (en) | Process the image processing apparatus of digital picture | |
CN110222760A (en) | A kind of fast image processing method based on winograd algorithm | |
CN112801158A (en) | Deep learning small target detection method and device based on cascade fusion and attention mechanism | |
CN110222607A (en) | The method, apparatus and system of face critical point detection | |
CN112036381B (en) | Visual tracking method, video monitoring method and terminal equipment | |
CN110503651A (en) | A kind of significant object segmentation methods of image and device | |
GB2579262A (en) | Space-time memory network for locating target object in video content | |
CN109800713A (en) | The remote sensing images cloud detection method of optic increased based on region | |
CN116503399B (en) | Insulator pollution flashover detection method based on YOLO-AFPS | |
CN114677596A (en) | Remote sensing image ship detection method and device based on attention model | |
CN111339917A (en) | Method for detecting glass in real scene | |
CN116524189A (en) | High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization | |
CN109299660A (en) | A kind of Intelligent campus early warning platform | |
CN109978855A (en) | A kind of method for detecting change of remote sensing image and device | |
CN116805387A (en) | Model training method, quality inspection method and related equipment based on knowledge distillation | |
CN108960326A (en) | A kind of point cloud fast partition method and its system based on deep learning frame | |
CN112966815A (en) | Target detection method, system and equipment based on impulse neural network | |
CN115205793B (en) | Electric power machine room smoke detection method and device based on deep learning secondary confirmation | |
CN112464725A (en) | First arrival picking method and device based on deep learning network | |
Chen et al. | Alfpn: adaptive learning feature pyramid network for small object detection | |
CN114494999B (en) | Double-branch combined target intensive prediction method and system | |
CN105913427A (en) | Machine learning-based noise image saliency detecting method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190104 |