CN109145822A - A kind of violence detection system of deep learning - Google Patents

A kind of violence detection system of deep learning Download PDF

Info

Publication number
CN109145822A
CN109145822A CN201810960914.0A CN201810960914A CN109145822A CN 109145822 A CN109145822 A CN 109145822A CN 201810960914 A CN201810960914 A CN 201810960914A CN 109145822 A CN109145822 A CN 109145822A
Authority
CN
China
Prior art keywords
image
network model
layer
module
violence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810960914.0A
Other languages
Chinese (zh)
Inventor
覃群英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Zheng Rong Technology Co Ltd
Original Assignee
Foshan Zheng Rong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Zheng Rong Technology Co Ltd filed Critical Foshan Zheng Rong Technology Co Ltd
Priority to CN201810960914.0A priority Critical patent/CN109145822A/en
Publication of CN109145822A publication Critical patent/CN109145822A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of violence detection systems of deep learning, including image input module, image overall personality modnies, depth network model module, 3D network model module, D3D network model module, image output module, described image input module is for inputting image detected, described image global property module is used to extract the global characteristics of image, the depth network model module is used for extracted image overall Fusion Features in depth network model, the 3D network model module determines violence testing result based on depth network model module, the D3D network model module is for optimizing 3D network model module violence testing result, described image output module is used to export the violence testing result of optimization.The invention has the benefit that providing a kind of violence detection system of deep learning, the accuracy rate of violence detection is effectively raised.

Description

A kind of violence detection system of deep learning
Technical field
The present invention relates to violence detection technique fields, and in particular to a kind of violence detection system of deep learning.
Background technique
With the continuous propulsion that safe city is built, video monitoring system is popularized on a large scale, utilizes intelligent video analysis skill Art realizes that carrying out detection and early warning to similar act of violence becomes a kind of urgent need.
Existing violence detection system can be subdivided into the method based on audio, be regarded based on sound according to the difference of analysis signal The method of frequency and method based on video.In actual monitored system, there is no installation audios to adopt for most of monitoring system Collect equipment, in this case, the method based on audio just can not prove effective, and just based on the violence of image/video data detection Become more researching value.In addition, explosion, bleed and the behaviors such as car chasing be usually detect vidclip in Violent scene Effective clue, but in daily life, this class behavior is very rare.It is opposite, violence is fought and group has a fist fight row For occur in daily life the most frequently, caused by damaging range it is most wide.
Summary of the invention
In view of the above-mentioned problems, the present invention provides a kind of violence detection system of deep learning.
The purpose of the present invention is realized using following technical scheme:
Provide a kind of violence detection system of deep learning, including image input module, image overall personality modnies, depth Spend network model module, 3D network model module, D3D network model module, image output module, described image input module For inputting image detected, described image global property module is used to extract the global characteristics of image, the depth net Network model module is used for by extracted image overall Fusion Features in depth network model, the 3D network model module Violence testing result is determined based on depth network model module, and the D3D network model module is for optimizing 3D network model mould Block violence testing result, described image output module are used to export the violence testing result of optimization.
The invention has the benefit that providing a kind of violence detection system of deep learning, violence is effectively raised The accuracy rate of detection.
Detailed description of the invention
The present invention will be further described with reference to the accompanying drawings, but the embodiment in attached drawing is not constituted to any limit of the invention System, for those of ordinary skill in the art, without creative efforts, can also obtain according to the following drawings Obtain other attached drawings.
Fig. 1 is structural schematic diagram of the invention;
Appended drawing reference:
Image input module 1, image overall personality modnies 2, depth network model module 3,3D network model module 4, D3D Network model module 5, image output module 6.
Specific embodiment
The invention will be further described with the following Examples.
Referring to Fig. 1, a kind of violence detection system of deep learning of the present embodiment, including image input module 1, image are complete Office personality modnies 2, depth network model module 3,3D network model module 4, D3D network model module 5, image output module 6, described image input module 1 is for inputting image detected, and described image global property module 2 is for extracting image Global characteristics, the depth network model module 3 be used for by extracted image overall Fusion Features in depth network model In, the 3D network model module 4 determines violence testing result, the D3D network model based on depth network model module 3 Module 5 is used to export the violence of optimization for optimizing 4 violence testing result of 3D network model module, described image output module 6 Testing result.
The violence detection system for present embodiments providing a kind of deep learning effectively raises the accurate of violence detection Rate.
Preferably, described image global property module 2 includes data input layer, convolutional calculation layer, excitation layer, pond layer; The data input layer pre-processes the image of input;The convolutional calculation layer is filtered to image and convolution behaviour Make;The output result of convolutional calculation layer is done Nonlinear Mapping by the excitation layer;The pond layer is mapped for compressive non-linearity Image afterwards;In convolutional calculation layer, by convolution operation to pretreated image zooming-out local neighborhood feature, through excessive In stacking generation, extracts the global characteristics of image by two-dimensional convolution:
In above formula, i indicates that the convolutional layer that image is currently located, j indicate the Feature Mapping quantity of this layer,It indicates i-th Activation value in j-th of Feature Mapping of layer at the position (x, y), this activation value is exactly the two-dimentional global characteristics of image;F () table Show activation primitive, wherein H, W respectively indicate the height of two-dimensional convolution core, the size of width;Indicate the weight of convolution kernel,Indicate activation value of (i-1)-th layer of d-th of the Feature Mapping at (x, y), bijIndicate bias vector.
This preferred embodiment by two-dimensional convolution can easily abstract image spatial information, simple and convenient, application It is widest in area, but be not sufficient to carry out expressed intact to video merely with these appearance features, video can be made to be lacked.
Preferably, the depth network model module 3 is by the two-dimensional convolution core in image overall personality modnies 2 by space Extension generates three dimensional convolution kernel, and the Three dimensional convolution at pixel (x, y, z) calculates is defined as:
In above formula, i indicates that the convolutional layer that image is currently located, j indicate the Feature Mapping quantity of this layer,It indicates the Activation value in i j-th of Feature Mapping of layer at the position (x, y, z);This activation value is exactly the three-dimensional global characteristics of image;f(·) Indicate activation primitive, wherein H, W, T respectively indicate the size on height, width and the time dimension of three dimensional convolution kernel; Indicate the weight of convolution kernel,Indicate activation value of (i-1)-th layer of d-th of the Feature Mapping at (x, y, z), bijIndicate bias vector.
Compared with two-dimensional convolution formula, Three dimensional convolution all increases this preferred embodiment in the expression to convolution kernel and pixel Time dimension is added.After convolution kernel is extended to three-dimensional space, when carrying out convolution to image sequence, convolution operation will be It spatially and temporally carries out simultaneously, in this way after the operation of convolution sum pondization, the characteristic pattern of output remains image sequence, can be with The space time information being effectively maintained in video.By the feature extraction of multiple Three dimensional convolutions, so that it may extract the overall situation of video Space-time characteristic.
Preferably, the 3D network model module 4 is based on use tri- convolution of C1, C2, C3 of depth network model module 3 Layer is calculated, the three dimensional convolution kernel size that C1, C2 and C3 are used is respectively 7 × 7 × 5,5 × 5 × 5 and 3 × 3 × 3 pixels;3D network The input of model module 4 is the image segments X being made of 40 frame consecutive images;Picture frame is normalized to after pretreatment 60 × 90 pixel sizes are simultaneously converted to grayscale image;Scalar Y scalar is exported, for indicating testing result that model inputs image, For trained model, if in test image including Violent scene, output Y is 1, and otherwise exporting result is 0;
The 3D network model module 4 carries out pondization operation, pond to the characteristic pattern that the first two convolutional calculation layer is calculated Change is calculate by the following formula:
In formula, δTFor sampling function,Wherein, t is the time, and T is the sampling period, n ∈ [0, + ∞] and n be positive integer,Indicate y-th of characteristic pattern of x layer,Indicate that y-th of characteristic pattern of x-1 layer, θ and B are respectively to multiply Property biasing and additivity biasing,Indicate y-th of the multiplying property biasing of x layer,Indicate y-th of the additivity biasing of x layer;
The pondization operation does not carry out input feature vector graphic sequence in time dimension down-sampled using two-dimentional pondization operation Operation, the pond factor are set to 3 × 3 and 2 × 2 pixels;
During model training, 3D network model module 4 is using mean square error as cost function, and expression formula is such as Under:
In formula, H1(X, θ) indicates that 3D network model cost function, G are pattern function, and θ is model parameter, and X is training sample This, N is sample size, andIt is sample physical tags, k ∈ [1, N], N ∈ [1 ,+∞];Cost function value is smaller to show model It is better to be fitted with training set;
On the one hand this preferred embodiment can be further reduced network parameter, on the other hand also give characteristic pattern translation not The characteristics such as change and invariable rotary, so that the feature acquired is more robust.
Preferably, the D3D network model module 5 is based on 3D network model, and input is 40 frames of 128 × 128 pixels Consecutive image, consecutive image are Three Channel Color image;
Three dimensional convolution kernel is uniformly set as to 3 × 3 × 3 pixels, in convolution operation, D3D network model module 5 is to characteristic pattern It is filled operation, so that the size before the characteristic pattern obtained after convolution and calculating holding;It is also used during pond Three-dimensional pondization operation, i.e., carry out down-sampled operation to input feature vector graphic sequence in time dimension, the pond factor is set as 2 × 2 × 2 Pixel;
Cost function of the D3D network model module 5 during model training chooses negative log-likelihood function, table It is as follows up to formula:
In formula, H2(X, θ) indicates D3D network model cost function, and G is pattern function, and θ is model parameter, XkIt is k-th Training sample, m are classification numbers, and N is every class sample number,It is k-th of data physical tags;K ∈ [1, N], N ∈ [1 ,+∞], l ∈ [1, m], m ∈ [1 ,+∞].
This preferred embodiment uses more complicated structure, therefore the image data dimension handled can be higher, in this way The extraction that can accelerate image temporal information removes bulk redundancy therein.
Violence detection is carried out using the violence detection system of deep learning of the present invention, 5 detection scenes is chosen and is tested, It respectively detects scene 1, detect scene 2, detection scene 3, detection scene 4, detection scene 5, to violence Detection accuracy and cruelly Power detection speed is counted, and is compared compared with violence detection system, generation has the beneficial effect that shown in table:
Violence Detection accuracy improves Violence detects speed and improves
Detect scene 1 29% 27%
Detect scene 2 27% 26%
Detect scene 3 26% 26%
Detect scene 4 25% 24%
Detect scene 5 24% 22%
Through the above description of the embodiments, those skilled in the art can be understood that it should be appreciated that can To realize the embodiments described herein with hardware, software, firmware, middleware, code or its any appropriate combination.For hard Part realizes that processor can be realized in one or more the following units: specific integrated circuit (ASIC), Digital Signal Processing Device (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), place Manage device, controller, microcontroller, microprocessor, other electronic units or its group designed for realizing functions described herein It closes.For software implementations, some or all of embodiment process can instruct relevant hardware come complete by computer program At.When realization, above procedure can be stored in computer-readable medium or as one on computer-readable medium or Multiple instruction or code are transmitted.Computer-readable medium includes computer storage media and communication media, wherein communication is situated between Matter includes convenient for from a place to any medium of another place transmission computer program.Storage medium can be calculating Any usable medium that machine can access.Computer-readable medium can include but is not limited to RAM, ROM, EEPROM, CD-ROM Or other optical disc storages, magnetic disk storage medium or other magnetic storage apparatus or can be used in carry or store have instruction Or data structure form desired program code and can be by any other medium of computer access.
Finally it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than the present invention is protected The limitation of range is protected, although explaining in detail referring to preferred embodiment to the present invention, those skilled in the art are answered Work as understanding, it can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the reality of technical solution of the present invention Matter and range.

Claims (8)

1. a kind of violence detection system of deep learning, which is characterized in that including image input module, image overall characteristic mould Block, depth network model module, 3D network model module, D3D network model module, image output module, described image input Module is used to extract the global characteristics of image, the depth for inputting image detected, described image global property module It spends network model module to be used for by extracted image overall Fusion Features in depth network model, the 3D network model mould Block determines violence testing result based on depth network model module, and the D3D network model module is for optimizing 3D network model Module violence testing result, described image output module are used to export the violence testing result of optimization.
2. the violence detection system of deep learning according to claim 1, which is characterized in that described image global property mould Block includes data input layer, convolutional calculation layer, excitation layer, pond layer;The data input layer locates the image of input in advance Reason;The convolutional calculation layer is filtered to image and convolution operation;Output result of the excitation layer convolutional calculation layer Do Nonlinear Mapping;The pond layer is for the image after compressive non-linearity mapping.
3. the violence detection system of deep learning according to claim 2, which is characterized in that in convolutional calculation layer, lead to Convolution operation is crossed to pretreated image zooming-out local neighborhood feature, by Multilevel Iteration, figure is extracted by two-dimensional convolution The global characteristics of picture:
In above formula, i indicates that the convolutional layer that image is currently located, j indicate the Feature Mapping quantity of this layer,It indicates at i-th layer the Activation value in j Feature Mapping at the position (x, y), this activation value are exactly the two-dimentional global characteristics of image;F () indicates activation Function, wherein H, W respectively indicate the height of two-dimensional convolution core, the size of width;Indicate the weight of convolution kernel,Indicate activation value of (i-1)-th layer of d-th of the Feature Mapping at (x, y), bijIndicate bias vector.
4. the violence detection system of deep learning according to claim 3, which is characterized in that the depth network model mould Two-dimensional convolution core in image overall personality modnies is generated three dimensional convolution kernel by spatial spread by block, at pixel (x, y, z) The Three dimensional convolution at place calculates is defined as:
In above formula, i indicates that the convolutional layer that image is currently located, j indicate the Feature Mapping quantity of this layer,It indicates at i-th layer Activation value in j-th of Feature Mapping at the position (x, y, z);This activation value is exactly the three-dimensional global characteristics of image;F () is indicated Activation primitive, wherein H, W, T respectively indicate the size on height, width and the time dimension of three dimensional convolution kernel;It indicates The weight of convolution kernel,Indicate activation value of (i-1)-th layer of d-th of the Feature Mapping at (x, y, z), bijIt indicates Bias vector.
5. the violence detection system of deep learning according to claim 4, which is characterized in that the 3D network model module Tri- convolutional calculation layers of C1, C2, C3, the three dimensional convolution kernel size that C1, C2 and C3 are used are used based on depth network model module The pixel of respectively 7 × 7 × 5,5 × 5 × 5 and 3 × 3 × 3.
6. the violence detection system of deep learning according to claim 5, which is characterized in that the 3D network model module Input be the image segments X being made of 40 frame consecutive images;Picture frame is normalized to 60 × 90 pixels after pretreatment Size is simultaneously converted to grayscale image;Scalar Y is exported, for indicating testing result that model inputs image, for trained mould Type, if in test image including Violent scene, output Y is 1, and otherwise exporting result is 0.
7. the violence detection system of deep learning according to claim 6, which is characterized in that the 3D network model module Pondization operation is carried out to the characteristic pattern that the first two convolutional calculation layer is calculated, pond is calculate by the following formula:
In formula, δTFor sampling function,Wherein, t is the time, and T is sampling period, n ∈ [0 ,+∞] And n is positive integer,Indicate y-th of characteristic pattern of x layer,Indicate that y-th of characteristic pattern of x-1 layer, θ and B are respectively the biasing of multiplying property It is biased with additivity,Indicate y-th of the multiplying property biasing of x layer,Indicate y-th of the additivity biasing of x layer.
8. the violence detection system of deep learning according to claim 7, which is characterized in that the pondization operation uses two Wei Chiization operation does not carry out down-sampled operation to input feature vector graphic sequence in time dimension, the pond factor is set to 3 × 3 With 2 × 2 pixels.
CN201810960914.0A 2018-08-22 2018-08-22 A kind of violence detection system of deep learning Withdrawn CN109145822A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810960914.0A CN109145822A (en) 2018-08-22 2018-08-22 A kind of violence detection system of deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810960914.0A CN109145822A (en) 2018-08-22 2018-08-22 A kind of violence detection system of deep learning

Publications (1)

Publication Number Publication Date
CN109145822A true CN109145822A (en) 2019-01-04

Family

ID=64790766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810960914.0A Withdrawn CN109145822A (en) 2018-08-22 2018-08-22 A kind of violence detection system of deep learning

Country Status (1)

Country Link
CN (1) CN109145822A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091060A (en) * 2019-11-20 2020-05-01 吉林大学 Deep learning-based fall and violence detection method
CN111191528A (en) * 2019-12-16 2020-05-22 江苏理工学院 Campus violent behavior detection system and method based on deep learning
CN111860064A (en) * 2019-04-30 2020-10-30 杭州海康威视数字技术股份有限公司 Target detection method, device and equipment based on video and storage medium
CN112287754A (en) * 2020-09-23 2021-01-29 济南浪潮高新科技投资发展有限公司 Violence detection method, device, equipment and medium based on neural network

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860064A (en) * 2019-04-30 2020-10-30 杭州海康威视数字技术股份有限公司 Target detection method, device and equipment based on video and storage medium
CN111860064B (en) * 2019-04-30 2023-10-20 杭州海康威视数字技术股份有限公司 Video-based target detection method, device, equipment and storage medium
CN111091060A (en) * 2019-11-20 2020-05-01 吉林大学 Deep learning-based fall and violence detection method
CN111091060B (en) * 2019-11-20 2022-11-04 吉林大学 Fall and violence detection method based on deep learning
CN111191528A (en) * 2019-12-16 2020-05-22 江苏理工学院 Campus violent behavior detection system and method based on deep learning
CN111191528B (en) * 2019-12-16 2024-02-23 江苏理工学院 Campus violence behavior detection system and method based on deep learning
CN112287754A (en) * 2020-09-23 2021-01-29 济南浪潮高新科技投资发展有限公司 Violence detection method, device, equipment and medium based on neural network

Similar Documents

Publication Publication Date Title
CN109145822A (en) A kind of violence detection system of deep learning
CN110120020A (en) A kind of SAR image denoising method based on multiple dimensioned empty residual error attention network
CN106600557B (en) PSF estimation method based on mixed Gauss model and sparse constraint
CN106133788A (en) Process the image processing apparatus of digital picture
CN110222760A (en) A kind of fast image processing method based on winograd algorithm
CN112801158A (en) Deep learning small target detection method and device based on cascade fusion and attention mechanism
CN110222607A (en) The method, apparatus and system of face critical point detection
CN112036381B (en) Visual tracking method, video monitoring method and terminal equipment
CN110503651A (en) A kind of significant object segmentation methods of image and device
GB2579262A (en) Space-time memory network for locating target object in video content
CN109800713A (en) The remote sensing images cloud detection method of optic increased based on region
CN116503399B (en) Insulator pollution flashover detection method based on YOLO-AFPS
CN114677596A (en) Remote sensing image ship detection method and device based on attention model
CN111339917A (en) Method for detecting glass in real scene
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN109299660A (en) A kind of Intelligent campus early warning platform
CN109978855A (en) A kind of method for detecting change of remote sensing image and device
CN116805387A (en) Model training method, quality inspection method and related equipment based on knowledge distillation
CN108960326A (en) A kind of point cloud fast partition method and its system based on deep learning frame
CN112966815A (en) Target detection method, system and equipment based on impulse neural network
CN115205793B (en) Electric power machine room smoke detection method and device based on deep learning secondary confirmation
CN112464725A (en) First arrival picking method and device based on deep learning network
Chen et al. Alfpn: adaptive learning feature pyramid network for small object detection
CN114494999B (en) Double-branch combined target intensive prediction method and system
CN105913427A (en) Machine learning-based noise image saliency detecting method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190104