CN115331172A - Workshop dangerous behavior recognition alarm method and system based on monitoring video - Google Patents

Workshop dangerous behavior recognition alarm method and system based on monitoring video Download PDF

Info

Publication number
CN115331172A
CN115331172A CN202210993747.6A CN202210993747A CN115331172A CN 115331172 A CN115331172 A CN 115331172A CN 202210993747 A CN202210993747 A CN 202210993747A CN 115331172 A CN115331172 A CN 115331172A
Authority
CN
China
Prior art keywords
dangerous behavior
dangerous
workshop
behavior recognition
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210993747.6A
Other languages
Chinese (zh)
Inventor
谢俊
王子贤
赵宇凡
刘军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202210993747.6A priority Critical patent/CN115331172A/en
Publication of CN115331172A publication Critical patent/CN115331172A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • G08B21/24Reminder alarms, e.g. anti-loss alarms

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a workshop dangerous behavior identification and alarm method and system based on a monitoring video, which comprises the following steps: collecting a dangerous behavior image dataset; constructing a plurality of dangerous behavior identification modules by utilizing an improved YOLOv4-MobileNet V3 deep learning network architecture; inputting a training set into a plurality of constructed dangerous behavior recognition modules, and training by using a loss function to obtain a plurality of trained dangerous behavior recognition modules; inputting the test set into a plurality of dangerous behavior recognition modules which are trained to carry out convolution processing, wherein the classification result output by the model comprises the category to which the target belongs and the corresponding confidence coefficient; setting a threshold according to the confidence of the target, and removing the class to which the target with the confidence lower than the threshold belongs; and judging whether dangerous behaviors exist in all image frames in the monitoring video by the aid of a plurality of dangerous behavior identification modules based on convolution processing, and triggering the alarm module when the dangerous behaviors are determined to exist. The invention can monitor and detect the behavior in the workshop in real time.

Description

Workshop dangerous behavior recognition alarm method and system based on monitoring video
Technical Field
The invention relates to the technical field of workshop safety behavior identification, in particular to a workshop dangerous behavior identification and alarm method and system based on a monitoring video.
Background
With the development of industry, the security problem existing in enterprises is not effectively controlled. The safety protection consciousness of workers is not correspondingly enhanced, and safety accidents in a workshop are frequent. At present, most enterprises have numerous monitoring cameras, and the monitoring cameras depend on manual real-time reflection of monitoring videos. On one hand, the labor operation cost is extremely consumed, and on the other hand, higher missing report and missing identification rate exists in the process of relying on manpower monitoring. Therefore, the workshop is also provided with a safety auxiliary identification alarm device.
In a workshop, the smoking behavior of workers can cause fire accidents; the activities of eating and playing mobile phones of workers may cause safety accidents by distracting the workers. The hazardous behavior detection technology is therefore an important technical component of the identification alarm system. Conventional object detection methods include applying conventional machine learning methods and applying deep learning methods. And the problems of complex environment, personnel walking, illumination change and the like in the workshop easily cause the omission of dangerous behaviors and misjudgment.
By the aid of machine vision, non-manual detection and identification of dangerous behaviors can be realized under the condition that a monitoring video and a computer hardware facility are integrated, so that the purposes of reducing labor cost and improving detection efficiency are achieved.
Most of the existing dangerous behavior detection and identification methods are deep learning-based methods, and various dangerous behaviors are detected and identified by performing simple training set replacement operation on a classical algorithm in the field of target detection. For example, two detection methods commonly used in target detection are used, and currently, the most commonly used two-stage target detection fast R-CNN series and single-stage target detection YOLO series are used. However, in an actual application scenario, due to the problems of complex environment in a workshop, light brightness, change of the size and the angle of a target to be detected and the like, the existing target detection algorithm cannot meet the requirements of detection accuracy (mAP) and detection speed (FPS) in real-time detection. Secondly, hardware facilities in a workshop are usually not enough to meet the computational power requirements of complex models, so that the training parameters of the models need to be reduced on the premise of considering the balance of detection precision and detection speed, so that the hardware parallel computing capability provided by the GPU is fully utilized.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a workshop dangerous behavior recognition alarm method and system based on a monitoring video, which are based on improved YOLOv4-MobileNet V3 network model detection, fully consider the situation of low efficiency of workshop infrastructure hardware facilities, can monitor and detect behaviors in a workshop in real time, improve the speed and accuracy of dangerous behavior detection, and trigger an alarm module to notify safety personnel in time. The cost of workshop equipment is greatly reduced, the occurrence of dangerous behaviors in the workshop is reduced, the safety in the workshop is improved, and the good order and the production work of the workshop can be effectively maintained.
The present invention achieves the above-described object by the following technical means.
A workshop dangerous behavior recognition alarm method based on a monitoring video comprises the following steps:
s1: collecting dangerous behavior image data sets, and performing quantity supplement on the obtained images through an image augmentation technology;
s2: preprocessing dangerous behavior images, and determining a training set and a testing set;
s3: constructing a plurality of dangerous behavior identification modules by utilizing an improved YOLOv4-MobileNet V3 deep learning network architecture;
s4: inputting a training set into a plurality of constructed dangerous behavior recognition modules, and training by using a loss function to obtain a plurality of trained dangerous behavior recognition modules;
s5: inputting the test set into a plurality of dangerous behavior recognition modules which are trained to carry out convolution processing, wherein the classification result output by the model comprises the category to which the target belongs and the corresponding confidence coefficient; setting a threshold according to the confidence of the target, and removing the class to which the target with the confidence lower than the threshold belongs;
s6: and judging whether dangerous behaviors exist in all image frames in the monitoring video by the aid of a plurality of dangerous behavior identification modules based on convolution processing, and triggering the alarm module when the dangerous behaviors are determined to exist.
Further, the dangerous behavior image dataset comprises a smoking image, a eating image and a playing mobile phone image.
Further, image augmentation techniques include flipping, cropping, changing colors (brightness, contrast, saturation, and hue), superimposing multiple images.
Further, dangerous behavior images are processed through cutting mixing, mosaic data enhancement and class label smoothing; the sample ratio of the training set to the test set in the data set was 10: 1.
Further, the method for constructing a plurality of dangerous behavior recognition modules by using the improved classification network architecture of YOLOv4-MobileNet V3 deep learning specifically comprises the following steps:
s3.1: the used YOLOv4-MobileNet V3 deep learning network architecture comprises a Backbone network for Backbone feature extraction, a Neck reinforced feature extraction network and a Head prediction network:
in a Backbone network for backhaul feature extraction, replacing CSPDarknet53 in an original YOLOv4 network with a MobileNet V3, and replacing a channel attention SENet mechanism module in the MobileNet V3 with a position attention CA mechanism module;
in a Neck enhanced feature extraction network, replacing a common convolution in a PANet module in a YOLOv4 network with a deep separable convolution; the Head prediction network is a Head prediction network in YOLOv 4;
s3.2: and (5) performing anchor frame dimension clustering by using a K-means + + algorithm.
Further, the position attention CA mechanism module decomposes the channel attention into two 1-dimensional feature codes, and the two 1-dimensional feature codes are aggregated along 2 spatial directions respectively, so that the remote dependency relationship is captured along one spatial direction, and the accurate position information is kept along the other spatial direction; respectively encoding the generated characteristic graphs into a stack of direction perception and position sensitive attention information, and specifically comprising the following steps of:
coordinate information embedding: given an input x c Using pooling kernels of size (H, 1) or (1, w) to encode each channel along the horizontal and vertical coordinates, respectively, the output of the height H, channel c, can be expressed as:
Figure BDA0003804818840000031
the output of the c-th channel of width W can be expressed as:
Figure BDA0003804818840000032
wherein:
x c (h, p) represents the p-th vertical tensor in height h in the c-th channel; x is a radical of a fluorine atom c (q, w) represents the q-th horizontal tensor of width w in the c-th channel;
Figure BDA0003804818840000033
and
Figure BDA0003804818840000034
respectively aggregating the features in two spatial directions to obtain a corresponding direction perception feature map; r is C×H×W The method comprises the following steps of (1) collecting a feature set with the channel number of C, the length in the horizontal direction of H and the length in the vertical direction of W;
coordinate attention generation: for is to
Figure BDA0003804818840000035
And
Figure BDA0003804818840000036
performing a join operation and then using a 1 x 1 convolution transformation function F 1 Carrying out transformation operation on the obtained product:
Figure BDA0003804818840000037
in the formula [,]a join operation along a spatial dimension; δ is a nonlinear activation function; f. of c ∈R C/r×(H+W) An intermediate feature map for encoding the spatial information in a horizontal direction and a vertical direction; f to be generated along the spatial dimension c Decomposed into two separate tensors
Figure BDA0003804818840000038
And
Figure BDA0003804818840000039
r is used for controlling the reduction rate of the size of the SE block;
using two other 1 x 1 convolution transformation functions F h And F w Respectively mapping the characteristics in the c channel
Figure BDA00038048188400000310
And
Figure BDA00038048188400000311
transforming, the output being represented as
Figure BDA00038048188400000312
And
Figure BDA00038048188400000313
Figure BDA00038048188400000314
and
Figure BDA00038048188400000315
as attention weights, respectively, the following formulas:
Figure BDA00038048188400000316
Figure BDA00038048188400000317
wherein, sigma is a sigmoid activation function, i is a horizontal coordinate variable in the c channel, and j is a vertical coordinate variable in the c channel;
the output of the attention CA mechanism module is as follows:
Figure BDA00038048188400000318
wherein: x is the number of c (i, j) is the input feature in the c-th channel.
Further, in the Neck enhanced feature extraction network, the ordinary convolution in the PANET module in the YOLOv4 network is replaced by the deep separable convolution, wherein the parameter number of the deep separable convolution is that of the ordinary convolution
Figure BDA0003804818840000041
In the formula D K Is the convolution kernel size and N is the number of output channels.
Further, the confidence is defined as:
Figure BDA0003804818840000042
wherein the content of the first and second substances,
Figure BDA0003804818840000043
representing the confidence of the nth bounding box of the mth grid cell; p r (Object) represents the probability that the current bounding box has an Object;
Figure BDA0003804818840000044
when the current bounding box has an object, the predicted bounding box is compared with the real bounding box of the object.
A system of the workshop dangerous behavior recognition alarm method based on the monitoring video comprises monitoring equipment, computing equipment, control equipment, display equipment and alarm equipment;
the monitoring equipment is used for acquiring real-time video data shot by a workshop monitoring camera and transmitting a workshop real-time video to the computing equipment; the computing device comprises a plurality of trained dangerous behavior recognition modules which are used for detecting and recognizing image frames in video transmission and transmitting real-time video with target positions, categories and confidence degrees to a display device and a control device,
the control equipment judges that dangerous behaviors exist by comparing the set confidence threshold value, and controls the display equipment to display a monitoring picture with a rectangular frame in real time when the control equipment detects that the confidence of the key image frame is greater than the set threshold value; when dangerous behaviors are judged to exist, the control device combines all the judged image frames into a video stream with the dangerous behaviors, uploads the video stream to the alarm device, and triggers an alarm.
The invention has the beneficial effects that:
1. the workshop dangerous behavior recognition alarm method and system based on the monitoring video are based on the original YOLOv4 network, and perform anchor dimension clustering on a data set by using a K-means + + algorithm, so that the algorithm precision is improved. The network with stronger feature extraction capability and smaller network parameter quantity, namely MobileNet V3, is used as the alternative backbone network. And the SE attention (channel attention) mechanism module is replaced by a CA attention (position attention) mechanism module, so that the feature map has stronger feature expression capability. And replacing a common convolution module in the PANET with a depth separable convolution and adding CBAM attention, improving the precision, reducing the parameter number to reduce the calculation load, and further adapting to hardware facilities.
2. The workshop dangerous behavior recognition alarm method and system based on the monitoring video are based on YOLOv4-MobileNet V3 network model detection, fully consider the situation that the workshop basic hardware facility is low in efficiency, can monitor and detect behaviors in a workshop in real time, improve the speed and accuracy of dangerous behavior detection, and trigger an alarm module to notify safety personnel in time. The cost of workshop equipment is greatly reduced, the occurrence of dangerous behaviors in the workshop is reduced, the safety in the workshop is improved, and the good order and the production work of the workshop can be effectively maintained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a workshop dangerous behavior identification and alarm method based on a surveillance video according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a YOLOv4 network model according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an overall network structure of MobileNet V3 according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of bneck in a MobileNet V3 network according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a CA attention mechanism structure according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of an improved PANet according to an embodiment of the present invention.
Fig. 7 is a block diagram of a workshop dangerous behavior recognition alarm system based on a surveillance video according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the following figures and specific examples, but the scope of the invention is not limited thereto.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "axial," "radial," "vertical," "horizontal," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the present invention and for simplicity in description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
As shown in fig. 1, the method for identifying and alarming dangerous behaviors of a workshop based on a surveillance video includes the following steps:
s1: the method comprises the following steps of collecting dangerous behavior image data sets, and supplementing the quantity of the obtained images through an image augmentation technology, wherein the method specifically comprises the following steps:
s1.1: selecting monitoring videos in a plurality of sections of enterprise workshops at different time, extracting key image frames containing dangerous behaviors, and uniformly cutting the key image frames into 608 × 608 sizes to obtain monitoring pictures of workshop personnel for smoking, eating and playing mobile phones; the dangerous behavior image data set comprises smoking images, eating images and mobile phone playing images.
S1.2: and carrying out a series of image augmentation processing operations on the extracted key frame image, wherein the operations comprise turning, clipping, changing the brightness, contrast, saturation, hue and the like of colors.
S2: preprocessing the dangerous behavior image, and determining a training set and a testing set, wherein the method specifically comprises the following steps:
s2.1: the data enhancement operations used include:
s2.1.1: cutting and mixing CutMix: combining images by cutting a portion from one image and pasting it onto an enhanced image;
s2.1.2: mosaic data enhancement Mosaic: combining the four training images into one sheet at random proportion;
s2.1.3: class Label Smoothing Label Smoothing: class labels are encoded to measure to some extent the complexity of uncertainty, i.e., errors that may occur, overfitting, and over-focusing on other predictions. Typically, 0.9, i.e., [0, 9,0 ], is selected to represent the correct class.
S2.2: and carrying out artificial name marking and positioning position marking on the extracted image, respectively defining the extracted image as tags of clicking, marking and playing, and dividing the tags into a training set and a test set. Wherein, the sample ratio of the training set and the testing set is 10: 1.
S3: the method comprises the following steps of constructing a plurality of dangerous behavior identification modules by utilizing an improved YOLOv4-MobileNet V3 deep learning network architecture, and specifically comprises the following steps:
s3.1: as shown in fig. 2, the YOLOv4-MobileNet V3 deep learning network architecture used in the present invention includes a Backbone network for backhaul feature extraction, a Neck enhanced feature extraction network, and a Head prediction network:
in a Backbone network for backhaul feature extraction, replacing CSPDarknet53 in an original YOLOv4 network with a MobileNet V3, and replacing a channel attention SENet mechanism module in the MobileNet V3 with a position attention CA mechanism module;
in a Neck reinforced feature extraction network, replacing the common convolution in a PANET module in a YOLOv4 network with a deep separable convolution; the Head prediction network is a Head prediction network in YOLOv 4; the method has the advantages that the method integrates four characteristics of depth separable convolution, an inverse residual error structure with a linear bottleneck, an improved CA attention mechanism and an h-swish activation function, reduces the parameter quantity of the model while ensuring the detection precision, improves the feature extraction capability of the model, improves the detection rate (FPS), and reduces the requirements on the infrastructure hardware;
as shown in fig. 3 and 4, the MobileNet V3 backbone network and bneck (bottleneck layer) in yollov 4-MobileNet V3 are described in more detail:
the MobileNet V3 network structure and Input represent the dimension of a feature matrix Input into the current layer. Operator stands for block operation each feature layer goes through. expsize represents the dimension of the first up-dimension 1 × 1 convolution output in bneck. # out indicates the number of channels in the feature layer at the input to bneck. NBN indicates no normalization operation is used. CA indicates whether or not the attention mechanism is used. NL denotes the nonlinear activation function currently used, HS denotes h-swish and RE denotes ReLU. s is the step size stride used for each block structure. The method comprises the steps of firstly convolving the input image by 1 multiplied by 1 to increase the number of channels, then using deep convolution in a high-dimensional space, then optimizing feature map data through a CA attention mechanism, and finally reducing the number of channels through 1 multiplied by 1 convolution (using an activation function). Connecting the input and output using a residual when the step size =1 and the dimensions of the input and output feature maps are the same; when the step =2 (down-sampling stage), the feature map after dimensionality reduction is directly output.
As shown in fig. 5, the improved attention mechanism module in the MobileNet V3 backbone network is explained:
the original attention mechanism SEnet (SqueezeExceptionNet) is a channel attention network, global average pooling is carried out on input feature maps, and corresponding weights are output through two full-connection layers by a Sigmoid activation function. It mainly measures the channel relationship, neglects the position information, i.e. neglects the space selection. Position Attention CA (coding Attention), which decomposes channel Attention into two 1-dimensional feature coding processes, each aggregated along 2 spatial directions, retains accurate position information along one spatial direction while capturing remote dependencies along the other spatial direction. Respectively encoding the generated feature maps into a stack of attention information attribute maps with direction perception and position sensitivity, complementarily applying the encoded feature maps to the input feature maps, and enhancing the representation of the attention object, specifically comprising the following steps:
coordinate information embedding: given an input x c Using pooling kernels of size (H, 1) or (1, w) to encode each channel along horizontal and vertical coordinates, respectively, the output of the c-th channel of height H can be expressed as:
Figure BDA0003804818840000071
the output of the c-th channel of width W can be expressed as:
Figure BDA0003804818840000072
wherein:
x c (h, p) represents a pth vertical tensor at height h in the pth channel; x is a radical of a fluorine atom c (q, w) represents a qth horizontal tensor with width w in the c-th channel;
Figure BDA0003804818840000073
and
Figure BDA0003804818840000074
aggregating the features for two spatial directions respectively to obtain a perception feature graph corresponding to the directions; r C×H×W The method comprises the following steps of (1) a feature set with the channel number of C, the length of H in the horizontal direction and the length of W in the vertical direction;
coordinate attention generation: to pair
Figure BDA0003804818840000075
And
Figure BDA0003804818840000076
performing a join operation, and then using a 1 x 1 convolutionTransformation function F 1 Carrying out transformation operation on the data:
Figure BDA0003804818840000077
in the formula [,]a join operation along a spatial dimension; δ is the nonlinear activation function; f. of c ∈R C/r×(H+W) Intermediate feature mapping for encoding spatial information in horizontal and vertical directions; f to be generated along the spatial dimension c Decomposed into two separate tensors
Figure BDA0003804818840000081
And
Figure BDA0003804818840000082
r is used for controlling the reduction rate of the size of the SE block;
using two other 1 x 1 convolution transformation functions F h And f w Respectively mapping the characteristics in the c channel
Figure BDA0003804818840000083
And
Figure BDA0003804818840000084
transforming, the output being represented as
Figure BDA0003804818840000085
And
Figure BDA0003804818840000086
Figure BDA0003804818840000087
and
Figure BDA0003804818840000088
as attention weights, respectively, the following formulas:
Figure BDA0003804818840000089
Figure BDA00038048188400000810
wherein sigma is a sigmoid activation function, i is a horizontal coordinate variable in a c channel, and j is a vertical coordinate variable in the c channel;
the output of the attention CA mechanism module is as follows:
Figure BDA00038048188400000811
wherein: x is the number of c (i, j) is the input feature in the c-th channel.
As shown in fig. 6, the specific steps of improving the PANet module in the Neck feature extraction layer in the original yollov 4 network are as follows:
the CBAM attention mechanism is used for the output feature map after the up-sampling kernel down-sampling, and the CBAM is a light-weight general-purpose module, so that the cost of the module can be ignored and the module can be seamlessly integrated into a PANET framework. The CBAM module takes the output result of the convolutional layer as an input feature map, obtains a weighting result through a channel attention module, weights the intermediate feature map processed by the channel attention module through a space attention module, and finally multiplies the attention distribution weight by the input feature map.
Replace the original normal Convolution module Conv in PANet with a depth Separable Convolution (i.e. DWConv — Depthwise Separable Convolution in fig. 4):
let the input feature dimension be D F ×D F ×M,D F For feature size, M is the number of channels, D K The convolution kernel size is N, the number of output channels is N;
standard convolution kernel parameter of D K ×D K ×M×N;
The depth separable convolution has a depth convolution + a point-by-point convolution, which is calculated as follows:
deep convolution: convolution kernel parameter of D K ×D K X1. Times.M. After outputting a deep convolutionThe characteristic dimensions are as follows: d F ×D F And (4) x M. Convolution is such that each channel corresponds to only one convolution kernel (scan depth is 1), so FLOPs (floating point arithmetic value) is: mxD F ×D F ×D K ×D K
Point-by-point convolution: the input is the feature after deep convolution and the dimension is D F ×D F X M, convolution kernel parameters of 1 × 1 × M × N, and output dimension of D F ×D F Xn, performing a 1 × 1 standard convolution for each feature during the convolution process, with the FLOPs being: nxD F ×D F ×M;
Adding the above convolution kernel parameters to obtain: d K ×D K XM + M XN. The depth separable convolution parameters are thus standard convolutions:
Figure BDA0003804818840000091
therefore, when the number of output channels is larger or the size of the convolution kernel is larger, the calculation amount of the depth separable convolution parameter number is smaller
S3.2: performing anchor frame dimension clustering by using a K-means + + algorithm, wherein the method comprises the following steps:
s3.2.1: computationally selecting a sample as a first initial clustering center in a training set;
s3.2.2: selecting the remaining cluster centers: calculating the shortest distance between all samples in the training set and the initial clustering center, and expressing the shortest distance by D (x); the probability of all samples being selected as the next cluster center is then calculated
Figure BDA0003804818840000092
Selecting the next clustering center according to a wheel disc method;
s3.2.3: the above process is repeated until k cluster centers are determined.
S4: inputting a training set into a plurality of constructed dangerous behavior recognition modules, and training by utilizing a loss function to obtain a plurality of trained dangerous behavior recognition modules, wherein the method specifically comprises the following steps:
s4.1: respectively setting training labels into smoking, eating and playing, and respectively detecting smoking, eating and playing mobile phones;
s4.2: inputting a training set into a plurality of constructed dangerous behavior recognition modules, modifying parameters required by training, and specifically:
s4.2.1: the parameters include the number of pictures batch of each iterative training, the picture reproduction subsets in the batch, the iteration times taps when the learning rate changes, the input picture width, the input picture height, the input picture channel number, the class, the picture angle change angle and the like, and specifically:
s4.2.2: batch =96; subdivisions =32; steps =14000 or 16000; width =608; height =608; channels =3; classes =3; angel =0;
s4.3: inputting the training set with the three types of labels into the plurality of dangerous behavior recognition modules in batches to obtain corresponding output results, calculating the loss value according to the loss function, performing back propagation, continuously updating the parameters of the model until the iteration times are larger than the threshold value, stopping training, selecting the parameter with the minimum loss as the final model parameter, and further obtaining a plurality of trained dangerous behavior recognition modules.
Wherein, the loss function comprises three parts: classification Loss Class _ Loss, confidence Loss Conf _ Loss, and bounding box regression Loss CIOU _ Loss.
Loss=Class_Loss+Conf_Loss+CIOU_Loss;
Figure BDA0003804818840000093
Figure BDA0003804818840000101
Figure BDA0003804818840000102
In the formula: s 2 The number of grids; n is the number of prediction frames in each grid;
Figure BDA0003804818840000103
containing the target and not containing the target for the prediction frame; lambda [ alpha ] noobj Is a weight coefficient;
Figure BDA0003804818840000104
representing the prediction confidence of the nth bounding box of the mth grid cell;
Figure BDA0003804818840000105
representing its true confidence; p r (Object)、
Figure BDA0003804818840000106
The prediction probability and the real probability of the object in the current box are represented; IOU is the cross-over ratio between the prediction box and the real box; b, b gt Respectively representing the central points of a prediction box and a ground truth box GT box of the prediction box; rho 2 Means the square of the distance between the two center points; l. the 2 The length square of a diagonal line of a minimum frame which just can contain a prediction frame and a real frame is defined; α is a penalty factor; ν is the ratio of length to width similarity of the real and predicted frames. Wherein, the calculation formulas of alpha and nu are as follows:
Figure BDA0003804818840000107
Figure BDA0003804818840000108
in the formula: omega gt ,h gt ω, h is the width and height of the real and predicted frames, respectively;
s5: inputting the test set into a plurality of dangerous behavior recognition modules which are trained to carry out convolution processing, wherein the classification result output by the model comprises the category to which the target belongs and the corresponding confidence coefficient; setting a threshold according to the confidence of the target, and removing the class to which the target with the confidence lower than the threshold belongs;
the confidence is defined as:
Figure BDA0003804818840000109
wherein the content of the first and second substances,
Figure BDA00038048188400001010
representing the confidence of the nth bounding box of the mth grid cell; p r (Object) represents the probability that the current bounding box has an Object;
Figure BDA00038048188400001011
when the current bounding box has an object, the intersection ratio of the predicted bounding box and the real bounding box of the object is shown.
In the training process, the training process is carried out,
Figure BDA00038048188400001012
the actual value is represented by the value of,
Figure BDA00038048188400001013
the value of (b) is determined by whether a bounding box of the grid cell is responsible for predicting a certain object. If there is responsibility, then
Figure BDA00038048188400001014
If not, then,
Figure BDA00038048188400001015
the rectangular frame represents the size and the accurate position of the target; the confidence value represents the feasible degree of the prediction rectangular box, and the larger the value is, the higher the possibility that the target exists in the rectangular box is; screening the prediction frames with the targets according to a non-maximum suppression algorithm, and removing repeated rectangular frames corresponding to the same target; and obtaining an index corresponding to the maximum probability according to the classification probability of the screened prediction frame, namely the classification index number of the target, thereby obtaining the category of the target.
S6: and judging whether dangerous behaviors exist in all image frames in the monitoring video by the aid of a plurality of dangerous behavior identification modules based on convolution processing, and triggering the alarm module when the dangerous behaviors are determined to exist.
As shown in fig. 7, a system of the workshop dangerous behavior recognition alarm method based on the monitoring video includes a monitoring device, a computing device, a control device, a display device and an alarm device;
the monitoring equipment is used for acquiring real-time video data shot by the workshop monitoring camera and transmitting the workshop real-time video to the computing equipment; the computing device comprises a plurality of trained dangerous behavior recognition modules which are used for detecting and recognizing image frames in video transmission and transmitting real-time videos with target positions, categories and confidence degrees to a display device and a control device.
The control equipment judges that dangerous behaviors exist by comparing set confidence threshold values, and when the confidence of the key image frame is detected to be greater than the set threshold values, the control equipment controls the display equipment to display a monitoring picture with a rectangular frame in real time; when dangerous behaviors are judged to exist, the control device combines all the judged image frames into a video stream with the dangerous behaviors, the video stream is uploaded to the alarm device, and an alarm is triggered.
It should be understood that although the specification has been described in terms of various embodiments, not every embodiment includes every single embodiment, and such description is for clarity purposes only, and it will be appreciated by those skilled in the art that the specification as a whole can be combined as appropriate to form additional embodiments as will be apparent to those skilled in the art.
The above-listed detailed description is only a specific description of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

Claims (8)

1. A workshop dangerous behavior recognition alarm method based on a monitoring video is characterized by comprising the following steps:
s1: acquiring a dangerous behavior image data set, and performing quantity supplement on the acquired images through an image augmentation technology;
s2: preprocessing dangerous behavior images, and determining a training set and a testing set;
s3: constructing a plurality of dangerous behavior identification modules by utilizing an improved YOLOv4-MobileNet V3 deep learning network architecture;
s4: inputting a training set into a plurality of constructed dangerous behavior recognition modules, and training by using a loss function to obtain a plurality of trained dangerous behavior recognition modules;
s5: inputting the test set into a plurality of trained dangerous behavior recognition modules for convolution processing, wherein classification results output by the models comprise categories to which targets belong and corresponding confidence coefficients; setting a threshold according to the confidence of the target, and removing the class to which the target with the confidence lower than the threshold belongs;
s6: and judging whether dangerous behaviors exist in all image frames in the monitoring video by the aid of a plurality of dangerous behavior identification modules based on convolution processing, and triggering the alarm module when the dangerous behaviors are determined to exist.
2. The monitoring video based workshop dangerous behavior recognition alarm method according to claim 1, wherein the dangerous behavior image dataset comprises smoking images, eating images and playing mobile phone images.
3. The monitoring video based workshop dangerous behavior recognition alarm method according to claim 1, characterized in that dangerous behavior images are processed through clipping mixing, mosaic data enhancement and class label smoothing; the sample ratio of the training set to the test set in the data set is 10: 1.
4. The monitoring video based workshop dangerous behavior recognition alarm method according to claim 1, wherein a plurality of dangerous behavior recognition modules are constructed by using the improved classification network architecture of YOLOv4-MobileNet V3 deep learning, and specifically the method comprises the following steps:
s3.1: the YOLOv4-MobileNet V3 deep learning network architecture comprises a Backbone network for Backbone feature extraction, a Neck enhanced feature extraction network and a Head prediction network:
in a Backbone network for backhaul feature extraction, replacing CSPDarknet53 in an original YOLOv4 network with a MobileNet V3, and replacing a channel attention SENet mechanism module in the MobileNet V3 with a position attention CA mechanism module;
in a Neck reinforced feature extraction network, replacing the common convolution in a PANET module in a YOLOv4 network with a deep separable convolution; the Head prediction network is a Head prediction network in YOLOv 4;
s3.2: and (5) carrying out anchor frame dimension clustering by using a K-means + + algorithm.
5. The monitoring video based workshop dangerous behavior recognition alarm method according to claim 4, wherein the position attention CA mechanism module decomposes the channel attention into two 1-dimensional feature codes, and the two 1-dimensional feature codes are respectively aggregated along 2 spatial directions, so that the remote dependency relationship is captured along one spatial direction, and meanwhile, accurate position information is kept along the other spatial direction; respectively encoding the generated characteristic graphs into a stack of direction perception and position sensitive attention information, and specifically comprising the following steps of:
coordinate information embedding: given an input x c Using pooling kernels of size (H, 1) or (1, w) to encode each channel along horizontal and vertical coordinates, respectively, the output of the c-th channel of height H can be expressed as:
Figure FDA0003804818830000021
the output of the c-th channel of width W can be expressed as:
Figure FDA0003804818830000022
wherein:
x c (h, p) denotes a height of c in the c-th channel ofThe p vertical tensor of h; x is a radical of a fluorine atom c (q, w) represents the q-th horizontal tensor of width w in the c-th channel;
Figure FDA0003804818830000023
and
Figure FDA0003804818830000024
respectively aggregating the features in two spatial directions to obtain a corresponding direction perception feature map; r is C ×H×W The method comprises the following steps of (1) a feature set with the channel number of C, the length of H in the horizontal direction and the length of W in the vertical direction;
coordinate attention generation: to pair
Figure FDA0003804818830000025
And
Figure FDA0003804818830000026
performing a join operation and then using a 1 x 1 convolution transform function F 1 Carrying out transformation operation on the data:
Figure FDA0003804818830000027
in the formula [,]a join operation along a spatial dimension; δ is a nonlinear activation function; f. of c ∈R C/r× ( H+w ) Intermediate feature mapping for encoding spatial information in horizontal and vertical directions; f to be generated along the spatial dimension c Decomposed into two separate tensors
Figure FDA0003804818830000028
And
Figure FDA0003804818830000029
r is used for controlling the reduction rate of the size of the SE block;
using two other 1 x 1 convolution transformation functions F h And f w Respectively mapping the characteristics in the c channel
Figure FDA00038048188300000210
And
Figure FDA00038048188300000211
transforming, the output being represented as
Figure FDA00038048188300000212
And
Figure FDA00038048188300000213
and
Figure FDA00038048188300000214
the attention weights are respectively given by the following formula:
Figure FDA00038048188300000215
Figure FDA00038048188300000216
wherein sigma is a sigmoid activation function, i is a horizontal coordinate variable in a c channel, and j is a vertical coordinate variable in the c channel;
the output of the attention CA mechanism module is as follows:
Figure FDA00038048188300000217
wherein: x is a radical of a fluorine atom c (i, j) is the input feature in the c-th channel.
6. The monitoring video based workshop dangerous behavior recognition alarm method according to claim 4, characterized in that in the Neck enhanced feature extraction network, the ordinary convolution in the PANet module in the YOLOv4 network is replaced by the deep separable convolution, wherein the deep separable convolutionThe parameters being ordinary convolutions
Figure FDA0003804818830000031
In the formula D K Is the convolution kernel size and N is the number of output channels.
7. The monitoring video based workshop dangerous behavior recognition alarm method according to claim 1, wherein the confidence level is defined as:
Figure FDA0003804818830000032
wherein the content of the first and second substances,
Figure FDA0003804818830000033
representing the confidence of the nth bounding box of the mth grid cell; p r (Object) represents the probability that the current bounding box has an Object;
Figure FDA0003804818830000034
when the current bounding box has an object, the predicted bounding box is compared with the real bounding box of the object.
8. The system for the workshop dangerous behavior recognition alarm method based on the monitoring video is characterized by comprising monitoring equipment, computing equipment, control equipment, display equipment and alarm equipment, wherein the monitoring equipment is used for monitoring workshop dangerous behaviors;
the monitoring equipment is used for acquiring real-time video data shot by the workshop monitoring camera and transmitting the workshop real-time video to the computing equipment; the computing device comprises a plurality of trained dangerous behavior recognition modules used for detecting and recognizing image frames in video transmission and transmitting real-time video with target positions, categories and confidence degrees to a display device and a control device,
the control equipment judges that dangerous behaviors exist by comparing set confidence threshold values, and when the confidence of the key image frame is detected to be greater than the set threshold values, the control equipment controls the display equipment to display a monitoring picture with a rectangular frame in real time; when dangerous behaviors are judged to exist, the control device combines all the judged image frames into a video stream with the dangerous behaviors, the video stream is uploaded to the alarm device, and an alarm is triggered.
CN202210993747.6A 2022-08-18 2022-08-18 Workshop dangerous behavior recognition alarm method and system based on monitoring video Pending CN115331172A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210993747.6A CN115331172A (en) 2022-08-18 2022-08-18 Workshop dangerous behavior recognition alarm method and system based on monitoring video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210993747.6A CN115331172A (en) 2022-08-18 2022-08-18 Workshop dangerous behavior recognition alarm method and system based on monitoring video

Publications (1)

Publication Number Publication Date
CN115331172A true CN115331172A (en) 2022-11-11

Family

ID=83926569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210993747.6A Pending CN115331172A (en) 2022-08-18 2022-08-18 Workshop dangerous behavior recognition alarm method and system based on monitoring video

Country Status (1)

Country Link
CN (1) CN115331172A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311361A (en) * 2023-03-02 2023-06-23 北京化工大学 Dangerous source indoor staff positioning method based on pixel-level labeling
CN116740821A (en) * 2023-08-16 2023-09-12 南京迅集科技有限公司 Intelligent workshop control method and system based on edge calculation
CN117011301A (en) * 2023-10-07 2023-11-07 广东三姆森科技股份有限公司 Defect detection method and device based on YOLO model
CN117237741A (en) * 2023-11-08 2023-12-15 烟台持久钟表有限公司 Campus dangerous behavior detection method, system, device and storage medium
CN117496678A (en) * 2024-01-02 2024-02-02 广州市声讯电子科技股份有限公司 Emergency broadcast image-text alarm method, alarm system and storage medium
CN117671594A (en) * 2023-12-08 2024-03-08 中化现代农业有限公司 Security monitoring method, device, electronic equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311361A (en) * 2023-03-02 2023-06-23 北京化工大学 Dangerous source indoor staff positioning method based on pixel-level labeling
CN116311361B (en) * 2023-03-02 2023-09-15 北京化工大学 Dangerous source indoor staff positioning method based on pixel-level labeling
CN116740821A (en) * 2023-08-16 2023-09-12 南京迅集科技有限公司 Intelligent workshop control method and system based on edge calculation
CN116740821B (en) * 2023-08-16 2023-10-24 南京迅集科技有限公司 Intelligent workshop control method and system based on edge calculation
CN117011301A (en) * 2023-10-07 2023-11-07 广东三姆森科技股份有限公司 Defect detection method and device based on YOLO model
CN117237741A (en) * 2023-11-08 2023-12-15 烟台持久钟表有限公司 Campus dangerous behavior detection method, system, device and storage medium
CN117237741B (en) * 2023-11-08 2024-02-13 烟台持久钟表有限公司 Campus dangerous behavior detection method, system, device and storage medium
CN117671594A (en) * 2023-12-08 2024-03-08 中化现代农业有限公司 Security monitoring method, device, electronic equipment and storage medium
CN117496678A (en) * 2024-01-02 2024-02-02 广州市声讯电子科技股份有限公司 Emergency broadcast image-text alarm method, alarm system and storage medium

Similar Documents

Publication Publication Date Title
CN115331172A (en) Workshop dangerous behavior recognition alarm method and system based on monitoring video
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN112861635B (en) Fire disaster and smoke real-time detection method based on deep learning
CN115829999A (en) Insulator defect detection model generation method, device, equipment and storage medium
US20180336460A1 (en) Predicting wildfires on the basis of biophysical indicators and spatiotemporal properties using a convolutional neural network
CN111428625A (en) Traffic scene target detection method and system based on deep learning
US11468266B2 (en) Target identification in large image data
CN111368636A (en) Object classification method and device, computer equipment and storage medium
CN111524113A (en) Lifting chain abnormity identification method, system, equipment and medium
CN115439694A (en) High-precision point cloud completion method and device based on deep learning
CN115393690A (en) Light neural network air-to-ground observation multi-target identification method
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
CN116824335A (en) YOLOv5 improved algorithm-based fire disaster early warning method and system
CN111340116A (en) Converter flame identification method and system, electronic equipment and medium
CN112668675B (en) Image processing method and device, computer equipment and storage medium
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN117218545A (en) LBP feature and improved Yolov 5-based radar image detection method
CN117011280A (en) 3D printed concrete wall quality monitoring method and system based on point cloud segmentation
CN116524296A (en) Training method and device of equipment defect detection model and equipment defect detection method
CN114360064B (en) Office place personnel behavior lightweight target detection method based on deep learning
CN115439926A (en) Small sample abnormal behavior identification method based on key region and scene depth
CN114581769A (en) Method for identifying houses under construction based on unsupervised clustering
CN114241189A (en) Ship black smoke identification method based on deep learning
CN113837173A (en) Target object detection method and device, computer equipment and storage medium
CN111724337B (en) Cold bed top punching identification method and system, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination