CN113642474A

CN113642474A - Hazardous area personnel monitoring method based on YOLOV5

Info

Publication number: CN113642474A
Application number: CN202110941137.7A
Authority: CN
Inventors: 窦涛; 沈雪松; 陆金波; 吴昊翰; 贺荣鹏
Original assignee: Sichuan Aerospace Electro & Hydraulic Control Co ltd
Current assignee: Sichuan Aerospace Electro & Hydraulic Control Co ltd
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2021-11-12

Abstract

The invention discloses a method for monitoring personnel in a dangerous area based on YOLOV5, which comprises the following steps: step 1, constructing a deep learning model training set: acquiring images from an underground monitoring video stream, and carrying out sample annotation on the images; step 2, training a deep learning network model based on YOLOV5 by using the sample marked in the step 1, and step 3, constructing a test data set, transmitting the test data set into the trained deep learning network model for testing, judging whether the deep learning network model meets the engineering requirements, if so, finishing the training, otherwise, modifying the network parameters for training again; and 4, identifying dangerous behaviors of the personnel in the video monitoring under the mine by using the trained deep learning network model, and outputting an identification result. The invention introduces more feature extraction and self-adaptive modules on a stable-based Yolov5 framework, can be effectively applied to the actual personnel behavior monitoring task, and greatly improves the monitoring data analysis capability.

Description

Hazardous area personnel monitoring method based on YOLOV5

Technical Field

The invention belongs to the technical field of image detection and image segmentation, and particularly relates to a method for monitoring personnel in a dangerous area based on YOLOV 5.

Background

The target detection is a basic algorithm in the field of universal identity recognition, is a core part of an intelligent monitoring system and an important branch of image processing and computer vision disciplines, and plays a vital role in subsequent tasks such as face recognition, gait recognition, crowd counting, instance segmentation and the like. Due to the wide application of deep learning, the target detection algorithm is developed rapidly.

The YOLO-based target detection algorithm redefines object detection as a regression problem. It applies a single Convolutional Neural Network (CNN) to the entire image, divides the image into meshes, and predicts class probabilities and bounding boxes for each mesh.

The target detection algorithm applies a method called "Non Max Suppression" (Non Max Suppression) to the objects of each class to filter out the bounding box with "confidence" less than a threshold, which can predict the image.

The bounding box of the target detection can be described by using four descriptors, namely: the center, height, width, value of the bounding box map to the class to which the object belongs.

Dark image enhancement algorithms are used to enhance useful information in an image with the aim of improving the visual impact of the image. The current common algorithms for image enhancement are as follows: histogram equalization, histogram specification, image enhancement algorithm based on physical model, image enhancement algorithm based on partial differential equation and variation, and change domain image enhancement algorithm, etc. with the development of technology, the algorithm is continuously improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method for monitoring personnel in a dangerous area based on Yolov5, which introduces more feature extraction and self-adaption modules on a stable-base Yolov5 framework, can be effectively applied to an actual personnel behavior monitoring task and can greatly improve the data analysis capability of monitoring.

The purpose of the invention is realized by the following technical scheme: a hazardous area personnel monitoring method based on YOLOV5 comprises the following steps:

step 1, constructing a deep learning model training set: acquiring images from an underground monitoring video stream, and carrying out sample annotation on the images;

step 2, training a deep learning network model based on YOLOV5 by using the samples marked in the step 1,

step 3, constructing a test data set, introducing the trained deep learning network model for testing, judging whether the deep learning network model meets the engineering requirements, if so, finishing the training, otherwise, modifying the network parameters and training again;

and 4, identifying dangerous behaviors of the personnel in the video monitoring under the mine by using the trained deep learning network model, and outputting an identification result.

Further, the step 1 comprises the following substeps:

step 11, obtaining images from the mine monitoring video stream, and preprocessing the images: enhancing the image with the brightness lower than a set threshold value; then, randomly turning horizontally/vertically, rotating and cutting the image;

and step 12, carrying out sample marking on the processed images, and carrying out pre-marking on the position information of the personnel on each image.

Further, the deep learning network model consists of three main components:

1) a Backbone component: aggregating and forming a convolutional neural network of image features on different image fine granularities;

2) a pack of the Neck component: a series of network layers that blend and combine image features and pass the image features to a prediction layer;

3) prediction component: and predicting the image characteristics, generating a boundary box and predicting the category.

Further, the Backbone component comprises a Focus layer, a CBL module, a CSP _ A module, an AC-Block module, a CBL module and an SPP module which are connected in sequence.

Further, in the step 3, the specific method for judging whether the deep learning network model meets the engineering requirement is as follows: judging whether the deep learning network model can accurately identify the staff in the video picture; judging whether the deep learning network model can accurately identify the activity area of the worker or not; judging whether the deep learning network model can accurately judge whether the worker enters a dangerous area in the picture or not; and judging whether the deep learning network model can give real-time early warning feedback to the violation in time.

The invention has the beneficial effects that: the invention introduces more feature extraction and self-adaptive modules on a stable-based Yolov5 framework, can be effectively applied to actual personnel behavior monitoring tasks, and can greatly improve the monitored data analysis capability. By utilizing a key area monitoring strategy, the personnel activity condition in a key area (dangerous area) can be efficiently and accurately concerned, and early warning feedback is made.

Drawings

FIG. 1 is a flow chart of a hazardous area personnel monitoring method based on YOLOV5 of the present invention;

fig. 2 is a structural diagram of the deep learning network model based on YOLOV5 according to the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

As shown in fig. 1, the hazardous area personnel monitoring method based on YOLOV5 of the present invention includes the following steps:

step 1, constructing a deep learning model training set: acquiring images from an underground monitoring video stream, and carrying out sample annotation on the images; the method specifically comprises the following substeps:

step 11, obtaining images from the mine monitoring video stream, and preprocessing the images: the image with the brightness lower than the set threshold value is enhanced, so that the data set can better meet the training requirement, and the high-precision prediction can be made on the video acquired in a dark environment; then, random horizontal/vertical turning, rotation and cutting operations are carried out on the image, the sample capacity of a training set is increased, and the generalization capability and robustness of the model are increased;

and step 12, performing sample labeling on the processed images, performing pre-labeling on the position information of the personnel on each image, and using the pre-labeled position information as label data for model training and parameter optimization.

Step 2, training a deep learning network model based on YOLOV5 by using the samples marked in the step 1, applying an efficient and practical feature extraction module to the deep model, transmitting the processed image data into the training model for training, and adjusting a network optimization strategy to enable the performance index of the model to be optimal; on a stable basis of the YOLOV5 framework, more feature extraction and adaptation modules are introduced, and a specific network model is shown in fig. 2. The deep learning network model of the invention is composed of three main components:

The Backbone component comprises a Focus layer, a CBL module, a CSP _ A module, an AC-Block module, a CBL module and an SPP module which are sequentially connected. According to the invention, an AC-Block module (asymmetric convolution structure) is added behind each CSP _ A module of the backhaul component, so that the feature extraction of the convolution on the central position can be enhanced, the defect of low detection accuracy rate due to poor feature extraction capability in the prior art when the video quality is poor is overcome, and the detection and evidence obtaining accuracy rate is improved. The rest of the network structure is the same as the existing YOLOV5 network model.

The Focus layer is used for copying an input image into four parts and then cutting the four pictures into four slices through a slicing operation;

the CSP _ A module is firstly processed by a convolution structure (Conv + Batch _ norm + Leaky relu, CBL) of 1x1, then added with the initial input by a residual structure, and finally spliced with the initial input by the residual structure.

The Neck component adopts an FPN + PAN structure, the FPN layers transmit strong semantic features from top to bottom, the feature information of the high layers is transmitted and fused in an up-sampling mode, and feature graphs of 76 x 76, 38 x 38 and 19 x 19 are output; next, a PAN structure is connected, strong positioning features are transmitted from bottom to top, feature graphs of 76 x 76, 38 x 38 and 19 x 19 are output in a down-sampling mode, concatee operation is carried out on the feature graphs and the output graphs of the FPN structure, parameter aggregation is carried out on different detection layers from different trunk layers, and the feature extraction capability is further improved;

the FA module is divided into a channel feature attention CFA and a pixel feature attention PFA, wherein the CFA is responsible for distributing different weights to feature maps of different channels; the pixel attention is then responsible for giving different weights of attention to different regions of the image.

The model of the invention can be effectively applied to the actual personnel behavior monitoring task.

the specific method for judging whether the deep learning network model meets the engineering requirements comprises the following steps: judging whether the deep learning network model can accurately identify the staff in the video picture; judging whether the deep learning network model can accurately identify the activity area of the worker or not; judging whether the deep learning network model can accurately judge whether the worker enters a dangerous area in the picture or not; and judging whether the deep learning network model can give real-time early warning feedback to the violation in time.

And constructing an overall monitoring scheme of the system, formulating a dangerous area personnel monitoring strategy, adopting the deep learning network model to identify personnel dangerous behavior information in the mine video monitoring, and feeding back a corresponding early warning task, thereby realizing the dangerous area personnel monitoring task. The monitoring scheme and strategy of the specific implementation of the invention are as follows: 1) carrying out video monitoring on the position where the dangerous area exists; 2) monitoring the personnel activity area of the returned video pictures by using the computation network model; 3) carrying out accuracy evaluation on the data result detected by the model; 4) judging whether the person in the picture enters an activity area or not by using an algorithm; 5) for violation behaviors, the system automatically implements early warning measures.

Meanwhile, danger feedback signals of the system are fed back through multi-dimensional information, and the danger feedback signals comprise information feedback of specific positions of people, red early warning feedback of abnormal pictures of a system software interface, sound warning feedback of an early warning device and the like. The invention can carry out various violation analyses on dangerous behaviors of personnel in the video pictures, namely system information such as corresponding positions, specific activity areas, corresponding monitoring camera pictures and the like of the violation personnel is obtained through an algorithm, the violation behaviors are correctly classified by comparing the system information with a behavior database, and corresponding early warning feedback is implemented.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A hazardous area personnel monitoring method based on YOLOV5 is characterized by comprising the following steps:

2. The YOLOV 5-based hazardous area personnel monitoring method of claim 1, wherein the step 1 comprises the following substeps:

3. The method for hazardous area personnel monitoring based on YOLOV5 of claim 1, wherein the deep learning network model consists of three main components:

4. The YOLOV 5-based dangerous area personnel monitoring method according to claim 1, wherein the Backbone component comprises a Focus layer, a CBL module, a CSP _ A module, an AC-Block module, a CBL module and an SPP module which are connected in sequence.

5. The method for monitoring personnel in dangerous areas based on YOLOV5 of claim 1, wherein the specific method for determining whether the deep learning network model meets the engineering requirements in step 3 is: judging whether the deep learning network model can accurately identify the staff in the video picture; judging whether the deep learning network model can accurately identify the activity area of the worker or not; judging whether the deep learning network model can accurately judge whether the worker enters a dangerous area in the picture or not; and judging whether the deep learning network model can give real-time early warning feedback to the violation in time.