CN116977904A

CN116977904A - Yolov 5-based rapid large-scene-identification multi-man-made garment detection method

Info

Publication number: CN116977904A
Application number: CN202311023456.5A
Authority: CN
Inventors: 陶茜茜; 赵静; 梁鸿; 宋贞耀
Original assignee: Shandong Dinghong Safety Technology Co ltd
Current assignee: Shandong Dinghong Safety Technology Co ltd
Priority date: 2023-08-15
Filing date: 2023-08-15
Publication date: 2023-10-31

Abstract

The invention discloses a method for rapidly identifying a large scene multi-man-made garment based on YOLOv5, which comprises the following steps: extracting image information of a video stream shot under a fixed camera, marking a foreground in an image by using LabelImg software in consideration of the complexity of an oil extraction operation site, constructing a data set of target detection of the oil extraction operation site, and randomly dividing the data set into training sets, wherein the test set is=8:2; according to the training result, adjusting parameters of the network, and selecting a model with the optimal training result as a final model; extracting image information of an oilfield on-site video stream as input into a trained YOLOv5 student network after knowledge distillation, and carrying out foreground information identification to obtain detection data; performing secondary detection on workers through the obtained detection information; and judging whether the worker does not wear the work clothes or not according to the set label threshold value, and storing detection and judgment information. And alarming when the environmental photo shows that the worker does not wear the work clothes. The invention uses knowledge distillation to compress the network model, extracts the valve area and the multi-scale characteristics of the image, solves the problem of low efficiency in the traditional method, and can rapidly and accurately identify that workers do not wear the worker clothes in the field photo.

Description

Yolov 5-based rapid large-scene-identification multi-man-made garment detection method

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a method for quickly identifying a large scene and multiple manual clothes based on YOLOv 5.

Background

In petroleum operation sites, the scene is complex, and the risk of personnel operation is higher. In order to ensure the personal safety of petroleum workers, it is necessary to correctly wear the work clothes. The all-weather staring examination mode is adopted by monitoring, so that the cost of manpower and material resources is high, and the efficiency is low. Therefore, the intelligent analysis of the phenomenon that the worker wears the clothes incorrectly exists on the operation site by using some intelligent technologies is of great significance.

With the continuous upgrading of computer hardware and the breakthrough of artificial intelligence technology, artificial intelligence such as machine learning and deep learning is becoming more and more popular. In the field of oilfield safety production, the computer vision technology has achieved remarkable results, and the computer vision technology such as target detection, target tracking and attitude estimation is used for analyzing the field video in real time, so that the phenomenon of incorrect wearing of the work clothes on the operation field is automatically judged, the traditional manual monitoring mode is effectively replaced, and the safety and the supervision efficiency of the field are improved.

Disclosure of Invention

The invention aims to automatically alarm the phenomenon of not wearing the worker's clothes by using target detection, and simultaneously, the invention uses knowledge distillation to improve the model reasoning speed and the efficiency of the existing oil extraction operation site on the detection of the worker's clothes. The method for quickly identifying the large-scene multi-man-made clothes based on the YOLOv5 is provided, features are automatically extracted, detection efficiency is improved, and cost of manpower and material resources is reduced.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

(1) Collecting video stream data of an oilfield site, carrying out framing treatment to obtain image data, and manufacturing a non-wearing clothing data set;

(2) Constructing a large-scene multi-artificial-coat target detection teacher model and a student model based on YOLOv5 under a PyTorch frame;

(3) Selecting a training sample to train the non-wearing detection teacher model, and distilling the backbone network of the teacher model and the output of the student model;

(4) The method comprises the steps of carrying out framing treatment on a video stream to be detected, and inputting a trained large-scene multi-manual work clothes detection model to obtain detection data;

(5) According to the obtained detection data of the continuous frames, carrying out secondary detection on the data detected as workers;

(6) And obtaining the state of whether the worker wears the frock according to the set label threshold value and storing relevant detection and judgment information.

The invention is further improved in that the specific implementation steps of the step (1) are as follows:

(101) Framing video stream data of an oilfield operation site to obtain image data;

(102) In consideration of the complexity of the oil extraction site and various types of the worker's clothes, worker's shoes, etc. in the image data are calibrated by LabelImg software, so as to obtain the image and the corresponding label file;

the invention is further improved in that the specific implementation steps of the step (2) are as follows:

(201) A teacher network, which uses ResNet151 as a backbone network of the teacher network;

(202) A student network using mobilenet v3 as a backbone network of the student network;

the invention is further improved in that the specific implementation step of the step (3) comprises the following steps:

(301) Setting training parameters of a network: the maximum number of iterations is set to 200; the learning rate was initialized to 0.001, which was reduced to 0.0001 on round 10 and to 0.00001 on round 50;

(302) Training a teacher model by using the data set of the unworn clothes;

(303) And adjusting parameters of the network according to training results, and distilling the teacher network, the backbone network of the student model, the output of the teacher model and the output of the student network.

The invention is further improved in that the specific implementation steps of the step (5) are as follows:

(501) Performing secondary detection on the data detected as the worker;

(502) Judging the brightness of the data subjected to the secondary detection, and if the average brightness is lower than a threshold value L=100, performing gamma conversion on the data to improve the brightness;

(503) And setting a threshold value T=0.7, and judging that the clothing is not worn when the confidence coefficient of the secondarily detected clothing label which is not worn is larger than T.

The method for rapidly identifying the large-scene multi-man-made clothes based on the YOLOv5 has the beneficial effects that: unified task, easy training, convenient optimization and the like; the problems of low efficiency and the like of the traditional manual staring method are considered, whether the phenomenon of not wearing the work clothes occurs or not is automatically judged in real time through a computer and a camera of an oilfield operation site, and meanwhile, the knowledge distillation technology is used, so that the magnitude of a model is reduced, and the reasoning speed on a terminal is improved. Compared with the process of manually checking the camera to confirm and feed back, the method is more accurate and rapid, and reduces a large amount of manpower and material cost.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic block diagram of a method for detecting multiple artificial clothes in a large scene based on YOLOv5 in embodiment 1 of the present invention.

Fig. 2 is a flowchart of a method for detecting multiple artificial clothes in a large scene based on YOLOv5 in embodiment 2 of the present invention.

Fig. 3 is a network structure schematic diagram of a YOLOv 5-based rapid identification large-scene multi-man-made garment detection method according to embodiment 3 of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality. The embodiments described below by way of the drawings are exemplary only and should not be construed as limiting the invention.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or groups thereof.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

In order that the invention may be readily understood, a further description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings and are not to be construed as limiting embodiments of the invention.

It will be appreciated by those skilled in the art that the drawings are merely schematic representations of examples and that the elements of the drawings are not necessarily required to practice the invention.

Example 1

As shown in fig. 1, embodiment 1 of the present invention provides a YOLOv 5-based rapid identification large scene multi-man clothing detection frame diagram, which includes:

the image processing module is used for carrying out framing processing on the real-time video data of the oilfield site to obtain image data;

the network module is used for analyzing and processing the obtained image by using the unworn garment detection model, extracting the position coordinates of the unworn garment region and outputting prediction information;

and the judging and storing module is used for judging whether the accident of wearing the work clothes occurs or not according to the detection information and storing the detection and judging information.

In this embodiment 1, the network model includes a backbone network, a neck network, a classification network, and a training optimization unit.

The main network is used for extracting multi-scale characteristics of the input image by utilizing the pre-constructed main network in the rapid identification large-scene multi-manual clothing detection model;

the neck network is used for fusing feature graphs among different scales and improving the accuracy and the robustness of the model;

the classification network is used for classifying and regressing the image characteristics and the aggregation information of the suspected unworn clothing areas by using a loss function to obtain prediction information;

the training optimization unit adjusts parameters of the network according to training results, and selects a model with the optimal training results as a final model.

In this embodiment 1, the determination storage module determines whether an accident of putting on a garment has occurred after outputting the detection information by the large-scene multiple-person garment detection model, and stores the position information and the determination information of the occurrence of the accident of putting on a garment.

Example 2

Fig. 2 is a diagram of a YOLOv 5-based rapid identification method for detecting multiple artificial clothes in a large scene, which comprises the following specific operation steps:

(2) Constructing a model of a plurality of artificial clothes detection teachers and students based on YOLOv5 under a Pytorch framework;

(3) Training and optimizing a target detection model by using the manufactured data set of the unworn clothes, and distilling a backbone network of the student model and a backbone network trained by a teacher model;

(4) Detecting the field video stream by using the trained student target detection model;

(5) Performing secondary detection on workers according to the obtained detection data;

(6) Judging whether the worker has the illegal phenomenon of wearing the work clothes or not through the set threshold value.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A method for quickly identifying a large scene multiple manual clothes detection based on YOLOv5 is characterized by comprising the following steps:

(1) Manufacturing a target detection data set of an oil extraction operation site; firstly, framing video data to obtain image data, calibrating the image data by using LabelImg software to manufacture an oil extraction site operation data set, and finally, randomly dividing the data set into a training set and a testing set according to a fixed proportion;

(3) Training and optimizing a large-scene multi-artificial-coat target detection teacher model by selecting training samples, and distilling a backbone network of the teacher model and a student model, and output of the teacher model and output of the student network respectively;

(4) The method comprises the steps of carrying out framing treatment on a video stream to be detected, and inputting the video stream to be detected into a trained large-scene multi-manual work clothes detection model to obtain detection data;

2. The YOLOv 5-based large scene multiple manual garment detection method of claim 1, wherein the creating a running drip target detection dataset comprises:

the video stream data of the oil field operation site is subjected to framing treatment to obtain image data, because the foreground information of the oil extraction site is complex, workers, hydraulic tongs, tongs frames and the like in the image data are calibrated by LabelImg software to obtain images and corresponding tag files, and the image data and the corresponding tag files are randomly divided into a training set and a testing set according to 8:2.

3. The YOLOv 5-based large-scene multiple-manual garment detection method of claim 2, wherein the YOLOv 5-based large-scene multiple-manual garment detection model comprises:

(a) The main network is used for extracting multi-scale characteristics of the input image;

(b) The neck network is used for processing the extracted features and fusing the features with different scales;

(c) And the classification network is used for outputting the detected category and position information.

4. A YOLOv 5-based large-scene multiple-artificial-garment detection method according to claim 3, wherein the selecting training samples to train and optimize the large-scene multiple-artificial-garment detection model comprises:

(a) Setting training parameters of a network;

(b) Training a YOLOv5 large-scene multi-artificial-garment target detection model by utilizing an oil extraction operation site data set;

(c) And adjusting parameters of the network according to the training result, and selecting a model with the optimal training result as a final model.

5. The YOLOv 5-based large-scene multiple-manual garment detection method according to claim 2, wherein the large-scene multiple-manual garment detection data are: surrounding frame and coordinates of workers.

6. The YOLOv 5-based large-scene multiple-manual garment detection method of claim 5, wherein the unworn garment violation data is behavior categories of each video segment.

7. The YOLOv 5-based large-scene multiple-manual garment detection method according to claim 6, wherein the specific implementation steps of judging whether the worker does not wear the garment are as follows:

(a) Setting tag thresholds T1 and T2 and a brightness threshold L;

(b) Performing brightness judgment according to the acquired worker data, and judging that the worker wears the worker clothes when the worker clothes threshold T is more than 0.5 under the condition that L is more than 100; otherwise, the worker is judged not to wear the frock.

(c) And storing and visualizing the real-time analysis result of whether the worker wears the worker clothes or not for the supervision personnel to check and process.