CN115471794A

CN115471794A - Intelligent power plant small target detection method and system based on YOLOv5

Info

Publication number: CN115471794A
Application number: CN202211274508.1A
Authority: CN
Inventors: 杨端; 孙建永; 薛江; 韩志英; 孙曼; 王鑫; 谭金鑫; 谢国庆; 石唯怡; 徐代
Original assignee: Xi'an Junneng Clean Energy Co ltd; Shaanxi Oula Mathematics Research Institute Co ltd
Current assignee: Xi'an Junneng Clean Energy Co ltd; Shaanxi Oula Mathematics Research Institute Co ltd
Priority date: 2022-10-18
Filing date: 2022-10-18
Publication date: 2022-12-13

Abstract

The invention discloses a method and a system for detecting a small target of an intelligent power plant based on YOLOv5, wherein an anchor mechanism is built, and a YOLO v5 network training model of an output end is reconstructed; training a YOLO v5 network training model by using a training data set, verifying the trained YOLO v5 network training model by using a verification data set, and storing the YOLO v5 network training model with the optimal model weight; inputting the test data set into the YOLO v5 network training model obtained in the step S3 to realize small target detection; the method reasonably improves a mature deep learning video target detection algorithm YOLO v5, and is applied to real-time detection and early warning analysis of 9 types of abnormal targets of the photovoltaic station so as to improve the safety of production operation of power generation of the photovoltaic power station, enable the photovoltaic power station and improve the intelligent management and control level of safety production.

Description

Intelligent power plant small target detection method and system based on YOLOv5

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to an intelligent power plant small target detection method and system based on YOLOv 5.

Background

The intelligent safety refers to an ideal, intelligent and reasonable safety state or level in production, life and survival by implementing an intrinsic safety strategy, a comprehensive treatment strategy and a system precaution project based on an intelligent theory, according to a scientific rule and an intelligent tool. In industrial production and construction, safety problems are always the most important, except for popularization and education of related safety theories, monitoring and avoidance of dangerous factors in the practical process are always necessary means for guaranteeing the safety of workers. In actual production, if safety helmets and work clothes are not worn regularly, serious potential safety hazards can be generated in the construction process, and the life safety of workers is directly threatened. In addition, the early warning to smog, open fire and inflammable matters is also necessary, the life and property loss caused by the fire is striking every year, and the fire alarm protection work cannot be looked at a short time.

Necessity of model method selection: the introduction of the target detection technology can fully exert the advantages of a machine learning algorithm, quickly and accurately realize early warning, and reduce the life and property loss caused by accidents to the maximum extent.

Currently, for the dataset level: in the prior art, a small target data set of smoke and fire is lacked, most of the small target data set is smoke and fire data of a large target, which is inconsistent with the early warning that people want to realize and has little practical significance; aiming at the model level: the existing target detection algorithm has a good object detection effect on large targets, but has a poor object detection effect on small targets.

Disclosure of Invention

The invention aims to solve the technical problem of providing an intelligent power plant small target detection method and system based on YOLOv5 aiming at the defects in the prior art, and the method and system are used for solving the technical problem of poor small target object detection effect.

The invention adopts the following technical scheme:

an intelligent power plant small target detection method based on YOLOv5 comprises the following steps:

s1, collecting a data set and labeling a training data set, a verification data set and a test data set;

s2, constructing an anchor frame adding mechanism and reconstructing a YOLO v5 network training model of an output end;

s3, training the YOLO v5 network training model obtained in the step S2 by using the training data set obtained in the step S1, verifying the trained YOLO v5 network training model by using the verification data set obtained in the step S1, and storing the YOLO v5 network training model with the optimal model weight;

and S4, inputting the test data set obtained in the step S1 into the YOLO v5 network training model with the optimal model weight obtained in the step S3, and realizing small target detection.

Specifically, step S1 specifically includes:

s101, collecting a PASCAL VOC data set and an MS-COCO data set, and converting the PASCAL VOC data set into a COCO format;

s102, collecting a work clothes data set, a safety helmet data set, a smoke data set, a flame data set, a garbage bottle data set, a human body data set, a reflective clothes data set, a non-work clothes data set and a human head data set from the data set obtained in the step S101;

s103, dividing the work clothes data set, the safety helmet data set, the smoke data set, the flame data set, the garbage bottle data set and the human head data set obtained in the step S102 into a training data set, a verification data set and a test data set according to a proportion.

Further, in step S102, when the data set obtained in step S101 has a labeled tag, the tag data is uniformly converted into a COCO format and exported; when only the picture data is in the data set obtained in step S101, the picture data is labeled by using a label image tool, and a label in a COCO format is derived.

Further, in step S103, the training data set accounts for 76% of the total data set, the verification data set accounts for 15% of the total data set, and the testing data set accounts for 9% of the total data set.

Specifically, in step S2, the YOLO v5 network training model includes an input end, a backbone network, a neck network, and an output end, where the input end performs data enhancement and data preprocessing on input data in a mosaic data enhancement, adaptive picture scaling, and adaptive anchor frame calculation manner; the backbone network adopts a Focus module, a CSP module and an SPP module to extract the characteristics of the input image; the neck network adopts FPN + PAN structure to aggregate the characteristics of different scales; the output end is used for obtaining the feature mapping and predicting the bounding box and the class by using the aggregation feature mapping obtained by the neck network.

Furthermore, the backbone network uses CSP1, the input in CSP1 is divided into two branches, one branch passes through CBL first, passes through residual error unit module including a plurality of residual error structures, and then is convoluted once; the other branch is directly convoluted; then cascading the two branches, performing BN (cubic boron nitride) layer and SiLU (single LU) activation function again, and finally performing CBL (cubic boron nitride) once;

the neck network uses CSP2, which CSP2 converts the result in CSP1 into 2X CBLs; the SPP module of the backbone network reduces the input channel by half through a standard convolution module, and then maximum pooling operations with pooling kernel sizes of 5, 9 and 13 are respectively carried out; padding is filled to adapt to different core sizes, cascade operation is carried out on the result of the three times of maximal pooling and data which are not subjected to pooling operation, and finally channel number which is 2 times of the original channel is obtained through combination.

Further, the output end adopts shape rule matching, the aspect ratio of the candidate frame bbox and the anchor frame anchor of the current layer is calculated, and when the aspect ratio is larger than a set threshold value, the filtered candidate frame bbox serves as a background; and calculating grids where the residual candidate frame bbox capable of being matched with the anchor frame anchor is located, finding out two nearest grids according to a rounding rule, and associating the grids where the candidate frame bbox matched with the anchor frame anchor is located and the adjacent two grids with the candidate frame bbox corresponding to the prediction to serve as grids responsible for predicting the candidate frame bbox.

Furthermore, after the output end is on the 17 th layer, the characteristic diagram continues to be subjected to up-sampling processing; carrying out cascade fusion operation on the feature map with the size of 160 × 160 acquired from the 20 th layer and the feature map of the 2 nd layer in the backbone network; a small target detection layer is arranged on the 31 st layer, and four layers [21,24,27 and 30] are used for detection.

Further, the output end adopts a Bounding box loss function, which specifically comprises:

the IOU is the ratio of an intersection set A and a union set B of the prediction frame and the real frame; distance _2 is the Euclidean Distance between two central points of the prediction frame and the real frame; c is the minimum circumscribed rectangle of the intersection A and the union B of the prediction frame and the real frame, and Distance _ C is the diagonal Distance of C; ν is a parameter that measures the uniformity of the aspect ratio.

In a second aspect, an embodiment of the present invention provides an intelligent power plant small target detection system based on YOLOv5, including:

the data module is used for acquiring the public data set and marking the training data set, the verification data set and the test data set;

the reconstruction module is used for constructing an anchor frame adding mechanism and reconstructing a YOLO v5 network training model of an output end;

the training module is used for training the YOLO v5 network training model obtained by the reconstruction module by using the training data set obtained by the data module, verifying the trained YOLO v5 network training model by using the verification data set obtained by the data module, and storing the YOLO v5 network training model with the optimal model weight;

and the detection module is used for inputting the test data set obtained by the data module into the YOLO v5 network training model with the optimal model weight obtained by the training module to realize small target detection.

Compared with the prior art, the invention has at least the following beneficial effects:

an intelligent power plant small target detection method based on YOLOv5 is characterized by constructing an anchor increasing mechanism and reconstructing a YOLO v5 network training model of an output end; training a YOLO v5 network training model by using a training data set, verifying the trained YOLO v5 network training model by using a verification data set, and storing the YOLO v5 network training model with the optimal model weight; the method comprises the steps of inputting a test data set into a YOLO v5 network training model with optimal model weight stored, achieving small target detection, detecting targets in multiple scales based on a plurality of detection heads of the YOLO v5 network training model, conducting data preprocessing such as mosaic data enhancement, self-adaptive anchor frame calculation and unified picture size on data, and achieving good detection effect on the small targets. Meanwhile, in the design of the model, a structure located between the Head and the backhaul is called "Neck" with the goal of aggregating as much information extracted by the backhaul as possible before feeding it back to the Head. This structure plays an important role in transferring small object information by preventing the small object information from being lost. It does this by again increasing the resolution of the feature map so that features from different layers of the Backbone can be aggregated to improve overall detection performance.

Further, the original image data and the labeled label data are obtained by collecting and labeling the data, and since the model constructed by the invention belongs to a supervision model, the original image data and the labeled label data are both necessary for inputting subsequent model training.

Furthermore, the data labels are unified into a COCO format, on one hand, the unified format is beneficial to the processing of a subsequent model during data input, and the problems of training error report and the like caused by different data formats are avoided; on the other hand, the COOC format is the most commonly used format for target detection algorithms such as YOLO and the like at present, and the occupied storage space is small.

Further, the training data set is a part which has the largest influence on the model performance, so that the division ratio of the training data set is the largest and is 76%; validating the data set involves adjusting the model hyperparameters (e.g. the last selected training epochs), so setting their ratio to the second largest, 15%; the test data set is only for evaluation of the trained model, which has minimal impact on the overall model and therefore the set proportion is minimal, 9%.

Further, the input layer: mosaic data enhancement is spliced in the modes of random zooming, random cutting and random arrangement, so that the detection effect of small targets is improved to a certain extent; the self-adaptive picture scaling unifies the input images into a fixed size (640 x 640), so that the problem of low training speed caused by overlarge scale of partial data is avoided; the self-adaptive anchor frame calculation sets anchor frames with different initial lengths and widths aiming at different data sets, so that the overall generalization capability of the model is enhanced. Backbone network: for the Focus module, the CSP module and the SPP module, convolution, BN normalization, slicing operation, residual connection and other methods or steps are combined, the characteristics of multiple dimensions of the input image are comprehensively extracted, and balance of performance and calculation complexity is considered in design. Neck network: by adopting an FPN + PAN structure and a mode of combining downsampling and upsampling, the features of different scales extracted from the backbone network are aggregated. An output layer: and designing a Bounding box loss function based on the CIOU, and comparing the previous DIOU loss, wherein the CIOU considers the aspect ratio into the loss function, thereby further improving the regression precision.

Further, the CSP (cross-phase local network) enhances the capability of network feature fusion and also alleviates the problem that a large amount of reasoning calculation is needed before; the SPP module (spatial pyramid pooling module) adopts 5, 9 and 13 maximal pooling respectively, and then concat fusion is carried out, so that the overall receptive field of the model is improved.

Furthermore, shape rule matching is adopted in an output layer, positive samples of neighborhoods are considered in shape rule matching, and the number of the positive samples is increased, so that convergence of model weight in a training process is accelerated.

Further, at the 20 th layer, performing cascade concat fusion operation on the acquired feature map with the size of 160 multiplied by 160 and the feature map at the 2 nd layer in the backbone network; and on the 31 st layer, a small target detection layer is added, and a total of four layers [21,24,27 and 30] are used, so that the detection effect of the whole model on the small target object is improved for additionally adding a small target detection layer in the previous network.

Further, a Bounding box loss function based on the CIOU is designed, and compared with the previous DIOU loss, the CIOU considers the aspect ratio into the loss function, so that the regression precision is further improved.

It is to be understood that, the beneficial effects of the second aspect may refer to the relevant description in the first aspect, and are not described herein again.

In conclusion, the mature deep learning video target detection algorithm YOLO v5 is applied to real-time detection and early warning analysis of 9 abnormal targets of the photovoltaic station through reasonable improvement, so that the production operation safety of power generation of the photovoltaic power station is improved, the photovoltaic power station is energized, and the intelligent management and control level of safety production is improved.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a feature diagram of an IOT technology detection system for deep learning;

FIG. 2 is a flow chart of the present invention;

FIG. 3 is a schematic diagram of the result of recognition by the method of the present invention, wherein (a) is the detection result of smoke and fire, (b) is the detection result of the frock tool, and (c) is the detection result of the garbage bottle;

FIG. 4 is a diagram of training results of various indicators;

FIG. 5 is a deployment system workflow.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be understood that the terms "comprises" and/or "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and including such combinations, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that although the terms first, second, third, etc. may be used to describe preset ranges, etc. in embodiments of the present invention, these preset ranges should not be limited to these terms. These terms are only used to distinguish preset ranges from one another. For example, the first preset range may also be referred to as a second preset range, and similarly, the second preset range may also be referred to as the first preset range, without departing from the scope of the embodiments of the present invention.

The word "if" as used herein may be interpreted as "at 8230; \8230;" or "when 8230; \8230;" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (a stated condition or event)" may be interpreted as "upon determining" or "in response to determining" or "upon detecting (a stated condition or event)" or "in response to detecting (a stated condition or event)", depending on the context.

Various structural schematics according to the disclosed embodiments of the invention are shown in the drawings. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers and their relative sizes and positional relationships shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, according to actual needs.

The invention provides an intelligent power plant small target detection method based on YOLOv5, which is characterized in that a deep learning technology framework is used, 8 cameras are carried for real-time monitoring and early warning aiming at two technical difficulties of data collection and marking and small target identification, and an intelligent safety detection and identification system is established by combining a flow processing request module, a routing inspection line initialization module, an event priority setting module, an abnormal event neglect setting module and a self-processing response time setting module. The detection method comprises core links such as mosaic data enhancement, self-adaptive anchor frame calculation, focus image slicing, cross-stage local network, SPP space pyramid pooling, small target feature detection layer and the like, and realizes detection and early warning of 6 types of abnormal events or objects such as no-wear work tools, no-wear safety helmets, smoke, open fire, garbage bottles, personnel intrusion and the like. The average recognition accuracy of various types reaches more than 80 percent. The detection capability of the small target is improved based on the improved YOLO v5 algorithm, the module is flexible, the maintenance is easy, and the method is suitable for being applied to multi-scene service requirements such as station polling, unmanned aerial vehicle low-altitude aerial photography and traffic road condition detection.

Referring to fig. 1, the invention utilizes a mature and complete deep learning network framework to perform real-time detection on a video image target, and designs a set of video stream intelligent image recognition system specially used for wearing normative detection, object potential hidden danger detection and abnormal person intrusion detection; the deep learning target detection algorithm YOLO v5 is used for replacing manual detection, an existing camera in a field is utilized, other hardware equipment does not need to be additionally arranged, an abnormal target is automatically identified, and 6 types of abnormal objects such as unworn safety helmets, unworn work clothes, smoke, flames, garbage bottles, personnel intrusion and the like are identified and marked in real time, so that the method is used for serving actual production and protecting the safety of workers and the good-order operation of production from the technical level.

Referring to fig. 2, the method for detecting the small target of the intelligent power plant based on YOLOv5 includes the following steps:

s1, collecting a public data set, and marking a training data set, a verification data set and a test data set;

firstly, collecting an existing public data set for pre-training a YOLO v5 network training model and initializing parameters of the YOLO v5 network training model; then, the 9 types of data required by the invention are manually expanded and labeled, and a training data set, a verification data set and a test data set are obtained by division.

S101, collecting a data set commonly used by a target detection task;

common data sets include a PASCAL VOC data set and an MS-COCO data set, and the PASCAL VOC data set is converted into a data format corresponding to the COCO data set.

S102, collecting data sets of 9 different types of objects from the data sets obtained in the step S101;

the data set consists of two parts, one part is an image, the other part is a label, the two parts correspond to each other, and if labeled label data exist, the labeled label data are uniformly converted into a COCO format; if only the picture data exists, labeling by using a Labelimage tool, and exporting a label in a COCO format, a human body data set, an industrial clothes data set, a reflective clothes data set, a non-industrial clothes data set, a safety helmet data set, a smoke data set, a flame data set, a garbage bottle data set and a human head data set;

s103, directly influencing the training result of the YOLO v5 network training model by the quantity and the quality of the data set obtained in the step S102; the invention resamples the training data, including random undersampling of the majority classes and SMOTE oversampling of the minority classes, specifically using data enhancement techniques such as Gaussian noise addition, random clipping, rotation scaling, etc.

And S104, dividing the data set obtained by resampling in the step S103 into a training data set, a verification data set and a test data set according to a proper proportion.

Wherein the training dataset accounts for 76% of the total dataset, the validation dataset accounts for 15% of the total dataset, and the test dataset accounts for 9% of the total dataset.

For training a small target detection-based improved YOLO v5 network training model, 9 data sets of different types of objects are needed, and in order to save the occupation of training video memory, a picture type data set is selected for training, so that the requirement of accurately detecting videos in industry can be met.

Firstly, pre-training is carried out on VOC2007 and COCO, and then human body data sets, worker clothes data sets, non-worker clothes data sets, reflective clothes data sets, safety helmet data sets, smoke data sets, flame data sets, garbage bottle data sets and human head data sets are collected in a targeted mode to train fine adjustment parameters, so that the classification and identification accuracy is improved.

To ensure the diversity of data, the collected category picture content includes multiple angles, multiple forms, different distance objects, and ensures that no identical picture is repeatedly trained in all epochs. In consideration of the shooting quality problem caused by the influence of the resolution ratio of the camera equipment and the environmental weather factors in the actual detection process, the fuzzy picture with poor resolution ratio is artificially introduced for training, and the stability of algorithm detection is enhanced. In addition, in consideration of the density of personnel movement and sundries placement in an actual scene, in order to avoid incomplete recognition, attention is paid to collecting samples with various targets appearing in the same picture for training, the sensitivity of algorithm detection is enhanced, and missing detection is avoided.

And labeling the data set meeting the requirements, wherein the labeling comprises code batching operation and manual label adding. Considering that the attribution categories of the objects to be identified are not completely mutually exclusive, multi-label marking is performed. Some open-source data sets themselves have a data format recognizable by YOLO v5, and are easy to label directly. And for the manually selected pictures, the pictures can only be manually marked. And using an Imagelabel tool to frame the region of the target and print a label, and storing the corresponding label information as a txt text format input model for training.

On the other hand, in consideration of adjusting a model learning strategy, cost sensitive learning based on a cost matrix is tried, and learning bias is controlled from the aspect of function optimization.

The invention uses a semi-supervised method to carry out label propagation in a self-adaptive manner, improves the confidence coefficient of the data label, does not need a professional to guide the marking, and can generate a label with sufficient reliability, thereby saving the labor cost.

S2, designing a YOLO v5 network training model;

the YOLO v5 network flexibly configures models with different complexity by applying channel and layer control factors similar to EfficientNet, and adopts a cross-neighborhood grid matching strategy in a positive and negative sample definition stage, so that more positive samples are obtained, and convergence is accelerated. Considering that the YoLO v5 network has the advantages of high training speed, short reasoning time, small model size, flexible module deployment and the like, the method selectively establishes the main body framework on the YoLO v 5.

The YOLO v5 network training model comprises four parts, namely an input end, a Backbone (Backbone) network, a Neck (Neck) network and an output end (Head).

Input terminal

The main technologies used for data enhancement and data preprocessing include Mosaic (Mosaic) data enhancement, adaptive picture scaling and adaptive anchor frame calculation.

The mosaic data enhancement is realized by randomly using 4 pictures, randomly zooming and randomly distributing and splicing, so that a detection data set is enriched, and particularly, a plurality of small targets are added by random zooming, thereby being beneficial to enhancing the detection capability of the small targets and improving the generalization of network to adapt to the detection of the targets with multiple sizes in a real scene.

Backbone (Backbone) network

The key point of the backbone network is to extract the characteristics of the input image, and the network structure of the main body is a Focus structure and a CSP structure.

And performing slicing operation on the pictures through the Focus structure. Specifically, every other pixel in one picture takes one value, similar to adjacent downsampling, so that four pictures are taken, the four pictures are complementary, no information is lost, the input channel is expanded by 4 times, and the spliced pictures are changed into 12 channels relative to the original RGB three-channel mode. And (4) carrying out convolution operation on the obtained new image to finally obtain a double-sampling feature map under the condition of no information loss.

Neck (hack) network

The key point of the neck network is to aggregate features of different scales, and particularly, a structure of FPN + PAN is adopted.

Output end

And (4) adopting a Bounding box loss function, and performing non-maximum suppression (nms) on an output result.

The Bounding box loss function adopts a CIOU-based loss function, and the specific steps are as follows:

the Bounding box loss uses the IOU value to evaluate the position loss of the prediction box and the real box, and specifically uses the CIoU function:

the IOU is the ratio of the intersection A and the union B of the prediction frame and the real frame; distance _2 is the Euclidean Distance between two central points of the prediction frame and the real frame; c is the minimum external rectangle of A and B, and Distance _ C is the diagonal Distance of C; ν is a parameter that measures the uniformity of the aspect ratio.

V is defined as follows:

wherein, w ^gt 、h ^gt Width and height, w, of the real frame, respectively ^p 、h ^p Respectively the width and height of the prediction box.

For the output end, a rule based on max iou matching is abandoned, shape rule matching is directly adopted, the aspect ratio of the bbox and the anchor of the current layer is calculated, if the aspect ratio is larger than a set threshold value, the matching degree of the bbox and the anchor is not enough, and the prediction of the layer is regarded as the background. For the remaining bbox, calculate which grid it falls within, and at the same time find the nearest two grids according to the rounding rule, all three grids being considered as relevant for predicting the bbox, such an operation ensures a significant increase in the number of positive samples compared to the other yolo series frameworks.

Using the CSPDarknet as a Backbone network Backbone, rich information features are extracted from the input image. CSPNet (Cross Stage Partial Networks) solves the problem of repeated network optimization gradient information in a backhaul, and gradient changes are integrated into a characteristic diagram from beginning to end, so that the parameter number and FLOPS value of a model are reduced, the inference accuracy and speed are ensured, and the structural scale of the model is reduced. The CSPNet is actually based on the concept of Densnet, and duplicates the feature map of the base layer, and sends the duplicates to the next stage through dense block, thereby separating the feature map of the base layer. Because the lost signal is difficult to reversely deduce by a very deep network, the gradient disappearance problem can be effectively relieved, the feature propagation is supported, and the features are shared, so that the number of network parameters is reduced.

Yolo5 uses two CSP structures, CSP1 and CSP2, and uses CSP1 in the backbone network and CSP2 in the neck network. In CSP1, the input is divided into two branches, one branch passes through CBL (Conv + BN + SiLu) first and passes through a plurality of residual error structures, and then convolution is carried out again; the other branch is directly convoluted; then concat is carried out on the two branches, and finally CBL is carried out after BN layer and SiLU activation function; in CSP2, the main difference is that the result is replaced by 2X CBLs.

An SPP module (Spatial Pyramid Pooling) in a Backbone (Backbone) network reduces an input channel by half through a standard convolution module, and then maxpoloring with kernel-size of 5, 9 and 13 is respectively carried out. Padding is adaptive for different core sizes; and performing cascade conca operation on the result of the three times of maximum pooling and the data which is not subjected to the pooling operation, wherein the number of channels after final combination is 2 times that of the original number. The introduction of the SPP can effectively avoid the problems of image distortion and the like caused by image region clipping and scaling operation, simultaneously solve the problem of image repeated feature extraction by a convolutional neural network, improve the speed of generating a candidate frame and reduce the calculation consumption.

In practical detection application, because the shooting distance is long, many small targets often exist in a video, and the YOLO v5 before improvement has a lower limit of detection capability in terms of target size, and the accuracy of object detection of less than 8 times 8 pixels is seriously reduced. Moreover, the downsampling multiple of YOLO v5 is large, and the feature information of a small target is difficult to learn by a deep feature map. In order to improve the detection accuracy and stability of the algorithm on small targets, the invention adds an improvement strategy, which comprises adding a group of smaller-size detection layers, detecting by using four layers [4,8, 16 and 24] in total, splicing a shallow feature map and a deep feature map by adding an anchor layer and a reconstruction output end, and performing cascade concat fusion operation on a feature map obtained by the 20 th layer (deep layer) of the network and the feature map of the 2 nd layer (shallow layer) in the backbone network, wherein the feature map is 160 multiplied by 160.

The reconstructed output is responsible for retrieving the feature map and predicting the bounding box and class by retrieving several aggregated feature maps from the neck network. In addition to the received parameters, the output structure remains intact, since the output is an essential part of the overall training model, and in small object detection, in addition to the size of the input image, the depth and width of the model can be modified to change the main direction of processing. The layer connection of the heck and Head can also be changed manually in order to focus on detecting specific signatures. In this section, several layers of operation are mainly added. After the 17 th layer, the feature map continues to be subjected to processing such as up-sampling, so that the feature map continues to be enlarged. And at the layer 20, performing cascade concat fusion operation on the acquired feature map with the size of 160 multiplied by 160 and the layer 2 feature map in the backbone network, so as to acquire a larger feature map for small target detection. At layer 31, the detection layer, a small target detection layer is added, and a total of four layers [21,24,27,30] are used for detection.

S3, training the YOLO v5 network training model designed in the step S2 by using the training data set obtained in the step S1, and storing the optimal model weight by measuring the performance index of the verification data set obtained in the step S1 of the YOLO v5 network training model;

and S4, inputting the test data set obtained in the S1 into the YOLO v5 network training model obtained in the step S3, and realizing small target detection.

In another embodiment of the present invention, an intelligent power plant small target detection system based on YOLOv5 is provided, and the system can be used for implementing the above intelligent power plant small target detection method based on YOLOv5, and specifically, the intelligent power plant small target detection system based on YOLOv5 includes a data module, a reconstruction module, a training module, and a detection module.

The data module is used for acquiring a public data set and marking a training data set, a verification data set and a test data set;

the reconstruction module is used for constructing an anchor adding mechanism and reconstructing a YOLO v5 network training model of an output end;

the training module is used for training the YOLO v5 network training model obtained by the reconstruction module by using the training data set obtained by the data module, verifying the trained YOLO v5 network training model by using the verification data set obtained by the data module and storing the YOLO v5 network training model with the optimal model weight;

and the detection module is used for inputting the test data set obtained by the data module into the YOLO v5 network training model obtained by the training module to realize small target detection.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 3 and 4, screenshots of recognition results of various categories are given, and examples include a video type and a picture type. It can be seen that the algorithm can accurately divide and mark various targets, and is suitable for targets with different sizes and under different environments.

The reason that the inference speed FPS is far lower than the official version is that only a CPU is called during testing, and when a GPU is called, the FPS of YOLO v5 can reach 88.

The performance of the process before and after the improvement is shown in table 1:

TABLE 1 Algorithm index comparison before and after improvement

Evaluation index	Number of iterations	Average rate of accuracy	Reasoning speed	Amount of ginseng	Memory usage
						YOLO v5	140	81.8	9	2.1e+07	818MB
Improved YOLO v5	140	83.1	6	2.3e+07	818MB

Compared with the prior art, the improved method for adding the small target detection layer obtains higher average accuracy under the same iteration number. Although the number of parameters is increased due to the increase of the network layer number, and the inference speed is slightly reduced, the memory occupation is almost unchanged, and the parameter quantity before and after the improvement is still in the same order of magnitude. Experiments show that the improved method achieves the expected effect in the aspect of accuracy, and does not pay large cost in the aspect of calculation complexity.

The invention can accurately detect three different types of input data samples:

pictures, video, and real-time video streams.

For real-time detection, only a camera needs to be accessed, and other complex hardware equipment is not needed.

In the result checking stage, a detection result document is output, the real-time detection result is stored in a specified output directory in a txt form, and historical data records are reserved, so that related personnel can conveniently conduct subsequent further detailed research and monitoring. Meanwhile, concurrent detection is realized. The algorithm is set according to the priority, detection videos are obtained from all cameras according to different detection frequencies and sent to a deep neural network for concurrent detection, and the deep neural network simultaneously outputs detection results of the pictures to finish concurrent detection. And the multi-path parallel stream pushing is completed on the basis of concurrent detection. And according to the detection result, establishing a fluid pushing thread for the camera in which people or smoke and fire appear.

After the detection and identification are completed, the algorithm is connected with an alarm system, when an abnormal object is detected, real-time alarm prompt is given, and a worker quickly positions and adjusts according to the geographical position coordinates by using a scheduling system and combining a preset definite routing inspection route according to the alarm information. For the detection of the personnel target, the human body appearing in the monitoring picture of the camera can be identified, and the position of the human face in the monitoring picture is marked. In addition, the identity of the person who enters can be determined through a face detection algorithm, and the inspection authority of the person is determined, so that the situation that different dangerous areas of the power station are mistakenly entered by unauthorized persons can be avoided.

The camera remote operation module, the stream processing request module, the inspection line initialization module, the event priority setting module, the abnormal event neglect setting module and the self-processing response time setting module all run normally in the edge service. After receiving the central request, response execution of all requests and return of response results are carried out, the whole process does not exceed 100 milliseconds, subsequent operation of part of requests relates to database persistence, and the operation is completed within 50 milliseconds.

Referring to fig. 5, the deployment system workflow is as follows:

(1) Anomaly identification system with YOLO v5 algorithm as core

Processing video information input by a camera, realizing automatic identification and accurate positioning of 9 types of targets, and outputting the position frame coordinates, the category serial numbers and the confidence level of the targets;

(2) Alarm handling system

Setting alarm threshold values aiming at various scenes based on the result fed back by the abnormality recognition system, and realizing alarms of different sources and different types; and visualizing the result of the anomaly identification, namely marking the position and the category of the target on the original video information.

In addition, in order to better realize the linkage of the recognition algorithm and the camera, a monitoring and monitoring dispatching system and a routing inspection system are added, wherein the monitoring and monitoring dispatching system controls the position and the real-time angle of the camera in a display area on a map; the latter sets up the route of patrolling and examining, accomplishes the real-time interaction with the personnel of patrolling and examining.

And according to the work clothes types of different customer units, the data set is replaced to retrain the model parameters, and the specific customer target is served in a personalized manner. The appointed data set is input again each time, training is carried out on the servers with the video card configuration 3060 and above, the training can be completed within 4 hours generally, and the average identification precision is over 90 percent, so that the requirement of continuous updating is met.

Table 2: VOC2007 test results

The experimental result of the table 2 shows that the model of the item is obviously superior to the comparative model in the VOC 2017 data set, and the leading amplitude is large. The model reaches 87.4 on the mAP @0.5 index, and is improved by 8.7 compared with the YOLOV2 model.

Table 3: COCO test results

In the test results of the COCO data set in Table 3, the project model also substantially precedes most of the comparative model, at mAP @0.5:0.95、mAP@0.75、mAP _S 、mAP _M 、mAP _L The indexes are all optimized, and only the mAP @0.5 model is slightly inferior to the YOLOV4 model. Compared with the current newer algorithm YOLOV4 (2020), the performance improvement mainly lies in the improvement of the detection effect, mAP, on small targets and medium targets _S And mAP _M Are lifted from 26.7 and 46.7 to 29.6 and 49.5, respectively.

The improved YOLO v5 method can not only realize real-time monitoring and early warning of 6 types of abnormal video streams of workers who do not wear a frock, wear a safety helmet, generate smoke, generate open fire, generate a garbage bottle, and break-in personnel, but also realize multi-class detection of small targets. Compared with the original edition YOLO v5 method, the accuracy is improved by about 2%, the inference speed is reduced less, and the comprehensive performance is dominant. Tests show that the designed system can already preliminarily complete the detection of the actual scene object in the plant station.

The module is flexible, is easy to modify and maintain in the later period, and can be suitable for different service requirements. The system is portable and renewable. On one hand, the system is simple and easy to deploy, rapid transplantation can be carried out among different power stations, the front end is used for deploying the cameras, the transplantation of project schemes can be realized only by designing different camera deployment schemes and installing servers for subsequent processing for different power plants, software in the system can be universal, and only hardware equipment required by algorithm operation and a corresponding software dependency library are provided. On the other hand, the system provides hardware and software support for the construction of the intelligent power plant, and subsequently, data sets such as the behavior safety of patrol personnel of the power plant can be expanded on the basis of the system so as to perfect the parameter optimization of a model algorithm in the system, thereby further improving the safety management level of the power plant.

In summary, the method and the system for detecting the small target of the intelligent power plant based on the YOLOv5 realize the detection of the 9 types of targets (human, head, safety helmet, work clothes tool, reflective clothes, other clothes, trash bottle, smoke and flame) which may appear in the intelligent power plant based on the YOLOv5 algorithm, and improve the detection result of the small target; then, the result detected by the algorithm is combined with a subsequent alarm handling system, on one hand, real-time reminding and early warning are realized, and potential dangers (such as fire) can be avoided; and on the other hand, the targets appearing in the camera are visualized, and the visualized recording result is stored.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention should not be limited thereby, and any modification made on the basis of the technical idea proposed by the present invention falls within the protection scope of the claims of the present invention.

Claims

1. An intelligent power plant small target detection method based on YOLOv5 is characterized by comprising the following steps:

2. The intelligent power plant small target detection method based on YOLOv5 as claimed in claim 1, wherein the step S1 is specifically as follows:

3. The intelligent power plant small target detection method based on YOLOv5 as claimed in claim 2, characterized in that in step S102, when the data set obtained in step S101 has a labeled tag, the tag data is uniformly converted into a COCO format and exported; when only the picture data is in the data set obtained in step S101, the picture data is labeled by using a label image tool, and a label in a COCO format is derived.

4. The YOLOv 5-based intelligent power plant small target detection method as claimed in claim 2, wherein in step S103, the training data set accounts for 76% of the total data set, the verification data set accounts for 15% of the total data set, and the test data set accounts for 9% of the total data set.

5. The intelligent power plant small-target detection method based on YOLO v5 as claimed in claim 1, wherein in step S2, the YOLO v5 network training model includes an input end, a backbone network, a neck network and an output end, the input end performs data enhancement and data preprocessing on the input data by adopting mosaic data enhancement, adaptive picture scaling and adaptive anchor frame calculation modes; the backbone network adopts a Focus module, a CSP module and an SPP module to extract the characteristics of the input image; the neck network adopts FPN + PAN structure to aggregate features of different scales; and the output end is used for acquiring the feature mapping and predicting the bounding box and the class by using the aggregation feature mapping acquired by the neck network.

6. The intelligent power plant small-target detection method based on YOLOv5 as claimed in claim 5, characterized in that the backbone network uses CSP1, the input in CSP1 is divided into two branches, one branch passes through CBL first, passes through residual unit module Resunit comprising a plurality of residual structures, and then is convoluted once again; the other branch is directly convoluted; then cascading the two branches, performing BN (cubic boron nitride) layer and SiLU (single LU) activation function again, and finally performing CBL (cubic boron nitride) once;

the neck network uses CSP2, which CSP2 converts the result in CSP1 into 2X CBLs; the SPP module of the backbone network reduces the input channel by half through a standard convolution module, and then maximum pooling operations with pooling kernel sizes of 5, 9 and 13 are respectively carried out; padding to adapt to different core sizes, performing cascade operation on the result of the three times of maximum pooling and data which is not subjected to pooling operation, and finally combining to obtain the channel number which is 2 times of the original channel.

7. The intelligent power plant small target detection method based on YOLOv5 as claimed in claim 5, characterized in that the output end adopts shape rule matching, the aspect ratio of a candidate frame bbox and a current layer anchor frame anchor is calculated, and when the aspect ratio is larger than a set threshold, the filtered candidate frame bbox is used as a background; calculating grids where the residual candidate frames bbox capable of being matched with the anchor frame anchorages are located, finding out two nearest grids according to a rounding rule, and associating the grids where the candidate frames bbox matched with the anchor frame anchorages are located and the adjacent two grids with the candidate frames bbox corresponding to prediction to serve as grids responsible for predicting the candidate frames bbox.

8. The YOLOv 5-based intelligent power plant small target detection method as claimed in claim 7, wherein the output end continues to perform upsampling processing on the feature map after the 17 th layer; carrying out cascade fusion operation on the feature graph with the size of 160 × 160 acquired at the 20 th layer and the feature graph of the 2 nd layer in the backbone network; a small target detection layer is arranged on the 31 st layer, and four layers [21,24,27 and 30] are used for detection.

9. The YOLOv 5-based intelligent power plant small target detection method according to claim 5, characterized in that a Bounding box loss function is adopted at an output end, and specifically:

the IOU is the ratio of the intersection A and the union B of the prediction frame and the real frame; distance _2 is the Euclidean Distance between two central points of the prediction frame and the real frame; c is the minimum circumscribed rectangle of the intersection A and the union B of the prediction frame and the real frame, and Distance _ C is the diagonal Distance of C; ν is a parameter that measures the uniformity of the aspect ratio.

10. An intelligent power plant small target detection system based on YOLOv5 is characterized by comprising: