CN112633308A

CN112633308A - Detection method and detection system for whether power plant operating personnel wear safety belts

Info

Publication number: CN112633308A
Application number: CN202010965975.3A
Authority: CN
Inventors: 李政谦; 刘曙元; 李志强
Original assignee: Beijing Huadian Tianren Power Controlling Technology Co Ltd
Current assignee: Beijing Huadian Tianren Power Controlling Technology Co Ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2021-04-09

Abstract

The invention discloses a safety belt detection method, which comprises the steps of obtaining historical video pictures of a monitoring area to form a safety belt detection data set; marking a safety belt detection data set, and randomly dividing the safety belt detection data set into a training set and a testing set; generating an anchor frame which corresponds to the safety belt detection data set and is suitable for a monitoring area according to the obtained safety belt detection data set; training a deep learning model by using a training set to obtain a safety belt detection model; adjusting the model confidence of the safety belt detection model; and the safety belt detection model based on deep learning detects the safety belt condition in the monitoring area in real time. The safety belt detection method based on the deep learning model realizes real-time detection of the safety belt, can adapt to actual complex scenes, improves mAP of safety belt detection, is beneficial to improving safety awareness of related personnel, and helps to orderly develop work.

Description

Detection method and detection system for whether power plant operating personnel wear safety belts

Technical Field

The invention belongs to the technical field of intelligent video monitoring safety, and particularly relates to a safety belt detection method.

Background

Safety belts are a measure to prevent brain damage. Research shows that in a work accident, approximately 60% of brain injuries are caused by incorrect safety belts, and the safety belts of related personnel are detected, so that the incidence rate of the related accidents can be effectively reduced.

In the early stage, a special safety supervisor usually detects the safety belt condition of a driver, but the mode is difficult to supervise in all directions and cannot ensure the supervision effectiveness. Therefore, there is a need for a safety belt detection method that can supervise construction and inspection sites in real time and reduce supervision costs.

In a general target detection method, only the existence of a detection target needs to be judged on a picture, the number of the targets is obtained, and the target position is marked. For a safety belt detection algorithm, real-time identification and depth optimization aiming at a dynamic video are also required on the basis, so that higher identification and tracking precision is achieved; the adaptability to different environments such as light, cloudy days and the like is strong, and the device is not influenced by the shielding of personnel glasses, beards, hairstyles, expressions and the like; and the device is not influenced by different postures of the front, the back, the side, running, head lowering and the like of people.

In recent years, researchers have made many innovative studies on seat belt detection in two detection methods, i.e., sensor-based detection and image processing-based detection. However, the detection algorithm based on the convolutional neural network is superior to the traditional detection algorithm by virtue of the characteristics of simple network, high detection speed, high accuracy and the like, and becomes the mainstream algorithm in the aspect of safety belt detection at present.

However, for a specific scene in a field, the convolutional neural network model cannot achieve a high detection effect, and a corresponding model and a detection strategy need to be formulated for the scene.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a full-band detection method which can detect whether a person in field video information is a safety belt or not and effectively reduce the problems of missed detection and error detection caused by complex conditions of high moving speed of the person, object shielding, dim light and the like.

The invention adopts the following technical scheme. A method for detecting whether a power plant operator wears a safety belt comprises the following steps:

step 1, acquiring historical video data from video monitoring equipment in a monitoring area, and converting the historical video data into pictures to form a safety belt detection data set;

step 2, marking whether a person wears the safety belt in the safety belt detection data set picture, and randomly dividing the person into a training set and a testing set according to a set proportion;

step 3, generating an anchor frame which corresponds to the safety belt detection data set and is suitable for the monitoring area according to the safety belt detection data set obtained in the step 1;

step 4, training the convolutional neural network model by using a training set, detecting the improved convolutional neural network model after each generation of training by using a test set, and screening to obtain a safety belt detection model;

and 5, acquiring real-time video data from video monitoring equipment in the monitoring area, and detecting the wearing condition of the safety belt in the monitoring area in real time based on the safety belt detection model obtained in the step 4.

Preferably, step 2 specifically comprises:

step 2.1, marking whether people wear safety belts in the safety belt detection data set one by one, wherein the marked part is used as a positive sample, and the unmarked part is used as a negative sample;

step 2.2, obtaining the category and the detection frame of each marked object to form a marked object file;

step 2.3, generating a file path corresponding to the safety belt detection data set and storing the file path in a data set file;

and 2.4, randomly distributing the labeled object file stored in the data set file in the step 2.3 and storing the labeled content, and dividing the labeled object file into a training set and a test set according to a set proportion.

Preferably, in step 3, the anchor frame is generated by using a clustering method, the distance measure in the clustering method is the overlapping degree distance, the distance D between the predicted frame and the real frame is expressed by the following formula,

D＝1-IoU(box,clusters)

in the formula:

d is the distance of the prediction frame from the real frame,

IoU (-) represents the overlap function,

the box is a marked box,

clusters is the number of clusters.

Preferably, step 4 specifically includes:

step 4.1, uniformly adjusting the pictures in the training set to 640 x 640;

step 4.2, carrying out image enhancement processing on the picture subjected to size adjustment in the step 4.1;

step 4.3, setting iteration times, the number of each batch of images, an initial learning rate and a learning rate updating condition;

4.4, training a convolutional neural network model by using the image subjected to image enhancement processing;

and 4.5, detecting the mAP of the convolutional neural network model after each generation of training by using the test set, and selecting the convolutional neural network model with the highest mAP as a safety belt detection model.

Preferably, in step 4, the convolutional neural network model is divided into 1 category, and the unbelted belt is marked as none; a convolutional neural network model with an input picture size of 640 x 640; the number of anchor frames of two output layers of the convolutional neural network model is respectively 4 and 5, and the convolutional neural network model is used for detecting a common target and a small target, wherein the size of the common target is greater than 32 x 32, and the size of the small target is less than or equal to 32 x 32; the sizes of the characteristic graphs of the two output layers of the convolutional neural network model are 5 × 5 and 40 × 40 respectively.

Preferably, in step 4, the processing procedure of the convolutional neural network model on the picture includes:

i. inputting the dimension 640 x 640, and obtaining the characteristic graph size of 40 x 40 through a plurality of convolution kernels; continuously passing through a convolution kernel with 1 × 1 step size of 1 and filling the convolution kernel with the number of 0 characteristic graphs of 256, a convolution kernel with 1 characteristic graph number of 512 and a combination of a residual block with 3 × 3 step size of 2 to obtain a first detection result, and connecting the first detection result with the output after the up-sampling;

obtaining the size of the characteristic graph which is 5 x 5 through a plurality of convolution kernels of the first detection result; filling convolution kernels with the number of 0 characteristic graphs being 512 through step sizes of 1 × 1 and filling convolution kernels with the number of 0 characteristic graphs being 2 through step sizes of 3 × 3, filling convolution kernels with the number of 1 characteristic graphs being 1024 through step sizes of 1 × 3 and combining the residual blocks to obtain a second detection result;

the second test result is output as a third test result with a size of 5 x 5 by three consecutive convolutions of 1 x 1 step size 1 filled with 0 and 3 x 3 step size 2 filled with 1 and one 1 x 1 step size 1 filled with 0;

the second detection result is output as a fourth detection result of size 40 x 40 through a combination of two consecutive convolutions of 1 x 1 with step 1 filled with 0 and 3 x 3 with step 2 filled with 1 and a convolution of 1 x 1 with step 1 filled with 0 and through a convolution of two consecutive convolutions of 1 x 1 with step 1 filled with 0 and 3 x 3 with step 2 filled with 1 and through a convolution of 1 x 1 with step 1 filled with 0 after being added to the first detection result.

Preferably, in step 4, FOCAL LOSS is used as the LOSS function, which is expressed by the following formula,

FL(p_t)＝-α_t(1-p_t)^γlg p_t

in the formula:

p_tthe probability of the prediction is represented by,

both alpha and gamma are hyperparameters.

In step 4, Swish is preferably used as the activation function, which is formulated as,

in the formula:

x denotes the tensor of the input image,

e denotes a natural constant.

Preferably, step 5 specifically includes:

step 5.1, acquiring video data from video monitoring equipment in a monitoring area in real time, and converting the video data into pictures;

step 5.2, the safety belt detection model judges whether a target exists in the acquired video picture;

step 5.3, segmenting the person and the background of the video picture with the operation, acquiring the position of an anchor frame and judging whether the person wears a safety belt or not;

and 5.4, if the target without the safety belt is found, marking the target with a red rectangular frame and marking the target with a none typeface, and storing the marked image.

The invention also provides a detection system for detecting whether the safety belt is worn by the power plant operating personnel by using the detection method, which comprises the following steps: the video monitoring equipment is arranged in a power plant monitoring area and is used for acquiring video data; the monitoring host comprises a video module processing module and a safety belt detection module and is used for converting video data of the video monitoring equipment into pictures and judging whether the power plant operating personnel wear the safety belt or not by using a safety belt detection model; the communication network is connected with the video monitoring equipment and the monitoring host and is used for transmitting video data between the video monitoring equipment and the monitoring host; the display and warning module is connected with the monitoring host and used for displaying video data and judging whether the safety belt is worn or not, and giving a warning if the safety belt is not worn.

The safety belt detection data set is obtained by collecting video pictures in a monitoring area and is marked, the anchor frame which corresponds to the safety belt detection data set and is suitable for the monitoring area is obtained by using a clustering algorithm, two outputs with the sizes of 5 × 5 and 40 × 40 are respectively obtained through a plurality of convolution layers, pooling layers and activation layers, and the outputs can more accurately detect people with different distances; the class imbalance problem is solved by using FOCAL LOSS as a LOSS function, and the saturation problem with overlarge value is avoided by using Swish as an activation function. Through the improvement, the model adapts to a complex scene on site, the accuracy, the recall rate and the mAP (mean Average accuracy) of the application of the model are improved, the safety awareness of related personnel is facilitated, and the ordered development of work is facilitated.

Drawings

FIG. 1 is a flow chart of a seat belt detection method of the present invention;

FIG. 2 is a network architecture diagram of the convolutional neural network model of the present invention;

fig. 3 is a flow chart of real-time detection of seat belt conditions in a monitored area based on a seat belt detection model.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

It can be understood that the safety belt in the invention refers to a safety belt worn by a power plant worker during work.

As shown in fig. 1, the present invention provides a seat belt detection method, including the steps of:

step 1, historical video data are obtained from video monitoring equipment in a monitoring area and are converted into pictures, and a safety belt detection data set is formed.

In a preferred but non-limiting embodiment, a monitoring area video is acquired through a working monitoring camera, and after the video is converted into a corresponding picture by using OpenCV, a safety belt detection data set is formed. The acquired historical video pictures are not limited to scale, lighting, style, color and whether occlusion exists.

It is worth noting that the detection method of the invention detects a plurality of targets in one picture at the same time, and because the person has a slow advancing speed and the distance between the camera and the detected target is usually far when the safety belt is detected, a lightweight model similar to the detection of a high-speed moving object, such as a lightweight model for belt tearing detection in a power plant, is not needed, so that the model of the invention has a deep design and a long detection time, detects 10 pictures per second, and has high detection accuracy.

Meanwhile, it can be understood that the belt tearing can be detected only from the upper part relative to the belt tearing, and the detection position of the safety belt has diversity, namely, the belt tearing can be detected from the directions of the upper part, the upper side, the horizontal direction and the like of a target, and can also be detected by being placed at the lower side of the target. The diversity of the detection directions promotes us to obtain more sample images, and the pattern increase of the sample images can effectively improve the generalization of the model, so that the safety belt patterns under more scenes can be identified, and the false detection and missing detection caused by the problems of insufficient light, shielding and the like are avoided.

Step 2, marking whether a person in the safety belt detection data set picture wears a safety belt or not, and carrying out the following steps: the ratio of 2 is randomly divided into a training set and a test set. The step 2 specifically comprises the following steps:

and 2.1, marking whether people wear the safety belts in the safety belt detection data set one by one, wherein the marked part is used as a positive sample, and the unmarked part is used as a negative sample, namely a background class, so as to obtain the xml format file. It is understood that any image annotation software may be used to manually individually annotate the seat belts of the people in the seat belt detection dataset, such as, but not limited to, Labelme, labelImg, or Yolo _ mark.

And 2.2, acquiring the type and the detection frame of each labeled object in the xml format file, namely, a bounding box and generating a corresponding txt format file.

And 2.3, generating all txt format file paths corresponding to the safety belt detection data set, and storing the txt format file paths in the dataset.

And 2.4, randomly distributing the txt file which is stored with the marked content and stored in the dataset. txt file in the step 2.3, and calculating the total number of the txt file with the following steps of 8: the ratio of 2 is divided into a training set and a test set.

It is understood that txt as the file format generated, the file format storing the file path, and the division ratio of the training set and the test set is only a preferred but non-limiting embodiment, and one skilled in the art can configure the file format and the ratio of the training set and the test set at will.

And 3, generating a corresponding anchor frame suitable for the monitoring area according to the safety belt detection data set obtained in the step 1, so as to better adapt to the current detection environment and obtain higher detection accuracy. The step 3 specifically comprises the following steps:

in step 3, an anchor frame is generated by using a clustering method, wherein a distance measurement method in clustering is IoU distance (overlap degree), a distance D between a predicted frame and a real frame is expressed by the following formula,

D＝1-IoU(box,clusters)

in the formula:

d is the distance of the prediction frame from the real frame,

IoU (-) represents the overlap function,

the box is a marked box,

clusters is the number of clusters.

In the step 3, the number of anchor frames is set by itself according to the actual scene on the spot, and the anchor frame is obtained from the data set, and the anchor frame value is more suitable for the actual scene on the spot, and in this embodiment, the number of anchor frames is set to be 9, and specifically, in the model with the input of 640 x 640, [4,5], [5,7], [8,11], [10,13], [17,19], [29,38], [39,47], [48,61], [53,71 ]. Wherein [4,5] represents a rectangular frame with a width of 4 and a length of 5, and so on.

And 4, training the convolutional neural network model by using the training set, detecting the trained convolutional neural network model of each generation by using the test set, and screening to obtain the safety belt detection model. It is worth noting that in the safety belt detection method, because the target is far away from the camera, the detection difficulty is high, a model with deep depth is designed to better distinguish semantic information in an image, and a residual error network is adopted to transmit more position information. And simultaneously, a branch is led out from the model to detect the object with 40-by-40 pixels, and the dimension is mainly set for the target with the normal distance.

The convolutional neural network model is divided into 1 category, and the unbelted belt is labeled as none. As shown in fig. 2, the process of processing the picture by the convolutional neural network model is specifically that, first, a size 640 x 640 is input, then, a convolution kernel with 1 feature map number of 32 is filled by one step size of 3 x 3 being 1, a convolution kernel with 1 feature map number of 64 is filled by one step size of 3 x 3 being 2, and the obtained feature map size is changed to 320 x 320.

Then two consecutive passes of a 1 x 1 step size of 1 are filled with a convolution kernel of 0 feature number 32, a 3 x 3 step size of 2 are filled with a convolution kernel of 1 feature number 64, a combination of residual blocks. The feature size becomes 160 x 160 by filling 1 convolution kernel with feature number 128 with a step size of 3 x 3 of 2.

Then three consecutive passes with a step size of 1 x 1 are filled with a convolution kernel with a feature number of 64 of 0, a step size of 3 x 3 is filled with a step size of 2 with a convolution kernel with a feature number of 128 of 1, a combination of residual blocks. The feature size becomes 80 x 80 by filling 1 convolution kernel with feature number 256 with a step size of 3 x 3 of 2.

Then three consecutive passes with a step size of 1 x 1 are filled with a convolution kernel with a number of 0 signatures of 128, a step size of 3 x 3 is filled with a step size of 2 with a number of 1 signatures of 256, a combination of residual blocks. The feature size of the resulting convolution kernel becomes 40 x 40 by filling 1 feature number of 512 with a step size of 3 x 3 of 2.

Then three consecutive passes with a step size of 1 x 1 are filled with a convolution kernel with a feature number of 256 of 0, a step size of 3 x 3 is filled with a step size of 2 with a convolution kernel with a feature number of 512 of 1, a combination of residual blocks. The result 1 is connected to the up-sampled output.

The result 1 is then filled with 1 convolution kernel with feature number 768 by one step of 3 x 3 of 2, resulting in a feature size of 20 x 20.

Then, three consecutive passes through a combination of 1 x 1 step size 1 filled with 0 number of feature maps 512 convolution kernels, 3 x 3 step size 2 filled with 1 number of feature maps 768 convolution kernels, and a residual block. Then, the convolution kernel with 1 number of feature maps 1024 is filled by one step of 3 × 3 and 2, and the obtained feature map size becomes 10 × 10.

Then, four consecutive passes of a 1 x 1 step size of 1 are filled with a convolution kernel with a 0 signature number of 512, a 3 x 3 step size of 2 are filled with a convolution kernel with a 1 signature number of 768, a combination of residual blocks. Then, the convolution kernel with 1 number of feature maps 1024 is filled by one step of 3 × 3 and 2, and the obtained feature map size becomes 5 × 5.

Then, four consecutive passes of a 1 x 1 step size of 1 are filled with a convolution kernel with a 0 signature number of 512, a 3 x 3 step size of 2 are filled with a convolution kernel with a 1 signature number of 1024, a combination of residual blocks. Result 2 was obtained by three consecutive convolutions of 1 x 1 step size 1 filled with 0 and 3 x 3 step size 2 filled with 1 and one convolution of 1 x 1 step size 1 filled with 0 to obtain a detection result 3 of size 5 x 5.

The result 2 is subjected to two successive convolution combinations of 1 × 1 with 1 step size filled with 0 and 3 × 3 with 2 step size filled with 1 and one convolution of 1 × 1 with 1 step size filled with 0 and after three upsampling, is added to the result 1, and after two successive convolution combinations of 1 × 1 with 1 step size filled with 0 and 3 × 3 with 2 step size filled with 1 and one convolution of 1 × 1 with 1 step size filled with 0, a detection result 4 of size 40 is obtained.

The FOCAL LOSS is adopted as a LOSS function in training, the LOSS function is expressed by the following formula,

FL(p_t)＝-α_t(1-p_t)^γlg p_t

in the formula:

p_tthe probability of the prediction is represented by,

both alpha and gamma are hyperparameters.

Swish is adopted as an activation function, the activation function is expressed as,

in the formula:

x denotes the tensor of the input image,

e denotes a natural constant.

The class imbalance problem is solved by using FOCAL LOSS as a LOSS function, and the saturation problem with overlarge value is avoided by using Swish as an activation function. Through the improvement, the model adapts to the complex scene of the site and the accuracy, the recall rate and the mAP of the model application are improved.

The convolutional neural network model training process is that an input picture-picture is preprocessed into an input model with a specified size, the model comprises a convolutional layer, a pooling layer, a BN regularization layer, an activation function and the like, a plurality of coordinate center points are obtained after the model is processed, an anchor frame is generated on the basis of the coordinate center points, corresponding results are obtained according to NMS (Non Maximum Suppression), namely the classification and the corresponding positions of all targets in the predicted picture, namely the classification and the positions of the prediction frame-prediction are compared with the real classification and the positions, namely a labeling frame, loss values are obtained through a loss function, the Maximum direction of the gradient is searched, and model parameters are reversely propagated and updated.

The step 4 specifically comprises the following steps:

and 4.1, uniformly adjusting the pictures in the training set to 640 x 640.

Step 4.2, carrying out image enhancement processing on the picture subjected to size adjustment in the step 4.1; it is understood that one skilled in the art can perform image enhancement processing using image flipping, cropping, changing color, modifying brightness, contrast, saturation, etc.

And 4.3, setting the iteration times (epoch) to 10000, setting the initial learning rate of 0.001 for 32 images in each batch (bitch _ size), and updating the learning rate to 0.1 time of the original learning rate if no more optimal result exists for 200 times continuously.

And 4.4, training a convolutional neural network model by using the image subjected to image enhancement processing.

And 5, detecting the safety belt condition in the monitoring area in real time based on the safety belt detection model obtained in the step 4.

The real-time detection process comprises the steps of obtaining a video, preprocessing, judging whether a target exists or not by a model, obtaining a central point, obtaining a result after applying an anchor frame and NMS (Non Maximum Suppression), judging whether the target is in a safety belt-model regression mode by the model in a classified mode, obtaining a coordinate point of a prediction frame, drawing, and displaying a drawn image on a monitor.

As shown in fig. 3, step 5 specifically includes the following steps,

and 5.1, acquiring video data from the video monitoring equipment in the monitoring area in real time, and converting the video data into pictures.

And 5.2, judging whether an operator, namely a target, exists in the acquired video picture by the safety belt detection model.

Step 5.3, segmenting the person and the background of the video picture with the operator, acquiring the position of the anchor frame and judging whether the operator wears a safety belt; that is, the detection model performs classification and regression, where the classification is to divide the picture into positive samples or negative samples, that is, the background class, and the regression is to obtain the position of the predicted point.

The sizes of the characteristic graphs are respectively 5 × 5 and 40 × 40, and in which grid the predicted central coordinate falls in the model, the grid generates a corresponding prediction frame according to the size of the corresponding anchor frame, and the grid with the highest confidence coefficient is selected as the prediction frame.

And 5.4, marking by using a red rectangular frame and marking a none typeface if a target without a safety belt is found. And storing the marked image.

The safety belt detection data set is obtained by collecting video pictures in a monitoring area and is marked, the anchor frame which corresponds to the safety belt detection data set and is suitable for the monitoring area is obtained by using a clustering algorithm, two outputs with the sizes of 5 × 5 and 40 × 40 are respectively obtained through a plurality of convolution layers, pooling layers and activation layers, and the outputs can more accurately detect people with different distances; the class imbalance problem is solved by using FOCAL LOSS as a LOSS function, and the saturation problem with overlarge value is avoided by using Swish as an activation function. Through the improvement, the model adapts to a complex scene on site, the accuracy, the recall rate and the mAP of the model application are improved, the safety awareness of related personnel is facilitated, and the work is developed orderly.

The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.

Claims

1. A method for detecting whether a power plant operator wears a safety belt is characterized by comprising the following steps:

2. The method for detecting whether a power plant operator wears a safety belt according to claim 1, wherein the method comprises the following steps:

the step 2 specifically comprises the following steps:

3. The method for detecting whether a power plant operator wears a safety belt according to claim 1, wherein the method comprises the following steps:

in step 3, an anchor frame is generated by using a clustering method, the distance measurement method in the clustering is the overlapping degree distance, the distance D between the prediction frame and the real frame is expressed by the following formula,

D＝1-IoU(box,clusters)

in the formula:

d is the distance of the prediction frame from the real frame,

IoU (-) represents the overlap function,

the box is a marked box,

clusters is the number of clusters.

4. A method for detecting whether a safety belt is worn by a power plant operator according to any one of claims 1 to 3, characterized in that:

the step 4 specifically comprises the following steps:

step 4.1, uniformly adjusting the pictures in the training set to 640 x 640;

5. The method for detecting whether a power plant operator wears a safety belt according to claim 4, wherein the method comprises the following steps:

in step 4, the convolutional neural network models are divided into 1 category, and the models without safety belts are marked as none; a convolutional neural network model with an input picture size of 640 x 640; the number of anchor frames of two output layers of the convolutional neural network model is respectively 4 and 5, and the convolutional neural network model is used for detecting a common target and a small target, wherein the size of the common target is greater than 32 x 32, and the size of the small target is less than or equal to 32 x 32; the sizes of the characteristic graphs of the two output layers of the convolutional neural network model are 5 × 5 and 40 × 40 respectively.

6. The method for detecting whether a power plant operator wears a safety belt according to claim 5, wherein the method comprises the following steps:

in step 4, the process of processing the picture by the convolutional neural network model comprises the following steps:

7. The method for detecting whether a safety belt is worn by a power plant operator according to any one of claims 1 to 7, characterized by:

in step 4, FOCAL LOSS is used as a LOSS function, the LOSS function is expressed by the following formula,

FL(p_t)＝-α_t(1-p_t)^γlgp_t

in the formula:

p_tthe probability of the prediction is represented by,

both alpha and gamma are hyperparameters.

8. The method for detecting whether a safety belt is worn by a power plant operator according to claim [ c ], comprising the following steps:

in step 4, Swish is used as an activation function, the activation function formula is,

in the formula:

x denotes the tensor of the input image,

e denotes a natural constant.

9. The method for detecting whether a safety belt is worn by a power plant operator according to any one of claims 1 to 8, characterized by:

the step 5 specifically comprises the following steps:

10. A detection system for detecting whether a safety belt is worn by a power plant operator by using the detection method according to any one of claims 1 to 9, comprising:

the video monitoring equipment is arranged in a power plant monitoring area and is used for acquiring video data;

the monitoring host comprises a video module processing module and a safety belt detection module and is used for converting video data of the video monitoring equipment into pictures and judging whether the power plant operating personnel wear the safety belt or not by using a safety belt detection model;

the communication network is connected with the video monitoring equipment and the monitoring host and is used for transmitting video data between the video monitoring equipment and the monitoring host;

the display and warning module is connected with the monitoring host and used for displaying video data and judging whether the safety belt is worn or not, and giving a warning if the safety belt is not worn.