CN114821486B

CN114821486B - Personnel identification method in power operation scene

Info

Publication number: CN114821486B
Application number: CN202210745758.2A
Authority: CN
Inventors: 刘军; 姜明华; 李会引; 赵雅欣; 朱佳龙; 余锋
Original assignee: Wuhan Textile University
Current assignee: Wuhan Textile University
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-10-11
Anticipated expiration: 2042-06-29
Also published as: CN114821486A

Abstract

The invention discloses a method for identifying personnel in an electric power operation scene, which comprises the following steps: collecting images of workers on the electric power construction site, images of non-constructors and images without detection targets, marking and distinguishing different workers to form a training data set. And expanding the collected data set through technologies such as image splicing, image turning, noise point increasing and the like. And the target detection network is trained by using the expanded data set to obtain a personnel identification model. And detecting the image acquired on the site in real time by using the trained personnel identification model, outputting the identity of the personnel when the personnel are detected, and reminding when the identified personnel identity is non-constructor. The invention can avoid the condition that non-constructors mistakenly enter the electric power construction site, effectively supervise the condition of the constructors in the electric power construction site and powerfully ensure the life and property safety of the electric power construction site.

Description

Personnel identification method in power operation scene

Technical Field

The invention relates to the field of target detection, in particular to a method for identifying a person in an electric power operation scene.

Background

In recent years, with the gradual maturity of computer vision computing, especially the rapid development of neural network technology, deep learning technology is beginning to be applied to various production environments. The concept of deep learning has first originated from the study of artificial neural networks by western mathematicians and computer scientists. The artificial neural network is an algorithm model simulating animal neural network behavior characteristics and performing distributed parallel information processing, and the aim of processing information is fulfilled by adjusting the interconnection relationship among a large number of nodes inside the artificial neural network. Deep learning appears in life and plays a great role, and is also applied to electric power construction scenes at present.

The special live-wire or other dangerous equipment exists in operation sites such as electric power, buildings and the like, the environment is complex, safety accidents easily occur, and casualties are easily caused if people who do not meet the electric power working requirements enter an electric power construction site. In the daily power operation stage, the phenomenon needs to be supervised, and the probability of the occurrence of the phenomenon is reduced. However, because the safety awareness of the operator is insufficient, the non-construction worker is easy to enter the construction site, and the guardian needs to monitor the site all the time and stop the situation in time. Through data statistics discovery, the labor intensity of the guardian is high, the identification efficiency is low, and the intelligent level is low due to the fact that the identity of the operator is manually identified.

Chinese patent publication No. CN113378622A discloses "a method, device, system and medium for identifying a specific person" for determining whether a person to be captured is a specific person according to an image feature value obtained by performing feature identification on a face image captured by a camera system. However, the technology has high requirements on images and insufficient flexibility, and in the current epidemic situation, the human face is used as the characteristic of specific personnel identification, so that the mask needs to be taken off by a detector, and the current situation is not met.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides a personnel identification method in an electric power operation scene, aiming at solving the situation that non-constructors mistakenly enter an electric power construction site, effectively implementing supervision on the safety situation of the electric power construction site and powerfully ensuring the life and property safety of the constructors in the electric power construction site.

For the purpose of experiment, according to one aspect of the present invention, a method for identifying a person in an electric power operation scene is provided, which includes the following steps:

step 1, collecting images of workers on a power construction site, images of non-constructors and images without detection targets, marking and distinguishing different personnel to form a training data set;

step 2, expanding the collected data set;

step 3, training the target detection network by using the expanded data set to obtain a personnel identification model;

the object detection network comprises three parts: a feature extraction part, a feature fusion part and a result output part;

and 4, detecting the image acquired on the site in real time by using the trained personnel identification model, outputting the identity of the personnel when the personnel are detected, and reminding when the identified personnel identity is non-constructor.

Further, the feature extraction part comprises 7 convolution modules, wherein the first convolution module comprises 1 convolution module of 3 × 3, 1 convolution module of 2 × 2 and a channel space attention module; the second convolution module comprises 2 1 × 1 convolutions and 13 × 3 convolution, and the third convolution module comprises 12 × 2 convolution and a channel space attention module; the fourth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, the fifth convolution module includes 12 × 2 convolutions and one channel space attention module, the sixth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, and the seventh convolution module includes 12 × 2 convolutions and one channel space attention module; in addition, the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module, the output of the third convolution module is subjected to deep separable convolution and then is added with the output of the fourth convolution module to obtain the input of a fifth convolution module, and the output of the fifth convolution module is subjected to deep separable convolution and then is added with the output of the sixth convolution module to obtain the input of a seventh convolution module.

Further, the feature fusion part comprises 1 convolution operation module, 1 branch structure and 3 up-sampling modules, the input of the feature fusion part is the output of the seventh convolution module, wherein the convolution operation module comprises 2 1 × 1 convolutions and 13 × 3 convolution, the branch structure performs 3 × 3, 7 × 7 and 9 × 9 convolution on the output of the convolution operation module respectively, then the output of the 3 branches and the output of the convolution operation module are spliced to be used as the input of the up-sampling module, and each up-sampling module comprises 2 1 × 1 convolutions, 13 × 3 convolutions and 1 up-sampling operation; the result output part comprises three outputs, namely an output for predicting a large target, an output for predicting a medium target and an output for predicting a small target, wherein the output for predicting the large target is realized by splicing the seventh convolution module with the first up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, the output for predicting the medium target is realized by splicing the fifth convolution module with the second up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, and the output for predicting the small target is realized by splicing the third convolution module with the third up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution.

Further, the specific operation of the channel space attention module is as follows;

firstly, carrying out 3 multiplied by 3 convolution operation on an input feature map without changing the height, width and channel number of the feature map, then sending the feature map to a channel attention module, wherein the specific operation of the channel attention module is to respectively carry out global average pooling and global maximum pooling on the feature map to obtain two vectors, then adding the two vectors to obtain a vector weight on a channel dimension, and multiplying the weight vector by the original input feature map to obtain the feature map with channel attention;

then, after obtaining the feature map with channel attention, 3 × 3 convolution operation is performed, and then the feature map is sent to a space attention module, wherein the specific operation of the space attention module is to generate two feature maps with the same size as the original map through two branches, one branch is maximum pooling operation with 1 × 1 step size 1 and one branch is average pooling operation with 1 × 1 step size 1, then the two feature maps are spliced and then subjected to convolution operation to generate a feature map with channel dimension 1, the feature map with channel attention is multiplied by the 1-dimensional feature map to obtain a space attention feature map, and finally the obtained space attention feature map is directly multiplied by channel dimension vector weights in the channel attention module to output the feature map with channel attention and space attention.

Further, the specific processing procedure of the feature extraction part is as follows;

inputting a training image into a first convolution module, firstly carrying out 3 multiplied by 3 convolution to change the depth of the training image into 32 layers, then reducing the height and width of the feature image into 1/2 of the original height and width by using convolution with 2 multiplied by 2 step pitch, wherein the number of channels is unchanged, global features are obtained through a channel space attention module, and the height and width and the number of channels of the feature image are not changed; then, the feature image enters a second convolution module, the number of channels of the feature image is adjusted to 64 through convolution of 1 × 1, then the number of channels is changed to 128 through convolution of 3 × 3, finally, convolution of 1 × 1 is executed to adjust the number of channels of the feature image to 64, then the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module;

executing a third convolution module to a sixth convolution module to obtain a feature map with the height and width of 1/8 of the original image, wherein the depth of the feature map is 256 layers, sending the feature map into a seventh convolution module, performing convolution with 2 multiplied by 2 step pitch of 2, reducing the height and width of the feature map to 1/16 of the original height and width, and sending the feature map into a channel space attention module without changing the height and width and the number of channels of the feature map;

after each convolution operation, a normalization operation and an activation operation are performed, where the activation function used in the activation operation is leak ReLU.

Furthermore, in the step 1, marking is carried out on the identity of personnel in the collected images, and identity labels comprise electric power construction workers, work responsible persons, special responsible persons and non-construction personnel, and in addition, some images without detection targets are collected to be used as negative samples; and manually labeling each image with the detection target to obtain the position and the category information of the target to be detected, wherein the labeled target area is that the body part of the person wearing the clothes does not contain the head.

Further, step 2, image splicing, horizontal turning, pixel shifting, random cutting, deformation scaling are carried out on the collected data set image, gaussian noise processing is added to generate a new image, and expansion of the data set is achieved.

Further, the specific implementation manner of step 4 is as follows;

the method comprises the steps of acquiring a monitoring video in real time, analyzing the video frame by frame to obtain an image of each frame, sending the analyzed image into a personnel identification model, outputting the identity category of a detection person through the personnel identification model, calling a voice broadcasting function to remind and store and log the image when the identity of the detection person is identified as a non-constructor, and normally displaying the identity information of other persons when the identity of the detection person is identified, wherein the standard for identifying different identities of the persons is clothes worn by the persons in an electric power scene.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) The negative sample image is collected and an image data processing technology is used for processing the image in the data set, the data set is expanded, the image information is closer to the real environment, and the robustness of the target detection model in the real environment is improved.

(2) Based on the attention mechanism and the target detection network with the branch structure, more global information and detail information can be obtained by outputting the characteristic diagram, the personnel identification precision is increased, and the information loss is reduced.

(3) The condition that non-construction personnel mistakenly enter the electric power construction site can be avoided, supervision on the condition of the personnel on the electric power construction site is effectively implemented, and life and property safety of the electric power construction site is powerfully guaranteed.

Drawings

Fig. 1 is a schematic block flow diagram of a method for identifying a person in an electric power operation scene according to an embodiment of the present invention.

Fig. 2 is a diagram of a target detection network structure of a person identification method in an electric power operation scene according to an embodiment of the present invention.

Fig. 3 is a diagram of a channel space module structure of a person identification method in an electric power operation scene according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

As shown in fig. 1, the schematic block flow diagram of a method for identifying a person in an electric power operation scenario provided by the embodiment includes two blocks: detection network training module and personnel identification module, wherein detection network training module includes: the system comprises a personnel data set making module, a data set expanding module and a target detection network training module. The embodiment provides a method for identifying personnel in an electric power operation scene, which comprises the following steps:

(1) Collecting images of workers on the electric power construction site, images of non-constructors and images without detection targets (namely images without the workers), marking and distinguishing different workers to form a training data set;

the collected on-site personnel identities are divided, the identity labels comprise electric power construction workers, work responsible persons, special responsible persons and non-construction persons, and in addition, some images without detection targets can be collected to serve as negative samples. And manually marking each image with the detection target to obtain the position and the category information of the target to be detected, wherein the marked target area is that the body part of the person wearing the clothes does not comprise a head.

(2) Expanding the collected data set by technologies of image splicing, image turning, noise point increasing and the like;

the collected data set image is subjected to image data processing means such as image splicing, horizontal turning, pixel shifting, random cutting, deformation scaling, gaussian noise increasing and the like to generate a new image, and the data set of the field personnel image is expanded.

(3) The target detection network is trained by using the expanded data set to obtain a personnel identification model;

the network structure of the target detection network based on the attention mechanism and the branch structure is a structure diagram of the target detection network of the personnel identification method in the power operation scene, as provided by the embodiment shown in fig. 2, wherein the network structure can be divided into three parts: the device comprises a feature extraction part, a feature fusion part and a result output part, wherein the feature extraction part comprises 7 convolution modules, wherein the first convolution module comprises 1 convolution module with the volume of 3 multiplied by 3, 1 convolution module with the volume of 2 multiplied by 2 and a channel space attention module; the second convolution module comprises 2 1 × 1 convolutions and 13 × 3 convolution, and the third convolution module comprises 12 × 2 convolution and a channel space attention module; the fourth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, the fifth convolution module includes 12 × 2 convolutions and one channel space attention module, the sixth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, and the seventh convolution module includes 12 × 2 convolutions and one channel space attention module; in addition, the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module, the output of the third convolution module is subjected to deep separable convolution and then is added with the output of the fourth convolution module to obtain the input of a fifth convolution module, and the output of the fifth convolution module is subjected to deep separable convolution and then is added with the output of the sixth convolution module to obtain the input of a seventh convolution module. The feature fusion part comprises 1 convolution operation module, 1 branch structure and 3 upsampling modules, the input of the feature fusion part is the output of a seventh convolution module, wherein the convolution operation module comprises 2 1 x 1 convolutions and 13 x 3 convolution, the branch structure respectively performs 3 x 3, 7 x 7 and 9 x 9 convolution on the output of the convolution operation module, then the outputs of 3 branches and the output of the convolution operation module are spliced to be used as the input of the upsampling module, and each upsampling module comprises 2 1 x 1 convolutions, 13 x 3 convolutions and 1 upsampling operation; the result output part comprises three outputs, namely an output for predicting a large target, an output for predicting a medium target and an output for predicting a small target, wherein the output for predicting the large target is realized by splicing the seventh convolution module with the first up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, the output for predicting the medium target is realized by splicing the fifth convolution module with the second up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, and the output for predicting the small target is realized by splicing the third convolution module with the third up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution.

Specifically, the first part of the network is a feature extraction part, and the specific processing process is as follows: inputting a training image into a first convolution module, firstly carrying out 3 multiplied by 3 convolution to change the depth of the training image into 32 layers, then reducing the height and width of the feature map into 1/2 of the original height and width by using convolution with 2 multiplied by 2 step pitch, obtaining global features through a channel space attention module without changing the height and width and the number of channels of the feature map, wherein the number of channels is unchanged; then, the feature image enters a second convolution module, the number of channels of the feature image is adjusted to 64 through convolution of 1 × 1, then the number of channels is changed to 128 through convolution of 3 × 3, finally, convolution of 1 × 1 is executed to adjust the number of channels of the feature image to 64, then the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module;

executing the third convolution module to the sixth convolution module according to the above operations to obtain a feature map with the height and width of 1/8 of the original image, wherein the depth of the feature map is 256 layers, sending the feature map into the seventh convolution module, performing convolution with 2 × 2 step pitch of 2, reducing the height and width of the feature map to 1/16 of the original width, and sending the feature map into the channel space attention module without changing the height and width and the number of channels of the feature map;

the second part is a feature fusion part of the target network, and the processing procedure is as follows: the feature map is sent into a convolution operation module, 1 × 1 convolution is carried out firstly to adjust the depth of the feature map to 512, then the height and the width of the feature map are reduced to 1/32 of the original height and width through 3 × 3 convolution, the depth is adjusted to 1024, and the depth of the feature map is adjusted to 512 through 1 × 1 convolution; and then entering a branch structure, dividing the three branches into three branches, and respectively performing 3 × 3 convolution, 7 × 7 convolution and 9 × 9 convolution without changing the size and the depth of the feature map, and splicing the outputs of the three branches and the feature map of the input branch structure to obtain the input of a first up-sampling module, wherein the height and the width of the feature map are 1/32 of the original height and the depth is 2048. The first up-sampling module firstly performs 1 × 1 convolution to adjust the depth of the feature map to 512, then adjusts the depth of the feature map to 1024 by 3 × 3 convolution, adjusts the depth of the feature map to 512 by 1 × 1 convolution, and then adjusts the feature map to 1/16 of the original depth by up-sampling operation, wherein the number of channels is unchanged; the second and third upsampling modules are then performed according to the operation of the first upsampling module above.

The third part is a target detection network result output part, and the concrete operation is as follows: splicing the seventh convolution module and the first up-sampling module to obtain a feature map with the size of 1/16 of the original image and the depth of 768, adjusting the depth of the feature map to 512 by 3 × 3 convolution, and adjusting the depth of the feature map to 27 by 1 × 1 convolution to obtain the output of a predicted large target; then, splicing the fifth convolution module and the second up-sampling module to obtain a feature map with the original image size of 1/8 and the depth of 640, adjusting the depth of the feature map to 512 through 3 × 3 convolution, and adjusting the depth of the feature map to 27 through 1 × 1 convolution to obtain the output of a target in prediction; splicing the third convolution module and the third up-sampling module to obtain a feature map with the size of original 1/4 and the depth of 576, then adjusting the depth of the feature map to 512 through 3 × 3 convolution, and adjusting the depth of the feature map to 27 through 1 × 1 convolution to obtain the output of the target in prediction;

wherein, the number of channels of the output feature map is 27, these dimensions are decomposed into (4 + 1) × 3 dimensions, the first 4 dimensions correspond to 4 worker categories respectively, the second 4 dimensions correspond to the center point coordinates and height and width of the prediction box, the value of the output confidence coefficient of the last one dimension 1 is the probability of being determined as a positive sample, and × 3 represents that each position in the feature map outputs 3 groups of prediction box information.

As shown in fig. 3, the channel space attention module used in the target detection network structure specifically operates as follows:

firstly, the input feature map is firstly subjected to 3 × 3 convolution operation without changing the height and width of the feature map and the number of channels, and then sent to a channel attention module, wherein the specific operation of the channel attention module is to respectively perform global average pooling and global maximum pooling on the feature map to obtain two vectors, then add the two vectors to obtain the vector weight on the channel dimension, and multiply the weight vector and the original input feature map to obtain the feature map with channel attention.

Then, after obtaining the feature map with channel attention, 3 × 3 convolution operation is performed, and then the feature map is sent to a space attention module, wherein the specific operation of the space attention module is to generate two feature maps with the same size as the original map through two branches, one branch is a maximum pooling operation with 1 × 1 step size and the other branch is an average pooling operation with 1 × 1 step size, then the two feature maps are spliced and then subjected to convolution operation to generate a feature map with channel dimension of 1 dimension, the feature map with channel attention is multiplied by the 1 dimension feature map to obtain a space attention feature map, and finally the obtained space attention feature map is directly multiplied by the channel dimension vector weight in the channel attention module to output the feature map with channel attention and space attention.

The specific operation of the branch structure in the target detection network structure is as follows.

And (3) sending the feature map into a branch structure to carry out convolution with 3 × 3, 7 × 7 and 9 × 9 respectively without changing the size and depth of the feature map, and finally splicing the original input feature map and the feature map formed by the three convolutions to obtain the feature map with multi-scale detail information.

(4) Detecting the image acquired on site in real time by using a trained personnel identification model, outputting the identity of personnel when the personnel are detected, and reminding when the identified personnel identity is non-constructor;

wherein, according to electric power construction site personnel's clothing identification personnel identity, when discerning non-construction worker, remind, specifically as follows:

the method comprises the steps of acquiring a monitoring video in real time, analyzing the video frame by frame to obtain an image of each frame, sending the analyzed image into a personnel identification model, outputting the identity category of a detection person through the personnel identification model, calling a voice broadcasting function to remind and store and log the picture when the identity of the person is identified to be a non-constructor, and normally displaying the identity information of other persons when the other persons are identified, wherein the standard for identifying different identities of the persons is the clothes worn by the person in the power scene. The working personnel of the power construction site have corresponding clothes, the personnel identities are distinguished by identifying different clothes of the working personnel of the power construction site, such as power construction workers, working responsible persons and special responsible persons, and the standard of non-construction personnel is that no power construction clothes are worn but only power construction clothes jackets or trousers are worn.

The invention provides a personnel identification method in an electric power operation scene, which not only can identify the identity of personnel in an electric power construction site, but also can remind non-construction personnel when the non-construction personnel are identified. The specific application is that the team cooperates with a national power grid, images of various field personnel are shot on an electric power construction field, a data set is manufactured through the implementation example method, a personnel identification model obtained through a network of the team is trained, the personnel identification accuracy of the model can reach more than 90% through verification under the actual condition, the condition that non-construction personnel mistakenly enter the electric power construction field can be avoided, supervision on the condition of the personnel on the electric power construction field is effectively implemented, and the life and property safety of the personnel on the electric power construction field is powerfully guaranteed.

Various changes and modifications may be made to the disclosure by those skilled in the art without departing from the spirit and scope of the disclosure. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for identifying personnel in an electric power operation scene is characterized by comprising the following steps:

step 1, collecting images of workers, images of non-constructors and images without detection targets on a power construction site, marking different workers and distinguishing the different workers to form a training data set;

step 2, expanding the collected data set;

the feature extraction part comprises 7 convolution modules, wherein the first convolution module comprises 1 convolution module of 3 multiplied by 3, 1 convolution module of 2 multiplied by 2 and a channel space attention module; the second convolution module comprises 2 1 × 1 convolutions and 13 × 3 convolution, and the third convolution module comprises 12 × 2 convolution and a channel space attention module; the fourth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, the fifth convolution module includes 12 × 2 convolutions and one channel space attention module, the sixth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, and the seventh convolution module includes 12 × 2 convolutions and one channel space attention module; in addition, the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module, the output of the third convolution module is subjected to deep separable convolution and then is added with the output of the fourth convolution module to obtain the input of a fifth convolution module, and the output of the fifth convolution module is subjected to deep separable convolution and then is added with the output of the sixth convolution module to obtain the input of a seventh convolution module;

the feature fusion part comprises 1 convolution operation module, 1 branch structure and 3 up-sampling modules, the input of the feature fusion part is the output of the seventh convolution module, wherein the convolution operation module comprises 2 1 × 1 convolution and 13 × 3 convolution, the branch structure respectively performs 3 × 3, 7 × 7 and 9 × 9 convolution on the output of the convolution operation module, then the output of the 3 branches and the output of the convolution operation module are spliced to be used as the input of the up-sampling module, and each up-sampling module comprises 2 1 × 1 convolution, 13 × 3 convolution and 1 up-sampling operation; the result output part comprises three outputs, namely an output for predicting a large target, an output for predicting a medium target and an output for predicting a small target, wherein the output for predicting the large target is realized by splicing the seventh convolution module with the first up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, the output for predicting the medium target is realized by splicing the fifth convolution module with the second up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, and the output for predicting the small target is realized by splicing the third convolution module with the third up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution;

the specific processing procedure of the feature extraction part is as follows;

inputting a training image into a first convolution module, firstly carrying out 3 multiplied by 3 convolution to change the depth of the training image into 32 layers, then reducing the height and width of the feature map into 1/2 of the original height and width by using convolution with 2 multiplied by 2 step pitch, obtaining global features through a channel space attention module without changing the height and width and the number of channels of the feature map, wherein the number of channels is unchanged; then, the feature image enters a second convolution module, the number of channels of the feature image is adjusted to 64 through convolution of 1 × 1, then the number of channels is changed to 128 through convolution of 3 × 3, finally, convolution of 1 × 1 is executed to adjust the number of channels of the feature image to 64, then the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module;

after each convolution operation, performing a normalization operation and an activation operation, wherein an activation function used in the activation operation is Leaky ReLU;

and 4, detecting the image acquired on the site in real time by using the trained personnel identification model, outputting the identity of the personnel when the personnel are detected, and reminding when the identified identity of the personnel is non-constructor.

2. The method for identifying the person in the power operation scene as claimed in claim 1, wherein: the specific operation of the channel space attention module is as follows;

firstly, carrying out 3 multiplied by 3 convolution operation on an input feature map without changing the height and width of the feature map and the number of channels, and then sending the feature map to a channel attention module, wherein the specific operation of the channel attention module is to carry out global average pooling and global maximum pooling on the feature map respectively to obtain two vectors, then adding the two vectors to obtain a vector weight on a channel dimension, and multiplying the weight vector by the original input feature map to obtain the feature map with channel attention;

then, after obtaining the feature map with channel attention, 3 × 3 convolution operation is performed, and then the feature map is sent to a space attention module, wherein the specific operation of the space attention module is to generate two feature maps with the same size as the original map through two branches, one branch is a maximum pooling operation with 1 × 1 step size 1 and the other branch is an average pooling operation with 1 × 1 step size 1, then the two feature maps are spliced and then subjected to convolution operation to generate a feature map with channel dimension 1, the feature map with channel attention is multiplied by the 1-dimensional feature map to obtain a space attention feature map, and finally the obtained space attention feature map is directly multiplied by the channel dimension vector weight in the channel attention module to output the feature map with channel attention and space attention.

3. The method for identifying the person in the power operation scene as claimed in claim 1, wherein: marking the personnel identity in the collected image in the step 1, wherein the identity label comprises an electric power construction worker, a work responsible person, a special responsible person and a non-construction person, and in addition, some images without detection targets are collected as negative samples; and manually marking each image with the detection target to obtain the position and the category information of the target to be detected, wherein the marked target area is that the body part of the person wearing the clothes does not comprise a head.

4. The method for identifying the personnel in the power operation scene as claimed in claim 1, characterized in that: and 2, carrying out image splicing, horizontal turning, pixel shifting, random cutting, deformation scaling on the collected data set image, adding Gaussian noise processing to generate a new image, and realizing expansion of the data set.

5. The method for identifying the person in the power operation scene as claimed in claim 1, wherein: the specific implementation manner of the step 4 is as follows;