CN114821486B - Personnel identification method in power operation scene - Google Patents

Personnel identification method in power operation scene Download PDF

Info

Publication number
CN114821486B
CN114821486B CN202210745758.2A CN202210745758A CN114821486B CN 114821486 B CN114821486 B CN 114821486B CN 202210745758 A CN202210745758 A CN 202210745758A CN 114821486 B CN114821486 B CN 114821486B
Authority
CN
China
Prior art keywords
convolution
module
feature map
output
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210745758.2A
Other languages
Chinese (zh)
Other versions
CN114821486A (en
Inventor
刘军
姜明华
李会引
赵雅欣
朱佳龙
余锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Textile University
Original Assignee
Wuhan Textile University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Textile University filed Critical Wuhan Textile University
Priority to CN202210745758.2A priority Critical patent/CN114821486B/en
Publication of CN114821486A publication Critical patent/CN114821486A/en
Application granted granted Critical
Publication of CN114821486B publication Critical patent/CN114821486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for identifying personnel in an electric power operation scene, which comprises the following steps: collecting images of workers on the electric power construction site, images of non-constructors and images without detection targets, marking and distinguishing different workers to form a training data set. And expanding the collected data set through technologies such as image splicing, image turning, noise point increasing and the like. And the target detection network is trained by using the expanded data set to obtain a personnel identification model. And detecting the image acquired on the site in real time by using the trained personnel identification model, outputting the identity of the personnel when the personnel are detected, and reminding when the identified personnel identity is non-constructor. The invention can avoid the condition that non-constructors mistakenly enter the electric power construction site, effectively supervise the condition of the constructors in the electric power construction site and powerfully ensure the life and property safety of the electric power construction site.

Description

Personnel identification method in power operation scene
Technical Field
The invention relates to the field of target detection, in particular to a method for identifying a person in an electric power operation scene.
Background
In recent years, with the gradual maturity of computer vision computing, especially the rapid development of neural network technology, deep learning technology is beginning to be applied to various production environments. The concept of deep learning has first originated from the study of artificial neural networks by western mathematicians and computer scientists. The artificial neural network is an algorithm model simulating animal neural network behavior characteristics and performing distributed parallel information processing, and the aim of processing information is fulfilled by adjusting the interconnection relationship among a large number of nodes inside the artificial neural network. Deep learning appears in life and plays a great role, and is also applied to electric power construction scenes at present.
The special live-wire or other dangerous equipment exists in operation sites such as electric power, buildings and the like, the environment is complex, safety accidents easily occur, and casualties are easily caused if people who do not meet the electric power working requirements enter an electric power construction site. In the daily power operation stage, the phenomenon needs to be supervised, and the probability of the occurrence of the phenomenon is reduced. However, because the safety awareness of the operator is insufficient, the non-construction worker is easy to enter the construction site, and the guardian needs to monitor the site all the time and stop the situation in time. Through data statistics discovery, the labor intensity of the guardian is high, the identification efficiency is low, and the intelligent level is low due to the fact that the identity of the operator is manually identified.
Chinese patent publication No. CN113378622A discloses "a method, device, system and medium for identifying a specific person" for determining whether a person to be captured is a specific person according to an image feature value obtained by performing feature identification on a face image captured by a camera system. However, the technology has high requirements on images and insufficient flexibility, and in the current epidemic situation, the human face is used as the characteristic of specific personnel identification, so that the mask needs to be taken off by a detector, and the current situation is not met.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a personnel identification method in an electric power operation scene, aiming at solving the situation that non-constructors mistakenly enter an electric power construction site, effectively implementing supervision on the safety situation of the electric power construction site and powerfully ensuring the life and property safety of the constructors in the electric power construction site.
For the purpose of experiment, according to one aspect of the present invention, a method for identifying a person in an electric power operation scene is provided, which includes the following steps:
step 1, collecting images of workers on a power construction site, images of non-constructors and images without detection targets, marking and distinguishing different personnel to form a training data set;
step 2, expanding the collected data set;
step 3, training the target detection network by using the expanded data set to obtain a personnel identification model;
the object detection network comprises three parts: a feature extraction part, a feature fusion part and a result output part;
and 4, detecting the image acquired on the site in real time by using the trained personnel identification model, outputting the identity of the personnel when the personnel are detected, and reminding when the identified personnel identity is non-constructor.
Further, the feature extraction part comprises 7 convolution modules, wherein the first convolution module comprises 1 convolution module of 3 × 3, 1 convolution module of 2 × 2 and a channel space attention module; the second convolution module comprises 2 1 × 1 convolutions and 13 × 3 convolution, and the third convolution module comprises 12 × 2 convolution and a channel space attention module; the fourth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, the fifth convolution module includes 12 × 2 convolutions and one channel space attention module, the sixth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, and the seventh convolution module includes 12 × 2 convolutions and one channel space attention module; in addition, the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module, the output of the third convolution module is subjected to deep separable convolution and then is added with the output of the fourth convolution module to obtain the input of a fifth convolution module, and the output of the fifth convolution module is subjected to deep separable convolution and then is added with the output of the sixth convolution module to obtain the input of a seventh convolution module.
Further, the feature fusion part comprises 1 convolution operation module, 1 branch structure and 3 up-sampling modules, the input of the feature fusion part is the output of the seventh convolution module, wherein the convolution operation module comprises 2 1 × 1 convolutions and 13 × 3 convolution, the branch structure performs 3 × 3, 7 × 7 and 9 × 9 convolution on the output of the convolution operation module respectively, then the output of the 3 branches and the output of the convolution operation module are spliced to be used as the input of the up-sampling module, and each up-sampling module comprises 2 1 × 1 convolutions, 13 × 3 convolutions and 1 up-sampling operation; the result output part comprises three outputs, namely an output for predicting a large target, an output for predicting a medium target and an output for predicting a small target, wherein the output for predicting the large target is realized by splicing the seventh convolution module with the first up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, the output for predicting the medium target is realized by splicing the fifth convolution module with the second up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, and the output for predicting the small target is realized by splicing the third convolution module with the third up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution.
Further, the specific operation of the channel space attention module is as follows;
firstly, carrying out 3 multiplied by 3 convolution operation on an input feature map without changing the height, width and channel number of the feature map, then sending the feature map to a channel attention module, wherein the specific operation of the channel attention module is to respectively carry out global average pooling and global maximum pooling on the feature map to obtain two vectors, then adding the two vectors to obtain a vector weight on a channel dimension, and multiplying the weight vector by the original input feature map to obtain the feature map with channel attention;
then, after obtaining the feature map with channel attention, 3 × 3 convolution operation is performed, and then the feature map is sent to a space attention module, wherein the specific operation of the space attention module is to generate two feature maps with the same size as the original map through two branches, one branch is maximum pooling operation with 1 × 1 step size 1 and one branch is average pooling operation with 1 × 1 step size 1, then the two feature maps are spliced and then subjected to convolution operation to generate a feature map with channel dimension 1, the feature map with channel attention is multiplied by the 1-dimensional feature map to obtain a space attention feature map, and finally the obtained space attention feature map is directly multiplied by channel dimension vector weights in the channel attention module to output the feature map with channel attention and space attention.
Further, the specific processing procedure of the feature extraction part is as follows;
inputting a training image into a first convolution module, firstly carrying out 3 multiplied by 3 convolution to change the depth of the training image into 32 layers, then reducing the height and width of the feature image into 1/2 of the original height and width by using convolution with 2 multiplied by 2 step pitch, wherein the number of channels is unchanged, global features are obtained through a channel space attention module, and the height and width and the number of channels of the feature image are not changed; then, the feature image enters a second convolution module, the number of channels of the feature image is adjusted to 64 through convolution of 1 × 1, then the number of channels is changed to 128 through convolution of 3 × 3, finally, convolution of 1 × 1 is executed to adjust the number of channels of the feature image to 64, then the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module;
executing a third convolution module to a sixth convolution module to obtain a feature map with the height and width of 1/8 of the original image, wherein the depth of the feature map is 256 layers, sending the feature map into a seventh convolution module, performing convolution with 2 multiplied by 2 step pitch of 2, reducing the height and width of the feature map to 1/16 of the original height and width, and sending the feature map into a channel space attention module without changing the height and width and the number of channels of the feature map;
after each convolution operation, a normalization operation and an activation operation are performed, where the activation function used in the activation operation is leak ReLU.
Furthermore, in the step 1, marking is carried out on the identity of personnel in the collected images, and identity labels comprise electric power construction workers, work responsible persons, special responsible persons and non-construction personnel, and in addition, some images without detection targets are collected to be used as negative samples; and manually labeling each image with the detection target to obtain the position and the category information of the target to be detected, wherein the labeled target area is that the body part of the person wearing the clothes does not contain the head.
Further, step 2, image splicing, horizontal turning, pixel shifting, random cutting, deformation scaling are carried out on the collected data set image, gaussian noise processing is added to generate a new image, and expansion of the data set is achieved.
Further, the specific implementation manner of step 4 is as follows;
the method comprises the steps of acquiring a monitoring video in real time, analyzing the video frame by frame to obtain an image of each frame, sending the analyzed image into a personnel identification model, outputting the identity category of a detection person through the personnel identification model, calling a voice broadcasting function to remind and store and log the image when the identity of the detection person is identified as a non-constructor, and normally displaying the identity information of other persons when the identity of the detection person is identified, wherein the standard for identifying different identities of the persons is clothes worn by the persons in an electric power scene.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) The negative sample image is collected and an image data processing technology is used for processing the image in the data set, the data set is expanded, the image information is closer to the real environment, and the robustness of the target detection model in the real environment is improved.
(2) Based on the attention mechanism and the target detection network with the branch structure, more global information and detail information can be obtained by outputting the characteristic diagram, the personnel identification precision is increased, and the information loss is reduced.
(3) The condition that non-construction personnel mistakenly enter the electric power construction site can be avoided, supervision on the condition of the personnel on the electric power construction site is effectively implemented, and life and property safety of the electric power construction site is powerfully guaranteed.
Drawings
Fig. 1 is a schematic block flow diagram of a method for identifying a person in an electric power operation scene according to an embodiment of the present invention.
Fig. 2 is a diagram of a target detection network structure of a person identification method in an electric power operation scene according to an embodiment of the present invention.
Fig. 3 is a diagram of a channel space module structure of a person identification method in an electric power operation scene according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the schematic block flow diagram of a method for identifying a person in an electric power operation scenario provided by the embodiment includes two blocks: detection network training module and personnel identification module, wherein detection network training module includes: the system comprises a personnel data set making module, a data set expanding module and a target detection network training module. The embodiment provides a method for identifying personnel in an electric power operation scene, which comprises the following steps:
(1) Collecting images of workers on the electric power construction site, images of non-constructors and images without detection targets (namely images without the workers), marking and distinguishing different workers to form a training data set;
the collected on-site personnel identities are divided, the identity labels comprise electric power construction workers, work responsible persons, special responsible persons and non-construction persons, and in addition, some images without detection targets can be collected to serve as negative samples. And manually marking each image with the detection target to obtain the position and the category information of the target to be detected, wherein the marked target area is that the body part of the person wearing the clothes does not comprise a head.
(2) Expanding the collected data set by technologies of image splicing, image turning, noise point increasing and the like;
the collected data set image is subjected to image data processing means such as image splicing, horizontal turning, pixel shifting, random cutting, deformation scaling, gaussian noise increasing and the like to generate a new image, and the data set of the field personnel image is expanded.
(3) The target detection network is trained by using the expanded data set to obtain a personnel identification model;
the network structure of the target detection network based on the attention mechanism and the branch structure is a structure diagram of the target detection network of the personnel identification method in the power operation scene, as provided by the embodiment shown in fig. 2, wherein the network structure can be divided into three parts: the device comprises a feature extraction part, a feature fusion part and a result output part, wherein the feature extraction part comprises 7 convolution modules, wherein the first convolution module comprises 1 convolution module with the volume of 3 multiplied by 3, 1 convolution module with the volume of 2 multiplied by 2 and a channel space attention module; the second convolution module comprises 2 1 × 1 convolutions and 13 × 3 convolution, and the third convolution module comprises 12 × 2 convolution and a channel space attention module; the fourth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, the fifth convolution module includes 12 × 2 convolutions and one channel space attention module, the sixth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, and the seventh convolution module includes 12 × 2 convolutions and one channel space attention module; in addition, the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module, the output of the third convolution module is subjected to deep separable convolution and then is added with the output of the fourth convolution module to obtain the input of a fifth convolution module, and the output of the fifth convolution module is subjected to deep separable convolution and then is added with the output of the sixth convolution module to obtain the input of a seventh convolution module. The feature fusion part comprises 1 convolution operation module, 1 branch structure and 3 upsampling modules, the input of the feature fusion part is the output of a seventh convolution module, wherein the convolution operation module comprises 2 1 x 1 convolutions and 13 x 3 convolution, the branch structure respectively performs 3 x 3, 7 x 7 and 9 x 9 convolution on the output of the convolution operation module, then the outputs of 3 branches and the output of the convolution operation module are spliced to be used as the input of the upsampling module, and each upsampling module comprises 2 1 x 1 convolutions, 13 x 3 convolutions and 1 upsampling operation; the result output part comprises three outputs, namely an output for predicting a large target, an output for predicting a medium target and an output for predicting a small target, wherein the output for predicting the large target is realized by splicing the seventh convolution module with the first up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, the output for predicting the medium target is realized by splicing the fifth convolution module with the second up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, and the output for predicting the small target is realized by splicing the third convolution module with the third up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution.
After each convolution operation, a normalization operation and an activation operation are performed, where the activation function used in the activation operation is leak ReLU.
Specifically, the first part of the network is a feature extraction part, and the specific processing process is as follows: inputting a training image into a first convolution module, firstly carrying out 3 multiplied by 3 convolution to change the depth of the training image into 32 layers, then reducing the height and width of the feature map into 1/2 of the original height and width by using convolution with 2 multiplied by 2 step pitch, obtaining global features through a channel space attention module without changing the height and width and the number of channels of the feature map, wherein the number of channels is unchanged; then, the feature image enters a second convolution module, the number of channels of the feature image is adjusted to 64 through convolution of 1 × 1, then the number of channels is changed to 128 through convolution of 3 × 3, finally, convolution of 1 × 1 is executed to adjust the number of channels of the feature image to 64, then the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module;
executing the third convolution module to the sixth convolution module according to the above operations to obtain a feature map with the height and width of 1/8 of the original image, wherein the depth of the feature map is 256 layers, sending the feature map into the seventh convolution module, performing convolution with 2 × 2 step pitch of 2, reducing the height and width of the feature map to 1/16 of the original width, and sending the feature map into the channel space attention module without changing the height and width and the number of channels of the feature map;
the second part is a feature fusion part of the target network, and the processing procedure is as follows: the feature map is sent into a convolution operation module, 1 × 1 convolution is carried out firstly to adjust the depth of the feature map to 512, then the height and the width of the feature map are reduced to 1/32 of the original height and width through 3 × 3 convolution, the depth is adjusted to 1024, and the depth of the feature map is adjusted to 512 through 1 × 1 convolution; and then entering a branch structure, dividing the three branches into three branches, and respectively performing 3 × 3 convolution, 7 × 7 convolution and 9 × 9 convolution without changing the size and the depth of the feature map, and splicing the outputs of the three branches and the feature map of the input branch structure to obtain the input of a first up-sampling module, wherein the height and the width of the feature map are 1/32 of the original height and the depth is 2048. The first up-sampling module firstly performs 1 × 1 convolution to adjust the depth of the feature map to 512, then adjusts the depth of the feature map to 1024 by 3 × 3 convolution, adjusts the depth of the feature map to 512 by 1 × 1 convolution, and then adjusts the feature map to 1/16 of the original depth by up-sampling operation, wherein the number of channels is unchanged; the second and third upsampling modules are then performed according to the operation of the first upsampling module above.
The third part is a target detection network result output part, and the concrete operation is as follows: splicing the seventh convolution module and the first up-sampling module to obtain a feature map with the size of 1/16 of the original image and the depth of 768, adjusting the depth of the feature map to 512 by 3 × 3 convolution, and adjusting the depth of the feature map to 27 by 1 × 1 convolution to obtain the output of a predicted large target; then, splicing the fifth convolution module and the second up-sampling module to obtain a feature map with the original image size of 1/8 and the depth of 640, adjusting the depth of the feature map to 512 through 3 × 3 convolution, and adjusting the depth of the feature map to 27 through 1 × 1 convolution to obtain the output of a target in prediction; splicing the third convolution module and the third up-sampling module to obtain a feature map with the size of original 1/4 and the depth of 576, then adjusting the depth of the feature map to 512 through 3 × 3 convolution, and adjusting the depth of the feature map to 27 through 1 × 1 convolution to obtain the output of the target in prediction;
wherein, the number of channels of the output feature map is 27, these dimensions are decomposed into (4 + 1) × 3 dimensions, the first 4 dimensions correspond to 4 worker categories respectively, the second 4 dimensions correspond to the center point coordinates and height and width of the prediction box, the value of the output confidence coefficient of the last one dimension 1 is the probability of being determined as a positive sample, and × 3 represents that each position in the feature map outputs 3 groups of prediction box information.
As shown in fig. 3, the channel space attention module used in the target detection network structure specifically operates as follows:
firstly, the input feature map is firstly subjected to 3 × 3 convolution operation without changing the height and width of the feature map and the number of channels, and then sent to a channel attention module, wherein the specific operation of the channel attention module is to respectively perform global average pooling and global maximum pooling on the feature map to obtain two vectors, then add the two vectors to obtain the vector weight on the channel dimension, and multiply the weight vector and the original input feature map to obtain the feature map with channel attention.
Then, after obtaining the feature map with channel attention, 3 × 3 convolution operation is performed, and then the feature map is sent to a space attention module, wherein the specific operation of the space attention module is to generate two feature maps with the same size as the original map through two branches, one branch is a maximum pooling operation with 1 × 1 step size and the other branch is an average pooling operation with 1 × 1 step size, then the two feature maps are spliced and then subjected to convolution operation to generate a feature map with channel dimension of 1 dimension, the feature map with channel attention is multiplied by the 1 dimension feature map to obtain a space attention feature map, and finally the obtained space attention feature map is directly multiplied by the channel dimension vector weight in the channel attention module to output the feature map with channel attention and space attention.
The specific operation of the branch structure in the target detection network structure is as follows.
And (3) sending the feature map into a branch structure to carry out convolution with 3 × 3, 7 × 7 and 9 × 9 respectively without changing the size and depth of the feature map, and finally splicing the original input feature map and the feature map formed by the three convolutions to obtain the feature map with multi-scale detail information.
(4) Detecting the image acquired on site in real time by using a trained personnel identification model, outputting the identity of personnel when the personnel are detected, and reminding when the identified personnel identity is non-constructor;
wherein, according to electric power construction site personnel's clothing identification personnel identity, when discerning non-construction worker, remind, specifically as follows:
the method comprises the steps of acquiring a monitoring video in real time, analyzing the video frame by frame to obtain an image of each frame, sending the analyzed image into a personnel identification model, outputting the identity category of a detection person through the personnel identification model, calling a voice broadcasting function to remind and store and log the picture when the identity of the person is identified to be a non-constructor, and normally displaying the identity information of other persons when the other persons are identified, wherein the standard for identifying different identities of the persons is the clothes worn by the person in the power scene. The working personnel of the power construction site have corresponding clothes, the personnel identities are distinguished by identifying different clothes of the working personnel of the power construction site, such as power construction workers, working responsible persons and special responsible persons, and the standard of non-construction personnel is that no power construction clothes are worn but only power construction clothes jackets or trousers are worn.
The invention provides a personnel identification method in an electric power operation scene, which not only can identify the identity of personnel in an electric power construction site, but also can remind non-construction personnel when the non-construction personnel are identified. The specific application is that the team cooperates with a national power grid, images of various field personnel are shot on an electric power construction field, a data set is manufactured through the implementation example method, a personnel identification model obtained through a network of the team is trained, the personnel identification accuracy of the model can reach more than 90% through verification under the actual condition, the condition that non-construction personnel mistakenly enter the electric power construction field can be avoided, supervision on the condition of the personnel on the electric power construction field is effectively implemented, and the life and property safety of the personnel on the electric power construction field is powerfully guaranteed.
Various changes and modifications may be made to the disclosure by those skilled in the art without departing from the spirit and scope of the disclosure. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (5)

1. A method for identifying personnel in an electric power operation scene is characterized by comprising the following steps:
step 1, collecting images of workers, images of non-constructors and images without detection targets on a power construction site, marking different workers and distinguishing the different workers to form a training data set;
step 2, expanding the collected data set;
step 3, training the target detection network by using the expanded data set to obtain a personnel identification model;
the object detection network comprises three parts: a feature extraction part, a feature fusion part and a result output part;
the feature extraction part comprises 7 convolution modules, wherein the first convolution module comprises 1 convolution module of 3 multiplied by 3, 1 convolution module of 2 multiplied by 2 and a channel space attention module; the second convolution module comprises 2 1 × 1 convolutions and 13 × 3 convolution, and the third convolution module comprises 12 × 2 convolution and a channel space attention module; the fourth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, the fifth convolution module includes 12 × 2 convolutions and one channel space attention module, the sixth convolution module includes 2 1 × 1 convolutions and 13 × 3 convolution, and the seventh convolution module includes 12 × 2 convolutions and one channel space attention module; in addition, the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module, the output of the third convolution module is subjected to deep separable convolution and then is added with the output of the fourth convolution module to obtain the input of a fifth convolution module, and the output of the fifth convolution module is subjected to deep separable convolution and then is added with the output of the sixth convolution module to obtain the input of a seventh convolution module;
the feature fusion part comprises 1 convolution operation module, 1 branch structure and 3 up-sampling modules, the input of the feature fusion part is the output of the seventh convolution module, wherein the convolution operation module comprises 2 1 × 1 convolution and 13 × 3 convolution, the branch structure respectively performs 3 × 3, 7 × 7 and 9 × 9 convolution on the output of the convolution operation module, then the output of the 3 branches and the output of the convolution operation module are spliced to be used as the input of the up-sampling module, and each up-sampling module comprises 2 1 × 1 convolution, 13 × 3 convolution and 1 up-sampling operation; the result output part comprises three outputs, namely an output for predicting a large target, an output for predicting a medium target and an output for predicting a small target, wherein the output for predicting the large target is realized by splicing the seventh convolution module with the first up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, the output for predicting the medium target is realized by splicing the fifth convolution module with the second up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution, and the output for predicting the small target is realized by splicing the third convolution module with the third up-sampling module and then performing 13 × 3 convolution and 1 × 1 convolution;
the specific processing procedure of the feature extraction part is as follows;
inputting a training image into a first convolution module, firstly carrying out 3 multiplied by 3 convolution to change the depth of the training image into 32 layers, then reducing the height and width of the feature map into 1/2 of the original height and width by using convolution with 2 multiplied by 2 step pitch, obtaining global features through a channel space attention module without changing the height and width and the number of channels of the feature map, wherein the number of channels is unchanged; then, the feature image enters a second convolution module, the number of channels of the feature image is adjusted to 64 through convolution of 1 × 1, then the number of channels is changed to 128 through convolution of 3 × 3, finally, convolution of 1 × 1 is executed to adjust the number of channels of the feature image to 64, then the output of the first convolution module is subjected to deep separable convolution and then is added with the output of the second convolution module to obtain the input of a third convolution module;
executing a third convolution module to a sixth convolution module to obtain a feature map with the height and width of 1/8 of the original image, wherein the depth of the feature map is 256 layers, sending the feature map into a seventh convolution module, performing convolution with 2 multiplied by 2 step pitch of 2, reducing the height and width of the feature map to 1/16 of the original height and width, and sending the feature map into a channel space attention module without changing the height and width and the number of channels of the feature map;
after each convolution operation, performing a normalization operation and an activation operation, wherein an activation function used in the activation operation is Leaky ReLU;
and 4, detecting the image acquired on the site in real time by using the trained personnel identification model, outputting the identity of the personnel when the personnel are detected, and reminding when the identified identity of the personnel is non-constructor.
2. The method for identifying the person in the power operation scene as claimed in claim 1, wherein: the specific operation of the channel space attention module is as follows;
firstly, carrying out 3 multiplied by 3 convolution operation on an input feature map without changing the height and width of the feature map and the number of channels, and then sending the feature map to a channel attention module, wherein the specific operation of the channel attention module is to carry out global average pooling and global maximum pooling on the feature map respectively to obtain two vectors, then adding the two vectors to obtain a vector weight on a channel dimension, and multiplying the weight vector by the original input feature map to obtain the feature map with channel attention;
then, after obtaining the feature map with channel attention, 3 × 3 convolution operation is performed, and then the feature map is sent to a space attention module, wherein the specific operation of the space attention module is to generate two feature maps with the same size as the original map through two branches, one branch is a maximum pooling operation with 1 × 1 step size 1 and the other branch is an average pooling operation with 1 × 1 step size 1, then the two feature maps are spliced and then subjected to convolution operation to generate a feature map with channel dimension 1, the feature map with channel attention is multiplied by the 1-dimensional feature map to obtain a space attention feature map, and finally the obtained space attention feature map is directly multiplied by the channel dimension vector weight in the channel attention module to output the feature map with channel attention and space attention.
3. The method for identifying the person in the power operation scene as claimed in claim 1, wherein: marking the personnel identity in the collected image in the step 1, wherein the identity label comprises an electric power construction worker, a work responsible person, a special responsible person and a non-construction person, and in addition, some images without detection targets are collected as negative samples; and manually marking each image with the detection target to obtain the position and the category information of the target to be detected, wherein the marked target area is that the body part of the person wearing the clothes does not comprise a head.
4. The method for identifying the personnel in the power operation scene as claimed in claim 1, characterized in that: and 2, carrying out image splicing, horizontal turning, pixel shifting, random cutting, deformation scaling on the collected data set image, adding Gaussian noise processing to generate a new image, and realizing expansion of the data set.
5. The method for identifying the person in the power operation scene as claimed in claim 1, wherein: the specific implementation manner of the step 4 is as follows;
the method comprises the steps of acquiring a monitoring video in real time, analyzing the video frame by frame to obtain an image of each frame, sending the analyzed image into a personnel identification model, outputting the identity category of a detection person through the personnel identification model, calling a voice broadcasting function to remind and store and log the image when the identity of the detection person is identified as a non-constructor, and normally displaying the identity information of other persons when the identity of the detection person is identified, wherein the standard for identifying different identities of the persons is clothes worn by the persons in an electric power scene.
CN202210745758.2A 2022-06-29 2022-06-29 Personnel identification method in power operation scene Active CN114821486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210745758.2A CN114821486B (en) 2022-06-29 2022-06-29 Personnel identification method in power operation scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210745758.2A CN114821486B (en) 2022-06-29 2022-06-29 Personnel identification method in power operation scene

Publications (2)

Publication Number Publication Date
CN114821486A CN114821486A (en) 2022-07-29
CN114821486B true CN114821486B (en) 2022-10-11

Family

ID=82522379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210745758.2A Active CN114821486B (en) 2022-06-29 2022-06-29 Personnel identification method in power operation scene

Country Status (1)

Country Link
CN (1) CN114821486B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116579616B (en) * 2023-07-10 2023-09-29 武汉纺织大学 Risk identification method based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270347A (en) * 2020-10-20 2021-01-26 西安工程大学 Medical waste classification detection method based on improved SSD
CN112990232A (en) * 2021-04-14 2021-06-18 广东工业大学 Safety belt wearing identification and detection method for various high-altitude operation construction sites
CN113610759A (en) * 2021-07-05 2021-11-05 金华电力设计院有限公司 A on-spot safe management and control system for roofbolter construction
CN114612813A (en) * 2020-12-09 2022-06-10 中兴通讯股份有限公司 Identity recognition method, model training method, device, equipment and storage medium
CN114612835A (en) * 2022-03-15 2022-06-10 中国科学院计算技术研究所 Unmanned aerial vehicle target detection model based on YOLOv5 network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11562500B2 (en) * 2019-07-24 2023-01-24 Squadle, Inc. Status monitoring using machine learning and machine vision

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270347A (en) * 2020-10-20 2021-01-26 西安工程大学 Medical waste classification detection method based on improved SSD
CN114612813A (en) * 2020-12-09 2022-06-10 中兴通讯股份有限公司 Identity recognition method, model training method, device, equipment and storage medium
CN112990232A (en) * 2021-04-14 2021-06-18 广东工业大学 Safety belt wearing identification and detection method for various high-altitude operation construction sites
CN113610759A (en) * 2021-07-05 2021-11-05 金华电力设计院有限公司 A on-spot safe management and control system for roofbolter construction
CN114612835A (en) * 2022-03-15 2022-06-10 中国科学院计算技术研究所 Unmanned aerial vehicle target detection model based on YOLOv5 network

Also Published As

Publication number Publication date
CN114821486A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN111814638B (en) Security scene flame detection method based on deep learning
CN110852222A (en) Campus corridor scene intelligent monitoring method based on target detection
CN113516076A (en) Improved lightweight YOLO v4 safety protection detection method based on attention mechanism
Cheng et al. Smoke detection and trend prediction method based on Deeplabv3+ and generative adversarial network
CN112669350A (en) Adaptive feature fusion intelligent substation human body target tracking method
CN113903081A (en) Visual identification artificial intelligence alarm method and device for images of hydraulic power plant
CN111126293A (en) Flame and smoke abnormal condition detection method and system
CN112287827A (en) Complex environment pedestrian mask wearing detection method and system based on intelligent lamp pole
KR20220024986A (en) Target tracking method and device, storage medium and computer program
CN111145222A (en) Fire detection method combining smoke movement trend and textural features
CN103106394A (en) Human body action recognition method in video surveillance
CN112183472A (en) Method for detecting whether test field personnel wear work clothes or not based on improved RetinaNet
CN104463869A (en) Video flame image composite recognition method
CN114821486B (en) Personnel identification method in power operation scene
CN111062373A (en) Hoisting process danger identification method and system based on deep learning
CN111860187A (en) High-precision worn mask identification method and system
CN111860457A (en) Fighting behavior recognition early warning method and recognition early warning system thereof
CN111401437A (en) Deep learning-based power transmission channel hidden danger early warning grade analysis method
WO2022222036A1 (en) Method and apparatus for determining parking space
CN113628172A (en) Intelligent detection algorithm for personnel handheld weapons and smart city security system
CN112488213A (en) Fire picture classification method based on multi-scale feature learning network
CN116778214A (en) Behavior detection method, device, equipment and storage medium thereof
CN115862128A (en) Human body skeleton-based customer abnormal behavior identification method
Lestari et al. Comparison of two deep learning methods for detecting fire
CN111127433B (en) Method and device for detecting flame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant