CN115359360A

CN115359360A - Power field operation scene detection method, system, equipment and storage medium

Info

Publication number: CN115359360A
Application number: CN202211279083.3A
Authority: CN
Inventors: 李强; 赵峰; 庄莉; 梁懿; 王秋琳; 伍臣周; 秦亮; 何敏; 余金沄; 刘浩锋; 刘开培
Original assignee: Wuhan University WHU; State Grid Information and Telecommunication Co Ltd; Fujian Yirong Information Technology Co Ltd
Current assignee: Wuhan University WHU; State Grid Information and Telecommunication Co Ltd; Fujian Yirong Information Technology Co Ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2022-11-18
Anticipated expiration: 2042-10-19
Also published as: CN115359360B

Abstract

The invention relates to a method, a system, equipment and a storage medium for detecting an electric field operation scene, wherein the method comprises the following steps: collecting a plurality of different types of power field operation scene images, and labeling each power field operation scene image; putting the marked power field operation scene images into a training sample set; building a YOLO model, and adding an MCBAM attention module in a hack network to obtain an improved YOLO model; the MCBAM attention module comprises an MB multi-scale information capturing module, a CAM channel attention module and a SAM space attention module; the MB multi-scale information capturing module is used for capturing multi-scale information of an input image; the CAM channel attention module is used for carrying out channel attention adjustment; the SAM space attention module is used for carrying out space attention adjustment; and training the improved YOLO model by using a training sample set, and detecting by using the trained model.

Description

Power field operation scene detection method, system, equipment and storage medium

Technical Field

The invention relates to a method, a system, equipment and a storage medium for detecting an electric field operation scene, and belongs to the technical field of electric field scene detection.

Background

In the electric power facility construction field, the environment is complicated various, and the electric power facility danger is uncontrollable, if the staff does not have safe dress, the scene does not place construction warning sign and rail, in case the incident takes place, the consequence is beyond the scope of assumption.

The existing field operation safety identification method based on traditional image feature extraction is influenced by a complex construction environment more or less, and good precision and accuracy are difficult to achieve. In recent years, with the rapid development of artificial intelligence and deep learning, some target detection algorithms based on deep learning gradually enter a construction site, for example, the invention patent with the publication number "CN114419659A" discloses a method for detecting the wearing of a safety helmet in a complex scene, and introduces an attention mechanism in a backbone part of a YOLO v5 network to reduce the loss of effective information in the network in transmission; a fourth detection scale of 104 x 104 is added at the neck and the head of the YOLO v5 network, so that the detection capability of small targets is enhanced; after the CSPDarkNet53 model is pre-trained on a large data set, the feature extraction capability of the CSPDarkNet53 model is migrated and learned to a safety helmet wearing detection model, and the problem that the data set is insufficient is solved.

YOLO is used as a target detection algorithm better combining speed and precision, but a certain promotion space still exists for the effect of detecting whether a person wears a safety helmet or not, whether a worker wears a work clothes or not and whether a construction warning board and a fence are on site or not in a complex power field operation environment.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a method, a system, equipment and a storage medium for detecting a power field operation scene, which can achieve better detection effects on the situations that whether a person wears a safety helmet or not, whether the person wears a work clothes or not and whether a construction warning board and a fence exist on the site or not in a power field operation environment in a complex environment.

The technical scheme of the invention is as follows:

in a first aspect, the invention provides a method for detecting an electric field operation scene, which includes the following steps:

collecting a plurality of power field operation scene images of different types, labeling each power field operation scene image, and labeling a position frame and a scene type of features in the power field operation scene images;

putting the marked power field operation scene images into a training sample set;

establishing a YOLO model based on a backhaul network, a Neck network and a Head network, and adding an MCBAM attention module in an up-sampling part and a down-sampling part of a PANet network in the Neck network to obtain an improved YOLO model; wherein, the MCBAM attention module comprises an MB multi-scale information capture module, a CAM channel attention module and a SAM space attention module; the MB multi-scale information capturing module is used for capturing multi-scale information of an input image through a plurality of convolution operations and outputting a first feature map in a combined mode; the CAM channel attention module is used for adjusting the channel attention of the first feature map and outputting a second feature map; the SAM space attention module is used for carrying out space attention adjustment on the second feature map and outputting a third feature map;

carrying out iterative training on the improved YOLO model by using a training sample set to obtain a trained power field operation scene detection model;

and carrying out scene detection on the electric power field operation by using the electric power field operation scene detection model.

In a preferred embodiment, the scene types of the scene image of the power field operation include:

the scene that a field worker wears a safety helmet, the scene that the field worker does not wear the safety helmet, the scene that the field worker wears a work clothes, the scene that the field worker does not wear the work clothes, the scene that the work site has a construction warning board or a fence and the scene that the work site has no construction warning board or fence are disclosed.

As a preferred embodiment, the method further includes a data enhancement step after the marked power field operation scene image is placed in a training sample set, specifically:

and performing data enhancement on the marked power field operation scene image, wherein the data enhancement mode comprises one or the combination of more than two of image turning processing, image translation processing, image scaling processing, noise adding processing, contrast adjusting processing, brightness adjusting processing, random image block erasing processing and mosaic data enhancement processing.

As a preferred embodiment, the MB multi-scale information capturing module includes: a convolution operation with convolution kernel size of 1 x1 and two void convolution operations with convolution kernel size of 3 x 3 and void rate of 6 and 12 respectively, each convolution operation being followed by a corresponding BN batch normalization layer; the multi-scale information characteristic output layer is used for performing addition operation on the characteristic graphs output by the three convolution operations and outputting a first characteristic graph to the CAM channel attention module;

the CAM channel attention module comprises a first pooling layer, a shared full-link layer and a channel attention output layer; the first pooling layer is used for performing maximum pooling and average pooling operations on the input first characteristic diagram according to channels to obtain two one-dimensional vectors;

the shared full-connection layer respectively carries out full-connection operation on the two one-dimensional vectors, then adds the two one-dimensional vectors, and activates the two one-dimensional vectors through a sigmoid activation function to generate a one-dimensional channel attention feature map;

the channel attention output layer is used for multiplying the channel attention feature map and the first feature map to obtain a second feature map and outputting the second feature map to the SAM space attention module;

the SAM space attention module comprises a second pooling layer, a convolution layer and a space attention output layer, wherein the second pooling layer is used for performing maximum pooling and average pooling operations on an input second feature map according to a channel to obtain two-dimensional vectors;

the convolution layer is used for performing convolution operation after splicing two-dimensional vectors, reducing the two-dimensional vectors into one channel, and activating the channel through a sigmoid activation function to generate a two-dimensional spatial attention feature map;

and the spatial attention output layer is used for multiplying the spatial attention feature map and the second feature map to obtain a third feature map and outputting the third feature map.

In a second aspect, the present invention provides a power field operation scene detection system, including:

the data set production module is used for collecting a plurality of different types of power field operation scene images, labeling each power field operation scene image, and labeling a position frame and a scene type of features in the power field operation scene images; putting the marked scene image of the power field operation into a training sample set;

the improved model building module is used for building a YOLO model based on a backhaul network, a Neck network and a Head network, and adding an MCBAM attention module into an up-sampling part and a down-sampling part of a PANet network in the Neck network to obtain an improved YOLO model; wherein, the MCBAM attention module comprises an MB multi-scale information capture module, a CAM channel attention module and a SAM space attention module; the MB multi-scale information capturing module is used for capturing multi-scale information of an input image through a plurality of convolution operations, and combining and outputting a first feature map; the CAM channel attention module is used for adjusting the channel attention of the first feature map and outputting a second feature map; the SAM space attention module is used for carrying out space attention adjustment on the second feature map and outputting a third feature map;

the training module is used for carrying out iterative training on the improved YOLO model by utilizing a training sample set to obtain a trained power field operation scene detection model;

and the detection module is used for carrying out scene detection on the electric field operation by utilizing the electric field operation scene detection model.

In a preferred embodiment, the scene types of the power field operation scene image include:

the scene that the field worker wears the safety helmet, the scene that the field worker does not wear the safety helmet, the scene that the field worker wears the work clothes, the scene that the field worker does not wear the work clothes, the scene that the work site has a construction warning board or a fence and the scene that the work site has no construction warning board or fence are shown.

As a preferred embodiment, the data set production module further includes a data enhancement unit, specifically configured to:

As a preferred embodiment, the MB multi-scale information capturing module includes: a convolution operation with convolution kernel size of 1 x1 and two void convolution operations with convolution kernel size of 3 x 3 and void rate of 6 and 12 respectively, each convolution operation being followed by a corresponding BN batch normalization layer; the multi-scale information feature output layer is used for performing addition operation on the feature maps output by the three convolution operations and outputting a first feature map to the CAM channel attention module;

the shared full-connection layer respectively carries out full-connection operation on the two one-dimensional vectors, then adds the two one-dimensional vectors, and then generates a one-dimensional channel attention feature map through sigmoid activation function activation;

the SAM space attention module comprises a second pooling layer, a convolution layer and a space attention output layer, wherein the second pooling layer is used for performing maximum pooling and average pooling operations on the input second feature map according to channels to obtain two-dimensional vectors;

the convolution layer is used for performing convolution operation after splicing two-dimensional vectors, reducing the two-dimensional vectors into one channel, and activating the channel through a sigmoid activation function to generate a two-dimensional space attention feature map;

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement a method for detecting a power field operation scenario according to any embodiment of the present invention.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for detecting a power field operation scenario according to any one of the embodiments of the present invention.

The invention has the following beneficial effects:

1. according to the method, the system, the equipment and the storage medium for detecting the scene of the power field operation, an MCBAM attention module is added into a Neck network of a YOLO model, multi-scale information capture, channel attention adjustment and space attention adjustment are carried out on input features, the influence of a complex background of a power field on target identification can be effectively inhibited, and the accuracy of scene detection of the field operation is improved.

2. According to the method, the system, the equipment and the storage medium for detecting the scene of the power field operation, which are disclosed by the invention, various scene types which are not considered to be identified in the prior art are set, so that the corresponding scene detection can be carried out, and the safety of the power field operation can be improved.

Drawings

FIG. 1 is a flow chart of a first embodiment of the present invention;

FIG. 2 is a network structure diagram of a conventional YOLOv4 model;

fig. 3 is a diagram of a hack network structure incorporating an MCBAM attention module in an embodiment of the invention;

FIG. 4 is a schematic structural diagram of an MCBAM attention module according to an embodiment of the present invention;

FIG. 5 is a result image of actual testing of the improved YOLO model in an embodiment of the present invention;

fig. 6 is a result image of actual detection of the conventional YOLOv4 model.

Detailed Description

The invention is described in detail below with reference to the figures and the specific embodiments.

The first embodiment is as follows:

the YOLO model is a good target detection algorithm combining speed and precision. Attention mechanism refers to a method that can focus attention on important areas of an image and discard irrelevant areas in computer vision. Can be regarded as a dynamic selection process of important information of image input, and the process is realized by adaptive weight. In general, attention mechanisms are generally classified into the following four basic categories: channel attention, spatial attention, temporal attention, and branch attention. There are also two mixed attention mechanisms: channel and spatial attention mechanisms and spatial and temporal attention mechanisms. CBAM (Convolitional Block Attention Module) is a mixed Attention mechanism of channel Attention and spatial Attention.

Considering that the electric power operation site environment is complex and extremely dangerous. The embodiment provides a target detection algorithm for introducing an improved CBAM attention mechanism to a reinforced feature extraction network part of a YOLO model, which can have a good detection effect on an electric power field operation scene in a complex environment, and the specific scheme is as follows:

referring to fig. 1, the present embodiment provides a method for detecting an operation scene of an electric power field, including the following steps:

s100, collecting a plurality of different types of power field operation scene images, labeling each power field operation scene image, labeling position information and size information of a characteristic target position frame in each power field operation scene image, and labeling a scene type corresponding to each power field operation scene image;

s200, putting the marked electric power field operation scene images into a training sample set, and dividing the electric power field operation scene images into a training set, a verification set and a test set according to the proportion of 8;

s300, establishing a YOLO model based on a Backbone feature extraction network, a Neck reinforced feature extraction network and a Head network; in this embodiment, a YOLOv4 model is specifically adopted, and a Network structure of the YOLOv4 model is specifically shown in fig. 2, a Backbone feature extraction Network of the YOLOv4 model adopts CSPDarknet53, a Neck enhanced feature extraction Network uses SPP (Spatial Pyramid Pooling) and PANet (Path Aggregation Network) structures, and a Head Network is used for converting extracted features into prediction results.

Adding an MCBAM attention module into an up-sampling part and a down-sampling part of a PANet network in a Neck network of a YOLOv4 model to obtain an improved YOLO model; specifically, the method comprises the following steps: backbone trunk feature extraction network constructed by CSPDarknet53 and used for feature extraction, wherein the Backbone network is used for extracting features of input images

The width H and the height W of the image are continuously compressed, F is the characteristic of a single input image, R is all image samples in a data set, the channel number C is continuously expanded, the image is constructed by DarknetConv2D _ BN _ Mish and Resblock _ body, resblock _ body is a large volume block constructed by a series of residual errors, resblock _ body is sequentially stacked once, twice, eight times and four times, a characteristic diagram passes through a main feature extraction network to obtain 3 effective feature layers x0, x1 and x2, and x0, x1 and x2 are sequentially the results of the Resblock _ body after eight times, eight times and four times of stacking;

the method comprises the steps that a Next network added with an MCBAM attention module is used for fusing features extracted by a main feature extraction network, the features obtained after fusion are transmitted to a prediction layer, the MCBAM attention module is added to enable a feature map to be more focused in a region to be focused in the feature map, x0 is convoluted for 3 times, new features are obtained through an SPP structure formed by largest pools with different sizes, the new features are stacked and are convoluted for 3 times, a feature map Xspp is obtained after convolution, x1 and x2 are input into the feature map obtained by the MCBAM attention module and the feature map Xspp after convolution and are transmitted into a structure of an upper sampling part and a lower sampling part of a PANet and an attention mechanism MCBAM, and finally feature maps xt1, xt2 and xt3 are obtained;

the enhanced feature extraction network after the MCBAM attention mechanism is added is shown in FIG. 3;

and predicting targets in the characteristic graphs xt1, xt2 and xt3 extracted by the Next network after the attention mechanism is added by the Head network by adopting a convolution of 3 × 3 and a convolution of 1 × 1 to obtain a final detection result graph.

Wherein, the MCBAM attention module comprises an MB multi-scale information capture module, a CAM channel attention module and a SAM space attention module; the MB multi-scale information capturing module is used for capturing multi-scale information of an input image through a plurality of convolution operations, and combining and outputting a first feature map; the CAM channel attention module is used for adjusting the channel attention of the first characteristic diagram and outputting a second characteristic diagram; the SAM spatial attention module is used for performing spatial attention adjustment on the second feature map and outputting a third feature map.

S400, carrying out iterative training on the improved YOLO model by using a training sample set, finishing the training after an iteration termination condition is reached, and obtaining a trained power field operation scene detection model; the iteration termination condition may be a preset iteration number or model accuracy.

And S500, carrying out scene detection on the electric field operation by using the electric field operation scene detection model.

As a preferred implementation manner of this embodiment, in step S100, the scene types of the power field operation scene image include:

As a preferred embodiment of this embodiment, in step S200, the method further includes a data enhancement step after placing the labeled power field operation scene image into the training sample set, specifically:

performing data enhancement on the marked power field operation scene image, wherein the data enhancement mode specifically comprises the following steps:

distortion correction processing of an image, such as image flip processing, image shift processing, image zoom processing, and the like;

or enhancement processing of the image, such as adjusting the contrast of the original image, adjusting the brightness of the original image, increasing the resolution of the original image, and the like;

the method can also be an expansion process of the image data set, such as a process of adding noise to the original image, a process of randomly erasing image blocks in the original image and a process of enhancing mosaic data of the original image;

any one of the above data enhancement modes or a combination of two or more of them may be employed.

As a preferred implementation manner of this embodiment, referring to fig. 4 in particular, in step S300, the MB multi-scale information capturing module includes: a convolution operation with convolution kernel size of 1 x1 and two void convolution operations with convolution kernel size of 3 x 3 and void rate of 6 and 12 respectively, each convolution operation being followed by a corresponding BN batch normalization layer; the multi-scale information characteristic output layer is used for performing addition operation on the characteristic graphs output by the three convolution operations and outputting a first characteristic graph F' to the CAM channel attention module;

the CAM channel attention module comprises a first pooling layer, a shared full-link layer and a channel attention output layer; the first pooling layer is used for performing maximum pooling and average pooling operations on the input first characteristic diagram F' according to channels to obtain two one-dimensional vectors;

the shared full-link layer respectively carries out full-link operation on the two one-dimensional vectors, then adds the two one-dimensional vectors, and then generates a one-dimensional channel attention feature map Fc through sigmoid activation function activation;

the channel attention output layer is used for multiplying the channel attention feature map Fc and the first feature map F ', obtaining a second feature map F ' ' and outputting the second feature map F ' ' to the SAM space attention module;

the SAM space attention module comprises a second pooling layer, a convolution layer and a space attention output layer, wherein the second pooling layer is used for performing maximum pooling and average pooling operations on the input second feature map F '' according to channels to obtain two-dimensional vectors;

the convolution layer is used for performing convolution operation after splicing two-dimensional vectors, reducing the two-dimensional vectors into one channel, and activating the channel through a sigmoid activation function to generate a two-dimensional space attention feature map Fs;

the spatial attention output layer is used for multiplying the spatial attention feature map Fs and the second feature map F '' to obtain a third feature map F '' 'and then outputting the third feature map F' ''.

In order to verify the superiority of the power field scene detection method provided by this embodiment, a comparative test is also performed in this embodiment, which specifically includes the following steps:

the detection results of 6 scene types of field workers wearing safety helmets (aqm), field workers not wearing safety helmets (wdaqm), field workers wearing working clothes (gzf), field workers not wearing working clothes (wcgzf), working sites with construction warning boards (signals) or fences (fenges) in the power field operation image and the detection results of the prior-art yov 4 model are compared by the power field scene detection model based on the improved YOLO model obtained by applying the method of the embodiment, wherein the detection results comprise F1-score and Ap values detected for each type in 6 types and mAP values of 6 types, and the YOLO 4 algorithm and the actual detection result image of the invention are checked to analyze the detection results.

Wherein, the detection result of the F1-score is specifically shown in the following table 1:

table 1: f1-score value of each class detection in the dataset

As can be seen from the comparison results in Table 1, for the power field operation data set in a complex environment, the F1-score values of the improved YOLO model provided by the embodiment respectively reach 90%, 84% and 77% in the detection of no safety helmet worn by field operators, construction warning boards on an operation field and fences, compared with YOLOv4, the improved YOLO model respectively has 3%, 2% and 1% promotion, and the F1-score values of 3 types of safety helmets worn by field operators and working clothes worn by the field operators can also be effectively detected.

The Ap values and the mep values were measured as shown in table 2 below:

table 2: ap and mAP values detected for a dataset

As can be seen from the comparison results in table 2, the improved YOLO model provided in this embodiment has an optimal mAP value of 87.99%, which is improved by 0.97% compared to YOLO 4, and has AP values of 92.43%, 88.33% and 86.93% respectively in the detection of the site operator not wearing a safety helmet, not wearing a work clothes, and having a construction warning board on the operation site, and also has better detection results for the site operator wearing a safety helmet, wearing a work clothes, and having AP values of 3 types of fences on the operation site compared to YOLO 4 with improvements of 3.67%, 1.55% and 2.74%.

Referring specifically to fig. 5 and 6, fig. 5 and 6 are a result image of actual detection of the improved YOLO model and a result image of actual detection of the existing YOLO 4 model proposed in the present embodiment, respectively;

as can be seen from the results of actual detection of the YOLOv4 model and the improved YOLO model provided in this embodiment, 6 categories, namely, a safety helmet (aqm) worn by a field worker, a safety helmet (wdaqm) not worn by the field worker, a work clothes (gzf) worn by the field worker, a work clothes (wcgzf) not worn by the field worker, and a construction warning board (signal) or fence (fence) on a work site in an electric power field work image, can be effectively detected, and compared with the YOLOv4 algorithm, the method provided by the invention has higher detection precision and better detection effect.

The second embodiment:

this embodiment provides an electric power field operation scene detecting system, includes:

the data set production module is used for collecting a plurality of different types of power field operation scene images, labeling each power field operation scene image, and labeling a position frame and a scene type of features in the power field operation scene images; putting the marked power field operation scene images into a training sample set; this module is used to implement the functions of steps S100 and S200 in the above-mentioned first embodiment, which are not described herein again;

the improved model building module is used for building a YOLO model based on a backhaul network, a Neck network and a Head network, and adding an MCBAM attention module into an up-sampling part and a down-sampling part of a PANet network in the Neck network to obtain an improved YOLO model; wherein, the MCBAM attention module comprises an MB multi-scale information capture module, a CAM channel attention module and a SAM space attention module; the MB multi-scale information capturing module is used for capturing multi-scale information of an input image through a plurality of convolution operations, and combining and outputting a first feature map; the CAM channel attention module is used for adjusting the channel attention of the first characteristic diagram and outputting a second characteristic diagram; the SAM space attention module is used for carrying out space attention adjustment on the second feature map and outputting a third feature map; this module is used to implement the function of step S300 in the above-mentioned first embodiment, which is not described herein again;

the training module is used for carrying out iterative training on the improved YOLO model by utilizing a training sample set to obtain a trained power field operation scene detection model; this module is used to implement the function of step S400 in the above-mentioned first embodiment, which is not described herein again;

a detection module, configured to perform scene detection of the power field operation by using the power field operation scene detection model, where the module is configured to implement the function of step S500 in the above-mentioned first embodiment, and details are not repeated here.

As a preferred embodiment of this embodiment, the scene types of the scene image of the power field operation include:

As a preferred embodiment of this embodiment, the data set creating module further includes a data enhancing unit, specifically configured to:

and performing data enhancement on the marked power field operation scene image, wherein the data enhancement mode comprises one or more of image turning processing, image translation processing, image scaling processing, noise adding processing, contrast adjusting processing, brightness adjusting processing, random image block erasing processing and mosaic data enhancement processing.

As a preferred embodiment of this embodiment, the MB multi-scale information capturing module includes: a convolution operation with a convolution kernel size of 1 x1 and two convolution kernels with a convolution kernel size of 3 x 3 and a void convolution operation with a void rate of 6 and 12 respectively, each convolution operation being followed by a corresponding BN batch normalization layer; the multi-scale information characteristic output layer is used for performing addition operation on the characteristic graphs output by the three convolution operations and outputting a first characteristic graph to the CAM channel attention module;

Example three:

the present embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the method for detecting an operation scene in an electric power field according to any embodiment of the present invention.

Example four:

the present embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a power field operation scene detection method according to any embodiment of the present invention.

In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.

Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, any function, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other media capable of storing program codes.

The above description is only an embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are included in the scope of the present invention.

Claims

1. A power field operation scene detection method is characterized by comprising the following steps:

establishing a YOLO model based on a backhaul network, a Neck network and a Head network, and adding an MCBAM attention module in an up-sampling part and a down-sampling part of a PANet network in the Neck network to obtain an improved YOLO model; wherein, the MCBAM attention module comprises an MB multi-scale information capture module, a CAM channel attention module and a SAM space attention module; the MB multi-scale information capturing module is used for capturing multi-scale information of an input image through a plurality of convolution operations, and combining and outputting a first feature map; the CAM channel attention module is used for adjusting the channel attention of the first feature map and outputting a second feature map; the SAM space attention module is used for carrying out space attention adjustment on the second feature map and outputting a third feature map;

and carrying out scene detection on the power field operation by using the power field operation scene detection model.

2. The power field operation scene detection method according to claim 1, wherein the scene categories of the power field operation scene image comprise:

3. The method for detecting the power field operation scene according to claim 1, wherein the method further comprises a data enhancement step after the marked power field operation scene image is placed in a training sample set, and specifically comprises the following steps:

4. The electric power field operation scene detection method according to claim 1, characterized in that:

the MB multi-scale information capturing module comprises: a convolution operation with a convolution kernel size of 1 x1 and two convolution kernels with a convolution kernel size of 3 x 3 and a void convolution operation with a void rate of 6 and 12 respectively, each convolution operation being followed by a corresponding BN batch normalization layer; the multi-scale information characteristic output layer is used for performing addition operation on the characteristic graphs output by the three convolution operations and outputting a first characteristic graph to the CAM channel attention module;

5. An electric power field operation scene detection system, characterized by comprising:

the improved model building module is used for building a YOLO model based on a backhaul network, a Neck network and a Head network, and adding an MCBAM attention module into an up-sampling part and a down-sampling part of a PANet network in the Neck network to obtain an improved YOLO model; wherein, the MCBAM attention module comprises an MB multi-scale information capture module, a CAM channel attention module and a SAM space attention module; the MB multi-scale information capturing module is used for capturing multi-scale information of an input image through a plurality of convolution operations and outputting a first feature map in a combined mode; the CAM channel attention module is used for adjusting the channel attention of the first feature map and outputting a second feature map; the SAM space attention module is used for carrying out space attention adjustment on the second feature map and outputting a third feature map;

6. The power field operation scene detection system according to claim 5, wherein the scene categories of the power field operation scene image comprise:

7. The system according to claim 5, wherein the data set generation module further includes a data enhancement unit, specifically configured to:

8. The system for detecting the power field operation scene according to claim 5, wherein:

the MB multi-scale information capturing module comprises: a convolution operation with convolution kernel size of 1 x1 and two void convolution operations with convolution kernel size of 3 x 3 and void rate of 6 and 12 respectively, each convolution operation being followed by a corresponding BN batch normalization layer; the multi-scale information characteristic output layer is used for performing addition operation on the characteristic graphs output by the three convolution operations and outputting a first characteristic graph to the CAM channel attention module;

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements a power field operation scene detection method according to any one of claims 1 to 4 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a power field operation scene detection method according to any one of claims 1 to 4.