CN117523437A

CN117523437A - Real-time risk identification method for substation near-electricity operation site

Info

Publication number: CN117523437A
Application number: CN202311419123.4A
Authority: CN
Inventors: 高士涛; 刘金柱; 王青; 景国明; 张森森; 马万利; 李武成; 闫海洋; 王波; 黄建华; 刘萌; 罗鹏; 韩磊; 宋玉存
Original assignee: Nanjing Stein Smart Energy Technology Co ltd; State Grid Corp of China SGCC; Wuhan University WHU; Henan Power Transmission and Transformation Construction Co Ltd
Current assignee: Nanjing Stein Smart Energy Technology Co ltd; State Grid Corp of China SGCC; Wuhan University WHU; Henan Power Transmission and Transformation Construction Co Ltd
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2024-02-06
Anticipated expiration: 2043-10-30
Also published as: CN117523437B

Abstract

The invention discloses a real-time risk identification method for a near-electricity operation site of a transformer substation, which comprises the following steps: collecting video image data of a transformer substation, and marking an object to be detected by adopting a rectangular frame to manufacture a training data set; pre-training, and improving a target detection model by utilizing a hybrid neural network; introducing an AFPN feature fusion network, and calculating the overlapping degree of a prediction frame and a real frame in a target detection task by using an S-HIoU loss function; obtaining an optimal target detection model; and carrying out real-time identification and warning on the violation risk behaviors of operators in the video image by combining an optimal target detection model with a preset field operation safety area. According to the invention, through carrying out light weight improvement on the target detection model, the AFPN progressive feature pyramid network is introduced, the robustness of target algorithm identification is improved, and the method is better suitable for the working environment of a transformer substation site. The intelligent method is reliable, efficient and convenient for safety control of the on-site power-on operation of the transformer substation.

Description

Real-time risk identification method for substation near-electricity operation site

Technical Field

The invention relates to the technical field of substation operation, in particular to a real-time risk identification method for a substation near-electricity operation site.

Background

At present, when a transformer substation is close to live equipment construction operation, because more complicated live equipment exists around an operation surface, a plurality of serious personnel operation safety accidents occur in recent years, and safety control of on-site construction of operators is particularly important. The reasons for the problems are mainly the internal faults of the power grid and the external violation hidden dangers of the production site, wherein the safety accidents caused by external violation factors such as weak safety consciousness of operators on the production site, insufficient safety prevention and control measures and the like account for a considerable proportion.

In the prior art, conventional risk identification methods generally include methods based on YOLOv4, YOLOv5, YOLOv6 and YOLOv7 model improvement, which use lightweight networks such as afflicientnet and ConvNeX to improve the target detection model, and FasterViT has a comparable accuracy compared with afflicientnet and ConvNeX, but the picture throughput is respectively 50% and 500% more than the latter two, so that the method has higher operation efficiency, and better improves real-time performance in mobile terminal devices with limited computing resources. In the prior art, the YOLOv4 model is improved based on effective channel attention mechanisms such as lightweight network Ghostnet and the like, so that real-time risk identification can be realized, however, the problems of insufficient identification accuracy, high influence degree by field complex environments and the like possibly exist, and the problems faced by the real-time risk identification of the field operation of the transformer substation cannot be fully solved.

In summary, the existing technology for identifying the risk of the near-electricity operation of the transformer substation mainly comprises a field identification method based on a target detection model, but has the following defects:

(1) The identification accuracy is not enough: the on-site operation of the transformer substation has the characteristics of complex environment, multiple devices, multiple risk factors and the like, and the existing near-electricity operation real-time risk recognition algorithm has insufficient accuracy in recognition of the on-site risk behavior of the transformer substation operation, which can cause the increase of the accidental risk of the on-site operation;

(2) Insufficient adaptivity: the target recognition result in the prior art is affected by the scene environment, illumination, lens angle and the like of the substation operation to a higher degree, the recognition result is unstable, and the adaptability to the scene environment is insufficient;

(3) The calculation speed is slow: the existing IoU bounding box loss function has poor recognition precision, high recognition omission factor, low regression precision and low regression speed on tiny targets of the transformer substation, and adverse effects such as gradient explosion, gradient disappearance and the like can be generated in network model training.

For the problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

The invention aims to provide a real-time risk identification method for a substation live operation site, which can be applied to the real-time risk identification method with high speed, accuracy and high adaptation degree for the substation live operation.

The invention adopts the technical scheme that:

a real-time risk identification method for a near-electricity operation site of a transformer substation comprises the following steps:

s1, acquiring video image data of a substation near-electricity operation site based on different scenes and operation tasks, dividing a video stream in the acquired video image data into video segments, performing self-adaptive frequency division, and marking an object to be detected by adopting a rectangular frame to manufacture a training data set;

s2, inputting video image data with labels in a training data set into a deep neural network for pre-training, and improving a target detection model by utilizing a hybrid neural network;

s3, introducing an AFPN feature fusion network to improve the self-adaptability of the target detection model, and calculating the overlapping degree of a prediction frame and a real frame in the target detection task by using an S-HIoU loss function;

s4, iteratively optimizing parameters of the target detection model according to training results of each round, and utilizing a high-performance calculation acceleration card to assist in calculation until the model parameters are converged to obtain an optimal target detection model;

s5, utilizing the optimal target detection model to combine with a preset field operation safety area to identify and alarm the illegal risk behaviors of the operators in the video image in real time.

The step S1 of labeling the object to be detected by adopting a rectangular frame specifically comprises the following steps of:

based on the safety standard of electric power field operation, behaviors, which relate to the non-standard dressing, the non-standard use of safety ropes, the toppling of fences, the non-wearing of insulating gloves and the exceeding of the safety area of the field operation, in the standard are extracted to be corresponding labels, and the corresponding labels are matched with objects in the training data set.

The step S2 specifically comprises the following steps: the neural network FasterViT mixed with CNN and Vision Transformer is used as a feature extraction network, and the global self-attention with secondary complexity is decomposed into multiple layers of attention with reduced calculation cost based on the introduced hierarchical attention method.

The FasterViT network architecture in the step S2 adopts a multi-scale form, a convolution layer is operated on a high-resolution layer in an early stage, the second half part of the model relies on a novel window attention mechanism to carry out space reasoning on the whole feature map, and the first half part of the network and a downsampling block carry out weighting operation on each channel in the original feature map by using dense convolution check so as to enhance the response to useful information; in specific work, the neural network FasterViT firstly carries out preprocessing on an input image through continuous convolution of 3*3 with the step length of 2, then maps pixels in the input image to a D-dimensional embedded space, and finally preliminarily extracts the characteristics of the image through batch normalization and ReLU function processing, so as to provide more expressive input for subsequent processing.

In the step S3, introducing an AFPN feature fusion network to improve the adaptivity of the target detection model, and specifically: the AFPN feature fusion network starts a fusion process by combining two adjacent Low-Level features, gradually brings High-Level features into the fusion process, finally fuses the top-Level features of the backstene, and in the process, the Low-Level features are fused with semantic information from the High-Level features, and the High-Level features are fused with detailed information from the Low-Level features;

in the multi-Level feature fusion process, ASFF is utilized to allocate different space weights for features of different levels, so as to enhance the importance of key levels and reduce the influence of contradictory information from different targets.

In the step S2, the overlapping degree of a prediction frame and a real frame in the target detection task is calculated by using an S-HIoU loss function; the method specifically comprises the following steps: the S-HIoU loss function is introduced with minimum circumscribed rectangular area information and length-width ratio about a predicted frame and a real frame, and the distance from the key point on the left upper part of the real frame to the key point on the left upper part of the predicted frame is introduced in the basic error of a punishment item, so that the distance information and the shape information are normalized to keep the scale invariance of the loss function, and the recognition and the positioning of a model to a multi-scale target are met.

The calculation formula of the intersection area of the prediction frame and the real frame in the step S2 is as follows:

the calculation formula of the basic error of the penalty term is as follows:

wherein A and B are a real frame and a predicted frame respectively, r represents the Euclidean distance from the key point on the left upper part of the real frame to the key point on the left upper part of the predicted frame, k1 and k2 represent the diagonal distance of the real frame and the diagonal distance of the predicted frame respectively, v is used for measuring the consistency of the length-width ratio, and alpha represents the balance parameter.

The step S5 specifically comprises the following steps:

s51, judging whether the operator exceeds the field operation safety area by combining the coordinates of the field operator in the video image and the preset field operation safety area, which are output by the optimal target detection model, and carrying out voice alarm on the operator exceeding the field operation safety area;

s52, detecting and identifying the violation risk of the operator in the video image by utilizing the optimal target detection model, and carrying out voice alarm on the operator with the violation risk behavior.

The method further comprises a step S6, wherein the step S6 is a logic step performed after the step S5, and specifically comprises the following steps:

the optimal target detection model is deployed in a movable lightweight intelligent device and used for safely monitoring operators on site, verifying the identification effect of the real-time risk identification method for the substation near-electricity operation site, and optimally adjusting the real-time risk identification method for the substation near-electricity operation site based on the identification result.

The invention aims to solve the problem of insufficient accuracy of risk identification of a transformer substation site by improving a target detection model based on a layered attention hybrid neural network, deploys the target detection model in a movable intelligent terminal, introduces an AFPN progressive feature pyramid network, improves the robustness of target algorithm identification, better adapts to the working environment of the transformer substation site, designs a novel loss function, combines the loss function with the algorithm model, improves the performance and the speed of model detection, monitors the real-time safety and the risk identification of the transformer substation operation site, carries out voice alarm on the identified personnel risk behavior, and meets the requirements of the site real-time risk monitoring. The intelligent control system and the intelligent control method provide a reliable, efficient and convenient intelligent means for safety control of the on-site power-on operation of the transformer substation, reduce the occurrence probability of safety accidents and improve the capability of risk prevention.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a functional block diagram of the present invention;

FIG. 3 is a schematic diagram of a FasterViT network according to the present invention;

FIG. 4 is a schematic diagram of a convolutional layer according to the present invention;

FIG. 5 is a schematic diagram of a pooling layer according to the invention;

FIG. 6 is a schematic diagram of the ASFF method of the present invention;

fig. 7 is a parameter diagram of the S-HIoU loss function of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, 2 and 3, the present invention includes the steps of:

s1, acquiring video image data of a substation near-electricity operation site based on different scenes and operation tasks, and preprocessing the acquired video image data to manufacture a training data set; the method for acquiring video image data of the substation near-electricity operation site based on different scenes and operation tasks and preprocessing the acquired video image data to prepare a training data set comprises the following steps:

s11, acquiring video image data of a substation near-electricity operation site based on different scenes and operation tasks;

s12, segmenting a video stream in video image data into video segments, then performing self-adaptive frequency division, and marking an object to be detected by adopting a rectangular frame to manufacture a training data set.

Specifically, firstly, collecting video image data of a near-electricity operation site of a transformer substation, carrying out video acquisition aiming at different scenes and different operation tasks, segmenting a video stream into video segments, carrying out self-adaptive framing, and marking an object to be detected by using a rectangular frame to manufacture a data set required by model training.

In the embodiment, scene selection and objects to be marked in video preprocessing and image marking are based on the field operation specification of a transformer substation, so that the dressing safety, the operation area safety and the operation behavior safety of operators are highlighted, representative video clips are subjected to frame extraction, picture data with uniform size are obtained, and rectangular frame marking is performed according to a pre-designed tag library.

S2, inputting video image data with labels in a training data set into a deep neural network for pre-training, and improving a target detection model by utilizing a hybrid neural network; the method for manufacturing the training data set by labeling the object to be detected by adopting the rectangular frame comprises the following steps of:

based on the safety standard of electric power field operation, behaviors, which relate to the non-standard dressing, the non-standard use of safety ropes, the toppling of fences, the non-wearing of insulating gloves and the exceeding of the safety area of the field operation, in the standard are extracted to be corresponding labels, and the labels are matched with objects in the training data set.

The method for inputting the video image data with the labels in the training data set into the deep neural network for pre-training and improving the target detection algorithm by utilizing the hybrid neural network comprises the following steps:

s21, inputting video image data with labels in a training data set into a deep neural network for pre-training;

s22, taking a neural network FaterViT mixed with CNN and Vision Transformer as a feature extraction network, and decomposing the global self-attention with secondary complexity into multiple layers of attention with reduced calculation cost based on an introduced hierarchical attention method.

Specifically, the marked data is input into a deep neural network for pre-training, a neural network FasterViT of a mixed CNN (convolutional neural network) and ViT (Vision Transformer) is used as a feature extraction network, and based on an introduced hierarchical attention HAT method, the global self-attention with secondary complexity is decomposed into multiple layers of attention with reduced calculation cost, and the feature extraction expression capability and the processing speed are improved.

The FasterViT network adopted for constructing the model in the step combines the advantages of CNN in local feature learning and the advantages of a transducer in global modeling capacity. The network uses a method called Hierarchical Attention-HAT to decompose the global self-attention mechanism with secondary complexity into multiple levels of attention, and on the high-level output features, the global self-attention mechanism realizes efficient communication among windows at lower cost. The FaterViT network has the performance advantage of processing high resolution images more quickly and accurately than other feature extraction networks. FIG. 3 is a schematic diagram of a FasterViT network.

Preprocessing of an input image is achieved through two continuous convolutions with step length of 3*3 of 2, pixels in the input image are mapped to a D-dimensional embedding space, and then characteristics of the image are initially extracted through batch normalization (improving model training stability) and ReLU function (formula 1) processing, so that more expressive input is provided for subsequent processing. The FasterViT network architecture adopts a multi-scale form, a convolution layer is operated on a high-resolution layer at an early stage, and the second half of the model relies on the HAT method to carry out spatial reasoning on the whole feature map. Wherein the first half of the network and the downsampling block use dense convolution kernels, weighting each channel in the original signature to enhance the response to useful information.

f(x)＝max(0,x) (1)

Where x is the input vector from the upper layer neural network into the neuron.

In the above step S2, a new window attention mechanism HAT is cited. The method can promote the exchange of local and global information with lower calculation cost.

The HAT architecture first segments the input feature map into local windows, and obtains CTs (Carrier Tokens) through pooling and convolution operations, which may perform global feature extraction on the entire local windows. Feature extraction is performed with a convolution layer, the input image being 32 x 3,3 being its depth (i.e. R, G, B), the convolution layer being a 5 x 3 filter (receptive field), the receptive field depth being the same as the depth of the input image. A 28 x 1 feature map can be obtained by convolving a filter with the input image, as shown in fig. 4. The size calculation method for the feature map is as follows:

the input width, height and length are W1×H2xD1;

the specified hyper-parameters are required: number of filters (K), size of filters (F), step size (S), boundary filling (P);

and (3) outputting:

D2＝K；

pooling layer: the input feature images are compressed, so that the feature images are reduced on one hand, and the network calculation complexity is simplified; on the one hand, feature compression is performed to extract main features, as shown in fig. 5.

The first attention block is applied to summarize and transfer global information, and then the features of the local windows and the global features are correspondingly fused through the Concat operation, so that a one-to-one corresponding effect is achieved, and each local window has unique CTs. Information exchange and extraction of the fused features is then further facilitated by performing a second attentive mechanism on the fused labels. The fused features are then separated and the model can be further grouped using global and local attention mechanisms alternately, with windows having higher levels of feature information to better capture different levels of information. The HAT module realizes information exchange between the local window and the global feature, and effectively promotes the space reasoning capability in the whole feature map hierarchical structure.

As shown in fig. 3, the neural network FasterViT firstly performs preprocessing on an input image through two continuous convolutions with step length of 3*3 of 2, then maps pixels in the input image to a D-dimensional embedded space, and finally preliminarily extracts the characteristics of the image through batch normalization and the processing of a ReLU function, so as to provide more expressive input for subsequent processing;

the FasterViT network architecture adopts a multi-scale form, a convolution layer is run on a high-resolution layer at an early stage, the second half of the model relies on a novel window attention mechanism to perform spatial reasoning on the whole feature map, and the front half of the network and a downsampling block use dense convolution to check each channel in the original feature map to perform weighting operation so as to enhance the response to useful information.

The method is based on the target detection model of the hybrid neural network FaterViT, introduces a layered attention (HAT) method, improves the feature extraction expression capability of the field acquisition image, can more accurately identify the risk behavior of the field operation, improves the safety guarantee of the field operation, and has the characteristic of high identification precision.

S3, introducing an AFPN feature fusion network to improve the self-adaptability of the target detection model, and calculating the overlapping degree of a prediction frame and a real frame in the target detection task by using an S-HIoU loss function; the AFPN feature fusion network starts a fusion process by combining two adjacent Low-Level features, gradually brings High-Level features into the fusion process, finally fuses the top-Level features of the backstene, and in the process, the Low-Level features are fused with semantic information from the High-Level features, and the High-Level features are fused with detailed information from the Low-Level features;

the AFPN network is used as the feature fusion network, so that information loss or degradation in multi-stage transmission is avoided, and the feature fusion performance and the self-adaptability of the target detection model are improved. And the S-HIoU loss function is used for calculating the overlapping degree of the prediction frame and the real frame in the target detection task, so that the regression speed of the target detection model is improved.

Aiming at the problem of insufficient adaptability caused by the interference of the complex environment of the transformer substation site, the invention introduces a progressive feature pyramid network (AFPN) in the feature fusion process, thereby effectively preventing the loss or degradation of feature information in the transmission and interaction processes, reducing the influence of interference factors on the risk identification effect, improving the adaptability of a target detection model to the environment and being suitable for the complex environment.

The AFPN network used in the step S3 is a feature fusion network, a fusion process is started by combining two adjacent Low-Level features, the High-Level features are gradually brought into the fusion process, and finally, the top-Level features of the backstone are fused, in the process, the Low-Level features are fused with semantic information from the High-Level features, and the High-Level features are fused with detailed information from the Low-Level features. In the multi-Level feature fusion process, ASFF (Adaptively Spatial Feature Fusion) is utilized to allocate different spatial weights for features of different levels, so that the importance of key levels is enhanced, and the influence of contradictory information from different targets is relieved. As shown in fig. 6, 3 levels of features are fused. Let x _ij ^n→l Representing feature vectors at positions (i, j) from Level n to Level. Resulting feature vector, denoted y ^l _ij Obtained by adaptive spatial fusion of multi-level features and derived from linear combination x of feature vectors _ij ^1→l ，x _ij ^2→l And x _ij ^3→l The following are provided:

y ^l _ij ＝α ^l _ij ·x _ij ^1→l +β ^l _ij ·x _ij ^2→l +γ ^l _ij ·x _ij ^3→l ；

wherein alpha is ^l _ij 、β ^l _ij And gamma ^l _ij Features representing 3 levels are weighted in the Level space by alpha ^l _ij +β ^l _ij +γ ^l _ij Constraint of =1. And the self-adaptive space fusion module with specific stage quantity is realized by considering the difference of the fusion characteristic quantity of each stage of the AFPN.

The S-HIoU loss function is introduced with minimum circumscribed rectangular area information and length-width ratio about a predicted frame and a real frame, and the distance from the key point on the left upper part of the real frame to the key point on the left upper part of the predicted frame is introduced in the basic error of a punishment item, so that the distance information and the shape information are normalized to keep the scale invariance of the loss function, and the recognition and the positioning of a model to a multi-scale target are met. As shown in fig. 7;

the calculation formula of S-HIoU is:

the calculation formula of the basic error of the penalty term is as follows:

wherein A and B are a real frame and a predicted frame respectively, r represents the Euclidean distance from the key point on the left upper part of the real frame to the key point on the left upper part of the predicted frame, and k ₁ And k ₂ Representing the real frame diagonal distance and the predicted frame diagonal distance, respectively, v is used to measure the uniformity of the aspect ratio, and α represents the balance parameter.

In view of the problems of the existing IoU loss function application and substation target identification, the invention designs a novel IoU loss function in a staged mode, namely an S-HIoU loss function, by fusing the minimum circumscribed rectangular area, the length-width ratio, the predicted frame, the real frame key points and other frame information, and the loss function can exert the validity of error information to the greatest extent, so that the regression precision and the regression speed of the target predicted frame can be effectively improved.

In practical use, in order to better reflect the superposition condition of the target prediction frame and the real frame, the superposition speed of the two frames is accelerated, information about the minimum circumscribed rectangular area and the length-width ratio of the two frames are introduced, the intersection area of the real frame and the prediction frame is expressed in the formula (3), A and B are respectively the real frame and the prediction frame, C is the minimum circumscribed rectangular area of the two frames, and the parts expressed by A, B and C are shown in fig. 7.

S-HIoU＝IoU-P _e (3)

Loss _s-hiou ＝1-S-HIoU (4)

LOSSs-HIOU is LOSS LOSS for S-HIOU.

Secondly, in order to accelerate the convergence rate of the prediction frame and alleviate the phenomenon that the prediction frame loses the convergence direction due to the fact that the prediction frame is difficult to measure when the prediction frame surrounds the real frame or the real frame surrounds the prediction frame, the distance from the upper left key point of the real frame to the upper left key point of the prediction frame is introduced into the basic error of the penalty term. The basic error of the punishment term is shown in a formula (5), wherein r is the Euclidean distance from the upper left key point of the real frame to the upper left key point of the prediction frame, k1 and k2 are the diagonal distance of the real frame and the diagonal distance of the prediction frame respectively, v is used for measuring the consistency of the length-width ratio, and alpha is a balance parameter, so that the distance information and the shape information are normalized to keep the scale invariance of the loss function, and the recognition and the positioning of the model on the multi-scale target are satisfied.

r＝ρ(b,b ^gt ) (8)

Wherein A and B are a real frame and a predicted frame respectively, r represents the Euclidean distance from the key point on the left upper part of the real frame to the key point on the left upper part of the predicted frame, and k ₁ And k ₂ Representing the real frame diagonal distance and the predicted frame diagonal distance, respectively, v is used to measure the uniformity of aspect ratio, and a represents flatBalance parameter, ρ represents b and b ^gt The Euclidean distance b between the two represents the parameter of the coordinates of the key point at the left upper part of the prediction frame, b ^gt Parameters representing the upper left keypoints of the real boxes. w (w) ^gt 、h ^gt Representing the width and height of the real frame, w, h representing the width and height of the predicted frame.

In summary, the objective detection model constructed in the steps S2-S3 adopts the YOLOv8 network, the novel hybrid neural network FasterViT is used as the feature to extract the backbone network, the progressive feature pyramid AFPN module is connected for fusing features of different scales, and a novel IoU loss function in a staged form, namely an S-HIoU loss function, is designed by fusing minimum circumscribed rectangular area, length-width ratio, predicted frame, real frame key points and other frame information. The loss function can exert the effectiveness of error information to the greatest extent, so that the regression accuracy and the regression speed of the target prediction frame are improved.

The real-time identification and alarm of the illegal risk behaviors of the operators in the video image by combining the optimal target detection model with a preset field operation safety area comprises the following steps:

The step S6 is further included after the real-time identification and warning of the violation risk behaviors of the operators in the video image are performed by combining the optimal target detection model with a preset field operation safety area:

the optimal target detection model is deployed in a movable lightweight intelligent device and used for collecting new video data of field operation, carrying out safety monitoring on operators on site, verifying the identification effect of the real-time risk identification method for the near-electricity operation site of the transformer substation, and carrying out optimization adjustment on the real-time risk identification method for the near-electricity operation site of the transformer substation based on the identification result to obtain an optimal method.

The invention provides a real-time risk identification method which is applied to the field of a transformer substation and has the advantages of rapidness, accuracy and high adaptation degree, a target detection model is subjected to light improvement based on a hierarchical attention hybrid neural network, the problem of insufficient accuracy of the field risk identification of the transformer substation can be solved, the target detection model is deployed in a movable intelligent terminal, an AFPN progressive feature pyramid network is introduced, the robustness of target algorithm identification is improved, the working environment of the field of the transformer substation is better adapted, meanwhile, the performance and the speed of model detection can be improved by designing a novel loss function and combining the novel loss function with an algorithm model, the real-time safety monitoring and risk identification of the field of the transformer substation can be realized, the voice alarm is carried out on the identified personnel risk behaviors, and the requirements of the field real-time risk monitoring can be met. The intelligent control system and the intelligent control method provide a reliable, efficient and convenient intelligent means for safety control of the on-site power-on operation of the transformer substation, reduce the occurrence probability of safety accidents and improve the capability of risk prevention.

In order to solve the problem of insufficient accuracy of on-site risk identification of a transformer substation, the invention improves the light weight of a target detection model based on a hybrid neural network FaterViT, introduces a layered attention HAT method, improves the feature extraction expression capability of on-site acquired images, more accurately identifies the risk behavior of on-site operation, and improves the safety guarantee of on-site operation;

further, in order to solve the problem of insufficient environmental adaptability of the transformer substation in the prior art, based on a novel feature fusion network AFPN, feature information of different levels is better fused, and the adaptability of a target detection model to complex field environments is improved;

in view of the problem of the existing IoU loss function application and substation target identification, the project designs a novel IoU loss function in a staged form, namely an S-HIoU loss function by fusing minimum circumscribed rectangular area, length-width ratio, predicted frame, real frame key points and other frame information. The loss function can exert the effectiveness of error information to the greatest extent, so that the regression accuracy and the regression speed of the target prediction frame are improved.

In the description of the present invention, it should be noted that, for the azimuth words such as "center", "lateral", "longitudinal", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc., the azimuth and positional relationships are based on the azimuth or positional relationships shown in the drawings, it is merely for convenience of describing the present invention and simplifying the description, and it is not to be construed as limiting the specific scope of protection of the present invention that the device or element referred to must have a specific azimuth configuration and operation.

It should be noted that the terms "comprises" and "comprising," along with any variations thereof, in the description and claims of the present application are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed.

Note that the above is only a preferred embodiment of the present invention and uses technical principles. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the present invention has been described in connection with the above embodiments, it is to be understood that the invention is not limited to the specific embodiments disclosed and that many other and equally effective embodiments may be devised without departing from the spirit of the invention, and the scope thereof is determined by the scope of the appended claims.

Claims

1. The real-time risk identification method for the near-electricity operation site of the transformer substation is characterized by comprising the following steps of:

2. The real-time risk identification method for a substation near-electric operation site according to claim 1, wherein the method comprises the following steps: the step S1 of labeling the object to be detected by adopting a rectangular frame specifically comprises the following steps of:

3. The real-time risk identification method for a substation near-electric operation site according to claim 1, wherein the method comprises the following steps: the step S2 specifically comprises the following steps: the neural network FasterViT mixed with CNN and Vision Transformer is used as a feature extraction network, and the global self-attention with secondary complexity is decomposed into multiple layers of attention with reduced calculation cost based on the introduced hierarchical attention method.

4. The real-time risk identification method for a substation near-electric operation site according to claim 1, wherein the method comprises the following steps: the FasterViT network architecture in the step S2 adopts a multi-scale form, a convolution layer is operated on a high-resolution layer in an early stage, the second half part of the model relies on a novel window attention mechanism to carry out space reasoning on the whole feature map, and the first half part of the network and a downsampling block carry out weighting operation on each channel in the original feature map by using dense convolution check so as to enhance the response to useful information; in specific work, the neural network FasterViT firstly carries out preprocessing on an input image through continuous convolution of 3*3 with the step length of 2, then maps pixels in the input image to a D-dimensional embedded space, and finally preliminarily extracts the characteristics of the image through batch normalization and ReLU function processing, so as to provide more expressive input for subsequent processing.

5. The real-time risk identification method for a substation near-electric operation site according to claim 1, wherein the method comprises the following steps: in the step S3, introducing an AFPN feature fusion network to improve the adaptivity of the target detection model, and specifically: the AFPN feature fusion network starts a fusion process by combining two adjacent Low-Level features, gradually brings High-Level features into the fusion process, finally fuses the top-Level features of the backstene, and in the process, the Low-Level features are fused with semantic information from the High-Level features, and the High-Level features are fused with detailed information from the Low-Level features; in the multi-Level feature fusion process, ASFF is utilized to allocate different space weights for features of different levels, so as to enhance the importance of key levels and reduce the influence of contradictory information from different targets.

6. The real-time risk identification method for a substation near-electric operation site according to claim 1, wherein the method comprises the following steps: in the step S2, the overlapping degree of a prediction frame and a real frame in the target detection task is calculated by using an S-HIoU loss function; the method specifically comprises the following steps: the S-HIoU loss function is introduced with minimum circumscribed rectangular area information and length-width ratio about a predicted frame and a real frame, and the distance from the key point on the left upper part of the real frame to the key point on the left upper part of the predicted frame is introduced in the basic error of a punishment item, so that the distance information and the shape information are normalized to keep the scale invariance of the loss function, and the recognition and the positioning of a model to a multi-scale target are met.

7. The real-time risk identification method for a substation near-electric operation site according to claim 1, wherein the method comprises the following steps: the calculation formula of the intersection area of the prediction frame and the real frame in the step S2 is as follows:

the calculation formula of the basic error of the penalty term is as follows:

8. The real-time risk identification method for a substation near-electric operation site according to claim 1, wherein the method comprises the following steps: the step S5 specifically comprises the following steps:

9. The real-time risk identification method for a substation near-electric operation site according to claim 1, wherein the method comprises the following steps: the method further comprises a step S6, wherein the step S6 is a logic step performed after the step S5, and specifically comprises the following steps: