CN116168352A

CN116168352A - Power grid obstacle recognition processing method and system based on image processing

Info

Publication number: CN116168352A
Application number: CN202310461859.1A
Authority: CN
Inventors: 李佩剑; 邓清凤; 伍强; 黄渠洪
Original assignee: Chengdu Ruitong Technology Co ltd
Current assignee: Chengdu Ruitong Technology Co ltd
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-05-26
Anticipated expiration: 2043-04-26
Also published as: CN116168352B

Abstract

A method and a system for identifying and processing a power grid obstacle based on image processing are provided, wherein a monitoring image collected by a camera arranged on a power line tower is obtained, the monitoring image is subjected to image blocking processing to obtain a sequence of image blocks, each image block in the sequence of image blocks is respectively subjected to image resolution enhancement based on an automatic codec to obtain a sequence of enhanced image blocks, each enhanced image block in the sequence of enhanced image blocks is respectively subjected to convolution neural network model serving as a filter to obtain a sequence of image local semantic feature vectors, the sequence of image local semantic feature vectors is subjected to ViT model based on a converter to obtain an image global context semantic understanding feature vector, the image global context semantic understanding feature vector is subjected to a classifier to obtain a classification result, the classification result is used for indicating whether birds are contained in a monitoring range, and finally the stability of power transmission is ensured.

Description

Power grid obstacle recognition processing method and system based on image processing

Technical Field

The present disclosure relates to the field of image recognition technologies, and in particular, to a method and a system for recognizing and processing a power grid obstacle based on image processing.

Background

In modern society, the application of electric power is ubiquitous, and electric power transmission has vital significance for the normal operation of national economy and the daily life of people. The power line tower is an important facility in power transmission, however many line towers are located in the field, and the tower body of the line tower and the crossing place of the cross arm form a structure similar to a tree fork, so birds are easy to be attracted to nest and perch at the crossing place of the cross arm and the tower body, and the stability of power transmission is affected.

When bird expelling means are adopted for expelling birds, accurate and effective identification and detection of birds are key for guaranteeing bird expelling effects.

The Chinese patent with the application number of 202110405605.9 discloses a method for identifying bird species images related to bird-related faults of a power transmission line, which comprises the steps of firstly, collecting bird species information around the power transmission line, establishing a bird species image database related to bird-related faults, and performing background removal pretreatment on bird species images based on a category activation diagram method; then, a learning model is built by utilizing four deep convolutional neural networks, the learning model is pre-trained through an ImageNet data set, the pre-trained model network structure is fine-tuned, the fine-tuned model is retrained by utilizing a preprocessed bird species image training set, and a test set is classified and identified; and finally, establishing a bird-related fault bird species image identification model integrating the multi-convolution network by adopting a linear weighting method according to the classification accuracy of the four network models, and classifying and identifying the bird species images. The method can provide a means for correctly identifying birds for power transmission line operation and maintenance personnel, is beneficial to realizing differential control of bird fault and reducing the tripping rate of bird fault.

Further, a method, apparatus, device and storage medium for bird image recognition are disclosed in chinese patent application No. 201910775414.4, wherein the method comprises: acquiring an image to be identified containing a bird target; based on a preset positioning algorithm, carrying out local area positioning on the image to be identified to obtain an area where the bird target is located; according to the multi-part feature extraction model, extracting features of the region where the bird target is located to obtain a plurality of part features of the bird target; identifying each part characteristic by using a classifier and verification part characteristics to obtain similarity scores corresponding to the part characteristics, wherein the verification part characteristics and the part characteristics have a one-to-one correspondence; according to all similarity scores, the recognition results of bird targets in the images to be recognized are calculated, and the technical problem that the conventional image recognition method is low in recognition efficiency is solved.

However, the above-mentioned identification and bird repelling techniques have the following drawbacks: because birds are small-size objects in actual detection, errors easily occur in the traditional mode of relying on manual identification, so that the identification and detection precision of the birds is low, and the bird repelling effect is affected. And the conventional method has low timeliness in the process of bird identification, and a great deal of effort is required to perform bird identification detection at various positions of the power line tower.

Therefore, an optimized image processing-based power grid obstacle recognition processing scheme is desired.

Disclosure of Invention

The present application has been made in order to solve the above technical problems. The embodiment of the application provides a method and a system for identifying and processing a power grid obstacle based on image processing, wherein the method and the system acquire a monitoring image acquired by a camera deployed on a power line tower; and the artificial intelligence technology based on deep learning is adopted to mine the implicit characteristic information about birds in the monitoring image, so that the birds are identified and detected, and then when the birds are detected, the compressed air is controlled to drive the birds, so that the stability of power transmission is ensured.

In a first aspect, there is provided an image processing-based power grid obstacle recognition processing method, including:

acquiring a monitoring image acquired by a camera deployed on a power line tower;

performing image blocking processing on the monitoring image to obtain a sequence of image blocks;

passing each image block in the sequence of image blocks through an automatic codec-based image resolution enhancer, respectively, to obtain a sequence of enhanced image blocks;

each enhanced image block in the sequence of enhanced image blocks is respectively passed through a convolutional neural network model serving as a filter to obtain a sequence of image local semantic feature vectors;

Passing the sequence of image local semantic feature vectors through a ViT model based on a converter to obtain image global context semantic understanding feature vectors;

the image global context semantic understanding feature vector is passed through a classifier to obtain a classification result, and the classification result is used for indicating whether birds are contained in a monitoring range; and

and controlling the compressed air to drive birds in response to the classification result being that birds are contained in the monitoring range.

In the above method for identifying and processing a power grid obstacle based on image processing, the steps of passing each image block in the sequence of image blocks through an image resolution enhancer based on an automatic codec to obtain a sequence of enhanced image blocks respectively include: performing explicit spatial coding on each image block in the sequence of image blocks by an image resolution encoder of the automatic codec using a convolutional layer to obtain each image feature; and deconvolution processing is carried out on each image feature by an image resolution decoder of the automatic codec by using deconvolution layers to obtain the sequence of enhanced image blocks.

In the above method for identifying and processing a grid obstacle based on image processing, the steps of passing each enhanced image block in the sequence of enhanced image blocks through a convolutional neural network model as a filter to obtain a sequence of image local semantic feature vectors respectively include: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer: carrying out convolution processing on the input data to obtain a convolution characteristic diagram; carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; wherein the output of the last layer of the convolutional neural network model as a filter is a sequence of local semantic feature vectors of the image, and the input of the first layer of the convolutional neural network model as a filter is each enhanced image block in the sequence of enhanced image blocks.

In the above method for identifying and processing a grid obstacle based on image processing, the step of passing the sequence of the image local semantic feature vectors through a ViT model based on a converter to obtain image global context semantic understanding feature vectors includes: one-dimensional arrangement is carried out on the sequence of the image local semantic feature vectors so as to obtain global image semantic feature vectors; calculating the product between the global image semantic feature vector and the transpose vector of each image local semantic feature vector in the sequence of image local semantic feature vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and weighting each image local semantic feature vector in the sequence of image local semantic feature vectors by taking each probability value in the plurality of probability values as a weight to obtain the image global context semantic understanding feature vector.

In the above method for identifying and processing the grid obstacle based on image processing, the method for identifying and processing the grid obstacle based on image processing includes that the image global context semantic understanding feature vector is passed through a classifier to obtain a classification result, and the classification result is used for indicating whether birds are contained in a monitoring range, and the method includes: performing full-connection coding on the image global context semantic understanding feature vector by using a plurality of full-connection layers of the classifier to obtain a coding classification feature vector; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.

The image processing-based power grid obstacle recognition processing method further comprises training the convolutional neural network model serving as a filter, the ViT model based on a converter and the classifier; wherein training the convolutional neural network model as a filter, the converter-based ViT model, and the classifier comprises: acquiring training data, wherein the training data comprises training monitoring images and whether the monitoring range contains the true value of birds or not; performing image blocking processing on the training monitoring image to obtain a sequence of training image blocks; respectively passing each training image block in the sequence of training image blocks through the image resolution enhancer based on the automatic codec to obtain a sequence of training enhancement image blocks; respectively passing each training enhancement image block in the sequence of training enhancement image blocks through the convolutional neural network model serving as a filter to obtain a sequence of training image local semantic feature vectors; passing the sequence of training image local semantic feature vectors through the converter-based ViT model to obtain training image global context semantic understanding feature vectors; passing the training image global context semantic understanding feature vector through the classifier to obtain a classification loss function value; and training the convolutional neural network model as a filter, the ViT model based on the converter and the classifier based on the classification loss function value and propagating through the direction of gradient descent, wherein in each round of iteration of the training, a feature affinity spatial affine learning iteration is performed on a weight matrix of the classifier.

Image processing-based power grid obstacle recognitionIn a processing method, passing the training image global context semantic understanding feature vector through the classifier to obtain a classification loss function value, including: the classifier processes the training image global context semantic understanding feature vector with a classification formula to generate a training classification result, wherein the classification formula is as follows:

, wherein ,/>

Representing the training image global context semantic understanding feature vector,/for>

To->

Is a weight matrix>

To represent a bias matrix; and calculating a cross entropy value between the training classification result and a true value as the classification loss function value.

In the above method for identifying and processing the grid obstacle based on image processing, in each iteration of the training, performing feature affinity space affine learning iteration on the weight matrix of the classifier according to the following optimization formula; wherein, the optimization formula is:

；

wherein ,

a weight matrix representing said classifier, +.>

Transpose of the weight matrix representing the classifier, < >>

Two norms of a weight matrix representing the classifier +.>

A kernel norm of a weight matrix representing the classifier, and +. >

Is the scale of the weight matrix of the classifier, < >>

Represents a logarithmic function with base 2, +.>

An exponential operation representing a matrix representing the calculation of a natural exponential function value raised to a power by a characteristic value at each position in the matrix,/v>

Representing multiplication by location +.>

And representing the weight matrix of the classifier after iteration.

In a second aspect, there is provided an image processing-based power grid obstacle recognition processing system, including:

the image acquisition module is used for acquiring a monitoring image acquired by a camera arranged on the power line tower;

the image blocking processing module is used for carrying out image blocking processing on the monitoring image to obtain a sequence of image blocks;

an automatic encoding and decoding module, configured to obtain a sequence of enhanced image blocks by respectively passing each image block in the sequence of image blocks through an image resolution enhancer based on an automatic encoder and decoder;

the feature extraction module is used for enabling each enhanced image block in the sequence of enhanced image blocks to pass through a convolutional neural network model serving as a filter respectively to obtain a sequence of image local semantic feature vectors;

the global coding module is used for enabling the sequence of the image local semantic feature vectors to pass through a ViT model based on a converter to obtain image global context semantic understanding feature vectors;

The monitoring result generation module is used for enabling the image global context semantic understanding feature vector to pass through a classifier to obtain a classification result, and the classification result is used for indicating whether birds are contained in a monitoring range; and

and the control module is used for controlling the compressed air to drive birds in response to the classification result that birds are contained in the monitoring range.

In the above system for identifying and processing a power grid obstacle based on image processing, the automatic encoding and decoding module includes: an encoding unit, configured to perform explicit spatial encoding on each image block in the sequence of image blocks by using a convolutional layer through an image resolution encoder of the automatic codec to obtain each image feature; and a decoding unit for performing deconvolution processing on the respective image features by an image resolution decoder of the automatic codec using a deconvolution layer to obtain the sequence of enhanced image blocks.

Compared with the prior art, the method and the system for identifying and processing the power grid obstacle based on the image processing acquire the monitoring image acquired by the camera deployed on the power line tower; and the artificial intelligence technology based on deep learning is adopted to mine the hidden characteristic information about birds in the monitoring image, so that the birds are identified and detected, the accuracy of bird image identification is improved, and then when birds are detected, compressed air is controlled to drive the birds, so that the stability of power transmission is ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a scenario of a power grid obstacle recognition processing method based on image processing according to an embodiment of the application.

Fig. 2 is a flowchart of a method for identifying and processing a grid obstacle based on image processing according to an embodiment of the application.

Fig. 3 is a schematic architecture diagram of an image processing-based power grid obstacle recognition processing method according to an embodiment of the present application.

Fig. 4 is a flowchart of the sub-steps of step 130 in the image processing-based grid obstacle recognition processing method according to an embodiment of the present application.

Fig. 5 is a flowchart of the sub-steps of step 150 in the image processing-based grid obstacle recognition processing method according to an embodiment of the present application.

Fig. 6 is a flowchart of the sub-steps of step 160 in the image processing-based grid obstacle recognition processing method according to an embodiment of the present application.

Fig. 7 is a flowchart of the sub-steps of step 180 in the image processing-based grid obstacle recognition processing method according to an embodiment of the present application.

Fig. 8 is a block diagram of an image processing-based grid obstacle recognition processing system according to an embodiment of the present application.

Description of the embodiments

The following description of the technical solutions in the embodiments of the present application will be made with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Unless defined otherwise, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.

In the description of the embodiments of the present application, unless otherwise indicated and defined, the term "connected" should be construed broadly, and for example, may be an electrical connection, may be a communication between two elements, may be a direct connection, or may be an indirect connection via an intermediary, and it will be understood by those skilled in the art that the specific meaning of the term may be understood according to the specific circumstances.

It should be noted that, the term "first\second\third" in the embodiments of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing objects may be interchanged where appropriate such that the embodiments of the present application described herein may be implemented in sequences other than those illustrated or described herein.

As described above, the current tower bird repellent technology has the following drawbacks: because birds are small-size objects in actual detection, errors easily occur in the traditional mode of relying on manual identification, so that the identification and detection precision of the birds is low, and the bird repelling effect is affected. And the conventional method has low timeliness in the process of bird identification, and a great deal of effort is required to perform bird identification detection at various positions of the power line tower. Therefore, an optimized image processing-based power grid obstacle recognition processing scheme is desired.

Accordingly, considering that in the process of actually performing bird recognition, since the power transmission line of the power line tower is long, birds are small-sized object features in the actual detection process, the bird information at each position of the power line tower cannot be effectively and accurately detected by means of manpower. Based on this, in the technical solution of the present application, it is desirable to perform image analysis on a monitoring image collected by a camera disposed in a power line tower to realize recognition detection of birds. However, because the information amount in the image is large, but birds are small-scale characteristic information in the image, capturing and extracting are difficult, and because the field environment is complex, the image resolution and the like can be influenced in the image acquisition process, so that the representation accuracy of the characteristic information about the birds in the image is influenced. Therefore, in this process, the difficulty lies in how to excavate the implicit characteristic information about birds in the monitored image, so as to perform the recognition detection of birds, and then when detecting that birds exist, control compressed air to perform bird expelling so as to ensure the stability of power transmission.

In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. The development of deep learning and neural networks provides new solutions and schemes for mining implicit characteristic information about birds in the monitored images.

Specifically, in the technical scheme of the application, first, a monitoring image is acquired through a camera deployed at a power line tower. It should be understood that, considering that the power line towers are often arranged in the wild, during the process of collecting the monitoring image, the monitoring image is easily interfered by external environment or equipment factors, so that the resolution of the image is lower, and the characteristic information about birds in the monitoring image becomes fuzzy, which affects the subsequent bird identification. In addition, since birds belong to small-sized objects in the monitoring image, if the monitoring image is directly subjected to preprocessing such as image filtering, the object recognition of the birds is lost, and the subsequent bird recognition and detection are affected.

Based on the above, in order to improve the expression capability of the bird feature in the monitoring image, so as to improve the accuracy of bird recognition detection, in the technical scheme of the application, the image blocking processing is performed on the monitoring image so as to obtain the sequence of the image blocks. It should be appreciated that the dimensions of the individual image blocks in the sequence of image blocks are reduced compared to the original image, and therefore the bird implication features for small sizes in the surveillance image are no longer small-sized objects in the individual image blocks for subsequent bird identification.

Then, each image block in the sequence of image blocks is respectively passed through an image resolution enhancer based on an automatic codec, so that resolution enhancement of each image block is performed, thereby obtaining a sequence of enhanced image blocks. In particular, here, the automatic codec includes an image resolution encoder and an image resolution decoder, the image resolution encoder explicitly spatially encoding the respective image blocks using a convolutional layer to obtain respective image features; the image resolution decoder uses a deconvolution layer to deconvolute the individual image features to obtain the sequence of enhanced image blocks.

Further, since each enhanced image block in the sequence of enhanced image blocks is image data, in order to enable expression of bird feature information in each enhanced image block, in the technical solution of the present application, feature mining of each enhanced image block in the sequence of enhanced image blocks is further performed using a convolutional neural network model as a filter, which has excellent performance in terms of implicit feature extraction of images, so as to extract implicit feature distribution information in each enhanced image block about birds, thereby obtaining a sequence of image local semantic feature vectors.

Next, further consider that since the individual image local semantic feature vectors of the sequence of image local semantic feature vectors have bird feature related information about the whole monitored image between them, the pure CNN approach has difficulty learning explicit global and remote semantic information interactions due to the inherent limitations of convolution operations. Therefore, in the technical scheme of the application, the sequence of the image local semantic feature vectors is encoded in a ViT model based on a converter to extract the context semantic association features of the bird implicit features in each image block, so as to obtain the image global context semantic understanding feature vector. It should be appreciated that ViT may process the individual image blocks directly through a self-attention mechanism like a transducer to extract contextual semantic association feature information about bird implicit features in the individual image blocks, respectively.

And then, taking the image global context semantic understanding feature vector as a classification feature vector to carry out classification processing in a classifier so as to obtain a classification result for indicating whether birds are contained in the monitoring range. That is, the birds in the image are recognized and detected by classifying the images with the context semantic association features of the hidden features of the birds in the respective image blocks of the monitoring image, so that the compressed air is controlled to expel birds in response to the classification result being that birds are contained in the monitoring range.

That is, in the technical solution of the present application, the tag of the classifier includes that birds (first tag) are included in the monitoring range, and that birds (second tag) are not included in the monitoring range, wherein the classifier determines to which classification tag the classification feature vector belongs through a soft maximum function. It should be noted that the first tag p1 and the second tag p2 do not include a human-set concept, and in fact, during the training process, the computer model does not have a concept of "whether birds are contained in the monitoring range" which is only two kinds of classification tags and the probability that the output characteristics are under the two classification tags, that is, the sum of p1 and p2 is one. Therefore, the classification result of whether birds are contained in the monitoring range is actually converted into a classification probability distribution conforming to the natural rule through classifying the tags, and the physical meaning of the natural probability distribution of the tags is essentially used instead of the language text meaning of whether birds are contained in the monitoring range. It should be understood that, in the technical scheme of the present application, the classification label of the classifier is an identification detection label of whether birds are contained in the monitoring range, so after the classification result is obtained, bird identification detection in the image can be performed based on the classification result, and accordingly, birds are contained in the monitoring range in response to the classification result, and compressed air is controlled to expel birds, so that stability of power transmission is ensured.

In particular, in the technical solution of the present application, for the image global context semantic understanding feature vector, since the sequence of the image local semantic feature vector is obtained by directly concatenating a plurality of context image local semantic feature vectors obtained based on a ViT model of a converter, although a ViT model based on the converter can promote context relevance of the plurality of context image local semantic feature vectors, explicit differences of feature distributions of the plurality of context image local semantic feature vectors still exist, which makes relevance among individual local weight value distributions of a weight matrix of the classifier insufficient when the image global context semantic understanding feature vector obtained by directly concatenating passes through the classifier, and affects training speed of the classifier.

Based on the above, in the technical solution of the present application, each time the weight matrix iterates, the weight matrix is mapped

Feature affinity spatial affine learning is performed, expressed as:

；

wherein ,

representing the two norms of the weight matrix, i.e +.>

Maximum eigenvalue of>

Represents the kernel norm of the weight matrix, i.e. the sum of the eigenvalues of the weight matrix, and +. >

Is the scale of the weight matrix, i.e. width times height.

Here, the feature affinity spatial affine learning performs affine migration based on spatial transformation with relatively low-resolution information characterization by performing detailed structured information expression in a low-dimensional eigensubspace on high-resolution information characterization in a weight value distribution space of the weight matrix, thereby implementing super-resolution (e.g., weight-by-weight) activation of weight distribution local to each weight based on affinity (affinity) dense simulation between weight value characterization to enhance training speed of the classifier by enhancing correlation between each local weight distribution of the weight matrix. Therefore, the birds of the power line tower can be accurately identified and detected, and when birds are detected, the compressed air is controlled to drive the birds, so that the stability of power transmission is ensured.

Fig. 1 is a schematic view of a scenario of a power grid obstacle recognition processing method based on image processing according to an embodiment of the application. As shown in fig. 1, in this application scenario, first, a monitoring image (e.g., C as illustrated in fig. 1) acquired by a camera disposed at a power line tower (e.g., M as illustrated in fig. 1) is acquired; then, the acquired monitoring image is input into a server (e.g., S as illustrated in fig. 1) in which an image processing-based grid obstacle recognition processing algorithm is deployed, wherein the server is capable of processing the monitoring image based on the image processing-based grid obstacle recognition processing algorithm to generate a classification result indicating whether birds are contained in the monitoring range, and controlling compressed air to repel birds in response to the classification result being that birds are contained in the monitoring range.

Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.

In one embodiment of the present application, fig. 2 is a flowchart of a method for identifying and processing a grid obstacle based on image processing according to an embodiment of the present application. As shown in fig. 2, a method 100 for identifying and processing a power grid obstacle based on image processing according to an embodiment of the present application includes: 110, acquiring a monitoring image acquired by a camera deployed on a power line tower; 120, performing image blocking processing on the monitoring image to obtain a sequence of image blocks; 130 passing each image block in the sequence of image blocks through an automatic codec based image resolution enhancer, respectively, to obtain a sequence of enhanced image blocks; 140, passing each enhanced image block in the sequence of enhanced image blocks through a convolutional neural network model serving as a filter to obtain a sequence of image local semantic feature vectors; 150, passing the sequence of image local semantic feature vectors through a ViT model based on a converter to obtain image global context semantic understanding feature vectors; 160, passing the image global context semantic understanding feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether birds are contained in a monitoring range; and 170, controlling compressed air to drive birds in response to the classification result being that birds are contained in the monitoring range.

Fig. 3 is a schematic architecture diagram of an image processing-based power grid obstacle recognition processing method according to an embodiment of the present application. As shown in fig. 3, in the network architecture, first, a monitoring image acquired by a camera disposed at a power line tower is acquired; then, performing image blocking processing on the monitoring image to obtain a sequence of image blocks; then, each image block in the sequence of image blocks is respectively passed through an image resolution enhancer based on an automatic codec to obtain a sequence of enhanced image blocks; then, each enhanced image block in the sequence of enhanced image blocks respectively passes through a convolutional neural network model serving as a filter to obtain a sequence of image local semantic feature vectors; then, passing the sequence of image local semantic feature vectors through a ViT model based on a converter to obtain image global context semantic understanding feature vectors; then, the image global context semantic understanding feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether birds are contained in a monitoring range; and finally, controlling the compressed air to expel birds in response to the classification result being that birds are contained in the monitoring range.

Specifically, in step 110, a monitoring image acquired by a camera deployed at a power line tower is acquired. As described above, the current tower bird repellent technology has the following drawbacks: because birds are small-size objects in actual detection, errors easily occur in the traditional mode of relying on manual identification, so that the identification and detection precision of the birds is low, and the bird repelling effect is affected. And the conventional method has low timeliness in the process of bird identification, and a great deal of effort is required to perform bird identification detection at various positions of the power line tower. Therefore, an optimized image processing-based power grid obstacle recognition processing scheme is desired.

Specifically, in the technical scheme of the application, first, a monitoring image is acquired through a camera deployed at a power line tower.

Specifically, in step 120, the monitoring image is subjected to image blocking processing to obtain a sequence of image blocks. It should be understood that, considering that the power line towers are often arranged in the wild, during the process of collecting the monitoring image, the monitoring image is easily interfered by external environment or equipment factors, so that the resolution of the image is lower, and the characteristic information about birds in the monitoring image becomes fuzzy, which affects the subsequent bird identification. In addition, since birds belong to small-sized objects in the monitoring image, if the monitoring image is directly subjected to preprocessing such as image filtering, the object recognition of the birds is lost, and the subsequent bird recognition and detection are affected.

Specifically, in step 130, each image block in the sequence of image blocks is passed through an automatic codec-based image resolution enhancer to obtain a sequence of enhanced image blocks, respectively. Then, each image block in the sequence of image blocks is respectively passed through an image resolution enhancer based on an automatic codec, so that resolution enhancement of each image block is performed, thereby obtaining a sequence of enhanced image blocks. In particular, here, the automatic codec includes an image resolution encoder and an image resolution decoder, the image resolution encoder explicitly spatially encoding the respective image blocks using a convolutional layer to obtain respective image features; the image resolution decoder uses a deconvolution layer to deconvolute the individual image features to obtain the sequence of enhanced image blocks.

Fig. 4 is a flowchart of the sub-steps of step 130 in the image processing-based grid obstacle recognition processing method according to an embodiment of the present application, as shown in fig. 4, the step of passing each image block in the sequence of image blocks through an image resolution enhancer based on an automatic codec to obtain a sequence of enhanced image blocks, including: 131, performing explicit spatial coding on each image block in the sequence of image blocks by an image resolution encoder of the automatic codec using a convolution layer to obtain each image feature; a kind of electronic device with a high-pressure air-conditioning system. And 132, performing deconvolution processing on the image features by an image resolution decoder of the automatic coder by using deconvolution layers to obtain the sequence of enhanced image blocks.

It should be appreciated that the automatic codec includes an encoder and a decoder, the encoder having two convolutional layers. In one example, the first convolution layer has 1 input channel number, 2 output channel number, convolution kernel size 10, sliding step size 10, zero padding width 1, and then a normalization layer and a ReLU nonlinear active layer are set; the number of input channels of the second convolution layer is 25, the number of output channels is 50, the convolution kernel size is 3, the sliding step length is 3, the zero padding width is 0, and then a normalization layer and a ReLU nonlinear activation layer are arranged; the tail end of the encoder is a full-connection layer, and the number of neurons is 10; the decoder head end is a full-connection layer, the number of neurons is 850, two deconvolution layers are connected later, the number of input channels of a first deconvolution layer is 50, the number of output channels is 25, the convolution kernel size is 4, the sliding step length is 3, the zero padding width is 1, then a normalization layer and a ReLU nonlinear activation layer are arranged, the number of input channels of a second deconvolution layer is 25, the number of output channels is 1, the convolution kernel size is 10, the sliding step length is 10, the zero padding width is 1, and then a Sigmoid nonlinear activation layer is arranged.

Specifically, in step 140, each enhanced image block in the sequence of enhanced image blocks is passed through a convolutional neural network model as a filter to obtain a sequence of image local semantic feature vectors, respectively. Further, since each enhanced image block in the sequence of enhanced image blocks is image data, in order to enable expression of bird feature information in each enhanced image block, in the technical solution of the present application, feature mining of each enhanced image block in the sequence of enhanced image blocks is further performed using a convolutional neural network model as a filter, which has excellent performance in terms of implicit feature extraction of images, so as to extract implicit feature distribution information in each enhanced image block about birds, thereby obtaining a sequence of image local semantic feature vectors.

Wherein, passing each enhanced image block in the sequence of enhanced image blocks through a convolutional neural network model as a filter to obtain a sequence of image local semantic feature vectors, respectively, comprising: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer: carrying out convolution processing on the input data to obtain a convolution characteristic diagram; carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; wherein the output of the last layer of the convolutional neural network model as a filter is a sequence of local semantic feature vectors of the image, and the input of the first layer of the convolutional neural network model as a filter is each enhanced image block in the sequence of enhanced image blocks.

The convolutional neural network (Convolutional Neural Network, CNN) is an artificial neural network and has wide application in the fields of image recognition and the like. The convolutional neural network may include an input layer, a hidden layer, and an output layer, where the hidden layer may include a convolutional layer, a pooling layer, an activation layer, a full connection layer, etc., where the previous layer performs a corresponding operation according to input data, outputs an operation result to the next layer, and obtains a final result after the input initial data is subjected to a multi-layer operation.

The convolutional neural network model has excellent performance in the aspect of image local feature extraction by taking a convolutional kernel as a feature filtering factor, and has stronger feature extraction generalization capability and fitting capability compared with the traditional image feature extraction algorithm based on statistics or feature engineering.

Specifically, in step 150, the sequence of image local semantic feature vectors is passed through a translator-based ViT model to arrive at an image global context semantic understanding feature vector. Next, further consider that since the individual image local semantic feature vectors of the sequence of image local semantic feature vectors have bird feature related information about the whole monitored image between them, the pure CNN approach has difficulty learning explicit global and remote semantic information interactions due to the inherent limitations of convolution operations. Therefore, in the technical scheme of the application, the sequence of the image local semantic feature vectors is encoded in a ViT model based on a converter to extract the context semantic association features of the bird implicit features in each image block, so as to obtain the image global context semantic understanding feature vector. It should be appreciated that ViT may process the individual image blocks directly through a self-attention mechanism like a transducer to extract contextual semantic association feature information about bird implicit features in the individual image blocks, respectively.

Fig. 5 is a flowchart of the substeps of step 150 in the image processing-based grid obstacle recognition processing method according to an embodiment of the present application, as shown in fig. 5, the step of passing the sequence of image local semantic feature vectors through a ViT model based on a converter to obtain an image global context semantic understanding feature vector includes: 151, performing one-dimensional arrangement on the sequence of the image local semantic feature vectors to obtain global image semantic feature vectors; 152, calculating the product between the global image semantic feature vector and the transpose vector of each image local semantic feature vector in the sequence of image local semantic feature vectors to obtain a plurality of self-attention correlation matrices; 153, respectively performing standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; 154, obtaining a plurality of probability values by using a Softmax classification function for each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and 155, weighting each image local semantic feature vector in the sequence of image local semantic feature vectors by using each probability value in the plurality of probability values as a weight to obtain the image global context semantic understanding feature vector.

The context encoder aims to mine for hidden patterns between contexts in the word sequence, optionally the encoder comprises: CNN (Convolutional Neural Network ), recurrent NN (RecursiveNeural Network, recurrent neural network), language Model (Language Model), and the like. The CNN-based method has a better extraction effect on local features, but has a poor effect on Long-Term Dependency (Long-Term Dependency) problems in sentences, so Bi-LSTM (Long Short-Term Memory) based encoders are widely used. The repetitive NN processes sentences as a tree structure rather than a sequence, has stronger representation capability in theory, but has the weaknesses of high sample marking difficulty, deep gradient disappearance, difficulty in parallel calculation and the like, so that the repetitive NN is less in practical application. The transducer has a network structure with wide application, has the characteristics of CNN and RNN, has a better extraction effect on global characteristics, and has a certain advantage in parallel calculation compared with RNN (RecurrentNeural Network ).

Specifically, in step 160 and step 170, the image global context semantic understanding feature vector is passed through a classifier to obtain a classification result, wherein the classification result is used for indicating whether birds are contained in a monitoring range; and controlling the compressed air to repel birds in response to the classification result being that birds are contained in the monitoring range. And then, taking the image global context semantic understanding feature vector as a classification feature vector to carry out classification processing in a classifier so as to obtain a classification result for indicating whether birds are contained in the monitoring range. That is, the birds in the image are recognized and detected by classifying the images with the context semantic association features of the hidden features of the birds in the respective image blocks of the monitoring image, so that the compressed air is controlled to expel birds in response to the classification result being that birds are contained in the monitoring range.

That is, in the technical solution of the present application, the tag of the classifier includes that birds (first tag) are included in the monitoring range, and that birds (second tag) are not included in the monitoring range, wherein the classifier determines to which classification tag the classification feature vector belongs through a soft maximum function. It should be noted that the first tag p1 and the second tag p2 do not include a human-set concept, and in fact, during the training process, the computer model does not have a concept of "whether birds are contained in the monitoring range" which is only two kinds of classification tags and the probability that the output characteristics are under the two classification tags, that is, the sum of p1 and p2 is one. Therefore, the classification result of whether birds are contained in the monitoring range is actually converted into a classification probability distribution conforming to the natural rule through classifying the tags, and the physical meaning of the natural probability distribution of the tags is essentially used instead of the language text meaning of whether birds are contained in the monitoring range.

It should be understood that, in the technical scheme of the present application, the classification label of the classifier is an identification detection label of whether birds are contained in the monitoring range, so after the classification result is obtained, bird identification detection in the image can be performed based on the classification result, and accordingly, birds are contained in the monitoring range in response to the classification result, and compressed air is controlled to expel birds, so that stability of power transmission is ensured.

Fig. 6 is a flowchart of a sub-step of step 160 in an image processing-based power grid obstacle recognition processing method according to an embodiment of the present application, as shown in fig. 6, the image global context semantic understanding feature vector is passed through a classifier to obtain a classification result, where the classification result is used to indicate whether birds are contained in a monitoring range, and the method includes: 161, performing full-connection coding on the image global context semantic understanding feature vector by using a plurality of full-connection layers of the classifier to obtain a coding classification feature vector; and 162, passing the encoded classification feature vector through a Softmax classification function of the classifier to obtain the classification result.

Further, the image processing-based power grid obstacle recognition processing method further comprises training the convolutional neural network model serving as a filter, the ViT model based on a converter and the classifier; fig. 7 is a flowchart of a sub-step of step 180 in the image processing-based power grid obstacle recognition processing method according to an embodiment of the present application, and as shown in fig. 7, training the convolutional neural network model as a filter, the converter-based ViT model, and the classifier includes: 181, obtaining training data, wherein the training data comprises training monitoring images and whether the monitoring range contains the true value of birds; 182, performing image blocking processing on the training monitoring image to obtain a sequence of training image blocks; 183, passing each training image block in the sequence of training image blocks through the automatic codec based image resolution enhancer, respectively, to obtain a sequence of training enhanced image blocks; 184, passing each training enhancement image block in the sequence of training enhancement image blocks through the convolutional neural network model as a filter to obtain a sequence of training image local semantic feature vectors; 185, passing the sequence of training image local semantic feature vectors through the ViT model based on the converter to obtain training image global context semantic understanding feature vectors; 186, passing the training image global context semantic understanding feature vector through the classifier to obtain a classification loss function value; and, 187 training the convolutional neural network model as a filter, the converter-based ViT model, and the classifier based on the classification loss function values and by propagation in the direction of gradient descent, wherein, in each iteration of the training, a feature affinity spatial affine learning iteration is performed on a weight matrix of the classifier.

Wherein passing the training image global context semantic understanding feature vector through the classifier to obtain a classification loss function value comprises: the classifier processes the training image global context semantic understanding feature vector with a classification formula to generate a training classification result, wherein the classification formula is as follows:

, wherein ,/>

To->

Is a weight matrix>

To->

Representing a bias matrix; and calculating a cross entropy value between the training classification result and a true value as the classification loss function value.

Feature affinity spatial affine learning is performed, expressed as: in each iteration of the training, carrying out characteristic affinity space affine learning iteration on the weight matrix of the classifier according to the following optimization formula; wherein, the optimization formula is:

；

wherein ,

a weight matrix representing said classifier, +.>

Transpose of the weight matrix representing the classifier, < >>

Two norms of a weight matrix representing the classifier +.>

A kernel norm of a weight matrix representing the classifier, and +.>

Is the scale of the weight matrix of the classifier, < >>

Represents a logarithmic function with base 2, +.>

Representing multiplication by location +.>

And representing the weight matrix of the classifier after iteration.

In summary, an image processing-based power grid obstacle recognition processing method 100 according to an embodiment of the present application is illustrated, which acquires a monitoring image acquired by a camera disposed at a power line tower; and the artificial intelligence technology based on deep learning is adopted to mine the implicit characteristic information about birds in the monitoring image, so that the birds are identified and detected, and then when the birds are detected, the compressed air is controlled to drive the birds, so that the stability of power transmission is ensured.

In one embodiment of the present application, fig. 8 is a block diagram of an image processing-based grid obstacle recognition processing system according to an embodiment of the present application. As shown in fig. 8, the image processing-based power grid obstacle recognition processing system 200 according to the embodiment of the present application includes: an image acquisition module 210 for acquiring a monitoring image acquired by a camera disposed at the power line tower; the image blocking processing module 220 is configured to perform image blocking processing on the monitoring image to obtain a sequence of image blocks; an automatic codec module 230 for passing each image block in the sequence of image blocks through an automatic codec-based image resolution enhancer, respectively, to obtain a sequence of enhanced image blocks; the feature extraction module 240 is configured to pass each enhanced image block in the sequence of enhanced image blocks through a convolutional neural network model serving as a filter to obtain a sequence of image local semantic feature vectors; the global encoding module 250 is configured to pass the sequence of image local semantic feature vectors through a ViT model based on a converter to obtain an image global context semantic understanding feature vector; the monitoring result generating module 260 is configured to pass the image global context semantic understanding feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether birds are contained in a monitoring range; and a control module 270 for controlling the compressed air to expel birds in response to the classification result being that birds are contained in the monitoring range.

In a specific example, in the above image processing-based power grid obstacle recognition processing system, the automatic codec module includes: an encoding unit, configured to perform explicit spatial encoding on each image block in the sequence of image blocks by using a convolutional layer through an image resolution encoder of the automatic codec to obtain each image feature; and a decoding unit for performing deconvolution processing on the respective image features by an image resolution decoder of the automatic codec using a deconvolution layer to obtain the sequence of enhanced image blocks.

In a specific example, in the above image processing-based power grid obstacle recognition processing system, the feature extraction module includes: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer: carrying out convolution processing on the input data to obtain a convolution characteristic diagram; carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; wherein the output of the last layer of the convolutional neural network model as a filter is a sequence of local semantic feature vectors of the image, and the input of the first layer of the convolutional neural network model as a filter is each enhanced image block in the sequence of enhanced image blocks.

In a specific example, in the above image processing-based grid obstacle recognition processing system, the global encoding module includes: the one-dimensional arrangement unit is used for one-dimensionally arranging the sequence of the image local semantic feature vectors to obtain global image semantic feature vectors; the self-attention unit is used for calculating the product between the global image semantic feature vector and the transpose vector of each image local semantic feature vector in the sequence of the image local semantic feature vectors to obtain a plurality of self-attention association matrices; the normalization processing unit is used for respectively performing normalization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices; the classification function unit is used for obtaining a plurality of probability values through a Softmax classification function by each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and the weighting unit is used for weighting each image local semantic feature vector in the sequence of the image local semantic feature vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the image global context semantic understanding feature vector.

In a specific example, in the above-mentioned grid obstacle recognition processing system based on image processing, the monitoring result generating module includes: the coding unit is used for carrying out full-connection coding on the image global context semantic understanding feature vector by using a plurality of full-connection layers of the classifier so as to obtain a coding classification feature vector; and the classification unit is used for passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.

In a specific example, in the above image processing-based power grid obstacle recognition processing system, the system further includes a training module that trains the convolutional neural network model as a filter, the converter-based ViT model, and the classifier; wherein, training module includes: the training image acquisition unit is used for acquiring training data, wherein the training data comprises training monitoring images and whether the monitoring range contains the true value of birds or not; the training image blocking processing unit is used for carrying out image blocking processing on the training monitoring image to obtain a sequence of training image blocks; the training automatic coding and decoding unit is used for respectively passing each training image block in the sequence of training image blocks through the image resolution enhancer based on the automatic coder and decoder so as to obtain a sequence of training enhancement image blocks; the training feature extraction unit is used for respectively passing each training enhancement image block in the sequence of training enhancement image blocks through the convolutional neural network model serving as a filter to obtain a sequence of training image local semantic feature vectors; the training global coding unit is used for enabling the sequence of the training image local semantic feature vectors to pass through the ViT model based on the converter to obtain training image global context semantic understanding feature vectors; the classification loss function value calculation unit is used for enabling the training image global context semantic understanding feature vector to pass through the classifier to obtain a classification loss function value; and a training iteration unit for training the convolutional neural network model as a filter, the ViT model based on the converter and the classifier based on the classification loss function value and traveling in the direction of gradient descent, wherein in each round of the training, a feature affinity space affine learning iteration is performed on a weight matrix of the classifier.

In a specific example, in the above-described image processing-based power grid obstacle recognition processing system, the classification loss function value calculation unit includes: the training classification subunit is configured to process the training image global context semantic understanding feature vector by using the classifier according to the following classification formula to generate a training classification result, where the classification formula is:

, wherein ,/>

To->

Is a weight matrix>

To->

Representing a bias matrix; and a calculation subunit for calculating a cross entropy value between the training classification result and a true value as the classification loss function value.

In a specific example, in the above image processing-based power grid obstacle recognition processing system, the training iteration unit is configured to: in each iteration of the training, carrying out characteristic affinity space affine learning iteration on the weight matrix of the classifier according to the following optimization formula; wherein, the optimization formula is:

；

wherein ,

a weight matrix representing said classifier, +.>

Transpose of the weight matrix representing the classifier, < > >

Two norms of a weight matrix representing the classifier +.>

A kernel norm of a weight matrix representing the classifier, and +.>

Is the scale of the weight matrix of the classifier, < >>

Represents a logarithmic function with base 2, +.>

Representing multiplication by location +.>

And representing the weight matrix of the classifier after iteration.

Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described image processing-based power grid obstacle recognition processing system have been described in detail in the above description of the image processing-based power grid obstacle recognition processing method with reference to fig. 1 to 7, and thus, repetitive descriptions thereof will be omitted.

As described above, the image processing-based power grid obstacle recognition processing system 200 according to the embodiment of the present application may be implemented in various terminal devices, for example, a server or the like for image processing-based power grid obstacle recognition processing. In one example, the image processing-based grid obstacle recognition processing system 200 according to embodiments of the present application may be integrated into the terminal device as one software module and/or hardware module. For example, the image processing-based grid obstacle recognition processing system 200 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the image processing-based grid obstacle recognition processing system 200 may also be one of a plurality of hardware modules of the terminal device.

Alternatively, in another example, the image processing-based power grid obstacle recognition processing system 200 and the terminal device may be separate devices, and the image processing-based power grid obstacle recognition processing system 200 may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information in a contracted data format.

The present application also provides a computer program product comprising instructions which, when executed, cause an apparatus to perform operations corresponding to the above-described methods.

In one embodiment of the present application, there is also provided a computer readable storage medium storing a computer program for executing the above-described method.

It should be appreciated that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the forms of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects may be utilized. Furthermore, the computer program product may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Methods, systems, and computer program products of embodiments of the present application are described in terms of flow diagrams and/or block diagrams. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not intended to be limited to the details disclosed herein as such.

The block diagrams of the devices, apparatuses, devices, systems referred to in this application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent to the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. The method for identifying and processing the power grid obstacle based on the image processing is characterized by comprising the following steps of:

2. The image processing-based power grid obstacle recognition processing method according to claim 1, wherein passing each image block in the sequence of image blocks through an automatic codec-based image resolution enhancer to obtain a sequence of enhanced image blocks, respectively, comprises:

performing explicit spatial coding on each image block in the sequence of image blocks by an image resolution encoder of the automatic codec using a convolutional layer to obtain each image feature; and

and performing deconvolution processing on the image features by an image resolution decoder of the automatic codec by using a deconvolution layer to obtain the sequence of enhanced image blocks.

3. The method for identifying and processing the grid obstacle based on image processing according to claim 2, wherein the step of passing each enhanced image block in the sequence of enhanced image blocks through a convolutional neural network model as a filter to obtain the sequence of image local semantic feature vectors comprises: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer:

Carrying out convolution processing on the input data to obtain a convolution characteristic diagram;

carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; and

non-linear activation is carried out on the pooled feature map so as to obtain an activated feature map;

wherein the output of the last layer of the convolutional neural network model as a filter is a sequence of local semantic feature vectors of the image, and the input of the first layer of the convolutional neural network model as a filter is each enhanced image block in the sequence of enhanced image blocks.

4. A method of image processing based grid obstacle recognition processing according to claim 3, wherein passing the sequence of image local semantic feature vectors through a converter based ViT model to obtain image global context semantic understanding feature vectors comprises:

one-dimensional arrangement is carried out on the sequence of the image local semantic feature vectors so as to obtain global image semantic feature vectors;

calculating the product between the global image semantic feature vector and the transpose vector of each image local semantic feature vector in the sequence of image local semantic feature vectors to obtain a plurality of self-attention association matrices;

Respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices;

obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and

and weighting each image local semantic feature vector in the sequence of image local semantic feature vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the image global context semantic understanding feature vector.

5. The method for identifying and processing the grid obstacle based on image processing according to claim 4, wherein the step of passing the image global context semantic understanding feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether birds are contained in a monitoring range, and the method comprises the following steps:

performing full-connection coding on the image global context semantic understanding feature vector by using a plurality of full-connection layers of the classifier to obtain a coding classification feature vector; and

and the coding classification feature vector is passed through a Softmax classification function of the classifier to obtain the classification result.

6. The image processing-based power grid obstacle recognition processing method according to claim 5, further comprising training the convolutional neural network model as a filter, the converter-based ViT model, and the classifier;

wherein training the convolutional neural network model as a filter, the converter-based ViT model, and the classifier comprises:

acquiring training data, wherein the training data comprises training monitoring images and whether the monitoring range contains the true value of birds or not;

performing image blocking processing on the training monitoring image to obtain a sequence of training image blocks;

respectively passing each training image block in the sequence of training image blocks through the image resolution enhancer based on the automatic codec to obtain a sequence of training enhancement image blocks;

respectively passing each training enhancement image block in the sequence of training enhancement image blocks through the convolutional neural network model serving as a filter to obtain a sequence of training image local semantic feature vectors;

passing the sequence of training image local semantic feature vectors through the converter-based ViT model to obtain training image global context semantic understanding feature vectors;

Passing the training image global context semantic understanding feature vector through the classifier to obtain a classification loss function value; and

training the convolutional neural network model as a filter, the ViT model based on a converter and the classifier based on the classification loss function value and traveling in the direction of gradient descent, wherein in each round of iteration of the training, a feature affinity space affine learning iteration is performed on a weight matrix of the classifier.

7. The image processing-based power grid obstacle recognition processing method as recited in claim 6, wherein passing the training image global context semantic understanding feature vector through the classifier to obtain a classification loss function value, comprises:

the classifier processes the training image global context semantic understanding feature vector with a classification formula to generate a training classification result, wherein the classification formula is as follows:

, wherein ,/>

To->

Is a weight matrix>

To->

Representing a bias matrix; and

and calculating a cross entropy value between the training classification result and a true value as the classification loss function value.

8. The image processing-based power grid obstacle recognition processing method according to claim 7, wherein in each iteration of the training, feature affinity space affine learning is iterated on the weight matrix of the classifier with the following optimization formula;

wherein, the optimization formula is:

；

wherein ,

a weight matrix representing said classifier, +.>

A transpose of the weight matrix representing the classifier,

two norms of a weight matrix representing the classifier +.>

Represents the kernel norms of the weight matrix of the classifier, and

is the scale of the weight matrix of the classifier, < >>

Represents a logarithmic function with base 2, +.>

Representing multiplication by location +.>

And representing the weight matrix of the classifier after iteration.

9. An image processing-based power grid obstacle recognition processing system is characterized by comprising:

10. The image processing-based power grid obstacle recognition processing system of claim 9, wherein the automatic codec module comprises:

an encoding unit, configured to perform explicit spatial encoding on each image block in the sequence of image blocks by using a convolutional layer through an image resolution encoder of the automatic codec to obtain each image feature; and

And the decoding unit is used for carrying out deconvolution processing on the image features by using a deconvolution layer through an image resolution decoder of the automatic coder so as to obtain the sequence of the enhanced image blocks.